ITALO lunchtime webinar: Creating address-based linkages

Export to calendar

This event will explore how we create links from address matching.  

Abstract

Place-based information is often recorded in the live systems from which data can be extracted for analysis and linkage. This has the potential to provide detail about where people live, work, attend education or training courses, request or receive services (health, social care…), etc. This can also provide a more recent insight on household composition, as time elapses since the previous population census. An emerging theme in linked data research is to mobilise place-based information both for individual matching purposes and for household inference. The webinars will introduce a few of the innovative approaches in this domain:

  • The creator of Splink, an open-source record linkage library, will present a newly released package `uk_address_matcher`
  • SeRP work on dual indexing for geographic information
  • Applied work using address-level linkages and household units from the GROVE project (Governance for household-level environment and health data)
  • Applied work on health and criminal justice events for a cohort of households that experience substance misuse

 

Presentation 1: `uk_address_matcher`
This free Python package for address matching and geocoding has just been released this month. The package has several aims: simplicity, speed and accuracy. Its authors, Robin Linacre and Tom Hepworth have published reproducible accuracy benchmarks using publicly available labelled datasets. This allows it to be compared head-to-head with other approaches.

 

Key features:

  • Python only. Set up in seconds, runs on a laptop. No separate infrastructure or services needed.
  • Automated build pipeline for users wishing to match to Ordnance Survey data.
  • Fast. Match 100,000 addresses in ~30 seconds. The full end-to-end process from raw Ordnance Survey data to 100k matched addresses can be completed in less than a minute if matching to a small geographic area such as a local authority, and about 11 minutes for the whole UK (including one-time setup). When matching to the whole UK, subsequent matching runs of an additional 100k records take less than a minute.
  • Docs: https://lnkd.in/e9MHKZDZ
  • Discussion forum is at https://lnkd.in/eU_4MP46.
  • Source code: https://lnkd.in/eUZ8ZguU

 

Presentation 2: Moving towards open UPRN retrieval  

This talk describes work within SAIL and SeRP towards more transparent address matching and UPRN retrieval approaches through the adoption and evaluation of ASSIGN, an open-source, rule-based algorithm. ASSIGN uses transparent rules to handle common issues such as misspellings, equivalent terms, pluralisation, postcode errors, and Levenshtein distances, returning both a match prediction and the rules applied. Using AddressBase Premium as a source, the system supports high-performance, real-time lookup and is available as a hosted service via Endeavour Health Trust, while its open nature allows for self-hosting in air-gapped environments. We discuss our experience self-hosting and evaluating ASSIGN compared with the previous black-box approach across several population-level datasets, as well as the impact of address matching on residential and household linkages.

 

Other presentations will take place in a future webinar on 30th April. Details to be circulated nearer the time.

 

Who should attend: Users of linked data, methodologists working in data linkage, data scientists, data owners and controllers, funders supporting linked data research or data linkage methodology, and members of the public with an interest.

Collaborators: Improving Transparency Around Linkage Outputs (ITALO, DARE UK interest group), UK Data Linkage Community (UK DLC, DARE UK interest group). This webinar is supported through ITALO funding through DARE UK.

Agenda

13:00 Welcome
13:05 Presentation 1
13:20 Presentation 2
13:35 Discussion
13:55 Wrap-up

 

Registration is free.

Event Details

Wednesday 25 March 2026
13:00 - 14:00
Online
Public
Free

Event Speakers

Robin Linacre
Ministry of Justice
Dr Mike Edwards
Secure eResearch Platform, SeRP

Organiser

Rees Centre, ITALO and UK DLC, supported by funding through DARE UK