Key solutions to unlock the power of location data
With no standardised address identifiers across national property datasets and differing ways of recording information, UK property data remains fragmented, incomplete and inconsistent.
Unfortunately, this makes linking property data accurately for research and analysis fraught with difficulties.
A recent study - led by the Urban Big Data Centre (UBDC) at the University of Glasgow and funded by the Ordnance Survey (OS) as part of the development of the Public Sector Geospatial Agreement - explores some of the challenges in working with address-based data and suggests ways reliability can be improved to help meet the UK’s geospatial strategic vision of ‘a coherent national location data framework’ by 2025.
The report shows how addresses are captured in many ways with much scope for error or uncertainty. For instance: in the misspelling of place names; the use of abbreviations or punctuation; the separation of building and street names; and the varied writing of flat numbers.
It is difficult, both to link information from different sources and to accurately identify the location of properties. However, tagging address-based data with the Unique Property Reference Number (UPRN) - now openly available for use from Ordnance Survey - can provide greater accuracy and efficiency in linking and analysing data.
Using two property datasets based in England and Wales, the study identified the following issues:
- The first dataset - Domestic Energy Performance Certificates (Domestic EPCs) - which already had UPRNs attached to 93% of cases could be improved up to a 96% match rate using a more extensive rules-based approach.
- With the second dataset - the Land Registry (LR) Price Paid Data (PPD) - the same rate of 96% could be achieved but with less time and effort because the address information was more structured. However, this required the development of a new rules-based approach.
- In both datasets, there was a small proportion of addresses where no match was possible because the address information was incomplete or incorrect.
- Flats were particularly problematic as there is no common standard of writing flat numbers.
- UPRNs have a hierarchical structure for some properties with a ‘parent’ UPRN for the whole building or block, and ‘child’ UPRNs for individual units or flats within it. This can lead to errors when the wrong level of UPRN is attributed to a property.
The Enriching address-based data with UPRNs study suggests some key solutions to help unlock the power of location data with increasingly standardised property datasets:
- GeoPlace – the central source of information for all UK addresses - should provide guidance on how the UPRN system works in terms of the lifecycle of UPRNs and the structures of parent and child relationships so that different data owners can be clear what their ‘target’ UPRN is in any situation.
- Data collection should be based on selection from fixed address lists which are derived from the OS AddressBase system so that UPRNs can be attached at the point of creating the records, not added through address matching.
- Data owners should agree a standard methodology for collecting and storing address data.
- Ordnance Survey should continue to improve the AddressBase product which matches Royal Mail postal address to unique property reference numbers (UPRN). While the quality of addresses stored in AddressBase is very high, there are still some errors and inconsistencies in the data.
- There should be an agreed way of dealing with parent-child property relationships with UPRNs and rules on how to deal with retired properties.
The low quality of the information in the address data is the main obstacle to achieving higher rates of accurate UPRN linkage. Controlling and standardising the quality of address curation in property-based databases offers the most effective way of improving UPRN linkage accuracy.
Read the study in full (OS Briefing Report):
Enriching address-based data with UPRNs
Full technical report:
Learning from Domestic EPC and Land Registry PPD datasets
Project R code: