An important aspect of DataLoch’s mission is to support research into the social determinants of health: non-medical factors that influence a person’s health and wellbeing. The characteristics of where a person lives – urban or rural setting; distance from health facilities and other amenities; crime rates; and so on – can all greatly impact an individual’s health. These geospatial (location-related) data can therefore offer important context when undertaking health research.
Due to our commitment to preserving individuals’ confidentiality, DataLoch does not provide identifiable address data to researchers. Therefore, we have to derive the necessary geospatial data, so these can provide the insights required for specific research needs. However, producing accurate locations to underpin the geospatial data is more challenging than it might initially seem.
Obstacles to location data caused by source datasets
The key issue when deriving location data is incomplete address data from the original source. This can be caused by spelling errors / typos, different numbering systems (e.g. Flat 15, 3 Random Street OR 3/15 Random Street), or only the name of a specific building being provided.
For care home residents, this issue is particularly stark. For these individuals, the complete address may not be recorded by a clinician as often as a private residence would be. (The assumption is that other practitioners accessing the patient notes would know where, for example, “Random Care Home” would be and therefore just the care home name would be sufficient for clinical purposes.)
To overcome the problem of incomplete addresses, we have incorporated Unique Property Reference Numbers (UPRN), which are unique for every addressable location in the UK. Just as the Community Health Index (CHI) number is an identifier for every person who engages health services in Scotland, so the Unique Property Reference Number allows every addressable location in the UK (including bus stops) to be reliably identified, no matter if the postal address changes over time or can be written in more than one way.
Using UPRN to improve location data
We have used an algorithm called FLAP (A Framework for Linking free-text Addresses to Ordnance Survey UPRN database) developed by partners in the Advanced Care Research Centre (at the University of Edinburgh) to attempt to match all Lothian addresses with their accurate UPRN. This first step resulted in a 75% matching success rate.
For a specific service-management project looking at care home locations, we then carried out a series of pattern-matching steps against care home names and related identifiers. For example, we searched through the remaining list of addresses for more obvious indicators of care homes, such as looking for phrases such as “care home”, “nursing home”, “c/h” [short for care home]. Once found, these additional addresses were assigned an accurate UPRN.
The result is that we can now accurately – using a combination of our UPRN-informed location data plus information published by the Care Inspectorate – define the care home population for our region. This development offers an important foundation for a variety of projects, such as those focussed on improving the quality of care for residents or informing strategies for promoting the independence of people in later life.
Our next geospatial data priorities
Defining the care home population has been a significant challenge, and we have learnt a lot about geospatial data through the process. Our next priority is to expand our geospatial data to enable links with climate and pollution measurements, as well as proximity to urban green (parks and gardens) and blue (canals and rivers) spaces. Achieving these objectives would open the possibility for research to explore the impacts of these on health and wellbeing, and in doing so substantially add to the opportunities for novel research through everyday health data.
Contact us
If you are a researcher interested in including geospatial data in your work, please Connect with Us