DataLoch has incorporated a significant selection of HDR UK Phenotypes within our repository. Through these, our aim is to ease the burden on researchers who seek DataLoch-hosted data within their projects.
What is a HDR UK phenotype?
In DataLoch, these phenotypes are health conditions or diseases – such as diabetes or hypertension – which we have drawn from the Health Data Research UK (HDR UK) Phenotype Library.
Non-disease phenotypes, such as measurements, are currently not within our repository, but we often access these from other data sources (e.g. our comprehensive laboratory results dataset). The phenotypes currently in DataLoch are those marked with a “CALIBER” label.
What is the value of using a HDR UK phenotype?
Health records are extremely complex. A record of a health condition or disease in an individual could be held in any of multiple different datasets: GP records, hospital records, or even only mentioned on death certificates.
Furthermore, there are multiple ways of describing a condition: for example, there are 42 different GP (‘Read’) codes that could be used to describe asthma.
So, the purpose of the HDR UK phenotypes is to simplify understanding diseases for researchers, and to improve standardisation so that work in different datasets becomes more comparable. Each phenotype contains codelists of all possible ways to describe a condition across GP, hospital, and death records. For example, if a researcher is only interested in patients with asthma, they could select the HDR UK phenotype, rather than try to find and list all the multiple possible codes for asthma from each data source.
What are the limitations?
Currently, DataLoch does not host all GP records. Therefore, if researchers select an HDR UK phenotype and want to use GP data, this will only be available for people registered with a DataLoch GP practice – approximately 85% of the Lothian population. If this data is critical for analysis, researchers must restrict their study population to people in these GP practices, to avoid incorrectly assuming patients do not have a condition because they do not appear in these tables.
Also, the phenotypes are designed to simplify research approaches, and therefore may lose too much information for some studies. For example, all codes for heart attacks (myocardial infarction) are combined into a single phenotype, but using this will not provide a breakdown on different types of myocardial infarction.
However, the standardisation of these HDR UK phenotypes is helpful for aligning research with others across the UK. These phenotypes have been used to describe the population prevalence of multiple conditions across four million people in England, as described in this Lancet Digital Health article.
How might researchers use the HDR UK phenotypes?
There are three broad areas where the HDR UK phenotypes could help:
1. To define a population of interest
If researchers are interested in a particular condition that is already a HDR UK phenotype, they could use this to select their study population, rather than providing their own codelists. This would still require some choices about dates of interest, and sources (e.g. GP, hospital, death data or combinations of all), but it would provide an excellent starting point.
2. To define the baseline data of a cohort
If researchers are defining their study population in a different way, they could still use the HDR UK phenotypes to help provide a simpler view of baseline characteristics. They might want to report the frequency of common conditions (e.g. diabetes, heart disease, dementia, cancer) in their study population at the start of the study period of interest. The HDR UK phenotypes could be used to look for codes in any source that are added before the start date of the study.
3. To provide outcomes for a cohort
Similarly, HDR UK phenotypes may be used to measure outcomes if the codes appear after a date of interest. For example, if researchers are interested in dementia as an outcome for their study population, a new code from the HDR UK dementia phenotype codelist appearing after a defined study date could be used across the study population.
Together, the HDR UK phenotypes offer great potential for speeding up the selection of conditions that are relevant to researcher interests, substantially improving the application process.