What is DataLoch?
DataLoch is a secure data service that supports health and social care priorities and is funded under a ten-year Data-Driven Innovation (DDI) programme. DataLoch has been developed in partnership between NHS Lothian and the University of Edinburgh. Find out more about the DDI programme on their website.
In the DataLoch team, we believe that putting data at the centre of responses to health and care system challenges is critical to improving services through research, innovation and planning.
Our approach will lead to better decision making, research, and support for colleagues on the front line. We will do this by:
- bringing together health and social care data for the South-East Scotland region;
- working with experts in health and social care to understand and improve this data; and
- providing safe access to data for researchers
Why use routinely collected data for research?
Using routinely collected data for research can enable population-wide insights over a significant time-scale as well as being a more efficient use of resources.
A key benefit of using routinely collected data for research is that genuine population-wide insights are possible due to the significant sample size (i.e. potentially anyone who use services). Therefore, routinely collected data has a low risk of ‘selection bias’ and therefore characteristics from throughout society are included in the research data and are representative of the entire population. Where research depends on recruiting people, there is usually an element of bias that is difficult to completely recognise and address. This bias will impact on the results.
Also, in recruitment-based approaches, there is a risk that people will withdraw from the study over time for a wide variety of reasons. Participants may die or could move away and become uncontactable as a consequence, which impacts on the available data. This is especially problematic for research looking at rare diseases. "Dropping out” still happens with using routinely collected data, but the reasons are more readily known since these data will be captured routinely.
Finally, by using data that are already there within our health and social care services, there are potential time-savings, which would speed up the process of introducing new innovations into frontline services.
Therefore, by using routinely collected data, there is a significant opportunity for ensuring results work for all and enabling more equitable impacts on public health policies and health care services in a timely manner.
What are the issues in using routinely collected data for research?
Practical issues such as the implementation of different IT systems across health care systems and potentially different ways of recording data (such as the use of different codes) make the linkage of data time-consuming. Services like DataLoch seek to do this linkage once, thus avoiding the duplication of effort that exists for different researchers repeating similar steps for their individual studies.
Also, routine data are collected for clinical or service delivery purposes and the use of it for research is secondary. Therefore, the data collection process has not been designed for a specific research project, so some degree of processing is required. In contrast, data collected to support clinical trials is generally designed specifically for that research.
Finally, there is usually no direct individual consent process for the use of routine data for research, so there needs to be strict security and governance procedures in place to ensure that confidentiality around the identity of individuals is maintained. For how DataLoch achieves this, see our Data Security page.
What sorts of data will DataLoch host?
DataLoch brings together routine data collected as part of people’s day-to-day interactions with health and social care services. These data include the types of services used, details of visits to hospitals or GPs, treatments and medicines, as well as outcomes and test results.
These data enable researchers to answer important questions about how to improve people’s wellbeing. There are many examples of how health data research has delivered insights to help solve challenging health problems, including diagnosing rare diseases, improving the performance and equity of care, identifying diseases early, and assessing the effectiveness of health systems. Read about some of the projects DataLoch has supported.
What is the legal basis for DataLoch's processing of these data?
NHS Lothian is the Data Controller of the data currently hosted by DataLoch. This means NHS Lothian exercises overall control over the purposes and means of the processing of personal data. Hence, DataLoch processes health care data in accordance with the NHS Lothian DataLoch Privacy Notice.
Who will have access to the data that DataLoch hosts?
Data hosted by DataLoch are accessed by NHS staff and those within the DataLoch team who are working to curate, link and de-identify data to prepare it for projects. De-identifying means removing aspects that can directly identify an individual (like names, addresses, and date of birth) and may include aggregating information into ranges (e.g. between ages 25-50) or suppressing rare conditions.
Prior to our full launch in July 2022, DataLoch applicants for access to specific data extracts included academic researchers and health and social care service managers. Now, we have developed the appropriate governance framework – in association with NHS Lothian – to allow approved researchers from third- and private-sector organisations to also apply for access.
Any person wishing to access extracts of the data must follow an approved application process and complete relevant training as described within the Charter for Safe Havens in Scotland definition of an approved researcher. This requires applicants to meet a number of key governance criteria to ensure their purposes are legitimate and in the public interest.
In all cases, applications will be processed according to the standards set out by the NHS Health Research Authority, National Data Guardian, and regulatory bodies including the Information Commissioner’s Office (ICO).
How will access to data in DataLoch be controlled?
Any person wishing to access extracts of the data hosted by DataLoch will need to meet a number of key governance criteria to ensure their purposes are legitimate and in the public interest.
Each project is scrutinised by NHS employees to ensure the request is proportionate and appropriate. Public Value Assessments from our Public Reference Group ensure that proposals are in the public interest. Researchers will access and analyse the minimal amount of data required to answer research questions within our secure data environment, which is dedicated to protecting the confidentiality of data and meets the best practice national standards for access and information security.
As well as supporting researchers, DataLoch also support requests from the NHS team for access to data to help them understand and support service improvements. Again, these projects are approved through DataLoch’s governance process, and data are then made accessible to NHS colleagues.
Data are archived and deleted according to NHS Lothian record management policies. The retention periods vary according to the type of project.
How will DataLoch ensure that research will be transparent and focused on the public good?
Details of all research projects that are approved to access data extracts through the DataLoch service are available through our Projects Delivered page.
In terms of understanding whether applications are focussed on the public good, Public Value Assessments from our Public Reference Group ensures that each approved project is in the public interest. Each project also goes through careful scrutiny by NHS employees trained in data privacy, to ensure the request is appropriate and proportionate. Depending on the specific purposes and data required to support the project, ethical approval may also be required.
For more details see our Data Security page.
Are the data accessed in DataLoch identifiable?
Data extracts accessed by researchers are de-identified, meaning aspects that can directly identify an individual (like names, addresses, and date of birth) are removed. This process is also called pseudonymisation. Data extracts accessed by researchers are also minimised, which means we provide no more data than the minimum needed to fulfil the specific approved application.
Before giving access to researchers, the DataLoch team also assesses the likelihood and impact of someone’s identity being inferred, for example from a rare condition or unique combination of information. We take steps to avoid this kind of inferred identification, for example aggregating information into ranges (e.g. between ages 25-50) or withholding data.
While the process varies with every project, it is designed (along with other controls such as the use of our secure data environment) to minimise the risk of researchers being able to identify individuals represented in the data without prejudicing the goals of the research.
A useful resource about the spectrum of identifiable to anonymous information is described by Understanding Patient Data in their ‘identifiability explainer’. Further details on how we keep data safe and protect individual identities can be found on our Data Security page.
How will the DataLoch team ensure its service is secure from cyber-attack or unauthorised use?
Data are held within NHS Lothian servers behind the NHS Lothian firewall. Access to these data outside of the NHS is only permitted via a secure data environment such as the Scottish National Safe Haven. This infrastructure is one of several Safe Havens across Scotland already dedicated to protecting NHS information and which are required to meet the best practice national standards for access and information security.
As with many other public data services, DataLoch has adopted the Five Safes Framework to design its governance procedures and ensure the security of the data we host. Further information, including a brief animation describing how this works, can be found on our Data Security page.
The DataLoch team has a Data Protection Impact Assessment in place to help identify and minimise any data protection risks. This is continually monitored and modified as the DataLoch service continues to develop, in consultation with other parties that may contribute data to DataLoch.
Have patients consented to the usage of their personal health data?
The data hosted by DataLoch represent unconsented patient information derived from NHS sources. These data contains identifiable information to enable data from different sources to be linked. Extracts of this data prepared for specific projects and accessed by others has identifiable information removed. Legislative and governance provisions exist for the re-use of these data under controlled circumstances for specific purposes. These purposes are detailed within the NHS Lothian DataLoch Privacy Notice. DataLoch processes data according to this NHS Lothian DataLoch Privacy Notice, which contains a description of your data-protection rights within NHS Lothian, and also details how to contact NHS Lothian should you have any queries.
When will DataLoch be available for applications?
After two years of development, we fully launched the DataLoch service in July 2022.
As well as service-management requests from the NHS, DataLoch also considers applications from researchers who wish to securely access health and social care data from the South-East Scotland region. Researchers can be from private- and third-sector organisations, as well as from academic or clinical settings. Applicants and applications need to meet a number of key governance criteria to ensure their purposes are legitimate and in the public interest.
What is the DataLoch team doing specifically in relation to COVID-19?
In March 2020, colleagues from NHS Lothian and the University of Edinburgh asked the DataLoch team to help in the production of a dedicated COVID-19-linked dataset. This request was motivated to support immediate hospital-based service management and to provide a data asset for active and anticipated regional and national research into the outbreak.
The DataLoch team, working in collaboration with clinicians on NHS Lothian data, built a linked COVID-19 dataset for use by approved researchers on 30 April 2020. This dedicated dataset was retired as a separate entity in June 2022, with the contributing datasets being incorporated for service and research use within the main DataLoch repository.
Will data be shared with third-sector or private-sector organisations?
After two years of development, we fully launched the DataLoch service in July 2022.
As well as service-management requests from the NHS, DataLoch also considers applications from researchers who wish to securely access health and social care data from the South-East Scotland region. Researchers can be from private- and third-sector organisations, as well as from academic or clinical settings.
The application process includes an assessment of whether the project will benefit patients and is in the public interest. For example, partnerships between the NHS and private- or third-sector organisations can result in new healthcare technologies and treatments and medical devices that support better outcomes for patients.
See “How will you ensure that research will be transparent and focused on the public good?” above for further details on how we hold all applicants and applications to the same high standards.
What is the Five Safes Framework and how does it apply to DataLoch?
The Five Safes framework is a set of principles adopted by a range of centres providing secure access to data around the world, including the Office for National Statistics Secure Research Service.
The Five Safes are:
- Safe Projects – Data are only used for valuable, ethical research that delivers clear public benefit;
- Safe People – Researchers are trained for safe handling of data;
- Safe Settings – Access to data is via secure technology systems;
- Safe Data – Researchers use data that have been de-identified and with extracts that have the minimum amount of data to fulfil the purpose of their project; and
- Safe Outputs – All research outputs are checked to ensure they cannot indirectly identify people.
The DataLoch team has incorporated the Five Safes in our overall strategy on keeping data secure. Assessing each aspect of the framework individually and as a whole is part of how we ensure data are hosted and accessed safely.
How has the DataLoch service been financed?
DataLoch has received supportive funding through the Edinburgh and South East Scotland City Region deal (see the next FAQ) and the Chief Scientist Office, Scottish Government.
Also, for any project, there is a charge for the required support and infrastructure costs (such as time for data preparation and use of computing resources) to cover our operating costs: DataLoch is a non-profit service. This charge varies based on the complexity of the project and the type of funding and/or organisation. There is no charge for projects related to NHS service management. All successful applications are published on our Projects Delivered page when researchers have access to the approved data.
What is the Edinburgh and South East Scotland City Region Deal?
DataLoch is a secure data service that supports health and social care priorities and is funded under a ten-year Data-Driven Innovation (DDI) programme as part of the Edinburgh and South East Scotland (ESES) City Region Deal. Finalised in August 2018, the ESES City Region Deal is a UK and Scottish Government-led investment designed to accelerate productivity and inclusive growth in the region through the funding of infrastructure, skills and innovation.
The regional partners of the City Region Deal include three NHS Boards (Lothian, Borders and Fife), six local authorities (City of Edinburgh, Midlothian, East Lothian, West Lothian, Fife and the Scottish Borders), plus regional universities and colleges. Find out more about the Edinburgh and South East Scotland City Region Deal.