Data linkage provides insight, informs policy change and helps answer society’s most important questions through increasing the utility of data.
It is integral to government operations, decision making and statistics. However, linkage presents challenges and more work needs to be done to realise its full benefits.
A new cross-government review set out to engage with the data linkage community across government, academia, the third sector and internationally to understand the challenges faced and identify state-of-the-art data linking methods. Published this week, the review found a number of challenges, including: how to assess the quality of linkage, how to link data effectively while maintaining privacy, and how to overcome the siloed nature of government linkage work. It also highlights that changes in data records over time make longitudinal linkage (linking data across time) particularly challenging.
The review contains a set of recommendations concentrated on improving data linkage methods and capability across government. Developing cross-government data linking networks and increased collaboration with academia forms a large part of these.
The recommendations of the review are to:
- build capability across government, including expanding the toolkit of data linkage courses, case studies and guidance
- improve collaboration across government, academia and internationally, including setting up a data linking network and organising linkage events
- conduct research on methods for Privacy-Preserving Record Linkage (PPRL), whilst carrying out linkage in-the-clear where possible to maintain quality
- work with networks to develop and maintain a quality culture when linking data, ensuring that quality metrics are produced and communicated to others
- conduct research on longitudinal linkage methods and quality metrics to understand how error progresses through the process and how to improve the linkage quality
- conduct research on the scaling method and compare with other linkage methods using large data sets
- conduct research on machine learning methods and their potential for government linkage, in terms of suitability and practicalities
- conduct research on scalable software solutions suitable for linking large data sets both within and across government departments, including Splink
- conduct research on graph databases for management of linked data sets
- explore options for producing test data for government and academia to test linkage methods. This includes new linkage algorithms and privacy-preserving techniques
The review discusses how longitudinal data linkage opens up opportunities for maximising the utility of data over time, but emphasises there are a lot of potential sources of error during the data journey that need to be understood to ensure longitudinal linkage is fit for purpose. Louisa Blackwell and Nicky Rogers (Office for National Statistics) produced a paper for the review, which discusses considerations to make when linking longitudinal data to meet user needs (in the context of using longitudinal linkage to estimate international migration) and seeks to offer a template for the statistical design of data sets that are derived through the linkage of administrative data to produce longitudinal data sets. They have designed a longitudinal linkage error framework for use when linking for statistical purposes. They recommend it is used to identify error sources in the data journey to build into statistical design and develop quality indicators for reporting on the statistical properties of the linked data set.
Read the full review, Joined up data in government: the future of data linkage methods