Data from administrative records and longitudinal surveys contributes considerably to our understanding of society and helps improve public policy and practice. Linking these two forms of data together can yield even more insight, but is currently very challenging to carry out in practice.
The UK is one of only two countries in the world to host a number of longitudinal birth cohort studies that follow samples of participants from birth to death. These are fantastic at gathering detailed snapshots of their members’ lives at particular points in time.
The UK also collates a significant number of administrative datasets. These data are recorded close to real time in a systematic way and can cover the entire population. They are not collected for research purposes; typically they are created by government agencies to help with day-to-day administration. Examples include data about birth and death registrations, disease registries, outpatient and inpatient visits, educational attainment, and benefits, tax credits, earnings and pensions.
Both data sources (administrative and longitudinal) have traditionally been used in isolation. However, linking them together offers potential that exceeds what each can offer on its own. Administrative records can provide enviable detail and frequency, but they rarely provide the richness and depth that comes from survey data collected to tackle specific research questions. Consequently, combining the two can yield considerable benefits by providing a more complete picture of study participants’ full life stories.
A number of studies have demonstrated the benefit of linking administrative data to surveys. Here are some examples:
- The most famous study is perhaps that of Doll and Hill (1954), who surveyed doctors from the British Medical Association over time about their smoking habits and eventually linked this to their death registration data to obtain cause of death. This ground-breaking study was one of the first to show the link between smoking, lung cancer and cardiovascular disease.
- More recently, Campbell (2014) used the Millennium Cohort Study and data from the National Pupil Database to assess the impact of streaming on primary school children. She found evidence to suggest that teachers’ perceptions seem to be influenced by streaming in a way that advantages pupils in higher groups and penalises children in lower placements.
- Crawford and Greeves (2015) carried out recent analysis of the National Pupil Database and Higher Education Statistics Agency. They found higher participation rates in higher education among a range of groups including ethnic-minorities but the administrative records alone could not help explain these ethnic differences. Exploring this further required linking individual administrative records to survey data collected as part of the Avon Longitudinal Study of Parents and Children (ALSPAC) and the Longitudinal Survey of Young People in England. This work, carried out by Bowes et al (2015), yielded important insights about the interplay between gender, ethnicity and socioeconomic status and wider social, cultural, personal and economic factors.
Given the nature of the records involved, there are many complexities and challenges associated with linking survey and administrative records. Consent from the participant to access and link to their administrative records needs to be obtained, ideally at the outset of a study. Even with full consent, it can be a long and drawn-out process obtaining all the necessary permissions from the various government agencies in order to carry out the link between two or more datasets. Researchers then need to ensure they put measures in place to protect participants’ anonymity. But despite these challenges, the potential offered by combining administrative and survey data is invaluable. It is vital the research community works together with data providers to overcome these so we can help improve public policy and practice.
By Dr Efrosini Setakis,
Research Data Scientist, Centre for Longitudinal Studies
UCL Institute of Education