Missing those harder-to-reach: It’s time to make increased use of population data to bolster research equity

Andy Boyd (Director at the UK Longitudinal Linkage Collaboration) calls for appropriate use of population data to improve inclusiveness in longitudinal research. Here, he explains his recently published findings for an Economic and Social Research Council (ESRC) commissioned report into Population Data for Inclusive Research.

There are challenges in ensuring inclusive sampling, recruitment and follow-up in almost all studies. This is not surprising, as ensuring heterogeneity during recruitment is highly difficult; our participants’ health, work, family and social situations can introduce barriers to taking part. If not addressed, some of the most vulnerable and marginalised groups – who are most in need of the benefits research can bring – are likely to be underrepresented and potentially further marginalised.

Our obligation to ‘missing’ communities

My report for the ESRC took me on a 24-month journey to scope appropriate mechanisms to use population data (the records that are collected on us all through interactions with health and other government services) to help deliver more inclusive longitudinal research. This was spurred by an international review panel, which recommended that the ESRC consider the viability of an ‘administrative data spine’ – effectively a population research register – for longitudinal research.

Interviews with experts and rapid literature reviews provided strong evidence that vulnerable and marginalised groups are disproportionally ‘missing’ from studies; yet the idea of a ‘population register’ to support UK longitudinal research was likely neither feasible nor socially acceptable.

However, my research did identify a technical, legal and ethical basis for using population data to bolster inclusivity. It is clear that, as a research community, we have an ethical obligation to be fair and to strive for equitable research. Aligned with this, there is a distinct legal obligation (through equalities legislation) on UKRI – as funders of longitudinal research – for the UK’s longitudinal research strategy to be inclusive.

This is not to say all studies need to be inclusive of all groups or adopt representative sampling; but rather, that across the portfolio of UK longitudinal research, there is a requirement to be inclusive, particularly of vulnerable and marginalised groups who are disproportionally ‘missing’ from studies.

Recommendations for research equity

My research found no sign that our community is systematically biased: indeed, I found the opposite, with studies making many approaches to connect with and involve harder-to-reach groups. However, these groups can face substantial barriers to taking part such as unstable housing, children in the care system, or those arising from mental health challenges. Their inclusion requires sustained support to build and retain relationships, and the right funding and training in place from the outset for studies to provide long-term commitments.

I recommended the need for greater funding to support studies to develop and implement ‘inclusion plans’ and that this should be approached at a community level in order to share best practice, determine what works, and focus energy and resources.

While well-resourced and rigorous fieldwork and communications are key to addressing this challenge, the more effective use of population data can help ensure those missing are represented. My research suggests that for this to be successful, both sampling and the initial recruitment approach, and establishing retrospective permissions to establish linkages in studies should be based on opt-out approaches. There is a clear scientific, ethical and public benefit case for this, on the proviso that there is a reasonable expectation of research benefits for these specific groups.

I specifically recommend the use of individual level population data in study sampling and recruitment, rather than current approaches using area-based data, given that this will improve the precision by which invitations can be extended to diverse populations. This approach has been stymied by barriers in accessing individual level data, such as the ‘consent for consent’ paradox, which limits access to contact details during recruitment. Yet some studies – such as the ‘Study of Early Education and Development’ – have achieved this and realised substantial efficiencies.

I also recommended the development of a centralised research system for pooling and linking data from multiple studies to generate the required statistical power for sub-group analysis, to follow-up outcomes from attrited cases and to assess inclusion at a holistic level: an ambition now embedded in the Population Research UK prospectus.

The new longitudinal studies proposed by the MRC, ESRC and the NHS provide a timely opportunity to make greater use of population data to drive inclusivity in a new light; and the public commitment for inclusivity through community building from Our Future Health, as well as the HDR UK ‘Diversity in Data’ programme, are welcome to see.

Post-pandemic practices

Since I developed these conclusions, the pandemic has changed data sharing practice and likely has shifted the norms around the data research capabilities the UK needs.

Through the National Core Studies and other initiatives, the way individual records are used in research has changed, with improved flows of data coupled with innovation and investment in improved safeguards. Two in particular stand out. Firstly, the model for ’Trusted Research Environments’ (which provide a secure ‘reading library’ environment for analysis) has been refined and has gained traction. Secondly, the adoption of the ‘Five Safes’ rules for governing data use is helping drive common governance standards with the potential for increased efficiencies.

Through the Longitudinal Health & Wellbeing National Core Study, we – as a community of longitudinal studies – have realised the UK Longitudinal Linkage Collaboration (UK LLC) as a national Trusted Research Environment, hosting pooled data from >200,000 participants from >20 major UK interdisciplinary studies and systematically linking participants survey data to their health and environmental records. Adding administrative non-health records would make this a truly cross-cutting and interdisciplinary resource, which could provide a means for those left out of direct follow-up to be included in research.

The case for public buy-in

Yet, we shouldn’t assume that the public and our study participants will accept these new ways of working. Recent years have seen the public accept the benefits of an ‘opt-out’ organ donation programme, but baulk at the notion of an ‘opt-out’ centralised GP database. The failed launch of the General Practice Data for Planning and Research database is an opportunity to learn more about the safeguards necessary for acceptable data use and how we can balance the benefits and burdens of research. It is reassuring to see some of those critical of the proposed database (e.g., some GPs, the media and privacy lobbyists) express support for the concept if established through a Trusted Research Environment, communicated properly to, and run with the involvement of, the public.

My recommendations provide some direction on how we can proceed as a research community to balance the benefits and burdens of our studies. There is an opportunity to fairly distribute the risks with rigorous safeguards (the use of TREs, secure and privacy preserving approaches to sampling and recruitment) and meaningful benefits for the people who are lending their time, opinions, personal data, feelings and biological samples. It is necessary to build a public understanding of the value of our research and the safeguards put in place. I recommend a long-term approach to this through introducing concepts of longitudinal research, and more broadly data science and research methods, into school-based learning.

We’ve long grappled with the ethical tension in balancing opt-in and opt-out approaches for record linkage permissions, with participant trust being paramount. To make full advantage of the benefits that population data can bring to research equity, it may be time to reconsider the full value of ‘opt-out’ approaches at a community level. With strong safeguarding – the use of Trusted Research Environments, transparency in our working, and the adoption of a clear social contract around data use – we have the opportunity on the back of the pandemic to build confidence in health and social data collection which is truly inclusive.

Note: This report was funded by the UK Strategic Priorities Fund as part of the ESRC’s ‘Population Data Laboratory’ programme. The UK LLC was established in 2020 by the UK Longitudinal Health and Wellbeing National Core Study as a centralised and interdisciplinary resource for linking established longitudinal studies and routine records for Covid-19 research.

Further information

Andy Boyd is Director of the UK Longitudinal Linkage Collaboration.

Suggested citation:

Boyd, A. (2022). ‘Missing those harder-to-reach: It’s time to make increased use of population data to bolster research equity’. CLOSER. 28 March 2022. Available at: https://www.closer.ac.uk/news-opinion/blog/missing-harder-to-reach/