Pioneers in data discoverability – the CLOSER metadata search platform

CLOSER Discovery in numbers In 2012 a vision of a new and unique research tool was born – a uniform search platform that allowed researchers to search, discover and explore research data from multiple ESRC and MRC-funded longitudinal population studies.

10 years later that vision has been realised – CLOSER Discovery is the UK’s most detailed search engine for longitudinal population studies. It allows researchers to search through rich metadata, filter by study, topic and life stage, and find the data that meets their research needs.

The UK is home to a remarkable set of scientific studies that have tracked generations of people growing up in Britain over the last 75 years. These longitudinal population studies are unique in science and unparalleled elsewhere in the world – no other country has anything like them on the same scale.

CLOSER is the interdisciplinary partnership of leading social and biomedical longitudinal population studies, the UK Data Service and The British Library. We are an ESRC-funded data and research infrastructure centre charged with the mission to increase the visibility, use and impact of longitudinal population studies, data and research.

CLOSER Discovery is our flagship resource, developed to meet the complex needs of modern research in a world where depositing data in an archive is simply no longer enough. It enables researchers to search and explore the contents of 10 UK longitudinal population studies in unprecedented detail. As well as variable metadata, Discovery shows how the data were collected through questions and questionnaires. Additionally, all questions and variables are tagged to a common set of topics, making it easy for researchers and policy makers to explore an area of interest.

Our journey to Discovery

In 2012, researchers were faced with the situation of trying to find out what data had been collected by CLOSER’s longitudinal population studies by accessing and searching through a variety of different forms of documentation, often varying by study – a complex and time-consuming process. For those new to longitudinal population studies, this was even more daunting, as understanding the vast amount of data collected could be overwhelming.

For longitudinal population studies, handling requests from multiple researchers who want to browse the available data and investing time in trying to find the most appropriate variables for each request was a burdensome and inefficient use of their time and limited resources with multiple risks attached, including loss of knowledge over time through staff changes or relocation. Additionally, many of the systems may not have been built for interoperability, often leading to duplication of work and information when working with others.

Looking more broadly, expectations about how we find and manage data had changed, with new researchers growing up in a digital world where Google is often the first point of call. We set out to create CLOSER Discovery to bring the discovery of longitudinal data in line with current technologies and address some of these challenges.

In developing the content for CLOSER Discovery, we first had to overcome several major challenges – the studies are geographically dispersed across the country, their leadership crosses scientific domains, information available across the studies (the oldest of which dates back to 1930) was inconsistently catalogued and held in various formats, and technical capacity varied from study to study. Added to these challenges was the sheer scale of the task ahead – for the first five years of the project manual transcription of the questionnaires was the dominant method of capture.

Our exploration of the best tools to capture questionnaires eventually led us to developing Archivist software to work collaboratively within CLOSER and with our study partners. We worked with Colectica to enhance their Portal to improve the discovery experience, and automation of the metadata workflow.

Taking Discovery to the next level

A series of controlled vocabularies were used to consistently describe the type of data, data collection methodology, for interoperability with other metadata collections. Where these were not available we created our own such as LifeStage, encompassing, birth, childhood, early adulthood, etc. Other reference data such as data collectors and funders were also added.

A set of minimum metadata was defined. This helped ensure a consistent set of information across all longitudinal population studies, to provide meaningful comparison within and between the studies. In addition, this established a set of expectations for the provision of metadata for studies joining CLOSER Discovery; and sends a strong signal to funders for what data management resources future studies need to build into their funding proposals.

Enhancing the package

Best practices in software development and metadata capture such as wikis, metrics and How-To guides evolved into the CLOSER Technical Wiki.

Development of a suite of resources, including a series of walk throughs, videos and FAQS, enhanced the support package and helped familiarise users on how to use what is arguably more akin to a research tool than a data catalogue. Alongside this we produced a new module on ‘Understanding metadata’ for the CLOSER Learning Hub.

The onset of the COVID-19 pandemic transformed our training and learning offer. Our series of webinars ‘Metadata in the Real World’ invited others in the metadata community and Data Managers Network to showcase their work, as a way of connecting CLOSER’s wider network of research to these new developments in metadata.

Looking to the future

CLOSER Discovery has addressed the key challenges of longitudinal data discovery, provenance from question to data and vice versa, the use of a standard scheme for interoperability, and a vocabulary that can be used across all longitudinal population studies (regardless of whether they are a social science or medical study). But a research tool like CLOSER Discovery is a living thing – always growing, improving, and taking in more information as new data are collected, or historic records are made available. The underlying structure of the standards we are using, future-proofs Discovery to be able to cope with rapidly changing technology and new data collection methods

The data landscape is also constantly changing and adapting to new challenges and demands. The ESRC is currently looking into how to fund data services in the future – the Future Data Services programme will develop a long-term plan to strategically invest in data services beyond 2024. Other initiatives such as Population Research UK are designed to rationalise and enhance the coordination of research activities in longitudinal population studies, with discovery metadata being a core component. We’re supporting these new initiatives by conducting a review of over sixty UK longitudinal studies to help inform the important next steps.

Raising the metadata bar: lessons from documenting UK longitudinal population studies

This week we will be in Sweden for IASSIST 2022 ‘Data by Design: Building a Sustainable Data Culture’. Our speaker, Jon Johnson (Technical Lead), will discuss how the provision of technical and logistical solutions, alongside the development of new ways of working, practical demonstration, and training and resource allocation has led to a positive change in the perception of the need for high quality metadata.

Visit our stand to find out more about CLOSER and our resources or check out our dedicated conference page for a tour of CLOSER Discovery, our Learning Hub, and bespoke COVID-19 Longitudinal Research Hub.

Further information

Related: CLOSER Discovery
Related: CLOSER Learning Hub

Rob Davies is CLOSER Head of Policy and Dialogue. Follow him on Twitter: @r0bdavies

Suggested citation:

Davies, R. (2022). ‘Pioneers in data discoverability – the CLOSER metadata search platform’. CLOSER. 7 June 2022. Available at: https://www.closer.ac.uk/news-opinion/blog/pioneers-in-data-discoverability-the-closer-metadata-search/