European Survey Research Association (ESRA) 2023 conference

CLOSER is delighted to sponsor and present at the 2023 European Survey Research Association (ESRA) conference. The conference will take place from 17 - 21 July 2023.

About the conference

ESRA hosts its main conference every two years to bring together applied survey researchers, methodologists and statisticians from Europe and beyond.

The conference showcases the latest survey research and offers a number of professional development opportunities, including short courses and awards. It is traditionally hosted in university buildings to keep registration fees to a minimum. ESRA aims to be as inclusive as possible, promoting in particular the participation of doctoral students and early career researchers.

The conference theme is “Survey research in times of crisis: Challenges, opportunities, and new directions”.

More information about the conference, including about the programme, venue and registration, can be found on the ESRA conference website.

CLOSER’s presence at ESRA 2023

CLOSER is delighted to be sponsoring and running an exhibition stand at this year’s conference.

The CLOSER Discovery team will also run a session and present the following papers:

Session
Metadata uplift, machine learning and sustainable methods for metadata curation
Coordinators: Mr Jon Johnson (CLOSER, UCL Social Research Institute) & Dr Suparna De (University of Surrey, Department of Computer Science)
Date and time: Thursday 20 July, 16:00 – 17:30 (local time)

The establishment of cross-European infrastructures, (European Question Bank, CESSDA Data Catalogue, SSHOC) and standards (DDI, European Language Social Science Thesaurus, Triple) to support FAIR data in the social sciences and humanities, will have a significant impact on the level, quality and interoperability requirements of metadata from studies to support discovery and reuse for both legacy data and future data collections.

Whilst there has been significant progress in the development of technical architectures and the establishment of standards, generating high quality content remains a challenge particularly in ealy capture of lifecycle metadata and the development of suitable training datasets.

Machine learning and allied technologies offer the possibility to assist both studies and infrastructures to uplift existing metadata and provide new automated methods to curate future metadata to sustain FAIR data infrastructures.

In this session we will explore the latest developments in automated and semi-automated metadata curation, to support FAIR data, reuse and interoperability.

Presentation
Initial findings from the automation of extraction of metadata from questionnaires and its classification – Jon Johnson and Suparna De
Session: Metadata uplift, machine learning and sustainable methods for metadata curation
Date and time: Thursday 20 July, 16:00 – 17:30 (local time)

Social science archives have a long history of producing well documented datasets which include the provenance (questionnaires), data description and methodological annotation. Alongside that recent efforts to create thesauri such as ELSST which can be used systematically across the social sciences provide the possibility for enriching these valuable assets created over the last 50 years. However, this information is currently available mostly as PDFs alongside deposited datasets.

The presentation will show preliminary findings from a project between CLOSER and the University of Surrey which has used the metadata held in CLOSER Discovery to explore the automation of extraction of provenance data from PDFs of questionnaires, and the classification of the questions and associated data to a subset of ELSST.

The project has used four supervised model architectures (Multinomial naive Bayes, LSTM, ULMFit, and BERT) and their enhancements, to explore the strengths of the models, for metadata extraction and its utility for classification, in a number of different social science and health domains. This has provided valuable insights both for the most suitable methods and the composition of training data which would be needed to reliably extract metadata from questionnaires and classify the questions and associated data to a suitable ontology.

Presentation
Towards Metadata Driven Harmonisation – Hayley Mills and Jon Johnson
Session: State of the Metadata Infrastructure
Date and time: Tuesday 18 July, 11:00 – 12:30 (local time)

Data harmonisation for longitudinal population studies (LPS) involves retrospectively adjusting data collected by different surveys to allow comparisons. Repeating the same analysis across several LPS allows researchers to test whether results are consistent, or differ due to changing or different social conditions. Finding detailed information about data for harmonisation is however resource intensive, with a high level of uncertainty about the possible success.

CLOSER Discovery is a metadata research tool which enables users to discover, explore and assess data. It can help provide assurance of the quality and utility for any potential harmonisation. Information held in CLOSER Discovery such as the question text, the available responses, mode of interview and a consistent vocabulary, can be used to identify comparable data within and across LPS.

CLOSER aims to enhance the metadata further by adding variable concordance, enabling direct comparison of variables within and across LPS. The purpose is to save researchers’ time by providing sufficient information to make decisions about whether the data are suitable for their harmonisation use case. This task is a large undertaking not only in scale, with CLOSER Discovery containing 11 LPS, but poses a significant challenge in how to align variables within a consistent conceptual framework.

This presentation will set out our approach in identifying variables across surveys and how this will be structured to be useful for Discovery users. It will detail the main considerations for determining a workable conceptual framework and we would value input as to whether this is the ideal approach for us and its utility to the research user community.