Basics of research data management
Research data management refers to the process of handling and organising research data throughout the lifecycle of a research project.
What is research data?
Research data includes any data collected for research. This can take many forms, from raw data collected in the field to processed data used for analysis.
- Responses to a questionnaire completed as part of a survey
- Transcripts or recordings of interviews
- Biological samples
- Measurements or tests carried out by research or medical professionals
- Cleaned or derived datasets from an archive
Longitudinal population studies collect large amounts of research data of many different types for each wave (or sweep) of a survey. This creates very large datasets across multiple data files at each timepoint.
Why is data management important?
Effective research data management has many benefitsto researchers and to the wider research community, funding bodies, and those taking part in research. These include:
Data management for longitudinal population studies and secondary data
Good data management is crucial for all types of data, but particularly when dealing with complex and large datasets, such as longitudinal population studies (LPS).
When using data from these types of studies for your research, some aspects of research data management are particularly important:
As a researcher using the data from LPS, you are not typically involved in the data collection, preservation, and sharing stage. However, research data management principles are still important.
Good research data management helps when processing the data and preparing it for analysis, when carrying out the analysis, and when sharing the outputs of the research e.g., the analysis syntax and results.
More information on data management in practice available in the Training Hub:
Activities throughout the Research Data Lifecycle
Research data management involves many different tasks and takes place throughout all of the stages of a research project, from planning to analysing data.
The Research Data Lifecycle framework describes the stages that research data goes through during a research project. Data management is a key part of all these stages.
Even if you did not collect or produce the data that you are using for your research project, data management is still an important part of your project. The ‘Data management in practice section provides further guidance on each step below:
Understanding metadata
Metadata is a fundamental part of documenting, sharing, and using survey data. Check out the CLOSER Learning Hub module Understanding Metadata module for a full introduction to metadata, to understand why it’s important and how to use metadata to identify relevant variables.
Why good data management, open science, and working reproducibly is beneficial:
Article
Five selfish reasons to work reproducibly
Five reasons why working reproducibly pays off in the long run and is in the self-interest of every ambitious, career-oriented scientist, by Florian Markowetz (2015).
Article
PLOS BIOLOGY: Open science challenges, benefits and tips in early career and beyond
There are great benefits but also significant challenges in the movement towards open science. By Christopher Allen and David M. A. Mehler (2019).
Article
eLife: Point of View: How open science helps researchers succeed
Literature review demonstrating benefits of open science, by Erin McKiernan et al. (2016).
Video
UK Data Service: Research data management
Well organised, well documented, preserved and shared data are invaluable to advance scientific inquiry and to increase opportunities for learning and innovation.