Basics of research data management
Research data management refers to the process of handling and organising research data throughout the lifecycle of a research project.
What is research data?
Research data includes any data collected for research. This can take many forms, from raw data collected in the field to processed data used for analysis.
- Responses to a questionnaire completed as part of a survey
- Transcripts or recordings of interviews
- Biological samples
- Measurements or tests carried out by research or medical professionals
- Cleaned or derived datasets from an archive
Longitudinal population studies collect large amounts of research data of many different types for each wave (or sweep) of a survey. This creates very large datasets across multiple data files at each timepoint.
Why is data management important?
Effective research data management has many benefitsto researchers and to the wider research community, funding bodies, and those taking part in research. These include:
Data are easier to find, access, and use for research.
Data are easier to understand – this improves your own understanding, for example, when returning to the data after some time has passed. It also assists other researchers who might use the data.
Data are less likely to contain errors or inconsistencies.
Data are easier to share with other researchers.
Data and analyses are easier for other researchers to replicate.
Well managed data and shared analysis code reduces time and effort for others to replicate the data and analyses.
When using secondary data, there will likely be specific licensing and copyright requirements that you must follow.
Copyright plays a role when creating, reusing, reproducing and sharing research data. Read more about copyright issues in the Find and access existing data section.
Data management for longitudinal population studies and secondary data
Good data management is crucial for all types of data, but particularly when dealing with complex and large datasets, such as longitudinal population studies (LPS).
When using data from these types of studies for your research, some aspects of research data management are particularly important:
Using consistent data formats (e.g., wide/long formatting, storage programme), metadata (e.g., variable type and variable labels), and variable names over different time points of an LPS makes it much easier to merge the data together and compare the data across time.
Secondary data from LPS and other surveys can be complex and may be contained in many different files. Having a sensible and organised file structure simplifies the management and analysis of that data.
LPS data can contain sensitive information, so it is important to have the correct safety measures and effective controls in place to protect the data from unauthorised access.
As a researcher using the data from LPS, you are not typically involved in the data collection, preservation, and sharing stage. However, research data management principles are still important.
Good research data management helps when processing the data and preparing it for analysis, when carrying out the analysis, and when sharing the outputs of the research e.g., the analysis syntax and results.
More information on data management in practice available in the Training Hub:
Activities throughout the Research Data Lifecycle
Research data management involves many different tasks and takes place throughout all of the stages of a research project, from planning to analysing data.
The Research Data Lifecycle framework describes the stages that research data goes through during a research project. Data management is a key part of all these stages.
Even if you did not collect or produce the data that you are using for your research project, data management is still an important part of your project. The ‘Data management in practice section provides further guidance on each step below:
Data management plans, how you will organise, store, and share the data.
Setting the data structure, documenting the metadata, developing data schema or data dictionaries.
Selecting appropriate storage locations and media, ensuring confidentiality and security of the data.
Ensuring the long-term accessibility and usability of the data.
Making the data available to others in a way that is accessible, discoverable, and ethical, while preserving confidentiality and security.
Analysing and interpreting the data, generating and documenting derived variables, producing analysis code/syntax.
Metadata is a fundamental part of documenting, sharing, and using survey data. Check out the CLOSER Learning Hub module Understanding Metadata module for a full introduction to metadata, to understand why it’s important and how to use metadata to identify relevant variables.