This section defines frequently used terms in data management.
Prospective data management is the practice of planning and organising data management activities from the beginning of a research project before data have been collected. It involves planning and establishing strategies for collecting, storing, analysing, and sharing data in a systematic and efficient manner, to ensure that data are well-managed throughout the entire research process.
Retrospective data management refers to the process of organising and managing data after it has been collected. It involves reviewing and organising existing data, implementing appropriate data management practices, and ensuring data quality and accessibility for future analysis and use.
A document written during the planning stages of a research project which outlines the strategy for managing the data during the length of the project. It can include information about the data type, how the data will be stored, how the data will be shared, and any processing that the data will undergo. A data management plan may be required by a funder of the research.
Learn more about metadata with the CLOSER Learning Hub ‘Understanding metadata’ module.
Metadata is data about data. It provides structured information, including essential context which helps us to make sense of the data.
A data/database schema defines the structure of a database, including how it is organised, any relationships, and any constraints of the database. It details how the data is stored, organised, and accessed.
A data dictionary holds information or metadata about the data in a database or dataset, such as variable names, descriptions, data types, data format, and relationships or dependencies. A data dictionary is also sometimes called a metadata repository. When downloading research data from the UK Data Service, for example, a data dictionary is provided alongside the data files and documentation.
A data repository is a centralised location for storing and managing research data. They are normally designed for the long-term preservation, curation and sharing of research data. A data repository often also provides tools and services for data curation, preservation, sharing and reuse. An example of a data repository is the UK Data Service.
Data sharing is the act of making research data available to others to use. It involves making data accessible to researchers and/or the wider community and is an important aspect of open science and research transparency. Data sharing can increase the impact and value of research data over time and so many research funders specify that data should be shared when this is possible.
Data preservation is the act of safeguarding research data for future use, by ensuring that it remains available, usable, and accessible over time. Preserving research data can involve taking measures to protect the data from loss, degradation, or becoming out of date, and may include strategies such as backup and recovery, migration to new formats or storage media, and ongoing curation and management.
Data curation refers to the process of managing and enhancing research data throughout the whole research data lifecycle, with the aim of maximizing the data’s quality, accessibility, and value. Curation may include tasks like data cleaning, writing or editing documentation, and creating metadata, as well as ensuring the data remain secure and follow privacy and ethical guidelines.
Data citation is the act of giving credit to research data in publications or other work. In the same way that references and citations in a paper give credit to previous publications, a data citation gives credit to the data used in the research. This helps other researchers to find and use the data in the future and helps to ensure that research is transparent and reproducible.
Open data or open access data refers to research data that is freely available for anyone to access, use, and share without restrictions and/or charge. Other access levels to data include “safeguarded”, “secure”, or “controlled” which means there are more restrictions on the access and use of the data, ranging from accepting terms and conditions of use to only accessing data in a “Trusted Research Environment”.
Open science is a way of conducting research that focuses on making science more transparent, inclusive, and accessible to everyone and involves sharing research findings, data, and tools openly with others. Open science principles aim to improve the quality of research and to make science more trustworthy and useful for society.
Also known as “TREs”, “Data Safe Havens”, or “Secure Data Environments”. These are secure computing environments that provide remote access to data that needs to be held more securely – often personal and/or identifying data. For example, the UK Office for National Statistics Secure Research Service is a TRE where researchers can get secure access to de-identified, unpublished data for their research.