Find and access existing data
When planning a research project that will explore a certain research question and topic, it is always useful to check whether there are existing data that have covered these questions and topics.
Using secondary data in your research project can bypass the money, time, and other costs associated with collecting primary data. It can also reduce the burden on participants (this issue is particularly important if you are interested in a very specific population) and give you access to support and resources from the study/data team.
The first step is to identify existing data and studies that might be relevant to you.
Tips to identify a dataset
You can use CLOSER Discovery to search and browse questionnaires and data from the UK’s leading longitudinal studies.
Data sources mentioned in the methods sections of published research articles that are relevant to your topic, can establish the following important information:
- Was this primary data collection specifically carried out for this article or project, or data from a larger study?
- Do they mention a study name?
- Did they include a data citation with information about the study?
- A lot of economic, population and social research data from UK studies are held at the UK Data Service – visit their site to search through the catalogue by theme or data type.
- The Consortium of European Social Science Data Archives (CESSDA) Data Catalogue is a one-stop shop for searching and finding European social science data.
- The US Inter-university Consortium for Political and Social Research (ICPSR) maintains an archive of research in the social and behavioural sciences.
- The Registry of Research Data Repositories (re3data) provides a global registry of research data repositories from a variety of academic disciplines, including repositories that enable permanent storage of, and access to, data sets to researchers, funding bodies, publishers, and scholarly institutions.
Data collected routinely by governments (administrative data) could be useful for your project. Some longitudinal population studies are linked with certain administrative datasets, such as the National Pupil Database, so the information from these administrative datasets is available for the study members alongside the data collected by the study.
- Some UK Government departments make certain databases available to access (via applications) e.g., the National Pupil Database from the Department for Education.
- Explore administrative datasets that have been made available for research using the ADR UK Data Catalogue.
- The UK Office for National Statistics (ONS) publishes summary data and statistics on various topics and holds data from the UK census.
Check whether a particular study you are interested in is linked to administrative data, for example, at the UK Data Service or on the study’s own website.
How to access the data?
It is important to understand the access requirements and restrictions associated with your selected data sources. There are different access requirements for different types of data due to the sensitivity of the data and potential disclosure risk.
“Disclosure risk” is the risk of identification of individuals in the dataset, while “disclosive information” refers to information that has the potential to reveal sensitive or personally identifiable details about individuals. More obvious examples of disclosive information include names, addresses, National Insurance numbers, email addresses, and medical records.
There are different levels of access for types of data with different disclosure risks.
Open access data contain no personal or disclosive information.
You can often access open data freely without any extra steps, although you may need to agree to some terms and conditions, such as the requirement for a data citation in any published work using the data or acceptance of a Creative Commons Licence.
Restricted or safeguarded data contain no personal information but there is a residual risk of disclosure.
Access to these data may require some conditions are met, such as registering for an account, accepting some terms and conditions, filling out an application form, and obtaining the data owner’s permission.
These data have a higher risk of disclosure. Access to controlled data often requires training and accreditation by the researcher, approval by a Data Access Committee, and may only be possible through a secure access environment.
Modes of access
There are different modes of data access for the different levels:
Open and safeguarded data can normally be downloaded for use on your computer.
With safeguarded data, there may be more restrictions on the storage of the data, such as keeping your computer password protected.
Controlled data might only be accessible remotely, where you login to a platform to access the data without downloading it to your own computer (for example the UK Data Service SecureLab) .
Controlled data might also only be accessible via an on-site safe room or pod where you can only access and analyse the data inside the room.
The amount of data that are accessed may vary depending on the data and access type:
- Entire dataset: all the dataset is accessible.
- Bespoke or limited dataset: access is granted to only certain variables or parts of the dataset which have been specified in a data application.
Some data sources may have a combination of access levels and requirements. For example, one dataset of a study may be safeguarded data which is available to download after acceptance of the terms and conditions, while the same study may include a controlled dataset which is only accessible via secure remote access.
Licensing and copyright
Ahead of using any data, it is crucial to understand the licensing and copyright that applies to the data. Copyright plays a role when creating, reusing, reproducing and sharing research data.
This is important to consider when using secondary data which will have copyright and licensing attached to the original dataset and any data derived or reproduced based on the original.
For more learning and teaching resources and courses on data management, go to the Training opportunities section and filter by Data management.
Find out more on the How and where to share data section.
The data sharing method is often linked closely to your chosen method of storing your data.
Ahead of sharing any data or data derived from a secondary dataset, it is important to check the copyright and licensing requirements that may be applicable. Read about these on the Find and access existing data section.