Find and access existing data
When planning a research project that will explore a certain research question and topic, it is always useful to check whether there are existing data that have covered these questions and topics.
Using secondary data in your research project can bypass the money, time, and other costs associated with collecting primary data. It can also reduce the burden on participants (this issue is particularly important if you are interested in a very specific population) and give you access to support and resources from the study/data team.
The first step is to identify existing data and studies that might be relevant to you.
Tips to identify a dataset
How to access the data?
It is important to understand the access requirements and restrictions associated with your selected data sources. There are different access requirements for different types of data due to the sensitivity of the data and potential disclosure risk.
“Disclosure risk” is the risk of identification of individuals in the dataset, while “disclosive information” refers to information that has the potential to reveal sensitive or personally identifiable details about individuals. More obvious examples of disclosive information include names, addresses, National Insurance numbers, email addresses, and medical records.
There are different levels of access for types of data with different disclosure risks.
Access levels
Modes of access
There are different modes of data access for the different levels:
The amount of data that are accessed may vary depending on the data and access type:
- Entire dataset: all the dataset is accessible.
- Bespoke or limited dataset: access is granted to only certain variables or parts of the dataset which have been specified in a data application.
Some data sources may have a combination of access levels and requirements. For example, one dataset of a study may be safeguarded data which is available to download after acceptance of the terms and conditions, while the same study may include a controlled dataset which is only accessible via secure remote access.
Licensing and copyright
Ahead of using any data, it is crucial to understand the licensing and copyright that applies to the data. Copyright plays a role when creating, reusing, reproducing and sharing research data.
This is important to consider when using secondary data which will have copyright and licensing attached to the original dataset and any data derived or reproduced based on the original.
For more learning and teaching resources and courses on data management, go to the Training opportunities section and filter by Data management.
Find out more on the How and where to share data section.
The data sharing method is often linked closely to your chosen method of storing your data.
Ahead of sharing any data or data derived from a secondary dataset, it is important to check the copyright and licensing requirements that may be applicable. Read about these on the Find and access existing data section.