Specific types of data
Many data management principles apply to all types of data but there are some specific considerations for specialist types of data.
Some longitudinal population studies include or are linked to geographic information. This normally includes information about where the study member lives and features of the area in which they live.
Geographic information tends to be grouped into areas across different levels. These may be different in different countries, but in the UK they generally match the geographies used in the Census.
- Output Areas (OAs): the lowest level and cover the smallest area. They include between 100 and 625 individuals across 40 to 250 households.
- Lower layer Super Output Areas (LSOAs): made up of groups of OAs, normally about 4 or 5.
- Middle layer Super Output Areas (MSOAs): made up of groups of 4 or 5 LSOAs.
Other geographic areas in the UK
These include wards, parishes, Local Enterprise Partnerships (LEPs), Parliamentary constituencies, local authority districts, built up conglomerations and built up areas, workplace zones, and Travel To Work Areas.
When using geographic data, be sure to check the geography used and the level of the data. This is especially important if you plan to link the geography data in the study to other information e.g., area level information or indexes.
For example, the Index of Multiple Deprivation (IMD) is a measure of relative deprivation at the LSOA level.
Medical codes and statistics
There are specific types of data collected and used by the medical field that are often linked to longitudinal population studies, but these may require some specific management.
Hospital Episode Statistics (HES): these contain information about admissions, outpatient appointments and historical Accident and Emergency attendances at NHS hospitals in England. There are different datasets within HES that focus on areas of care, e.g., Accident & Emergency, Critical Care, Outpatient care.
Studies linked to HES
Some longitudinal population studies are linked to HES and make this linked data available to researchers alongside documentation which outlines any data processing that was carried out and guidance for its use. For example, the National Child Development Study Linked health administrative datasets – Hospital Episode Statistics (HES) User guide.
Medical data can include specific clinical codes that may need to be cleaned and/or translated in order to be useful in research.
The NHS in the UK used “Read codes” until 2018 to clinically encode information about patients, including symptoms, laboratory tests and results, and other information.
The Pathology Bounded Code List (PBCL) is still used for laboratory medicine reports.
Diagnoses, conditions, and procedures are often coded using classification standards, such as the International Statistical Classification of Diseases and Related Health Problems (ICD), the Classification of Interventions and Procedures (OPCS) or the Diagnostic and Statistical Manual of Mental Disorders (DSM).
Electronic health records being used for research may include some of these codes which might need translating or recoding so that the information is relevant to the research project. It is important to check which classification has been used and the version.