Click on the section links below to learn more about dietary data harmonisation:
All of the original eight CLOSER studies have some form of diet-related questions; however the dietary assessment method used and the number of repeat assessments over time varied greatly between the studies. This heterogeneity will make it difficult to create harmonised dietary variables to apply to cross-cohort analyses.
Harmonisation aims to create comparable measures from various types of data across different studies. Harmonisation involves converting variables that capture the same latent construct across studies into a common format and it can be approached in different ways. Maelstrom Research developed guidelines for retrospective data harmonisation that can be found at https://www.maelstrom-research.org/about-harmonization/maelstrom-guidelines.
The DAPA toolkit described elsewhere in this guide also provides harmonisation principles from a dietary perspective (https://dapa-toolkit.mrc.ac.uk/) with these general steps: 1) Define the target variable; 2) Assess harmonisation potential; 3) Derive common format data. The section below outlines these steps using the harmonisation of fish intake across 12 studies as an exemplar.
The InterConnect consortium (http://www.interconnect-diabetes.eu/) was established to examine the causes of diabetes and obesity using existing data. As part of this aim, researchers used exemplar projects to understand challenges and approaches to harmonisation. The DAPA toolkit outlines the approach they took to harmonise fish consumption (https://dapa-toolkit.mrc.ac.uk/diet/harmonisation):
- Define target variable
The target variable is derived from harmonisation of the raw data in different studies and should be specified in terms of units. This variable should be appropriate to answer the research question as well as being dependent on the data and methods used in the different datasets.
In InterConnect, they aimed to harmonise a total of eight variables (total fish, fatty/oily fish, lean fish, shellfish, saltwater fish, freshwater fish, fried fish, smoked/salted fish), all in g/d, across 12 studies.
- Assess the harmonisation potential
It is important to know if the existing data have the ability to capture the same latent constructs. Understanding the specific methods and instruments used in each study as well as the format of the data, the overall study design and any assumptions made during processing within-study data are essential.
In InterConnect, ten studies assessed fish intake using FFQs with two using diet history (a retrospective structured interview method consisting of questions about habitual intake of foods from the core food group). While all studies could create total fish, not all could contribute to the seven other variables.
- Deriving a common format
A number of different methods can be applied to derive a common format for the target variable within each study, for example, using a conversion factor or collapsing to the least common denominator. Applying a conversion factor can be straightforward when the relationship between two units is known, as is the case for converting kilocalories per day to kilojoules per day. Collapsing to the least common denominator can include recoding or transforming existing data and would involve applying an agreed set of rules or algorithms depending on within-study data availability. External data can also be used to support deriving a common format. For example, data on average portion sizes could be used in combination with frequency and food type to derive food quantities. However, this should be applied with caution as the degree to which these values can be generalised depends on the specific study population.
When considering the harmonisation of dietary patterns (DPs), the food groups within each study and the items within these groups should be as similar as possible between the studies. If using PCA to determine a DP, the coefficients from study will need to be applied to the other to ensure the same DP is being compared.
All of these suggested approaches have limitations which might make it difficult to compare absolute levels of dietary intake across studies. However by ranking individuals in quartiles according to intake or adherence to a DP, a comparison of associations between diet and health outcomes between studies can be made.
For the InterConnect consortium, methods to transform variables from each study to the common target variable were created and agreed with each study. The two tables below outline the harmonisation approach taken. There were some specific challenges related to this study. For instance, for some types of fish it was unclear if they should be classified as lean or fatty. Furthermore, the fat content of certain fish and portion sizes can vary depending on location; therefore local knowledge was required to make these decisions.
|Fish items in the original cohort||Harmonised items|
|Fish types||Assumption of g/portion||Frequency and quantity||Target variable (g/d)||Harmonisation - categorisation of fish||Harmonisation - frequency and quantity|
|White fish (hake, whiting, bream, grouper, sole)||150 g||Frequency: never/almost never; 1-3 times/month; once a week; 2-4 times/week; 5-6 times/week; once per day; 2-3 times/day; 4-6 times/day; more than 6 times/day||Lean fish||White fish/day + Cod/day||Lean fish: multiply portion/day*150 g|
|Blue fish (sardines, tuna, bonito, mackerel, salmon)||150 g||Fatty fish||Blue fish/day||Fatty fish: multiply portion/day*150 g|
|Salted or smoked fish||50 g||Salted/smoked/dried||Salted or smoked fish/day||Salted/smoked/dried fish: multiply portion/day*50 g|
|Clam, oyster, mussels||60 g||Seafood other than fish||Total seafood per day||Source data already in g/d|
|Prawn, king prawn, crayfish||100 g|
|Octopus, squid, cuttlefish||150 g|
|Total fish and seafood per day (derived)||g/d||Total fish||Total fish and seafood per day||Source data already in g/d|
|Total seafood per day (derived)||g/d|
|Fish items in the original cohort||Harmonised items|
|Fish types||Frequency and quantity||Target variable||Harmonised categorisation of fish||Harmonised frequency and quantity|
|Total fish||g/d||Total fish||Total fish (sum of all available variables) - variables are mutually exclusive||Source data already in g/d|
|Cod; Baltic herring with bones; Baltic herring; Salmon; Salmon salted; Baltic herring salted with bones; Herring slated; Smoked Baltic herring with bones; Sardine; Smoked redfish; Perch; Pike; Flounder; Bream; Vendace with bones; Fresh frozen saithe; Whitefish; Fish average; Fish in soup, average; Roe; Stockfish; Vendace, salted with bones; Smoked vendace with bones; Smoked lamprey; Smoked whitefish; Smoked fish average; Tuna; Shrimp||g/d||Lean fish||Cod; Stockfish; Fresh frozen saithe; Perch; Pike; Flounder; Fish, average; Fish in soup, average||Source data already in g/d|
|Fatty fish||Baltic herring with bones; Baltic herring; Salmon; Salmon salted; Baltic herring salted with bones; Herring slated; Smoked Baltic herring with bones; Sardine; Smoked redfish; Whitefish; Vendace, salted with bones; Smoked vendace with bone; Vendace, with bones; Smoked fish average||Source data already in g/d|
|Salted/ smoked/ dried||Salmon salted; Baltic herring salted with bones; Herring salted; Smoked Baltic herring with bone; Smoked redfish; Vendace, salted with bones; Smoked vendace with bone; Smoked lamprey; Smoked whitefish; Smoked fish average: mean of four species; Baltic herring smoked; Vendace smoked; Whitefish smoked; Bream smoked||Source data already in g/d|
|Seafood other than fish||Shrimps||Source data already in g/d|
There are no specific rules for harmonising dietary data across studies. The approach taken depends on the research question and the data available. A metadata inventory documenting methods, data formats and nuances of data processing etc. is the most time consuming aspect of harmonisation. With this guide, we have completed this key step for the original CLOSER partner studies, so that researchers can focus on how best to answer their specific diet-related questions.
Explore additional background detail:
- Objective and outline of this guide
- Dietary research in context
- Dietary assessment tools (DATs)
- Estimating nutrient intakes from DATs
Learn more about the individual studies covered by this guide and their dietary measurements:
- Overview of dietary information in selected CLOSER studies
- Hertfordshire Cohort Study (HCS)
- 1946 National Survey for Health and Development (NSHD)
- 1958 National Child Development Study (NCDS)
- 1970 British Cohort Study (BCS70)
- Understanding Society: The UK Household Longitudinal Study (UKHLS)
- The Avon Longitudinal Study of Parents and Children (ALSPAC)
- Southampton Women’s Survey (SWS)
- Millennium Cohort Study (MCS)
- Acknowledgements and copyright information for this guide
- References for this guide
- Download the full guide as a PDF
This page is part of the CLOSER resource: ‘A guide to the dietary data in eight CLOSER studies’.