Data harmonisation involves recoding or modifying variables so that they are comparable across research studies. This can be a complex process, but CLOSER has resources to help!
This page defines frequently used terms in cross-study research and data harmonisation and different ways harmonisation can be carried out. Explore the resources on Harmonisation methods using the navigation link on this page.
Some key terminology commonly used when discussing data harmonisation:
Harmonisation (/harmonization) is the prospective or retrospective process of making data and variables similar so that they can be compared directly and meaningfully.
For example, harmonising height across different datasets may involve converting measurements from feet and inches into centimetres or converting a continuous measure of income into categories/bands so that it matches how another dataset measured income.
Prospective harmonisation means to collect new data in a way that is intended to ensure it is comparable to data from other sources.
Retrospective harmonisation means to make data that already exists more comparable.
Concordance is the agreement or consistency between the items / variables / data. Often used when exploring the questions or variables from different datasets to see how similar or different they are and whether harmonisation is possible.
Equivalence is the degree of similarity between the data / variables / items from different datasets – are they equivalent/equal?
Inferential equivalence is a principle underlying data harmonisation; this means that the inferences about the underlying truth are the same regardless of the method (e.g., the question or variable) that was used to collect the data (see more about inferential equivalence in the Measurement Toolkit).
Standardisation is the process of removing all variance or differences in the collection, storage, or transformation of data through using the same methods across different studies. Standardisation would most likely occur prior to data collection, so can be seen as a form of prospective harmonisation. In practice, this would look like different studies using exactly the same questions and data collection methods and procedures with the resulting data being directly comparable.