Cohort studies chart the lives of groups of individuals who experience common life events within a given period of time. Birth cohort studies are a type of cohort study which follow a group of people born in a particular period, for example, the 1958 National Child Development Study follows all the children born in one week in March 1958.
A cluster is a term used to describe when a population is split into smaller groups called clusters. Cluster sampling is a form of probability sampling where clusters are picked at random to form the sample. This is used typically with large, geographically spread-out populations.
Cohort members are the participants or subjects of a cohort study.
Confounding occurs where the relationship between independent (exposure) and dependent (outcome) variables is distorted by one or more additional, and sometimes unmeasured, variables. A confounding variable must be associated with both the independent and dependent variables but must not be an intermediate step in the relationship between the two (i.e. not on the causal pathway). For example, we know that physical exercise (an independent variable) can reduce a person’s risk of cardiovascular disease (a dependent variable). We can say that age is a confounder of that relationship as it is associated with, but not caused by, physical activity and is also associated with coronary health. See also ‘unobserved heterogeneity’, below.
Contextual factors are things in the environment that affect an individual’s actions, behaviours, or circumstances, and describe the context in which they take place.
Researchers consider contextual factors to understand the context surrounding an action, behaviour, event, or circumstance. Contextual factors can include, for example, physical surroundings, laws and regulations, social and cultural norms, and the economic and political climate.
A cross sectional study can be described as a snapshot observation of a group of people at one point in time. This differs from a longitudinal study which follows a group of people over a period of time.
A cross-sectional association measures the association between two factors which have been measured at the same specific point in time. Cross-sectional associations can be carried out using data from cross-sectional surveys but also with data from longitudinal studies if only one wave or sweep of the study is used.
Epidemiology is a scientific discipline that studies the patterns and determinants of health and disease in a population.
Data harmonisation is the prospective or retrospective process of making data and variables similar so that they can be compared directly and meaningfully and can be done within and across studies.
Heterogeneity is a term that refers to differences. This could be differences in characteristics between study participants or samples, or differences in methods and measurements. It is the opposite of homogeneity which refers to similarities. Methodological heterogeneity is sometimes used to refer to differences in study design or processes.
Longitudinal studies gather information about the same individuals repeatedly over a period of time, in some cases from birth until old age. Many longitudinal studies focus upon individuals, but some look a whole households or organisations.
Measurement error, also known as observational error, is the difference between a measured or observed value and its true (unknown) value.
Measurement error is made up of random error (errors in measurement by chance) and systematic error (errors in measurement due to some systematic process or influence).
A multidisciplinary study includes more than one academic discipline or topic. For example, Understanding Society collects data on a wide range of topics that span multiple academic disciplines, including education, employment, finances, social relationships, health, and biomarkers.
When a sample is nationally representative, it means the population of interest that the sample is aiming to represent is the entire population of a country/nation. Therefore the sample will match the demographics of the country, e.g. age, sex, ethnicity, the location of where people live, etc. A nationally representative sample means that implications from the results of the study can be extrapolated to apply to the nation.
Non-communicable diseases (NCDs) are diseases which cannot be directly transmitted between people. They are often chronic diseases which require long-term treatment and management. Examples of NCD’s include cardiovascular diseases, cancers and diabetes.
Nutritional epidemiology is a branch of epidemiology that investigates how diet relates to health and disease in populations.
Observational studies focus on observing the characteristics of a particular sample without attempting to influence any aspects of the participants’ lives. They can be contrasted with experimental studies, which apply a specific ‘treatment’ to some participants in order to understand its effect.
Panel studies are a type of longitudinal study and, as such, follow the same individuals over time. They vary considerably in scope and size. Examples include online opinion panels and short-term studies whereby people are followed up once or twice after an initial interview.
Piloting refers to the process of testing research procedures or instruments/measures to identify problems or issues before implementing them in the full study. Pilots are usually conducted on a small subset or separate sample of eligible participants who are sometimes encouraged to provide feedback on the process or instrument.
Postal questionnaires, also known as paper-and-pencil surveys, are surveys that are sent to respondents by post. These paper-based questionnaires are self-administered, meaning they are carried out by the respondent themselves and not an interviewer.
Prospective refers to the future and is the opposite to retrospective which refers to the past. In prospective studies, individuals are followed over time and data about them is collected as their characteristics or circumstances change.
Prospective studies generally collect information about the present or relatively recent past (e.g. the past week or month).
Random variation is a term used to describe unexpected variations in results. Random variation can be due to external factors, such as the environment, or uncontrolled factors within the experiment/analysis.
A randomised control trial (RCT) is a type of research experiment where participants are randomly assigned into two or more groups to test a specific intervention. An experimental group receives the intervention while a comparison or control group receives an alternative intervention, a placebo, or no intervention. RCTs are the gold standard method for testing the effectiveness of new health interventions, drugs, or treatments as, because they are so controlled, bias in the results is reduced.
Relative validity is the extent to which two methods of measurement can rank individuals in the same order, capturing their relative differences but not absolute differences. For example, you could test the relative validity of two different measurements of water consumption in a day. Say one measure asked for the number of glasses of water drunk per day and the other measure asked for the number of litres of water drunk per day. It is not possible to directly compare these two measures as they use different units – one uses glasses (and doesn’t specify the size of the glass) and one uses litres. If both measures ranked all the individuals in the same order (i.e. the order of those drinking the most to least amount of water was the same in both measures), then they would show high relative validity. By contrast, absolute validity refers to the degree of agreement between two methods measuring the same phenomenon in the exact same units.
Reporting bias occurs when only a select amount of information is shared, while other information is supressed. It can occur in data collection when participants in a study choose to not report certain information (e.g. not report that they’ve previously smoked cigarettes when asked if they’ve ever smoked) and in dissemination of research findings when researchers choose only to report certain results from a study (e.g. only reporting results that back up their hypothesis).
Residual confounding is the error or variance in results that remains even after controlling for confounding factors.
Retrospective refers to the past and is the opposite of prospective which refers to the future. In retrospective studies, information is collected about individuals’ past.
This might be through interviews in which participants are asked to recall important events, or by identifying relevant administrative data to fill in information on past events and circumstances.
Sample attrition is when participants in a longitudinal study stop taking part over time. Attrition can reflect a range of factors, from study participants being untraceable, to them choosing not to take part when contacted. Problems that can stem from attrition include the potential to lead to bias in the study findings (if attrition is higher among some groups than others) and the reduction in the size of the sample as participants drop out.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis porta, eros eu tincidunt condimentum, nibh nibh maximus lorem, ut bibendum risus dolor vel lectus. Cras posuere felis neque, a elementum mi pretium id. Sed aliquet massa et odio gravida euismod. Nulla eros neque, lobortis id aliquet ac, pharetra a justo. Nulla consequat, augue sed laoreet ultrices, libero nunc suscipit tortor, sit amet pulvinar mi risus non turpis. Fusce tempus, felis quis faucibus rhoncus, turpis est tincidunt tellus, sed dictum mauris massa luctus diam. Mauris scelerisque luctus elementum. Aliquam aliquet in ipsum a finibus.
Social mobility describes the movement in an individual’s socioeconomic circumstance and social status. Inter-generation mobility refers to the change in an individual’s socioeconomic circumstance and social status in comparison to their parents’, whilst intra-generational mobility is the change in socioeconomic circumstance and social status over the course of an individual’s life. Social mobility can be described as being upward, when socioeconomic circumstances improve, or downward, if circumstances worsen.
A stratified sample is a representational sample of a population that is acquired by splitting the population into homogeneous sub-groups called strata. With socially-stratified samples, each stratum/sub-group consists of people with similar characteristics based on socio-economic factors such as income, race, gender or educational attainment among other factors. Stratified random sampling (also known as proportional random sampling) is when random samples from the strata are taken, in proportion to the population. In disproportionate sampling, the strata are not proportional to the population.
A stratified cluster sampling framework brings together both cluster and stratifying sampling techniques. Cluster sampling is a term used to describe probability sampling where a population is split into smaller groups called clusters and people are picked at random from these clusters to make a sample. Cluster sampling is typically used with large, geographically spread-out populations. A stratified sample is a representational sample of a population that is acquired by splitting a population into homogeneous sub-groups called strata. Each stratum consists of people with similar characteristics in terms of income, race, sex, or educational attainment, among other factors. This sampling method can be used to ensure adequate representation from particular groups, for example, those from disadvantaged areas or ethnic minority groups.
A sub-group is a smaller group of a study’s population or sample that is defined by specific characteristics or events, e.g. sex, ethnicity, age. Sub-groups may be used for analyses that are focused only on certain groups of people, e.g. only women.
A subsample is a smaller group of individuals that have been sampled from the original sample. They are often randomly sampled and so are representative of the original sample.
The term sweep is used to refer to a round of data collection in a particular longitudinal study (for example the age 7 sweep of the National Child Development Study refers to the data collection that took place in 1965 when participants were aged 7 years). Note that some studies use the term wave instead of sweep.
Validation studies are when the validity (accuracy) of a certain method of measurement is tested. This is often done by comparing the accuracy of the method in question to a “gold standard” measurement tool for the collection of that particular data; however, could also involve comparing to other available measurement tools (that may not be a “gold standard”). Validation studies are used to understand and prevent information bias.
Variables is a term used to describe data items within a dataset. So, for example, a questionnaire might collect information about a participant job (job title, whether it involves any supervision, the type of organisation worked for and so on).
This information would then be coded using a code-frame and the results made available in the dataset in the form a variable about occupation.
In data analysis, variables can be ‘dependent’ and ‘independent’, with the dependent variable being the particular outcome of interest (for example high attainment at school) and the independent variable being the variables that might have a bearing on this outcome (for example parental education, gender and so on).