Skip to content
Closer - The home of longitudinal research

Latest videos – Longitudinal Methodology Series – Centre for Longitudinal Studies Missing Data Strategy

News |

The ninth seminar in the CLOSER Longitudinal Methodology Series featured talks from the Centre for Longitudinal Studies (CLS) Missing Data Strategy team – George Ploubidis, Professor of Population Health and Statistics, Dr Tarek Mostafa, Research Officer and Brian Dodgeon, Research Fellow.


George Ploubidis, Tarek Mostafa and Brian Dodgeon

Missing data handling in longitudinal studies: Evidence for the 1958 British birth cohort.

Selection bias, in the form of incomplete or missing data, is unavoidable in longitudinal surveys. It results in smaller samples, incomplete histories, lower statistical power and it is well known that unbiased estimates cannot be obtained without properly addressing the implications of incompleteness. However, statistical methods exist which enable users to exploit the full richness of longitudinal data and address sources of bias. We present the first results from the Centre for Longitudinal Studies Missing Data Strategy using data from the National Child Development Study (NCDS) which follows the lives of 17,416 people born in England, Scotland and Wales in a single week of 1958. Also known as the 1958 Birth Cohort Study, it collects information on physical and educational development, economic circumstances, employment, family life, health behaviour, wellbeing, social participation and attitudes. Since the birth survey in 1958, there have been ten further ‘sweeps’ of all cohort members at ages 7, 11, 16, 23, 33, 42, 44, 46, 50 and 55. Within Rubin’s framework we present three papers where we clarify the situations where complete case analysis and methods that operate under the Missing At Random assumption return unbiased results. We present a three step empirical/data driven approach that maximises the plausibility of the Missing At Random assumption in NCDS and quantify the effect of strong departures from MAR. Our findings have implications for missing data handling in the 1958 cohort and other longitudinal studies as they will help inform the selection of auxiliary variables and allow researchers to effectively communicate the assumptions underlying popular MAR methods such as Multiple Imputation, Full Information Maximum Likelihood and Inverse Probability Weighting.