Skip to content
Closer - The home of longitudinal research

Background & instruction on using this guide

< Go to guide main menu

Cognition is a broad term that refers to the mechanisms by which we acquire, process, store and ultimately use information from the environment [1].

It encompasses processes such as perception, learning, memory, and reasoning [1]. The CLOSER British birth cohorts contain a wealth of information on cognition over the life course, and the cognitive measures available in these studies have been used to answer research questions in many different fields, e.g. education [2, 3], public health [4, 5], economics [6], psychiatry [7], psychology [8-10], and political science [11]. However, these cognitive tests vary considerably both within and across the cohorts, and this has hindered studies of developmental trends and cross-cohort differences. Moreover, there is considerable heterogeneity in the quality and quantity of the documentation used to describe these cognitive assessments, and, to date, there has been no attempt to develop a uniform description of the key features of these instruments.

Therefore, as a first step in facilitating developmental and cross-cohort studies, we provide a comprehensive description of the cognitive measures that are available in five British birth cohorts. A companion report (which will be available at will assess the feasibility of harmonising the cognitive measures both within and across the cohorts.

Cohorts included

This guide documents the cognitive measures that have been administered in the following studies: i) the MRC National Survey of Health of Development (NSHD); ii) the 1958 National Child Development Study (NCDS); iii) the 1970 British Cohort Study (BCS70); iv) the Avon Longitudinal Study of Parents and Children (ALSPAC); and v) the Millennium Cohort Study (MCS). A brief description of each study follows:

The MRC National Survey of Health of Development: The NSHD is the longest running of the British birth cohort studies. It originally consisted of a socially stratified sample (N=5,362) of men and women born to married parents in England, Scotland or Wales in March 1946. The sample was selected from an initial maternity survey of 13,687 pregnancies, and consisted of all births to non-manual and agricultural families, and a random 1-in-4 sample from manual families. To date, the participants have been followed up in 24 data collections when they were aged 2, 4, 6, 7, 8, 9, 10, 11, 13, 15, 19, 20, 21, 22, 23, 24, 25, 26, 31, 36, 43, 47-54, 60 -64 and 68-69. At age 69, the most recent home visit, 2,149 cohort members participated. More details about this study can be found at:

The 1958 National Child Development Study: The NCDS follows the lives of 17,415 people born in England, Scotland and Wales in a single week in 1958. The NCDS started in 1958 as the Perinatal Mortality Survey and captured 98% of the total births in Great Britain in a week. The cohort has been followed up a total of ten times at ages 7, 11, 16, 23, 33, 42, 44, 46, 50 and most recently at 55 when 9,137 cohort members took part. Additional information on these sweeps can be found at:

The 1970 British Cohort Study: The BCS70 follows the lives of 17,198 people born in England, Scotland and Wales in a single week in 1970. The BCS began as the British Births Survey and participants have since been followed up nine times at ages 5, 10, 16, 26, 30, 34, 38, 42 and the most recent at age 46, when 8,581 cohort members took part. In addition to the main BCS70 sweeps, the following sub-studies have been conducted: 1) Twins study (2008-2009); 2) Age 21 sweep (1992); 3) Age 7 sweep (1977); and 4) 22 month and 42 month sweeps (1972-1973). For further details of these sub-studies, see

The Avon Longitudinal Study of Parents and Children: The ALSPAC charts the lives of 14,541 people born in the former county of Avon between April 1991 and December 1992. Assessments have been administered frequently, with 68 data collection time points between birth and 18 years of age. Data is collected on both parents and children, and more recently ALSPAC has started to recruit and collect data on the children of the original cohort members. Further information can be found at:

The Millennium Cohort Study: The MCS follows the lives of 19,517 children born in England, Scotland, Wales and Northern Ireland in 2000 -01. Since the initial birth survey at 9 months, the cohort has been followed up five times at ages 3, 5, 7, 11 and most recently at age 14, when 11,872 cohort members took part. A description of these sweeps is available at:

More details on each of the cohorts, including cohort profiles and guidance on accessing the data, can be found at

Measuring cognition

Researchers from different disciplines often approach the study of cognition from different perspectives, which can lead to inconsistencies in terminology. For instance, the term cognitive ability is most commonly used in the social sciences (e.g. education, economics, psychology), whereas the term cognitive functioning appears more often in medical disciplines (e.g. geriatric medicine). Both terms broadly refer to individual differences in mental processes of thinking, and the demarcation between them is poorly defined. At a more specific level, different terms may be applied to different groups of functionally connected cognitive processes. For example, the various cognitive mechanisms associated with attentional control (i.e. coordinating goal-directed behaviour) have been conceptualized as executive functioning by neuropsychologists and as working memory capacity by experimental psychologists [12].

Along with differences in terminology, measurement strategies can vary depending on factors such as academic discipline, historical factors, research setting, and characteristics of the population being studied. For instance, researchers with an educational background may be more likely to measure skills and abilities that are developed in the school environment, e.g. pen and paper tests of reading comprehension and arithmetic. Researchers from a cognitive neuroscience background may be more likely to administer instruments that aim to capture specific cognitive processes, e.g. computer-administered tests of working memory and visual processing.

In terms of research setting, due to time and resource constraints, large population-based studies may be forced to rely on short, easy-to-administer cognitive tests (e.g. [13]), whereas smaller-scale studies may have the opportunity to administer more comprehensive assessment batteries (e.g. [14]). Moreover, measures that are ostensibly similar in content may serve radically different purposes, e.g. tests of verbal fluency can be used to profile executive function in the general adult population (e.g. [15]), or as part of a screener for dementia in individual clinical assessments (e.g. [16]).

Given the above described heterogeneity in the study of cognition, we aim to be as inclusive as possible and document all measures of cognition that are available in five key British birth cohorts, regardless of academic discipline, methodology, function or participant (e.g. cohort member, cohort members’ mother).

Conventions in the available tests

In discussing cognitive measures that are available in the cohorts, it is possible to draw a distinction between tests of achievement and tests of ability [17]. Achievement tests are used to measure knowledge and competence accumulated within a particular area, e.g. reading skills, language skills, arithmetic and mathematics [18]. Ability tests typically assess an individual’s capability of solving unfamiliar problems, usually by employing some form of reasoning (e.g. verbal, numeric, visuospatial) [18]. This distinction is analogous to the idea of crystallised and fluid intelligence (see section on Specific features documented below). Although these types of test may seem well-differentiated, scores tend to correlate highly due to functional overlap [18]. Indeed, Dickens [17] argues that it is impossible to measure ability without also measuring the test taker’s reading or verbal comprehension. Furthermore, any reasoning task that involves some form of acquired knowledge (e.g. geometry, arithmetic, general knowledge) will also be impacted by the individual’s level of achievement. As such, the most widely used batteries of cognitive assessment typically include tests of both ability and achievement, e.g. the Wechsler scales [19] and the British Ability Scales [20]. Given this theoretical and functional overlap, this report documents both achievement and ability tests.

The tests that were administered during childhood in the earlier cohorts appear to reflect the curricula of those periods. For example, the early arithmetic tests contain several conventions that are no longer used in the teaching of mathematics. Moreover, we noted a trend whereby tests became more reflective of achievement and attained knowledge as children entered adolescence. We do not, however, include educational qualifications and school educational attainment measures, e.g. key stage national curriculum tests. Educationalists have criticised these tests for various reasons, such as: i) changes in the curricula and tests over time, ii) the high stakes for teachers and schools encouraging a “teaching to the test” mentality, and iii) questions regarding political interference in the monitoring and reporting of national standards (see [21] for a more detailed discussion of this issue).

Prior to the 1970s, no standardised tests of cognitive ability had been developed for use in the British population [22]. As such, many of the tests administered during childhood in the earlier cohorts (NSHD, NCDS) were devised specifically for the cohort studies by educationalists. In particular, many of the childhood tests were developed at the National Foundation for Educational Research (NFER) [23]. Standardised ability tests (e.g. the British Ability Scales) became the primary form of assessment beginning at the age 10 sweep of the BCS in 1980. The exact content of such standardised tests varies in order to be age appropriate for the study children. Moreover, there are important mode effects to consider; traditional pen and paper methods and physical tasks (e.g. block building) were more common in childhood (particularly in the older cohorts), whereas modern assessment formats (e.g. computer-assisted personal interviewing; CAPI) are used more regularly in later sweeps/cohorts. External factors may also have contributed to bias in the tests; e.g. at the age 16 sweep of the BCS, national teacher strikes meant that a smaller than expected number of cognitive tests were returned, and these were completed in different settings (approximately 3,000 in schools, approximately 2,000 in homes).

Regarding the cognitive measures that were administered in adulthood (available only in NSHD, NCDS and BCS), two trends became evident. First, there was a considerable period (when participants were aged in their 20s to early 40’s) over which little information on cognition was gathered. In the NCDS and BCS, tests during this period focused on basic skills in adult literacy and numeracy, as well as cognitive measures from the children of the cohort members. Second, the measures of cognition that were administered in mid-life and beyond differed considerably from those used in childhood. Whereas the measures administered in childhood were comprised largely of tests of ability (e.g. novel problem solving) and achievement (e.g. literacy and numeracy), the measures administered in adulthood (beginning primarily as participants entered their 40s) were more reflective of cognitive skills/abilities that impact on functioning in day-to-day adult life, e.g. short-term memory, visual scanning ability, and verbal fluency. Recent research, however, has demonstrated that these common adult tests demonstrate structural and functional overlap with childhood tests of ability and achievement [24]. As such, in addition to the childhood measures outlined above, we describe all the available measures of general cognitive function in adulthood.

Overview of the cognitive measures

In spite of the structural and functional overlap mentioned above, the broader differences that exist between the measures administered in childhood and adulthood informed our decision to divide our description of the cognitive measures into two separate sections reflecting these different stages of life. The links below provide tabulated overviews of the cognitive measures administered in the five cohorts during childhood and those used in adulthood. The overview tables outline the name of each test by cohort, age (or decade), and reporter (with the latter documented in the table footnotes).

Specific features documented

In order to provide a comprehensive and consistent description of the cognitive measures in the five British birth cohorts, we document various features of the different tests. Furthermore, in order to facilitate the comparison of these measures both within and across the cohorts, we classify each measure at a conceptual level under a common theoretical framework. Although there are multiple theoretical models that are proposed to account for individual differences in cognitive tests, we chose the Cattell-Horn-Carroll (CHC) model of cognitive ability [25] as our overarching framework. There are three primary reasons for this decision:

  1. The CHC model is built into the theoretical framework of (or is at least compatible with) many of the cognitive tests administered in the cohorts, particularly in childhood, e.g. the British Ability Scales, the Wechsler scales.
  2. The CHC model is the most comprehensive and strongly supported, empirically derived taxonomy of cognitive abilities [25, 26].
  3. The CHC model has shown a high degree of generality across different tests, including those designed under other theoretical frameworks; e.g. recent psychometric evidence has demonstrated that neuropsychological tests designed to assess executive function demonstrate structural and functional alignment with the CHC model [24].

This model conceptualises cognitive ability as multidimensional and functionally integrated [25]. The CHC model is hierarchical in nature, ranging from general ability (g) to broad, narrow, and specific abilities [25]. Specific abilities, at the bottom of the hierarchy, are the only observable cognitive abilities, and are usually tied to specific tests (e.g. ability to repeat back sentences). Narrow-stratum abilities are inferred, and are captured in clusters of highly correlated specific abilities (e.g. ability to repeat back sentences and ability to repeat back individual words may reflect a broader memory span ability). Similarly, broad-stratum abilities are reflected in clusters of correlated narrow-stratum abilities. Arguably the two most commonly discussed broad-stratum abilities are ‘crystallised intelligence’ and ‘fluid intelligence’. Crystallised intelligence broadly refers to acquired knowledge, and encompasses narrow-stratum abilities such as general knowledge, lexical knowledge, and language development [26]. Fluid intelligence refers to an individual’s ability to solve novel problems, without relying on acquired knowledge [26]. It includes processes such as induction and sequential reasoning. The ‘fluid-crystallised’ split mirrors the ability vs achievement test distinction previously discussed (see section on Conventions in the available tests above). By convention, abilities at the broad-stratum level are denoted with an abbreviation that begins with a capital ‘G’ (standing for ‘general’), followed by lowercase letters, e.g. Gc (crystallised intelligence), Gf (fluid intelligence) [25]. A brief description of each of the broad stratum abilities of the CHC model is provided in the table below.

Broad-stratum abilities as defined in the CHC model of intelligence [27]

GfFluid reasoning/ fluid intelligenceAbility to solve ‘novel’ problems without relying on previously acquired knowledge.
GsmShort-term memoryAbility to store and manipulate information in one’s immediate awareness.
GlrLong-term storage & retrievalAbility to store information in memory and recall this information over periods of time ranging from minutes to years. The main distinction between this and Gsm is that, in Gsm tests, there is a continuous effort to maintain awareness of the information, whereas in Glr tests the info has been placed out of conscious awareness for a specified period of time, and must be ‘retrieved’.
GsProcessing speedDegree to which cognitive tasks can be performed quickly and without error.
GtReaction timeSpeed and accuracy with which decisions/judgements can be made when presented with information.
GpsPsychomotor speedSpeed and fluidity with which body movements can be made.
GcAcquired knowledge/ crystallised intelligenceSkill/knowledge base acquired, e.g. knowledge of the fundamental meaning of words. Highly dependent on culture.
GknDomain-specific knowledgeMastery of specialised knowledge, e.g. foreign language proficiency, geographical knowledge.
GrwReading and writingSkills related to written language, e.g. reading speed, spelling ability.
GqQuantitative knowledgeKnowledge/achievement related to mathematics.
GvVisual processingAbility to mentally simulate and manipulate imagery.
GaAuditory processingAbility to identify and process information from sound.
GoOlfactory abilitiesAbility to detect and process information from odours.
GhTactile abilitiesAbility to recognise and process information from touch.
GkKinesthetic abilitiesThe ability to detect and process meaningful information in proprioceptive sensations.
GpPsychomotor abilityPrecision, coordination and strength of body movements.

At the highest level of the hierarchy, a general cognitive ability factor (g) is posited. Both the structure and validity of this model have been supported in many factor analytic studies [26], and general cognitive ability has been shown to be an important predictor of a wide range of life outcomes across different groups [28].

The key features of each of the cognitive measures are documented as outlined in the table below. Please be aware that over time some of the features detailed in this report may have subsequently been updated or changed.

Outline of the key features documented for each measure of cognitive ability

Domain:First, each measure will be classified at the broadest possible level, e.g. does it assess a form of verbal, or non-verbal (i.e. performance) ability.
Measures:This section will list the more specific areas of cognition that are measured by each test, e.g. lexical knowledge, reading comprehension, general sequential reasoning, quantitative reasoning, short-term episodic memory, visual scanning, simple reaction time etc. This information will be taken from the original source documentation for the measure. If the source documentation is unavailable or does not contain this information, we will consult technical resources documented in the cohort literature.
CHC:In this section, we will document the broad-stratum ability (e.g. Gc, Gf, Gsm) associated with each test. Again, this will be determined using the source documentation. If the source documentation is unavailable/inadequate, the test/task will be matched with established broad-level cognitive abilities as described in the extant literature, e.g. [24]. For a more detailed description of the CHC model of cognitive ability, see [24, 25, 29].Not all cognitive tests fit within the CHC framework, for example developmental tests in early childhood and basic language and numeracy tests in adulthood. In such instances, no broad-stratum ability will be assigned to these tests. In addition, some tests may be associated with more than one broad-stratum.
Administration method:Here we will describe the key features of how the test was administered, including the test administrator (e.g. teacher, psychologist, trained interviewer) and method used (e.g. CAPI, pen and paper, oral response). This section will help highlight any mode effects to consider when tests are being compared within/across cohorts.
Procedure:We provide a brief description of the test itself and the administration procedure. Details (where available) include:
- Nature of questions/items
- Number of questions/items
- Number of sub-tasks (if appropriate)
- Whether practice trials were administered
- Whether prompts or encouragement were used
- Duration of the test
Questionnaire:Where possible we provide links to the original questionnaire documentation (or provide the file name), the majority of which are freely available online.
Scoring:In this section we provide information on the scoring of the tests (both raw scores and any standardised/normalised scores available).
Item-level variable(s):Here we list the relevant item-level variable names (where available). For some tests, item level variables were not available as either the test had not been processed or the data were not readily available at the UKDS (for further information please contact the relevant data providers). Note, variables could be in either upper or lower case, so please check for both.
Total score/derived variable(s):Here we list (where available) any derived variables (i.e. any variables that were constructed by manipulating the original raw data) and summary/total scores for the test. For some tests, total scores were not available.
Descriptives:Where total scores were available we provide basic descriptive statistics for the tests, including number of available cases (N), mean (M), standard deviation (SD), and range of scores. We also include histograms as a means of quickly assessing the distribution of scores, enabling researchers to identify potential issues such as floor and ceiling effects. Note that, although the descriptive statistics are accurate at the time of writing, ongoing updates and improvements to the raw data by the hosts may lead to minor discrepancies with previous/future documents.
Age of participants:Here we note the M, SD, and age range (in weeks, months or years, as appropriate) of participants at time of assessment (where available).
Other sweep and/or cohort:In the instance that the same measure has been administered in multiple waves or cohorts, this information will be recorded here. This may not mean the test is exactly the same. For example, a British Ability Scales (BAS) test previously administered, may have been subsequently revised and updated. There may also be mode effects to consider; e.g. the NSHD, NCDS and BCS all include word list learning tasks in mid-adulthood, however in NSHD the words are presented visually, whereas in NCDS and BCS they are presented aurally. In addition, we have also included references to the same tests, which have been devised by different test developers. For example, in ALSPAC the Wechsler Intelligence Scale for Children (WISC-III) was administered, and includes sub-scales such as Recall of Digits which is also available in the BAS and administered in the BCS.

Tests which cover very broad domains such as mathematics and reading which are conceptually similar but not the same test are not included in this section. For example, the mathematics tests do not cover all the same fields of mathematics i.e. arithmetic, algebra, geometry and include different questions in each of the mathematical fields.
Source:Here we specify the original source of each test. Typical sources include scale/test manuals, published empirical articles or descriptions of the processes used to create tests specifically for a given cohort study.
Technical resources:Here we provide details (where available) of useful technical resources and supplementary materials. Examples include user guides and methodological papers/materials (beyond the core source materials).
Reference examples:Finally, (where available) we provide examples of empirical articles that have made use of the given test (in these British birth cohorts only). This section is neither an exhaustive list, nor an endorsement of the quality of the reported research or treatment of the cognitive variables therein, rather it serves simply to provide examples of the measures in use.

Explore the cognitive measures in the cohort studies covered by this guide:

Further information:

This page is part of CLOSER’s ‘A guide to the cognitive measures in five British birth cohort studies’.