(1) Background

The paper describes datasets from two linked projects conducted in Germany: PIAAC and PIAAC-L. PIAAC, the Programme for the International Assessment of Adult Competencies, is a large-scale international assessment initiated by the Organisation for Economic Co-operation and Development (OECD); it measures key adult cognitive skills in around 40 countries, including Germany (OECD, 2013a, 2013b). PIAAC is a multi-cyclical study conducted at 10-year intervals. In the first cycle of PIAAC, the German data were collected over an eight-month period in 2011/12 (Rammstedt, 2013; Zabal et al., 2014). PIAAC-L, the longitudinal follow-up study to PIAAC in Germany, collected data in 2014, 2015 and 2016 (Rammstedt et al., 2017).

PIAAC provides estimates of the level and distribution of cognitive skills in the adult population of the participating countries. It also examines factors associated with acquiring and maintaining these skills. To date, the results of the PIAAC study have been used mainly for cross-national comparative analyses and benchmarking of cognitive skills. These analyses are used, for example to identify the strengths and weaknesses of national education systems across economically developed countries.

The skills measured in the first cycle of PIAAC were:

  • literacy – the ability to understand, use and interpret written texts (Jones et al., 2009);
  • numeracy – the ability to retrieve, interpret and use mathematical information (Gal et al., 2009); and
  • problem solving in technology rich-environments – the ability to use technology to access and process information (Rouet et al., 2009).

These skills are highly relevant for education, employment and participation in social life. They are considered central prerequisites for acquiring, maintaining and further developing job-specific competencies in the workplace and through continuing education and training. In the private context, these skills are needed for a wide range of everyday activities and situations, such as searching for and processing information on the Internet, reading and understanding written material (e.g. newspapers, instruction manuals), or interpreting numbers and information presented in graphs or tables.

The PIAAC interview consists of an interviewer-administered background questionnaire and a self-administered cognitive assessment. The computer-based background questionnaire collects a wide range of information about respondents in areas such as education; training; employment; current/recent work history; reading/writing/numeracy/computer activities at work and in everyday life; migration and family background; as well as other personal information, for example voluntary work, social trust, political efficacy and health (OECD, 2009).

PIAAC-L, the longitudinal follow-up study to PIAAC in Germany, extended the analytical scope by collecting additional information about the German PIAAC respondents (Martin et al., 2018; Rammstedt et al., 2017; Zabal et al., 2016, 2017).1 It was a collaborative project on the part of three institutes that are responsible for three large-scale surveys in Germany: PIAAC Germany, the German Socio-Economic Panel (SOEP) and the German National Educational Panel Study (NEPS).2 This collaboration was the foundation for implementing links and synergies between these three national surveys. On the one hand, the SOEP household panel survey concept was adopted and adapted for PIAAC-L. Specifically, in addition to the PIAAC Germany respondents, their household members aged 18 years and older were also interviewed. On the other hand, questionnaire items and constructs from the three surveys were combined. A key element of PIAAC-L was the re-assessment of cognitive skills in Wave 2 in 2015. The cognitive assessment included literacy and numeracy instruments from PIAAC and instruments in the cognitive domains of reading and mathematics from NEPS (Starting Cohort Adults [SC6]). The PIAAC-L survey also collected a broad range of detailed biographical information, for example on formal education, continuing education and training, work/employment, skill mismatch, income, benefits, wealth, languages, health, time use, leisure activities, satisfaction, attitudes, personality, trust and locus of control. While the PIAAC-L instruments incorporated questionnaire items and assessment instruments from PIAAC Germany, NEPS and the SOEP, the linkage of PIAAC-L with NEPS and SOEP is at a merely conceptual level. The datasets from the different surveys are not linked and are not distributed jointly.

Because PIAAC-L built on PIAAC, and because PIAAC was, in a sense, the starting wave for PIAAC-L, data from both surveys are presented jointly in this paper. To distinguish between the two data sources, we use the acronym PIAAC DEU in what follows to refer to the PIAAC survey conducted in Germany in 2011/12, and the acronym PIAAC–L to refer to the longitudinal follow-up to PIAAC in Germany, which collected data in 2014, 2015 and 2016.

(2) Methods

2.1 Study design and data collection

PIAAC DEU. All features of the PIAAC study design were defined by an international consortium of several institutions from Europe and the United States. In order to ensure comparability of results across participating countries, the PIAAC Consortium developed a set of best practice standards and guidelines to which all countries were required to adhere when implementing the survey (OECD, 2014). The Consortium was responsible for the development of the survey instruments, in cooperation with international domain expert groups.

PIAAC is a computer-assisted face-to-face survey. The interview consists of two parts: a background questionnaire and a cognitive assessment. The background questionnaire, a standard question-and-answer instrument, covers a broad range of topics and collects detailed information about the respondent. Interviewers administer the background questionnaire to respondents. In the first cycle of PIAAC, this took on average 40 minutes. The second part of the interview, the cognitive assessment, is self-administered. In the first cycle of PIAAC, respondents worked on items from the domains of literacy, numeracy and/or problem solving in technology-rich environments. Designed by experienced international test developers, these items reflect everyday situations. PIAAC implemented an assessment design in which only a subset of items was administered to respondents. This assessment design reduced the burden on respondents and enabled a more in-depth and accurate assessment of their skill levels.

Respondents worked on the tasks independently and without external help. They could take as much time as needed to work on the items; interviewers monitored their progress. On average, the assessment took 60 minutes. By default, the items were completed in a computer-based mode on the interviewer’s laptop. Respondents with no or insufficient computer experience were given the option of taking the assessment in a paper-based format.

In Germany, data collection was conducted during an eight-month period between August 2011 and March 2012; 129 interviewers administered the interviews at the respondents’ homes. Respondents received an incentive of 50 euros after completion of the interview. A set of comprehensive measures was implemented to address respondents (e.g. advance mailing, information material) and optimise survey operations (e.g. at least four in-person contact attempts, targeted refusal conversion). Table 1 summarises information about the study design and other key facts about PIAAC DEU and PIAAC-L.

Table 1

PIAAC DEU and PIAAC-L: study design and key facts.


KEY FACTS PIAAC DEU PIAAC-L

Context National implementation of international large-scale assessment; cross-sectional National study of three national surveys (PIAAC, SOEP, NEPS); longitudinal

Aims To assess adult skills (level and distribution) and to measure possible factors related to the acquisition and maintenance of these skills To extend PIAAC DEU by enhancing the contextual information through a wide variety of questions and the administration of different cognitive instruments

Design F2F
• BQ (CAPI)
• Cognitive assessment: default CBA, PBA option (both self-admin.)
F2F
• W1: SOEP household & person questionnaires (CAPI)
•W2:
  → BQ (CAPI)
  → Cognitive assessment: PIAAC (default CBA, PBA option; both self-admin.); NEPS (PBA; self-admin.)
• W3:
  → SOEP household & person questionnaires (CAPI)
  → Cognitive assessment: SOEP (CBA; interviewer- and self-admin.); Number Series Study (CBA; self-admin.)

Assessment domains Literacy, numeracy, problem solving in technology-rich environments • W1: not applicable
• W2: PIAAC instruments (literacy, numeracy); NEPS instruments (reading, mathematics)
• W3: SOEP short scales (animal naming task, symbol-digit test, multiple-choice vocabulary test); Number Series instrument (numerical reasoning)

Target persons Aged 16–65 years living in private HHs • W1/3: APs and their HH members aged 18+ years
• W2: APs and their spouses/partners in same HH

Location/language Germany/German Germany/German

Time of data collection 08/2011–03/2012 • W1: 02–08/2014
• W2: 03–09/2015
• W3: 03–07/2016

No. of interviewers 129 • W1: 138
• W2: 117
• W3: 117

Interview duration (Ø, in minutes) BQ (40), cognitive skills assessment (60) • W1: HH interview (15); person interview (45)
• W2: Person interview (90–100)
• W3: HH interview (10); person interview (45)

Monetary incentive (post-paid) €50 • W1: €25 (HH + AP interview); €10 (HH person interview)
• W2: €40
• W3: €30 (HH + AP interview); €20 (HH person interview)

Source: Adapted from Martin et al. (2022, pp. 175–176).

Notes: AP = anchor person; BQ = background questionnaire; CAPI = computer-assisted personal interviewing; CBA = computer-based assessment; F2F = face-to-face; HH = household; PBA = paper-based assessment; self-admin = self-administered; W = wave.

PIAAC-L. The PIAAC-L design differed in several aspects from that of PIAAC. First, PIAAC-L combined input, material and processes from three surveys, PIAAC DEU, NEPS and the SOEP. Second, PIAAC-L was a longitudinal survey that collected comprehensive data over three waves (2014, 2015, 2016). Third, the scope was extended to the household level following an SOEP household panel survey concept.3 PIAAC DEU respondents who agreed to be re-contacted for a follow-up survey (called anchor persons in PIAAC-L) and their household members aged 18 years and older were interviewed. Fourth, PIAAC-L relied mainly on existing instruments or items from PIAAC DEU, the SOEP and NEPS. The design specifics of each wave are presented separately in what follows.

In Wave 1 (2014), 138 interviewers conducted computer-assisted face-to-face interviews in the households of the anchor persons. A household questionnaire was always administered (preferably to the anchor persons), and a separate person questionnaire was administered to the anchor persons and their household members aged 18 years and older. The questionnaire content was derived from the SOEP. On average, the household questionnaire took 15 minutes to complete and the person questionnaire 45 minutes. Data collection took place between February and August 2014. After completing the household and person questionnaires, the anchor person received an incentive of 25 euros. Every other household member who completed a person questionnaire received 10 euros.

In Wave 2 (2015), 117 interviewers conducted face-to-face interviews in the households of the anchor persons. In contrast to Wave 1, in Wave 2, only cohabiting spouses/partners of the anchor persons were interviewed in addition to the anchor persons. As in PIAAC DEU, the interview consisted of two parts, a background questionnaire and a cognitive assessment. The PIAAC-L Wave 2 background questionnaire included many items from PIAAC DEU and was supplemented with items from NEPS and other surveys.

The core of the Wave 2 data collection was the re-assessment of cognitive skills. The literacy and numeracy assessment instruments from PIAAC DEU and the reading and mathematical skills assessment instruments from NEPS were used. The assessment was conducted in a self-administered mode under the same assessment conditions as the original surveys. Thus, the PIAAC assessment was computer-based with a paper-based option for respondents with no or insufficient computer experience; no time limit was imposed. By contrast, the NEPS assessment was available only in a paper-based mode and had a time limit. Anchor persons were randomly allocated to one of eight assessment conditions. Two assessment conditions consisted solely of NEPS items; two assessment conditions comprised only PIAAC DEU items and four assessment conditions combined NEPS and PIAAC DEU instruments (Zabal et al., 2017). Partners and spouses of the anchor persons were randomly allocated to two assessment conditions, both of which consisted solely of NEPS instruments. On average, the interview took between 90 and 100 minutes. Data collection took place between March and September 2015. Every respondent received 40 euros upon completion of the interview.

In Wave 3 (2016), 117 interviewers conducted computer-assisted face-to-face interviews in the households of the anchor persons. A household questionnaire based on the SOEP was always administered (preferably to the anchor persons). Separate person questionnaires were administered to the anchor persons and their household members aged 18 years and older. The person questionnaire included a lot of content from the SOEP, which was supplemented with items from other surveys and several newly developed items. To measure general intellectual ability and education- and experience-related cognitive pragmatics, each person interview was extended to include a computer-assisted cognitive test sequence consisting of three short scales previously implemented in the SOEP (Richter et al., 2017). One of these scales was interviewer-administered; the other two were self-administered. In addition, a subgroup of anchor persons completed a set of number series tasks, measuring numerical reasoning, as part of an add-on module (Number Series Study developed by the German Institute for International Educational Research; Engelhardt & Goldhammer, 2018).

On average, the household questionnaire took 10 minutes to complete and the person interview 45 minutes. Data collection took place between March and July 2016. After completing the household and person questionnaires, the anchor person received an incentive of 30 euros. Every other household member who completed a person interview received 20 euros.

2.2 Sampling and sample characteristics

PIAAC DEU. The PIAAC target population consisted of persons aged 16–65 years living in private households in Germany. Nationality, legal status and native language were not relevant for eligibility. A registry-based two-stage stratified and clustered sampling design was implemented to obtain a representative sample of the target population. In Stage 1, municipalities were selected proportional to their size. In Stage 2, target persons were randomly selected from the registries of the selected municipalities. A total gross sample of 10,240 target persons was selected. The achieved net sample comprised 5,465 completed cases (Mage = 40, SDage = 14; 51% female). According to international PIAAC computations, Germany achieved a response rate of 55% (OECD, 2013c; Zabal et al., 2014).

PIAAC-L. As the PIAAC DEU net sample migrated to the PIAAC-L survey, separate sampling was not required for PIAAC-L. However, of the 5,465 PIAAC participants in Germany, a small group of 240 cases was not available for follow-up interviews (e.g. due to the lack of consent).

In Wave 1 (2014), 5,225 persons (96% of the net PIAAC sample) could be recontacted, of whom 3,758 (72%) participated as anchor person in PIAAC-L (Mage = 43, SDage = 14; 51% female). In addition, members (aged 18+) of the anchor person’s household were interviewed. Sixty-five percent of the 2,371 registered spouses/partners of anchor persons (n = 1,539; Mage = 47, SDage = 13; 50% female) and 51% of other registered household members, such as children or parents (n = 934; Mage = 37, SDage = 17; 56% female), provided a person interview.

In Wave 2 (2015), 3,263 anchor persons (87% of the net sample from Wave 1) remained as panellists in the sample (Mage = 44, SDage = 14; 51% female). In addition, per design, only the spouses/partners of the anchor persons were interviewed in that wave. Of 2,103 registered spouses/partners, 1,368 (65%) participated in the survey (Mage = 47, SDage = 13; 50% female) and completed a person questionnaire and a cognitive assessment.

Finally, in Wave 3 (2016), 2,967 anchor persons (91% of the net sample from Wave 2) participated in the last round of PIAAC-L (Mage = 46, SDage = 14; 51% female). Of 1,954 registered spouses/partners of the anchor persons, 65% were successfully interviewed (n = 1,262; Mage = 48, SDage = 13; 51% female). Of the 1,210 other registered household members, 54% also provided an interview (n = 652; Mage = 35, SDage = 18; 53% female).

2.3 Survey instruments

Section 2.1 – and especially Table 1 therein – shows that the interview situation in PIAAC DEU and PIAAC-L is affected by several aspects. First, in PIAAC DEU and Wave 2 of PIAAC-L, interviewers started the interview by administering a comprehensive background questionnaire, followed by the assessment of cognitive skills. Second, PIAAC-L combined instruments from three large-scale surveys. Third, PIAAC DEU and PIAAC-L collected a multitude of in-depth information over four data collection periods. Thus, the constructs and various survey instruments were more complex and larger in number compared with many other surveys. A detailed presentation of survey instruments and constructs is beyond the scope of this paper. However, Table 2 provides a brief overview of the central constructs and the core instruments. For more in-depth information and descriptions of the interviewer-administered person and household questionnaire constructs and instruments, we refer the reader to the technical documentation (see also Section 3.2) and related (inter)national reports (e.g. DIW Berlin/SOEP, 2014a, 2014b, 2014c, 2015a, 2015b, 2015c; Martin et al., 2018; OECD, 2009, 2013c; Zabal et al., 2014; Zabal et al., 2016, 2017).

Table 2

Key constructs measured in PIAAC DEU and PIAAC-L.


PIAAC DEU PIAAC-L WAVE 1 PIAAC-L WAVE 2 PIAAC-L WAVE 3

CONTENT OF PERSON QUESTIONNAIRES

General Information

    • Country of birth/citizenship/immigration x x x

    • Household size x

    • Number of books at home x x

    • Household possessions x

    • Living and household situation x x

    • Life events x x

    • Childhood x x

    • Friends x

Family

    • Children x x x Updates

    • Marital/civil status x

    • Relationships x x

    • Activities with spouse/partner x

    • Siblings x

    • Parental information x x

Biographical calendar (15–65 years) x

Education

    • Formal – highest school qualification x x x x

    • Formal – highest professional qualification x x x x

    • Continuing education and training x x (x) x

Work

    • Employment status x x x Updates

    • Occupation & industry x x x

    • Information on current job x x x

    • Information on last job x x x

    • Job search x x x

    • Job changes x

    • Years paid work x

    • Information about employer x

    • Contract, working hours x

Income detailed Updates

    • Earnings x x x

    • Bonuses, benefits, income from various sources x

    • Household income x

Time use/leisure activities x x x

Health

    • General (basic information) x x

    • Detailed (e.g. the 12-item Short-Form Health Survey, SF-12) x x

    • Behaviour x x

    • Doctor visits x x

Attitudes, well-being, personality

    • Learning strategies x

    • Political efficacy x

    • Social trust x

    • Cultural engagement x

    • Life satisfaction/satisfaction with life domains x x x

    • Big Five x x

    • Locus of control x x

    • Risk propensity x

    • Trust x

    • Grit x

    • Political inclination, voting x

Languages and cultural identity

    • Mother tongue x x

    • Foreign languages x

    • Attachment to country of origin x x

    • Identification as German x x

    • For migrants: identification with country of origin x

Skill mismatch x x

Self-assessed skills x x

Skill use (e.g. literacy, numeracy, computer) x x

Cognitive skills

Literacy x x

Numeracy x x

Problem solving in technology-rich environments x

Short scales to assess general intellectual ability x

Number series tasks x

Content of household questionnaires

Residential/living conditions

    • Type of dwelling, size, number of rooms x x

    • Amenities and facilities x x

Living situation, conditions and costs

    • Neighbourhood characteristics and infrastructure x

    • Ownership/tenancy/rental incl. costs x x

Household income and wealth

    • Household income and sources x x

    • Social benefits/state assistance x x

    • Savings x

Household members

    • Children (age, school attendance, activities) x x

    • Other (e.g. persons in need of help/care) x

Source: Adapted from Martin et al. (2022, pp. 181–186).

The core element of PIAAC DEU and of Wave 2 of PIAAC-L was the measurement of cognitive skills. As mentioned in Section 1, the first cycle of PIAAC measured the domains literacy, numeracy and problem solving in technology-rich environments (for a brief definition of each domain, see Section 1). For the assessment of these skills, respondents worked independently on everyday tasks from the aforementioned domains, either on a laptop or on paper. The items were presented in different formats. For example, the literacy items were presented as continuous texts (e.g. reports, emails), non-continuous texts (e.g. tables, forms), or mixed texts (e.g. newspaper articles with graphics). In the case of numeracy, mathematical information was presented in various ways (e.g. in graphs or diagrams, in texts, as symbols, with numbers, etc.). Different cognitive strategies were required to solve the items, and the amount of written text or mathematical information to be processed varied across items. The items originated from different contexts (e.g. work-related, personal) and covered different topics (e.g. health, finances, leisure activities). Construct definitions and derivation of operational dimensions for item development are provided in related framework papers and international reports (Gal et al., 2009; Jones et al., 2009; OECD, 2013b; Rouet et al., 2009). Assessment items are not accessible to the public because leaked items can no longer be implemented. However, examples and short descriptions of specific items are provided by the OECD (2013b).

Wave 2 of PIAAC-L also included an assessment of reading and mathematical skills using instruments from NEPS (Starting Cohort Adults [SC6]). The NEPS domains measure similar cognitive domains to PIAAC literacy and numeracy, and the PIAAC-L data allow researchers to explore links between these instruments. Detailed information on and a description of the conceptual frameworks of the NEPS cognitive competence domains can be found in Ehmke et al. (2009) and Gehrer et al. (2013).

2.4 Quality control and quality assurance

PIAAC DEU. The aim of PIAAC is to obtain high-quality data at both the national and international levels. To allow reliable comparisons across participating countries, the PIAAC Consortium established a set of best practice standards for all phases of the survey life cycle (OECD, 2014). These standards and guidelines covered areas such as sampling design; survey instruments, including translation and adaptation; field management; interviewer selection and training; and monitoring and control of interviewers’ work. Each participating country was required to comply with these regulations. Countries received training in several areas of survey implementation. The Consortium closely monitored the implementation of process steps and measures taken in the countries.

Recognised experts created the frameworks that formed the basis for developing the background questionnaire items and cognitive items. Subject- and/or domain-specific expert groups and item writers developed background questionnaire items and cognitive items based on these frameworks. Participating countries translated the final master source instruments (in English) into their national language(s) following the international PIAAC translation and adaptation guidelines. Where applicable, the content of the background questionnaire was adapted to reflect national differences (e.g. measurement of education). The entire instrument – the background questionnaire and the cognitive assessment with its scoring – was intensively tested to detect and remove any errors.

Between March and June 2010, a field test was conducted as a “dress rehearsal” for the main study. In the field test, the instruments were tested; the final selection of items for the main survey was based on in-depth analyses of the field test data. In addition, all processes of the PIAAC survey were tested and subsequently reviewed and corrected or improved as required.

In preparation for the main study, interviewers received an intensive five-day face-to-face interviewer training to adequately familiarise them with their tasks in PIAAC (e.g. gaining respondent cooperation, standardized administration of the questionnaire, understanding their role during the assessment). During data collection, supervisors or other home office staff of the survey institute thoroughly monitored interviewers and validated their work (e.g. through verification of completed interviews and audio recordings).

The distribution of the sample was closely monitored throughout the entire data collection period to detect potential bias at an early stage and to take measures to counteract any shortfalls. After data collection, (item) nonresponse analyses were conducted. There was no significant item nonresponse in the PIAAC DEU data. However, a low bias was found for age, citizenship and educational attainment due to unit nonresponse (Helmschrott & Martin, 2014; Zabal et al., 2014). To correct for this low bias, data users are thus advised to use the weighting factor included in the dataset for their analyses.

A committee of several stakeholders (e.g. the OECD, the PIAAC Consortium) evaluated each PIAAC country’s data in a data adjudication process to assess their fitness for use and approval for publication by the OECD. Detailed information on the implementation of PIAAC from an international perspective are provided in the international technical report by the OECD (2013c); the German technical report (Zabal et al., 2014) focuses on the implementation of PIAAC in Germany.

PIAAC-L. PIAAC-L benefited from the expertise of and adopted measures and processes from PIAAC, NEPS and the SOEP that follow(ed) best practices in survey implementation and have proved to be of high quality. For example, PIAAC-L continued to employ many of the quality control and monitoring activities used in PIAAC DEU. Questionnaire items and assessment instruments used in PIAAC-L across the three waves originated primarily from PIAAC DEU, NEPS and the SOEP. Thus, the reliability and measurement quality of the items had already been proven in the source surveys. New items were evaluated as needed in cognitive pretests.

As the same survey institute was commissioned to conduct PIAAC DEU and PIAAC-L, undertaking panel maintenance activities (e.g. sending a Christmas card) and planning interviewer assignments in the transition from PIAAC DEU to PIAAC-L was straightforward. If possible, the survey institute sent the same interviewer to the same anchor person’s household in each data collection period. An established relationship of trust between interviewer and respondent helped to further stabilise the willingness to participate. Interviewers’ work was closely monitored and completed interviews validated.

Despite all efforts, every panel survey is affected by attrition. As shown in Section 2.2, the number of participating anchor persons and other household members declined over time. For example, the risk of refusal in PIAAC-L was higher among anchor persons with low literacy skills than among anchor persons with higher literacy skills (Martin et al., 2021). In the anchor person sample, individuals younger than 25 years were slightly overrepresented, and low-educated individuals were underrepresented. Weights were computed to address selectivity from attrition, but only for anchor persons, because only they were randomly selected to participate in PIAAC DEU. For other household members, eligibility for participation in PIAAC-L depended solely on the anchor person’s participation in PIAAC-L. Thus, selection probabilities – the basis for weighting – could be calculated only for anchor persons. For each PIAAC-L wave, nonresponse and post-stratification weights were computed. Technical reports summarise the weighting activities in each wave (Bartsch et al., 2017; Burkhardt & Bartsch, 2017; Burkhardt et al., 2018).

2.5 Data anonymisation and ethical issues

When the PIAAC DEU and PIAAC-L surveys were launched, approval by an ethics committee or review board was not required and not common practice in Germany. However, the project groups explicitly adhered to research methods that strictly followed professional ethical guidelines and good scientific practice. The commissioned survey institute was a member of the European Society for Opinion and Marketing Research (ESOMAR) and fully complied with the ESOMAR standards. Survey institute staff and interviewers were trained to follow these professional principles. Participation in the survey was voluntary, personal rights were respected, and informed consent was obtained from target persons before starting the interview. Individual data were processed according to the applicable data protection regulations. Special care was taken to ensure confidentiality and to protect against possible re-identification of participants. The data is available for scientific use only and a legally binding data use agreement must be signed by data users in order to obtain the data.

2.6 Existing use of data

The PIAAC Research Data Center at GESIS maintains a bibliography based on worldwide use of PIAAC data (Maehler & Konradt, 2022). Key publications that provide further insights into the PIAAC DEU and PIAAC-L data include, for example:

(3) Dataset description and access

The scientific use files (SUFs) for PIAAC DEU and PIAAC-L are available to researchers as a package through the GESIS Data Archive for the Social Sciences (PIAAC DEU: https://search.gesis.org/research_data/ZA5845; PIAAC-L: https://search.gesis.org/research_data/ZA5989) and the PIAAC Research Data Center (www.gesis.org/en/piaac/rdc).

PIAAC DEU. The German PIAAC SUF includes 1,407 variables for 5,465 respondents from a random sample of the population aged 16–65 years residing in private households in Germany. Given the assessment design, each respondent worked only on a subset of items. Thus, the dataset does not include point estimates for cognitive ability. Instead, it comprises 10 proficiency scores (called plausible values) imputed from an item response theory (IRT) model for each skill domain as a measure of proficiency (OECD, 2013c; von Davier et al., 2009). Researchers can also estimate plausible values independently using the open-access R package, PVPIAACL, developed by LIfBi (https://github.com/jcgaasch/PVPIAACL). However, using this tool requires a high level of psychometric and methodological knowledge.

Due to the complex sampling design in PIAAC (stratification, clustering), applying a replication approach is recommended in order to correctly estimate the variance of a population statistic. Replication methods are used to divide the full sample into several subsamples that mirror the design of the full sample. The jackknife approach, one of several replication methods, was used for PIAAC DEU data; the full sample was divided into 80 subsamples and each subsample weighted separately. To estimate the variance of a full sample statistic, the sum of squared deviations between each replicate sample and the full sample must be calculated. For more information on replication approaches, see OECD (2013c).

PIAAC-L. The PIAAC-L SUF comprises 12 separate datasets with 3,935 variables for the three waves of data collection (see Table 3). Data are available for anchor persons (2014: 3,758 cases; 2015: 3,263 cases; 2016: 2,967 cases) as well as for their adult household members who agreed to participate in PIAAC-L (2014: 2,473 cases; 2015: 1,368 cases; 2016: 1,914 cases); in addition, data were collected at the household level. Ten datasets are in a wide format; their file names were chosen to refer to the content and the year of data collection (e.g. file Household_14 relates to data collected with the household questionnaire in Wave 1 in 2014). Units in these files are either households or persons. The weighting files and the NumberSeries_16 file refer only to data from anchor persons.

Table 3

File information on the PIAAC DEU (ZA5845) and PIAAC-L (ZA5989) scientific use files.


DATA SOURCE DATA FILE UNITS n QUESTIONNAIRE COGNITIVE ASSESSMENT # VARIABLES

PIAAC DEU ZA5845_v2-2-0 PIAAC Rs 2012 5,465 PIAAC BQ PVs PIAAC literacy, numeracy, PS-TRE (scaled) 1,407

PIAAC-L ZA5989_

Household_14 HHs 2014 3,737 HH 2014 353

Persons_14 Rs 2014 (APs, HH members 18+) 6,231 PS 2014 PVs PIAAC literacy, numeracy, PS-TRE (assessed in PIAAC, re-scaled) 882

Weights_14 APs 2014 3,758 8

Persons_15 Rs 2015 (APs, partners in HH) 4,631 PS 2015 (1) PVs PIAAC literacy, numeracy (assessed in PIAAC, rescaled)(2) PVs PIAAC literacy, numeracy (assessed in PIAAC-L 2015)(3) WLEs PIAAC literacy, numeracy (assessed in PIAAC-L 2015)(4) WLEs NEPS reading, mathematics (assessed in PIAAC-L 2015) 1,054

Weights_15 APs 2015 3,263 8

Household_16 HHs 2016 2,946 HH 2016 227

Persons_16 Rs 2016 (APs, HH members 18+) 4,881 PS 2016 635

Cognit_16 Rs 2016 (APs, HH members 18+) 4,818 Data from three short tests of cognitive ability 589

NumberSeries_16 Selected APs 2016 910 Data from add-on module (Number Series Study) 104

Weights_16 APs 2016 2,967 8

Calendar Rs 2014 and/or 2016 (APs and HH members 18+) 31,361 PS 2014, 2016: biographical calendar 14

Registry All persons ever registered in PIAAC-L 10,343 53

Source: Adapted from Martin et al. (2022, p. 198).

Notes: AP = anchor person; BQ = background questionnaire; HH = household; PS = person; PS-TRE = problem solving in technology-rich environments; PVs = plausible values; WLEs = weighted maximum likelihood estimates; Rs = respondents; 18+ = aged 18 years and older.

The files Calendar and Registry are in a long format and incremental. The Calendar file has 31,361 observations (spells) from biographical calendars from each respondent who participated in 2014 and/or 2016. It contains information on each respondent’s activity status (e.g. at school, in vocational training, employed, retired) over the life course (aged 15–65 years). Each row represents one activity per individual. The Registry file includes data from all three waves and has 10,343 data entries. Each row in the file represents precisely one person nested in an anchor-person household. The Registry file provides socio-demographic information and several status variables (e.g. relationship to the anchor person, years of participation). It gives a comprehensive overview of respondents and other household members (e.g. non-eligible household members such as children younger than 18 years) across all PIAAC-L waves. The file includes all identification variables (ID variables) available in PIAAC-L and PIAAC DEU and is the primary source for merging data from different files.

Each anchor person’s household has a unique and permanently valid ID variable (hnrid). Users should use this ID variable for merging records from household-based data files. Every registered person within a household has a permanent and unique ID variable (pnrfestid). This ID variable combines the variable hnrid with a two-digit serial number. An anchor person always has the serial number 01. The ID variable pnrfestid should be used for merging records from two or more person-based PIAAC-L data files. To merge PIAAC-L anchor person data with the PIAAC DEU data, the PIAAC ID variable, seqid, must be used.

Cognitive assessment data are available in the files Persons_14, Persons_15, Cognit_16 and NumberSeries_16. For anchor persons, the file Persons_14 includes 10 plausible values for each cognitive domain assessed in PIAAC DEU, which were re-scaled using background data from PIAAC DEU and Wave 1 of PIAAC-L. The file Persons_15 contains plausible values for the domains literacy and numeracy measured with PIAAC instruments and weighted maximum likelihood estimates for reading and mathematics measured with NEPS instruments (for more information, see GESIS – Leibniz Institute for the Social Sciences et al., 2017). The file Cognit_16 includes data from the short scales assessing cognitive abilities. The file NumberSeries_16 contains data from the add-on module, the Number Series Study.

The computation of replicate weights for variance estimation introduced in PIAAC DEU was not continued in PIAAC-L. Instead, data users are encouraged to use variables on sampling and stratification in a Taylor series linearization approach. The document Notes to the User provides some Stata code as illustration (GESIS – Leibniz Institute for the Social Sciences et al., 2017, pp. 10–11). If users want to perform their analyses with weighted data, the correct selection of weighting factors depends on which data from which data collection years are combined. Explanations on how to combine different weighting factors for various analysis purposes can be found in the aforementioned Notes to the User (GESIS – Leibniz Institute for the Social Sciences et al., 2017) and in the three technical reports on weighting (Bartsch et al., 2017; Burkhardt & Bartsch, 2017; Burkhardt et al., 2018).

When estimating cognitive ability – for example literacy skills – with plausible values, the estimate must be computed using all 10 plausible values for the domain in question to avoid underestimating the standard error. Technically, each analysis (e.g. a linear regression) must be repeated for each plausible value separately. Applying Rubin’s Rule (Rubin, 1987), the final regression estimate results from averaging the 10 individual parameters. Methodological advice (e.g. through online tutorials or workshops), tools, scripts and information on how to conduct analyses with plausible values are available from the PIAAC Research Data Center at GESIS (www.gesis.org/en/piaac/rdc), the OECD website (www.oecd.org/skills/piaac/data/), and, for example a recently published textbook by Maehler and Rammstedt (2020).

3.1 Repository location

GESIS Data Archive for the Social Sciences, Cologne, Germany.

3.2 Object/file name

PIAAC DEU. Rammstedt, B., Martin, S., Zabal, A., Konradt, I., Maehler, D. B., Perry, A., Massing, N., Ackermann-Piek, D., & Helmschrott, S. (2016). Programme for the International Assessment of Adult Competencies (PIAAC), Germany – Reduced version (ZA5845; Version 2.2.0). GESIS Data Archive, Cologne. DOI: https://doi.org/10.4232/1.12660

Package comprises:

  • Data: ZA5845_v2-2-0.sav, ZA5845_v2-2-0.dta, ZA5845_noMissings_v2-2-0.dta (data in Stata format without missing definitions)
  • Questionnaire: ZA5845_fb.pdf
  • Codebook: ZA5845_cod.xlsx
  • User guide: ZA5845_Userguide.pdf
  • Technical report: ZA5845_technical_report.pdf (Zabal et al., 2014)
  • List of all missing values for STATA: ZA5845_missings.txt
  • List of changed variables in Version 2.2.0: ZA5845_changes_in_v2-2-0.pdf
  • Overview of files: Content.doc
  • Overview of codes (folder): ISCO-08.pdf, ISIC_Rev4.pdf, Language-Codes.txt

PIAAC-L. GESIS – Leibniz Institute for the Social Sciences, German Socio-Economic Panel (SOEP) at DIW Berlin, & LIfBi – Leibniz Institute for Educational Trajectories. (2017). PIAAC-Longitudinal (PIAAC-L), Germany (ZA5989; Version 3.0.0). GESIS Data Archive, Cologne. DOI: https://doi.org/10.4232/1.12925

Package comprises:

  • Data (.sav, .dta):
    • – Across waves: ZA5989_Registry_v3-0-0, ZA5989_Calendar_v3-0-0
    • – 2014: ZA5989_Persons_14_v3-0-0, ZA5989_Household_14_v3-0-0, ZA5989_Weights_14_v3-0-0
    • – 2015: ZA5989_Persons_15_v3-0-0, ZA5989_Weights_15_v3-0-0
    • – 2016: ZA5989_Persons_16_v3-0-0, ZA5989_Household_16_v3-0-0, ZA5989_Weights_16_v3-0-0, ZA5989_Cognit_16_v3-0-0, ZA5989_NumberSeries_16_v3-0-0
  • Questionnaires (.pdf):
    • – 2014: ZA5989_fb_Persons_14, ZA5989_fb_Household_14
    • – 2015: ZA5989_fb_Persons_15
    • – 2016: ZA5989_fb_Persons_16, ZA5989_fb_Household_16, ZA5989_fb_Cognit_16
  • Codebooks (.pdf, .xlsx):
    • – Across waves: ZA5989_cod_Registry, ZA5989_cod_Calendar
    • – 2014: ZA5989_cod_Persons_14, ZA5989_cod_Household_14, ZA5989_cod_Weights_14
    • – 2015: ZA5989_cod_Persons_15, ZA5989_cod_Weights_15
    • – 2016: ZA5989_cod_Persons_16, ZA5989_cod_Household_16, ZA5989_cod_Weights_16, ZA5989_cod_Cognit_16, ZA5989_cod_NumberSeries_16
  • Technical and weighting reports, reports from survey institute (.pdf):
  • Notes to the user: ZA5989_NotesToTheUser.pdf
  • Information on variables in background model (.xlsx): ZA5989_PIAAC_L_Variables_PVs_background_model_12_15, ZA5989_PIAAC_L_Variables_PVs_background_model_14

3.3 Data type

PIAAC DEU:

  • Primary data (from interview: questionnaire/assessment) and processed data (e.g. derived variables, imputed plausible values, weighting factors)
  • Questionnaire, codebook, user guide, technical report, other information (e.g. list of updated variables, codes, overviews)

PIAAC-L:

  • Primary data (from interview: questionnaire/assessment) and processed data (e.g. derived variables, imputed plausible values, weighting factors)
  • Questionnaires, codebooks, notes to the user, technical reports, weighting reports, information on variables in background model

3.4 Format names and versions

PIAAC DEU:

  • Data: .sav, .dta (version 12)
  • Codebook: .xlsx
  • Other documents: .pdf, .doc, .txt

PIAAC-L:

  • Data: .sav, .dta (version 12)
  • Codebooks: .pdf, .xlsx
  • Reports: .pdf
  • Other documents: .pdf, .xlsx

Upon request, the PIAAC Research Data Center provides the PIAAC DEU and PIAAC-L data also in other common formats (e.g. csv).

3.5 Language

PIAAC DEU/PIAAC-L:

  • Questionnaire(s), report from survey institute (only PIAAC-L): German
  • Variable labels, other materials: English

3.6 License and limitations of sharing data

The PIAAC DEU and PIAAC-L data contain sensitive information. In compliance with data protection regulations (European Union General Data Protection Regulation [EU-GDPR]), the datasets are available as scientific use files for scientific research only. Data can be obtained and processed by researchers after signing a data use agreement (contact via the PIAAC Research Data Center). All other documents (e.g. questionnaires, codebooks, reports) are accessible without restrictions from the PIAAC Research Data Center website (www.gesis.org/en/piaac/rdc).

3.7 Publication date

PIAAC DEU: first release in 2014; latest updated release in 2016.

3.8 FAIR data

The PIAAC DEU and PIAAC-L scientific use file data conform to the FAIR Principles.

  • Findable. PIAAC DEU: https://doi.org/10.4232/1.12660; PIAAC-L: https://doi.org/10.4232/1.12925
  • Accessible. PIAAC DEU/PIAAC-L: Both datasets are accessible through the GESIS Data Archive for the Social Sciences.
  • Interoperable. Datasets and documentation files are available in standard formats (SPSS, Stata, Word, Excel, PDF).
  • Reusable. For both datasets, several documents (questionnaire documentation, codebooks, technical reports) are available through the PIAAC Research Data Center.

(4) Reuse potential

The PIAAC DEU and the PIAAC-L data contain information that allows for examining the acquisition, maintenance and consequences of adult cognitive skills. In addition to cognitive skills, a wide range of context information about respondents was collected, such as biographical information on education and employment history, further education and training, family-related issues, health, leisure activities, attitudes, well-being, personality and identity (see Table 2).

Thus, the data include comprehensive information to investigate questions related to the trend of demographic ageing and the challenges it poses to individuals and societies. Psychological research on ageing has focused on its effects on cognitive skills across the life span (e.g. Hanoch et al., 2021) and has addressed various factors related to the maintenance or decline of cognitive skills, health and productivity in later life. The PIAAC DEU/PIAAC-L data provide an opportunity to explore, for example determinants of change and stability in cognitive skills in adulthood, such as lifestyle, job complexity, health, or aspects of socialization. With their large sample sizes, the datasets allow researchers to study different sub-populations, for example individuals with low skill levels, couples, or persons with varying conditions of employment. The fact that different assessment instruments were administered allows for investigating links between these instruments.

To date, only a few researchers in the psychological field have used these data to address research questions. Most research comes from other social sciences (for an overview, see Maehler & Konradt, 2022). Previous research with a psychological focus based on the present data has, for example investigated the relationship between cognitive skills and personality (e.g. Rammstedt et al., 2016). Recent research has examined the stability of and changes in cognitive skills in adulthood (e.g. Lechner et al., 2021).

Although the PIAAC DEU and PIAAC-L data offer many analytical possibilities, their complex design and structure with multiple data files and different survey units can make it challenging to use them correctly. Longitudinal analyses and handling multiple imputed plausible values require a high level of analytical skills. Participation in data analysis workshops and access to methodological advice and tutorials that are publicly available on the PIAAC Research Data Center website provides knowledge and support for the analyses.