(1) Overview

Collection Date(s)

Data were collected between August 18, 2010 and May 20, 2013.


The SAPA Project is a collaborative online data collection tool for assessing psychological constructs across multiple domains of personality. These domains – temperament, cognitive abilities, and interests – have been chosen based on historical and current prominence in the field of individual differences research. The primary goal of the SAPA Project is to determine the combined and independent structures of each of these domains based on the collection of large, cross-sectional, online samples. Secondary goals include [1] the identification of additional domains (e.g., motivation, character) which may also provide insight into the ways that individuals differ; and [2] an improved understanding of the demographic and psychographic correlates of individual differences in personality.

The data described here were collected to evaluate the item characteristics, reliability and structural properties of ICAR measures assessing 4 distinct components of cognitive ability. The data were also used to validate the ICAR items based on online administration relative to self-reported achievement test scores and university majors; see Condon & Revelle [1]. The broader goal of these analyses was to demonstrate the utility and potential for public-domain cognitive ability measures that can be administered without proctoring via the internet.

It should be noted that additional ICAR item types have been developed since these data were collected and further development remains ongoing. More information about all of the ICAR measures, including the item content, can be found at the ICAR website (icar-project.com).

(2) Methods


Participants (N = 96,958) completed the online survey in exchange for feedback about various aspects of their personality and cognitive abilities. No active advertisements or marketing efforts were used to attract participants for this data collection; web traffic statistics (collected through Google Analytics) suggest that participants who did not come to the website directly were directed to it through links from various other websites about personality, personality research, general psychology topics, and psychometrics. Many of these websites were academic/educational in nature.

Many demographic and psychographic variables are included in the data. These include: gender (66.2% of the participants were female); age (see Figure 1); marital status (see Table 1); body mass index (see Figure 2); country (198 countries were represented in total; 78.1% of participants were from the United States, and 34 countries had more than 100 participants); state/region (for 32 countries); educational attainment level (see Table 2); employment status (see Table 3); self-reported achievement test scores (SAT – Critical Reading, SAT – Mathematics, and ACT Composite); parental educational attainment level (for 1 or 2 parents); and parental field of employment (for 1 or 2 parents). Participants were not required to provide any of these data except age and gender.

Figure 1 

Participants by age and gender (males in blue, females in red).

Marital Status Participants

Never Married 71,227
Married 16,597
Domestic Partnership 6,070
Divorced & Single 2,470
Divorced & Remarried 497
Widowed & Single 97

Table 1

Marital Status.

Figure 2 

Participants by Body Mass Index.

Educational Level Participants

Less than 12 years 14,034
High school graduate 5,995
Currently in college/university 49,810
Some college/university, but did not graduate 4,868
College/university degree 11,382
Currently in graduate or professional school 4,225
Graduate or professional school degree 6,644

Table 2

Educational Attainment Level.

Employment Status Participants

Currently a student 48,716
Employed 37,619
Not employed, seeking work 3,506
Not employed 3,504
Homemaker 1,686
Retired 817

Table 3

Employment Status.


Items from 4 cognitive ability scales were administered. These scales were all developed in the lab of the second author for the purposes of online assessment of cognitive ability. The 4 scales contained a total of 60 items: 9 Letter and Number Series items, 11 Matrix Reasoning items, 16 Verbal Reasoning items and 24 Three-dimensional Rotation items. The Letter and Number Series (“LN”) items prompt participants with short digit or letter sequences and ask them to identify the next position in the sequence from among six choices. Matrix Reasoning (“MR”) items contain stimuli that are similar to those used in Raven’s Progressive Matrices. The stimuli are 3x3 arrays of geometric shapes with one of the nine shapes missing. Participants are instructed to identify which of the six geometric shapes presented as response choices will best complete the stimuli. The Verbal Reasoning (“VR”) items include a variety of logic, vocabulary and general knowledge questions. The Three-dimensional Rotation (“R3D”) items present participants with cube renderings and ask participants to identify which of the response choices is a possible rotation of the target stimuli. None of the items were timed in these administrations as untimed administration was expected to provide more stringent and conservative evaluation of the items’ utility when given online (there are no specific reasons precluding timed administrations of the ICAR items, whether online or offline).

For each of the 60 items, participants were instructed to choose the best answer from 8 response choices, including “None of these” and “I don’t know”. In order to maintain the integrity of the measures, the data provided here have already been scored. The raw unscored data may be obtained by contacting the first author. It should be noted that many useful analyses might be conducted on the unscored data including evaluation of ‘distractor’ response choices and their relationship with both other items and demographic variables.


The items were administered using the Synthetic Aperture Personality Assessment (“SAPA”) technique [5], a variant of matrix sampling procedures discussed by Lord [3]. This method produces data which contain “massive missingness” by design [4]. This missingness qualifies for classification as missing completely at random (“MCAR”, 2) and it is further described as massively missing because the mean level of missingness by participant was approximately 68%. The number of administrations for each item varied considerably (median = 21,764; m = 19,998; sd = 10,958) as did the number of pairwise administrations between any two items in the set (median = 2,610; m = 4,240; sd = 4,110). The items were presented to participants in random order as part of a broader personality survey, and participants responded to as many items as they wished. The broader survey included items relating to a range of topics such as Big Five personality traits, vocational interests, creativity and more. The number of items administered to each participant was procedurally independent of participant response characteristics; participants were encouraged to complete 16 items. On average, participants responded to 12.4 (sd = 3.7; median = 12) of the ICAR items; it is not clear why some participants elected to complete fewer than 16 items though it seems likely that participants skipped the items that were particularly challenging (it would be possible to explore this topic further using the data described here). The feedback provided to participants on these cognitive ability items was informal and basic. Participants were told how many of the cognitive ability items were answered correctly out of those for which a response was given (e.g., “you answered 12 items correctly out of 14”). Participants were also informed about the average number of items answered correctly by previous participants of their age and gender, though no specific interpretative guidance was given about their score. For more information about the development and use of these measures, see Condon & Revelle [1].

Quality Control

The available data are presented largely as they were collected with only two exceptions:

  1. Partial removal of data collected from participants who completed the survey more than once in a single browser session. This was done by assigning participants a random user ID that was persistant as long as their current browser session remained active. In those cases where more than 1 response set was entered in a single browser session, only the first response set was kept.
  2. Removal of participants with self-reported ages younger than 14 and older than 90. The survey is not intended for participants younger than 14. Self-reported ages over 90 were removed on the grounds that they were deemed to be unlikely.

Ethical issues

No personally identifying information were collected from participants in these data.

(3) Dataset description

Object name


The data file is named to indicate the data collection method (SAPA), the source of the items (ICAR), and the time period over which the data were collected (18aug2010 through 20may2013). The file can be found at: http://dx.doi.org/10.7910/DVN/AD9RVY.

The rdata file includes three objects. The most pertinent of these is the raw data object (‘sapaICAR-Data18aug2010thru20may2013’). The remaining two objects are helper files for data analysis: ‘ItemLists’ is a list object that provides an index of the ICAR items associated with each item type, and ‘superKey60’ is a scoring matrix for the 4 ICAR scales (though the items have previously been ‘scored’ as correct or incorrect - using 1s and 0s, respectively – the scoring key remains useful for scoring the scales). There is also a text file (‘demographic codes.txt’) which describes the coding for all of the other variables in the data set.

The variable names within the data file have been coded with the acronyms “LN” for Letter and Number Series, “MR” for Matrix Reasoning, “R3D” for Three-Dimensional Rotation, and “VR” for Verbal Reasoning.

Data type

Self-report, cross-sectional survey data from 96,958 participants.

Format names and versions

The data are stored as a single rdata file (approximately 2.7 MB) and three separate csv files (approximately 32 MB in total). The rdata file includes the three objects described above: ‘ItemLists’, ‘superKey60’ and the main data object ‘sapaICARData18aug2010thru20may2013’. Each of these three objects are separated into individual csv files. There is also an associated text file that provides full information on the demographic codes (‘demographic codes.txt’).

Data Collectors

The first and second author were responsible for collecting all the data described in this dataset.


All aspects of the survey and website were written in English. Data collected about the website through Google Analytics suggests that some participants used browser-based translation software, but no specifics are available about the extent and effect of these translations.


The data have been deposited under the open license CC0 (Public Domain Dedication).


The data are freely available for use with appropriate citation.

Repository location

The data were published on Dataverse and are located at http://dx.doi.org/10.7910/DVN/AD9RVY.

Publication date

The dataset was published on September 23, 2015.

(4) Reuse potential

The data are well-suited for many types of structural and correlational analyses of cognitive abilities, including those aimed at reproducing or extending the analysis described by Condon & Revelle [1]. These might include evaluation of the ways in which the 4 item types relate to one another, evaluations of their shared structure, evaluations of structural relationships across constructs in various groups of participants, evaluations of differential item functioning, meta-analyses, or the development of new IRT-based adaptive measures. It should be noted that the feasibility of some of these analyses may be affected by the substantial missingness in the data. Additional, non-overlapping data sets from the SAPA Project are also available for use, including those which contain measures of personality and other constructs; contact the authors for more information.

Competing Interests

The authors declare that they have no competing interests.