Over the lifespan, people accumulate a large and idiosyncratic set of experiences that shape their mental knowledge representations. These changes in mental representations driven by experience could potentially be a major factor underlying typical age-related patterns, such as decreased memory performance with increased age (Buchler & Reder, 2007; Ramscar et al., 2014; Wulff et al., 2019). In line with this view, recent research (Kenett et al., 2020; Siew et al., 2019) has documented consistent differences in the size and structure of younger and older adults’ mental representations (Dubossarsky et al., 2017; Wulff et al., 2018, October 29). To evaluate whether and how strongly these differences in representations contribute to differences in cognitive performance across age, we designed the My Small World of Words (MySWOW) project. Building on ongoing efforts to obtain word association norms for several languages in a large online citizen-science project, the Small World of Words (SWOW; e.g., De Deyne et al., 2019) study (, MySWOW aims to elicit large-scale, free association networks from single individuals and concurrently assess their cognitive performance across a variety of tasks that are known to be linked to semantic representations. MySWOW addresses shortcomings of previous research, which either had focused on group-level representations (Dubossarsky et al., 2017) or did not concurrently assess cognitive performance on a broad scale (Wulff et al., 2018, October 29). We present data of a proof-of-concept study of MySWOW involving four younger and four older individuals. For additional details of the study rationale, see companion article (Wulff et al., 2022).


The MySWOW proof-of-concept study relied on a correlational design encompassing the concurrent assessment of a large number of free word associations and a broad battery of cognitive tasks for four younger and four older individuals. The free association task and cognitive battery were designed to match each other in order to facilitate a comparison of semantic networks and cognitive performance.

Collection Date

The data were collected from August to October 2018.


Four older adults aged 68 to 70 years old and four younger adults aged 24 to 28 years old participated and completed the study. Three more participants began the study, but dropped out after .5%, 18.7%, and 41.7% of the free association task. We only report data for the eight participants with complete data. Participants were recruited from the participant pool of the Center for Cognitive and Decision Sciences (CDS) of the University of Basel. They were contacted via phone and completed an initial screening to confirm the following inclusion criteria: mother tongue being German or Swiss German, daily access to a computer with a stable Internet connection, absence of neurological or psychiatric diagnoses. Participants were compensated with CHF 220 for their full participation consisting of CHF 180 for 3,600 answered cues (CHF 0.05 per cue) and CHF 40 for two to three hours of laboratory assessment and instructions (approx. CHF 15/h).


Free association task

Free associations were collected via a password-protected web-based platform that participants could access from home. In the association task, participants were sequentially presented with a total of 3,600 cues for which they provided three associations each, following the same procedure used in SWOW. Participants were instructed to enter, using the keyboard, the first three words that came to mind when thinking about the cue. If fewer than three words came to mind or if the cue was not recognized, the participant could proceed to the next cue by clicking on a “no further responses” or “unknown word” button, respectively. Figure 1 shows a screenshot of the free association interface.

Screenshot of the free association task showing the cue “Büroklammer”
Figure 1 

Screenshot of the free association task. The screenshot shows one trial requiring associations to the cue “Büroklammer” (paper clip) in the training mini study.

The 3,600 cues consisted of 3,000 unique and 600 repeated cues. The 3,000 unique cues, in turn, consisted of three subsets of 1,000 cues each. To ensure high coverage of central words in people’s semantic networks, the first subset consisted of 1,000 highest frequency words among the 4,443 cue words that, at time, were included in the German SWOW, with frequency determined using the German SUBTLEX frequency norms (Brysbaert et al., 2011). To ensure high coverage of the connections within people’s networks, the second subset consisted of those 1,000 from the remaining 3,443 cues in the German SWOW that most likely produced one of the cues in the first subset. Finally, to ensure a high network depth, the third subset consisted of the 1,000 most frequent associates in the German SWOW given to the cues of the first subset. The cues were presented to the participants in the same fixed, randomly determined order.

Table 1 presents the ten most central words according to node degree for each of the individual networks. Words in bold are also among the most central words in either the English (De Deyne et al., 2019) or the German1 SWOW data.

Table 1

Most Frequent Words in Association Task.



man music money money human Germany music state

music food write music country instrument1 development country

work work beautiful water animal military money goods

change school food Germany water car exact first name

water money school work occupation water wood computer

family clothing read food work country make instrument1

learn write important furniture food food school family

wood instrument1 economy instrument1 car name computer university

church name large love instrument computer children church

love summer family clothes male child work month

Note: Words were translated from German. Words in bold face are also among the ten most central words in the German (money, music, work, school, food, water, car, love, green, important) or English (money, food, water, car, music, green, red, love, work, old) SWOW data sets. 1 Musical instrument.

Cognitive assessment

The cognitive battery consisted of two sets of tasks fulfilling different purposes. The purpose of the first set was the assessment of people’s general cognitive abilities and functioning. This set included a 20-minute timed version of the Advanced Progressive Matrices (APM; Hamel & Schmittmann, 2006) as a measure of general intelligence, a Digit-Symbol Substitution Test, as is found in the Wechsler Adult Intelligence Scale IV as subtest “coding” (WAIS-IV; Wechsler, 2008) as a measure of processing speed, the Mehrfachwahl-Wortschatz-Intelligenztest: Form I (MWT-A; Lehrl et al. 1995) as a measure of vocabulary size, and, finally, the DemTect (Kalbe et al., 2004) as a screen for dementia. The purpose of the second set was to establish word-level links between the free association network and cognitive performance. This set included 10-minute category (animals) and phonemic (letter S) fluency tasks (e.g., Wulff et al., 2018, October 29), an episodic list memory task modeled after Penn Electrophysiology of Encoding and Retrieval Study (e.g., Healey & Kahana, 2016), and an associative recall task modeled after Naveh-Benjamin et al. (2003). Behavior in the two fluency tasks can be related to the free association network based on the fact that both cues and responses naturally included animals and words starting with the letters S. Participants retrieved between 62 and 113 animals and between 45 and 138 words starting with the letter S. The retrieved animals overlapped with 1.5% of cues and 0.8% of responses, whereas the retrieved words starting with the letter S overlapped with 11.1% of cues and 11.9% of responses. The episodic memory task and the associative recall task were populated with nouns from the cue set to establish comparability with the associative network. In the episodic memory task, a total of 20 lists of 16 words each were studied and subsequently recalled. Participants correctly recalled between 28.7% and 60.9% of words, with an additional 1.3% to 25% intrusions. In the associative recall task, 4 lists consisting of 16 word-pairs were presented and tested. Participants correctly recalled between 32.8% and 96.8% of pairs. See also Table 2 for an overview of tasks included in the cognitive assessment in the MySWOW proof-of-concept study.

Table 2

Tasks in the Cognitive Battery.


Category fluency Name all the animals you can in 10 minutes. Predict performance from network Wulff et al. (2018, October 29)

Phonemic fluency Name all words starting with letter S you can in 10 minutes. Predict performance from network Griffiths et al. (2007)

Episodic memory task Study a word list and then recall the words in any order (20 lists, 16 words per list). Predict performance from network Healey and Kahana (2016)

Associative recall task Study a list of word pairs, then recall for each one word of a pair while being cued with the other (4 lists, of 16 word pairs). Predict performance from network Naveh-Benjamin et al. (2003)

Advanced Progressive Matrices Solve abstract reasoning problems. General cognitive abilities Hamel and Schmittmann (2006)

Digit-Symbol Substitution Assign digits to symbols according to rule. General cognitive abilities Wechsler (2008)

Mehrfachwahl-Wortschatz-Intelligenztest Recognize words in list of words and non-words. General cognitive abilities Lehrl et al. (1995)

DemTect Various cognitive tasks. Screen for age-related pathologies Kalbe et al. (2004)

Entry and debriefing questionnaires

At study entry, participants provided demographic information concerning their primary language (German or Swiss German), their current occupation their highest academic degree, and the income level of their household. Participants further answered questions on their usual reading behavior, e.g., the number of books read in a year. At debriefing, participants were asked to provide information on their observations during the study, for example, whether they were able to sustain concentration while working on the free associations. The specific questions are reported in the codebook (see Table 3).

Table 3

Description of Data Files.


participants.csv Contains data on demographic information, reading behavior, debriefings survey, and all but four cognitive assessments.

associations.csv Contains the corrected and uncorrected free association data.

episodic_memory.csv Contains the episodic memory training and test data.

associative_recall.csv Contains the associative recall training and test data.

animal_fluency.csv Contains animal fluency response sequences.

letter_fluency.csv Contains letter fluency response sequences.

codebook.pdf Contains descriptions of all variable names in the data files.


Participants passing the initial screening over the phone were invited to to our laboratory at the University of Basel for an introductory session lasting approximately 30 minutes. During this session, participants provided informed consent, completed the entry questionnaire, and were introduced to the web-based platform using a training mini-study involving 15 cues. Over the course of the next weeks, participants were instructed to log in and work on the free association task twice a day for 30 minutes each. On average, participants completed the free association task in 26.1 hours spread over 39.4 days. After completing the free association task, participants were invited back to the laboratory for a three-hour session that included the cognitive assessments and study debriefing.

The cognitive assessment and study debriefing session consisted of the following elements: First, participants filled out the debriefing questionnaire. Next, the verbal fluency tasks were conducted orally and recorded for later transcription by two student assistants responsible for data collection. Following the verbal fluency tasks, the participants were administered a 90-second timed Digit-Symbol Substitution Test in paper and pencil format. To conclude the first part of the lab session, the associative recall task was completed as a computerized task implemented in E-Prime (Psychology Software Tools, Inc., 2016) at a lab-computer. After a 10-minute break, the second part of the lab session began with the List Memory task, which was also implemented as a computerized task using E-Prime (Psychology Software Tools, Inc., 2016). The Mehrfachwahl-Wortschatz-Intelligenztest (MWT-A) was then conducted in paper and pencil format followed by a 20-minute timed version of the APM in paper and pencil format. The lab session concluded with the interactive verbal administration of the DemTect, carried out by one of the student assistants. Subsequently, participants received their monetary compensation for participation.

Dataset description

Table 3 provides an overview of the different files containing the data. All data are available as comma-separated files. A codebook.pdf file provides descriptions of all variable names across the data files. All variable names and data labels have been translated to English. The association and fluency data, however, were not translated and are reported in German.

The data were published on the Open Science Framework (10.17605/OSF.IO/VKWPS) on 15 February 2021. The data are licensed under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) and follow the FAIR guidelines (Wilkinson et al., 2016).

Quality control

Free associations were obtained using a computerized task adapted from the German version of the large and long-running citizen-science project Small World of Words (De Deyne et al., 2019). Responses were cleaned in the following way. First, all responses matching either individual words or composites of words included in the German aspell dictionary were accepted as valid. The remaining words were subjected to manual correction. Overall, 4.2% of responses were corrected manually with a median string edit distance (i.e., the number of letters that were changed) of 2 (mean = 2.42). The data include the repeated sampling of a subset of words as an indicator to assess the reliability of free association. In the cognitive assessments, standardized and computerized tasks were used to improve comparability with previous work. All participants in the data set completed all tasks. Three additional participants started the study but dropped out (see Participants section).

Ethical considerations

All data were recorded in reference to a random six letter identifier assigned to participants at the beginning of the study. Identifying information such as names or addresses was not recorded. Potentially identifying information such as participants’ age, birthday, and occupation were not included in the publicly available files. Participants provided informed consent that included permission for public sharing of the data. The study was approved by the internal review board of the Department of Psychology at the University of Basel (# 014-17-1).

Reuse potential

The reported data present the only publicly available resource containing large-scale free-association and cognitive performance data on the level of the individual (cf. Morais et al., 2013). The data can be reused in at least four different ways.

First, the data can be used to investigate individual and age differences in the structure of semantic networks. Past research has either used methods that prevented an assessment of large-scale networks or has analyzed semantic networks in the aggregate (e.g., similarity ratings, Wulff et al., 2018, October 29). There is only one other study that has elicited individual level semantic networks of comparable size (cf. Morais et al., 2013); however, it relied on a snowball approach that resulted in less comparable networks and its data are not publicly available. We present a first comparison of the structure of individuals’ semantic networks in the companion paper (Wulff et al., 2022); however, that comparison considers only four structural properties and one approach to constructing the network from individuals’ free associations. The current data could be used to explore other structural properties, such as assortativity (Van Rensbergen et al., 2015) and analyze the networks in ways that account for its bipartite nature and the direction of edges. In light of the large size of the individual networks, the current data could further be used to shed light on whether structural properties of the network are distributed homogeneously across the network or not.

Second, by utilizing the large cognitive battery, the data can be used to study the link between individuals’ semantic networks and various aspects of cognitive performance. These analyses are facilitated by the fact that the items in four of the cognitive tasks, semantic and letter fluency, episodic memory, and associative recall, overlap by design with the contents of individual semantic networks. To our knowledge, there are no other data sets publicly available that permit item-level predictions of cognitive performance from individual semantic networks. We report a first analysis of the link between semantic networks and trial-level cognitive performance in our companion paper (Wulff et al., 2022) and these analyses provide evidence that word centrality and relatedness derived from individuals’ networks predict cognitive performance well, but not better than an aggregate network. The current data could be used to study whether the links between the network and cognitive performance can be strengthened either by using alternative ways to construct networks from free associations or by using cognitive models of the cognitive tasks.

Third, the data could be used as the basis for the creation of synthetic data and simulation of individual and age differences in semantic representations. Such simulations could be particularly helpful in two ways. One way recruits the data to make forecasts for new studies aimed at eliciting individual semantic representations (see, e.g., Wulff et al., 2018, October 29). Specifically, one could assess whether alternative designs using different numbers of cues or responses may produce more reliable measurements of individual-level networks (De Deyne & Storms, 2008). The other way uses the data as the basis for simulating the role of different aging mechanisms (e.g., age-related changes in associations, search processes) on cognitive performance (see, e.g., Borge-Holthoefer et al., 2011).

Fourth, the data can be used as a free association norm (Nelson et al., 2004). With a total of 80,000 free associations, they represent one of the largest publicly available resources of free association data in German (e.g., Schulte im Walde & Borgwaldt, 2015; Schulte im Walde et al., 2008). This means that these data can be used for the many different purposes free association norms are traditionally used in the psychological literature. This includes comparisons between norms, which may shed light on how variations in the elicitation procedure (e.g., one versus three responses per cue) or how the aggregation of the response of different individuals affect the distribution of associations (De Deyne & Storms, 2008).

While we see much potential for further uses of our data, some limitations apply. The data originate from only eight individuals, rendering between-person analyses difficult. Further, participants spent an average of 26.1 hours spread over almost 40 days on producing the free association data. The lengthy and repetitive nature of data collection could have led to tiredness or have been influenced by changing situational circumstances and systematic training effects that one cannot easily control for in these data.