Data From the MySWOW Proof-of-Concept Study: Linking Individual Semantic Networks and Cognitive Performance

We report data from a proof-of-concept study involving the concurrent assessment of large-scale individual semantic networks and cognitive performance. The data include 10,800 free associations—collected using a dedicated web-based platform over the course of several weeks—and responses to several cognitive tasks, including verbal fluency, episodic memory, associative recall tasks, from four younger and four older native German speakers. The data are unique in scope and composition and shed light on individual and age-related differences in mental representations and their role in cognitive performance across the lifespan.


BACKGROUND
Over the lifespan, people accumulate a large and idiosyncratic set of experiences that shape their mental knowledge representations. These changes in mental representations driven by experience could potentially be a major factor underlying typical age-related patterns, such as decreased memory performance with increased age (Buchler & Reder, 2007;Ramscar et al., 2014;Wulff et al., 2019). In line with this view, recent research (Kenett et al., 2020;Siew et al., 2019) has documented consistent differences in the size and structure of younger and older adults' mental representations (Dubossarsky et al., 2017;Wulff et al., 2018, October 29). To evaluate whether and how strongly these differences in representations contribute to differences in cognitive performance across age, we designed the My Small World of Words (MySWOW) project. Building on ongoing efforts to obtain word association norms for several languages in a large online citizen-science project, the Small World of Words (SWOW; e.g., De Deyne et al., 2019) study (https:// smallworldofwords.org), MySWOW aims to elicit largescale, free association networks from single individuals and concurrently assess their cognitive performance across a variety of tasks that are known to be linked to semantic representations. MySWOW addresses shortcomings of previous research, which either had focused on group-level representations (Dubossarsky et al., 2017) or did not concurrently assess cognitive performance on a broad scale (Wulff et al., 2018, October 29). We present data of a proof-of-concept study of MySWOW involving four younger and four older individuals. For additional details of the study rationale, see companion article (Wulff et al., 2022).

METHODS
The MySWOW proof-of-concept study relied on a correlational design encompassing the concurrent assessment of a large number of free word associations and a broad battery of cognitive tasks for four younger and four older individuals. The free association task and cognitive battery were designed to match each other in order to facilitate a comparison of semantic networks and cognitive performance.

COLLECTION DATE
The data were collected from August to October 2018.

PARTICIPANTS
Four older adults aged 68 to 70 years old and four younger adults aged 24 to 28 years old participated and completed the study. Three more participants began the study, but dropped out after .5%, 18.7%, and 41.7% of the free association task. We only report data for the eight participants with complete data. Participants were recruited from the participant pool of the Center for Cognitive and Decision Sciences (CDS) of the University of Basel. They were contacted via phone and completed an initial screening to confirm the following inclusion criteria: mother tongue being German or Swiss German, daily access to a computer with a stable Internet connection, absence of neurological or psychiatric diagnoses. Participants were compensated with CHF 220 for their full participation consisting of CHF 180 for 3,600 answered cues (CHF 0.05 per cue) and CHF 40 for two to three hours of laboratory assessment and instructions (approx. CHF 15/h).

Free association task
Free associations were collected via a passwordprotected web-based platform that participants could access from home. In the association task, participants were sequentially presented with a total of 3,600 cues for which they provided three associations each, following the same procedure used in SWOW. Participants were instructed to enter, using the keyboard, the first three words that came to mind when thinking about the cue. If fewer than three words came to mind or if the cue was not recognized, the participant could proceed to the next cue by clicking on a "no further responses" or "unknown word" button, respectively. Figure 1 shows a screenshot of the free association interface.
The 3,600 cues consisted of 3,000 unique and 600 repeated cues. The 3,000 unique cues, in turn, consisted of three subsets of 1,000 cues each. To ensure high coverage of central words in people's semantic networks, the first subset consisted of 1,000 highest frequency words among the 4,443 cue words that, at time, were included in the German SWOW, with frequency determined using the German SUBTLEX frequency norms (Brysbaert et al., 2011). To ensure high coverage of the connections within people's networks, the second subset consisted of those 1,000 from the remaining 3,443 cues in the German SWOW that most likely produced one of the cues in the first subset. Finally, to ensure a high network depth, the third subset consisted of the 1,000 most frequent associates in the German SWOW given to the cues of the first subset. The cues were presented to the participants in the same fixed, randomly determined order.    Note: Words were translated from German. Words in bold face are also among the ten most central words in the German (money, music, work, school, food, water, car, love, green, important) or English (money, food, water, car, music, green, red, love, work, old) SWOW data sets. 1 Musical instrument.
The purpose of the second set was to establish word-level links between the free association network and cognitive performance. This set included 10-minute category (animals) and phonemic (letter S) fluency tasks (e.g., Wulff et al., 2018, October 29), an episodic list memory task modeled after Penn Electrophysiology of Encoding and Retrieval Study (e.g., Healey & Kahana, 2016), and an associative recall task modeled after Naveh-Benjamin et al. (2003). Behavior in the two fluency tasks can be related to the free association network based on the fact that both cues and responses naturally included animals and words starting with the letters S. Participants retrieved between 62 and 113 animals and between 45 and 138 words starting with the letter S. The retrieved animals overlapped with 1.5% of cues and 0.8% of responses, whereas the retrieved words starting with the letter S overlapped with 11.1% of cues and 11.9% of responses. The episodic memory task and the associative recall task were populated with nouns from the cue set to establish comparability with the associative network. In the episodic memory task, a total of 20 lists of 16 words each were studied and subsequently recalled. Participants correctly recalled between 28.7% and 60.9% of words, with an additional 1.3% to 25% intrusions. In the associative recall task, 4 lists consisting of 16 wordpairs were presented and tested. Participants correctly recalled between 32.8% and 96.8% of pairs. See also Table 2 for an overview of tasks included in the cognitive assessment in the MySWOW proof-of-concept study.

Entry and debriefing questionnaires
At study entry, participants provided demographic information concerning their primary language (German or Swiss German), their current occupation their highest academic degree, and the income level of their household. Participants further answered questions on their usual reading behavior, e.g., the number of books read in a year. At debriefing, participants were asked to provide information on their observations during the study, for example, whether they were able to sustain concentration while working on the free associations. The specific questions are reported in the codebook (see Table 3).

PROCEDURE
Participants passing the initial screening over the phone were invited to to our laboratory at the University of   Basel for an introductory session lasting approximately 30 minutes. During this session, participants provided informed consent, completed the entry questionnaire, and were introduced to the web-based platform using a training mini-study involving 15 cues. Over the course of the next weeks, participants were instructed to log in and work on the free association task twice a day for 30 minutes each. On average, participants completed the free association task in 26.1 hours spread over 39.4 days. After completing the free association task, participants were invited back to the laboratory for a three-hour session that included the cognitive assessments and study debriefing. The cognitive assessment and study debriefing session consisted of the following elements: First, participants filled out the debriefing questionnaire. Next, the verbal fluency tasks were conducted orally and recorded for later transcription by two student assistants responsible for data collection. Following the verbal fluency tasks, the participants were administered a 90-second timed Digit-Symbol Substitution Test in paper and pencil format. To conclude the first part of the lab session, the associative recall task was completed as a computerized task implemented in E-Prime (Psychology Software Tools, Inc., 2016) at a lab-computer. After a 10-minute break, the second part of the lab session began with the List Memory task, which was also implemented as a computerized task using E-Prime (Psychology Software Tools, Inc., 2016). The Mehrfachwahl-Wortschatz-Intelligenztest (MWT-A) was then conducted in paper and pencil format followed by a 20-minute timed version of the APM in paper and pencil format. The lab session concluded with the interactive verbal administration of the DemTect, carried out by one of the student assistants. Subsequently, participants received their monetary compensation for participation. Table 3 provides an overview of the different files containing the data. All data are available as commaseparated files. A codebook.pdf file provides descriptions of all variable names across the data files. All variable names and data labels have been translated to English. The association and fluency data, however, were not translated and are reported in German.

DATASET DESCRIPTION
The data were published on the Open Science

QUALITY CONTROL
Free associations were obtained using a computerized task adapted from the German version of the large and long-running citizen-science project Small World of Words (De Deyne et al., 2019). Responses were cleaned in the following way. First, all responses matching either individual words or composites of words included in the German aspell dictionary were accepted as valid. The remaining words were subjected to manual correction. Overall, 4.2% of responses were corrected manually with a median string edit distance (i.e., the number of letters that were changed) of 2 (mean = 2.42). The data include the repeated sampling of a subset of words as an indicator to assess the reliability of free association. In the cognitive assessments, standardized and computerized tasks were used to improve comparability with previous work. All participants in the data set completed all tasks. Three additional participants started the study but dropped out (see Participants section).

ETHICAL CONSIDERATIONS
All data were recorded in reference to a random six letter identifier assigned to participants at the beginning of the study. Identifying information such as names or addresses was not recorded. Potentially identifying information such as participants' age, birthday, and occupation were not included in the publicly available files. Participants provided informed consent that included permission for public sharing of the data. The study was approved by the internal review board of the Department of Psychology at the University of Basel (# 014-17-1).

REUSE POTENTIAL
The reported data present the only publicly available resource containing large-scale free-association and cognitive performance data on the level of the individual (cf. Morais et al., 2013). The data can be reused in at least four different ways.
First, the data can be used to investigate individual and age differences in the structure of semantic networks. Past research has either used methods that prevented an assessment of large-scale networks or has analyzed semantic networks in the aggregate (e.g., similarity ratings, Wulff et al., 2018, October 29). There is only one other study that has elicited individual level semantic networks of comparable size (cf. Morais et al., 2013); however, it relied on a snowball approach that resulted in less comparable networks and its data are not publicly available. We present a first comparison of the structure of individuals' semantic networks in the companion paper (Wulff et al., 2022); however, that comparison considers only four structural properties and one approach to constructing the network from individuals' free associations. The current data could be used to explore other structural properties, such as assortativity (Van Rensbergen et al., 2015) and analyze the networks in ways that account for its bipartite nature and the direction of edges. In light of the large size of the individual networks, the current data could further be used to shed light on whether structural properties of the network are distributed homogeneously across the network or not.
Second, by utilizing the large cognitive battery, the data can be used to study the link between individuals' semantic networks and various aspects of cognitive performance. These analyses are facilitated by the fact that the items in four of the cognitive tasks, semantic and letter fluency, episodic memory, and associative recall, overlap by design with the contents of individual semantic networks. To our knowledge, there are no other data sets publicly available that permit item-level predictions of cognitive performance from individual semantic networks. We report a first analysis of the link between semantic networks and trial-level cognitive performance in our companion paper (Wulff et al., 2022) and these analyses provide evidence that word centrality and relatedness derived from individuals' networks predict cognitive performance well, but not better than an aggregate network. The current data could be used to study whether the links between the network and cognitive performance can be strengthened either by using alternative ways to construct networks from free associations or by using cognitive models of the cognitive tasks.
Third, the data could be used as the basis for the creation of synthetic data and simulation of individual and age differences in semantic representations. Such simulations could be particularly helpful in two ways. One way recruits the data to make forecasts for new studies aimed at eliciting individual semantic representations (see, e.g., Wulff et al., 2018, October 29). Specifically, one could assess whether alternative designs using different numbers of cues or responses may produce more reliable measurements of individual-level networks (De Deyne & Storms, 2008). The other way uses the data as the basis for simulating the role of different aging mechanisms (e.g., age-related changes in associations, search processes) on cognitive performance (see, e.g., Borge-Holthoefer et al., 2011).
Fourth, the data can be used as a free association norm (Nelson et al., 2004). With a total of 80,000 free associations, they represent one of the largest publicly available resources of free association data in German (e.g., Schulte im Walde & Borgwaldt, 2015; Schulte im Walde et al., 2008). This means that these data can be used for the many different purposes free association norms are traditionally used in the psychological literature. This includes comparisons between norms, which may shed light on how variations in the elicitation procedure (e.g., one versus three responses per cue) or how the aggregation of the response of different individuals affect the distribution of associations (De Deyne & Storms, 2008).
While we see much potential for further uses of our data, some limitations apply. The data originate from only eight individuals, rendering between-person analyses difficult. Further, participants spent an average of 26.1 hours spread over almost 40 days on producing the free association data. The lengthy and repetitive nature of data collection could have led to tiredness or have been influenced by changing situational circumstances and systematic training effects that one cannot easily control for in these data.

NOTE
1 German SWOW data was downloaded on January 25, 2021.