Taxonomic and Thematic Relatedness Ratings for 659 Word Pairs

These data are comprised of taxonomic and thematic relatedness ratings for 300 target words paired with taxonomically related and/or thematically related words for a total of 659 word pairs. The pairs come from 4 prior studies and were normed through surveys provided online using Amazon’s Mechanical Turk. Pairs were rated in terms of both their taxonomic similarity and their thematic relationship. The data are provided as comma-separated (.csv) and R data (.rdata) files and can be used to create new studies investigating taxonomic and thematic semantic processing.


Context Collection Date(s)
September-November 2014 and June-July 2015.

Background
Understanding how taxonomic relations (based on feature similarity) and thematic relations (based on co-occurrence in events) operate in the mind has been a long standing topic of interest among cognitive psychologists. Investigators have used a variety of tasks to understand how these relations operate and coincide in the mind, including free sorting, forced choice categorization, similarity ratings, and semantic decision paradigms. Recently the neural organization and processing of taxonomic and thematic semantic memory has also become an increasing topic of debate among cognitive scientists [1,4,6]. To date however, there has not been a consistent quantitative definition of taxonomic and thematic relatedness. Feature overlap or biological taxonomy is often used to define taxonomic similarity. For thematic relatedness, some researchers have used latent semantic analysis while others have used their expert opinion and thus there has been a lack of consistency in terms of materials across studies [1][2][3]. This lack of consistency may be a possible reason for the conflicting results regarding the organization and neural mechanisms responsible for taxonomic and thematic processing [1,2,6]. In this dataset, word pairs were collected from four independent studies investigating differences in the processing of taxonomic and thematic semantics and re-normed by obtaining the strength of taxonomic and thematic relationships for each pair of words. This was done in order to provide the field with a common set of normed items for future research.

Sample
Participants were recruited through Amazon's Mechanical Turk services. All participants were from the United States and were paid for their participation in the surveys (payment varied depending on survey length). Participants were limited to completing only one survey. Participant's responses were only included if they completed 90% or more of the survey, were from the United States of America and spoke English as their primary language. Responses from a total of 157 participants were included in the final data (85 females and 72 males). Ages ranged from 19 to 67 (M = 34.8, SD = 10.63). All but four participants were native English speakers. The four participants who were not native English speakers learned English by the age of 10.

Materials
The 659 word pairs found in this data set come from four studies conducted by independent research groups. The norming was done through a series of 12 online surveys (6 taxonomic and 6 thematic surveys). Each survey was comprised of 100 to 300 randomly assigned word pairs and the pairs were randomly ordered within each survey.

Procedures
Through Amazon's Mechanical Turk web services, participants were provided a link to an external survey on either Survey Monkey or Qualtrics websites. Each survey included a brief demographic questionnaire before giving participants the instructions for the survey. There were two different types of surveys -one for taxonomic similarity ratings and one for thematic relatedness ratings -each with different types of instructions.
The instructions for the taxonomic survey were: Each word pair was included in each of the two survey types. Once the surveys were completed, the data were downloaded and collated, then average taxonomic similarity and thematic relatedness ratings were computed for each word pair. Taxonomic and thematic rating standard deviations were also computed for each of the items. In addition, a difference score was then calculated for each word pair by subtracting the thematic relatedness rating from the taxonomic similarity rating for each word pair. Thus, items with high taxonomic similarity and low thematic relatedness have positive difference scores (6 being the highest); items with negative scores have high thematic relatedness and low taxonomic similarity (-6 being the highest); pairs with difference scores near 0 are approximately equally taxonomically similar and thematically related (this includes both approximately equally high and equally low ratings). Participant's individual ratings of word pairs are also provided in a separate data table. This table includes a participant id number, their gender, age, native language, the age they learned English if it is not their native language, the country they live in, the rating value and the rating type (i.e. taxonomic or thematic). In total, there are 27317 individual word pair ratings provided in this table. It should be noted that two participants (identified in the individual data-set as id40 and id138) did not report their ages and are denoted as NA in the data-set.

Quality Control
All surveys were distributed through Amazon's Mechanical Turk web services and completed on either Survey Monkey or Qualtrics. Surveys were screened to ensure that participants did not just provide random responses or the same response repeatedly.

Ethical issues
This study had been deemed exempt from IRB review because it involved minimal risk survey procedures without collection of individually identifying information. All participants were paid for their work.

(3) Dataset description Object name
The dataset appears in the repository as listed below and can be downloaded in the following formats: Aggregated Data: TxThmNorms.rdata TxThmNorms.csv Individual Rater Data: IndividualRatingsTxThm.rdata IndividualRatingsTxThm-1.csv

Data type
Processed data.

Format names and versions
The TxThmNorms and the IndividualRatingsTxThm are provided in the following formats: .rdata and .csv. The rdata files contain single data frames which are analogous to the tables contained in the csv files. For examples of the data sets look to Table 1 and Table 2 below.  (4) Reuse potential These aggregated norms have multiple forms of reuse potential. First, they could be used to investigate the organization and processing mechanisms of taxonomic and thematic semantic memory in typical adults and children, and in neurologically impaired individuals. Second, they could be used as comparison/control data for investigating typical and atypical conceptual development and cognitive aging, and for testing semantic processing in acquired neurological impairments such as stroke and dementia, including helping to track disease progression or recovery. Finally, the norms could provide preliminary guidance for studies using picture stimuli with high name agreement (though we recommend directly norming the pictures because pictures can evoke somewhat different performance in semantic tasks [5]). The individual ratings could be used to examine gender or age differences in ratings of taxonomic similarity or thematic relatedness among other possible uses.