(1) Overview

Collection Date(s)

January 2016.


Evidence suggests that the processing of homonyms, or words with multiple unrelated meanings (e.g., “money/river bank”), varies depending on the relative frequencies of their meanings [e.g., 1, 2, 3, 4]. Here, we provide the first meaning-frequency ratings in British English that were collected online using the eDom norming procedure [5]. This data set is essential to UK-based researchers examining ambiguity processing who must otherwise collect their own small-scale meaning-frequency ratings prior to experimental testing [e.g., 6, 7, 8] or use normative data in other dialects of English, despite recent evidence for dialectal differences in meaning frequency [9]. We also appear to be the first to have derived the homonymous status of the word stimuli (i.e., multiple unrelated word meanings) based on linguistic criteria, dictionary entries, and subjective ratings. Previous studies [e.g., 6, 10, 11, 12] used either of the methods, even though there is no consensus as to which one of them captures the nature of homonymy best [13]. We therefore provide a list of 100 homonyms that were carefully selected and validated for further research, with the intention of promoting consistency in the ambiguity-processing literature.

(2) Methods


One hundred monolingual native British-English speakers (55 females) completed the norming study (30 students, 70 professionals). All were born and resident in the UK (for more information on participants’ places of residence, see the Participants File). Individuals with any language-related difficulties or those with an education qualification below the A Level were not recruited. Participants’ age ranged from 19 to 39 (M = 28.1, SD = 5.3). All were recruited via Prolific Academic (http://prolific.ac) and received £3 in exchange for participation. There were eight additional participants who began but did not complete the study; their data were not included in the analysis.


One hundred words with the same spelling and pronunciation were selected from Armstrong, Tokowicz, & Plaut’s norms [5]. Ninety-three of the words had two separate entries in the Wordsmyth dictionary [14], being suggestive of homonymy. In the remaining cases (n = 7), the dictionary listed a third but highly uncommon meaning of the word (e.g., “sack” denoting a light-coloured dry sherry made in Spain). Most of the words had either noun-noun (n = 47) or noun-verb (n = 36) interpretations. The remaining words had noun-adjective (n = 9), verb-adjective (n = 5), and verb-verb (n = 3) interpretations. Our stimulus selection criteria excluded homonyms for which one of the two meanings was highly infrequent (e.g., “frail” denoting a basket made of dried rushes), known only to a specific population (e.g., “bleak” denoting a small European river fish), or archaic (e.g., “burden” denoting the refrain of a song). Using information about word-meaning usage in different dialects of English in the Oxford Dictionary [15], we excluded words with meanings that seem to be exclusively used by British-English (e.g., “chap” denoting a man) or American-English speakers (e.g., “bus” denoting to clear restaurant tables). This aimed to allow both UK- and USA-based researchers to use the same list of our carefully selected homonyms. We also excluded items that were ambiguous because the word form was an abbreviation (e.g., “log”) or a past simple/participle (e.g., “dove”). Stimulus selection was further constrained by word length (3–6 letters) and word-form frequency (4–60 occurrences per million) in the British National Corpus [16]. Overall, the stimuli were as homogenous as possible with respect to 14 lexical and semantic variables (e.g., word-form frequency, imageability, and the number of related word senses), allowing researchers to compile and match sets of homonyms with balanced and unbalanced meaning frequencies (see the Stimulus Properties File).

The homonymous status of the stimuli was confirmed on the basis of meaning-relatedness ratings collected from 30 monolingual native British-English speakers [16 females, aged 19–38 (M = 29.9, SD = 4.8)] with no language-related difficulties and at least A-Level education qualifications. All participants were recruited via Prolific Academic (http://prolific.ac) and did not take part in the norming study. In this online pre-test, participants read the definitions of the two word meanings (as they appear in the Wordsmyth dictionary) and rated their semantic relatedness on a 7-point Likert scale (where 1 denoted “highly unrelated”). Participants rated one word at a time in a pseudo-randomised order. The homonyms were rated along with 100 non-homonymous items taken from [2] and [6]. For these fillers, participants rated the relatedness between two different senses of a word (e.g., “maple” denoting either a tree or the wood”). The order of the definitions was pseudo-randomised for each item and rater. The average relatedness ratings for the homonyms are given in the Stimulus Properties File.


The homonyms were normed using an online survey designed in Qualtrics (http://qualtrics.com). We followed the eDom procedure [5] very closely. Participants were instructed to estimate, as a percentage, how often each meaning of a homonym was implied when they encountered that word. The instructions were taken directly from the eDom manual available at http://edom.cnbc.cmu.edu. Participants rated one word at a time in a pseudo-randomised order. First, participants were presented with a word in print and indicated whether they knew the word. Wordsmyth dictionary definitions [14] of the two meanings of the word were presented in the panel below in a pseudo-randomised order (an example of a trial is available at http://osf.io/7k3eh/). Participants were told that the order and length of the definitions did not reflect the relative frequencies of the meanings. Participants had an opportunity to list up to two additional meanings of a given word if the presented definitions were not exhaustive. We provided two text-entry boxes for this purpose. If participants knew more than two additional meanings, they were told to list the two that they had encountered/used the most. Finally, participants rated the relative frequency of each meaning (those in the dictionary definitions and those that they may have added themselves) using percentage scores (0–100). The percentage scores had to sum up to 100 across all the meanings.

On average, participants provided 12.6 (SD = 11.4) definitions of additional meanings for the homonyms, which is suggestive of their thorough approach to the task. These “additional-meaning responses” appeared across 91 words and constituted, on average, 32.0% (SD = 15.6) of participants’ encounters with a given homonym. Most of the generated definitions seemed to pertain to highly frequent variants of the main word meanings (e.g., “squash” denoting a sport) which were not explicitly conveyed in the presented definitions (to press, beat, or crush into a pulp or a flat mass). We established whether these additional-meaning definitions referred to a completely unrelated meaning of a homonym, or whether they referred to a semantically related sense of the presented meanings. In the latter case, we added the frequency rating of such additional meanings (e.g., “mate” denoting a friend) to the frequency rating of the presented meaning (“mate” denoting a marriage partner). Unlike [5], we determined the semantic relatedness between the additional and presented meanings based on subjective ratings rather than entries in the Wordsmyth dictionary [14]. All 1,263 additional-meaning responses were coded independently by the first author (GM) and a linguist (EO) who was not involved in any other stages of the research. Each response was coded as either unrelated to either of the presented word meanings, highly related to the first meaning, or highly related to the second meaning.

One additional-meaning definition was deleted from the norms due to insufficient level of detail. There was substantial inter-rater agreement on the relatedness between the additional and presented meanings of the words (κ = .73, SE = 0.02, p < .001). The raters disagreed on only 14.3% of the responses, which mostly pertained to idiosyncratic definitions of the words. This disagreement was resolved by the second author (EK). Out of 1,262 responses, 45.6 % of the definitions of the additional meanings, with a mean meaning-frequency rating of 40.5% (SD = 22.1), referred to a sense of the first meaning of the word. In contrast, only 5.2% of the definitions, with a mean frequency rating of 36.5% (SD = 21.3), referred to a sense of the second meaning. The meaning-frequency ratings, averaged across the raters, are given in the Norms File.

The remaining 49.2% of the definitions, with a mean meaning-frequency rating of 25.8% (SD = 16.8), were considered unrelated to either of the presented meanings. We reviewed each of these definitions and found consistent additional meanings, listed by at least five participants, for 21 of the homonymous words. Across the 21 words, these unrelated meanings were listed, on average, by 21.4 (SD = 16.8) participants, and had a mean meaning-frequency rating of 37.8% (SD = 16.3). Descriptive statistics and definitions of these additional meanings are given in the Additional Meanings File.

To demonstrate that the present norming study was indeed warranted, we compared our British-English ratings to the analogous American-English ratings [5] for the first meaning of each homonym in the Wordsmyth dictionary [14]. This analysis was not initially possible due to different approaches to the categorisation of additional-meaning responses in the two norming studies. Armstrong et al. [5] added the frequency ratings of additional meanings (e.g., “plane” denoting a vehicle) to the ratings of the main meanings presented in the study (a flat or level surface) if the two were listed as word senses in the Wordsmyth dictionary. Here, on the contrary, the ratings of the two meanings were summed only if there was substantial conceptual overlap between them. This resulted in large differences in the ratings between the norms for 15 homonyms that had additional participant-generated meanings of questionable semantic relatedness to the presented word meanings. However, for the sake of the between-dialects comparison, we adopted the approach in Armstrong et al. and adjusted the ratings of these 15 words accordingly (both the adjusted and the American-English ratings are given in the Norms File). The analysis showed fairly considerable dialectal variation (R2 = .69) in meaning dominance between the two norms, which is remarkably similar (R2 = .72) to that between the norms in Spanish dialects [9].

We suggest that the differences in the relative meaning frequencies of homonymous words reflect genuine differences in how British-English and American-English speakers use and encounter these words. Inspection of the items with large dialectal differences in the ratings revealed no specific pattern. Instead, we found a subset of homonyms (n = 16) with meanings far more common in one dialect than in the other. For instance, the student-related meaning of “pupil” is used more frequently by British-English speakers (57% vs. 31%), whereas the campsite-related meaning of “camp” seems to be highly dominant in American-English but not in British-English (86% vs. 66%). We surmise that such differences in meaning dominance only for a few of the homonyms reflect a number of cultural and linguistic factors (e.g., using synonyms to avoid ambiguity, prevalence of the word referents) that are specific to these particular words rather than to the entire dialect. Research into the nature of these factors and their contribution to dialectal differences in how words and their meanings are used is certainly worth further scrutiny.

Quality Control

The data revealed a very small (0.3%) number of instances in which participants were unfamiliar with the overall word. These “null responses” appeared across 20 items and 12 participants. The highest number of null responses was seven (both per item and participant), which did not warrant any data deletion and convinced us of a suitable linguistic background of our participant group. The null responses were excluded from the norms as participants did not provide any frequency ratings for these words. Participants were given a maximum of 90 min to complete the study in order to ensure that they attended to the task. The average completion time was 43 min. Each participant normed all the words and provided several additional-meaning responses. These responses were coded and checked by two raters (see the Procedure subsection). Average meaning-frequency and meaning-relatedness ratings were computed using Excel functions and checked twice.

Although we normed the stimuli using a more diverse group of native British-English speakers in a web-based study, our estimates of meaning frequency are of fairly considerable inter-rater reliability. We subtracted the rating of the less frequent meaning from the rating of the more frequent meaning and then divided the result of the subtraction by the rating of the more frequent meaning. We computed this β value, a formal measure of meaning dominance introduced by Armstrong et al. [5], for each individual word and participant and then correlated each participant’s data with the group means of β across the 100 words. Spearman’s rank correlation coefficients ranged from .02 to .85 (M = .67, SE = .02, N = 100, mean R2 = .45), indicating similar inter-rater consistency in meaning-frequency ratings to that in American-English [mean R2 = .49; 5] and European Spanish eDom norms [mean R2 = .48; 9]. These results clearly demonstrate that both web-based and lab-based eDom norming procedures provide reliable estimates of meaning frequency, regardless of sample characteristics (i.e., homogenous student vs. heterogeneous non-student populations). There was no indication that participants’ ratings differed from those of the group depending on age, employment, geographical location, and other characteristics. Instead, the inter-rater variation in meaning-dominance norms appears to reflect inherent and unsystematic differences in native speakers’ linguistic environment and their actual experience with the meanings of ambiguous words [17].

Ethical issues

All participants gave informed consent and could withdraw during or two weeks after their participation. Participants’ current place of residence was established based on the first half of their UK postcode using the Royal Mail postcode finder (http://www.royalmail.com/find-a-postcode). Participation in this study was completely anonymous. The study received ethical approval from the School of Psychology, University of Leeds Ethics Committee.

(3) Dataset description

Object name

British eDom norms

Data type

The data set contains primary (information about the participants), secondary (stimulus properties) and processed data (meaning-frequency ratings).

Format names and versions

The data set is available in the .csv format. The set contains five files: Variables, Norms, Additional meanings, Stimulus properties, and Participants.

Data Collectors

Greg Maciejewski collected all the data.







Repository location

Open Science Framework, http://osf.io/7k3eh/

Publication date


(4) Reuse potential

Our stimulus set is valuable to researchers examining ambiguity processing. We compiled a set of homonyms that are relatively homogeneous with respect to a large number of variables that are known to affect word processing. Furthermore, we provide data on how unrelated the multiple meanings of the words are and how many of these meanings native speakers actually know. The data set also provides UK-based researchers with the first meaning-frequency ratings in British English. Given that meaning frequency modulates homonym processing [1, 2, 3, 4], we recommend researchers use meaning-frequency norms to refine their future ambiguity-processing research. The ratings come from a large and diverse group of participants, and thus make the norms highly representative of native British-English speakers and their different linguistic experience. Finally, our data set together with the analogous American-English norms [5] might be of interest to those exploring dialectal variation in the estimates of psycholinguistic variables.