The Open Anchoring Quest Dataset: Anchored Estimates from 96 Studies on Anchoring Effects

People’s estimates are biased toward previously considered numbers (anchoring). We have aggregated all available data from anchoring studies that included at least two anchors into one large dataset. Data were standardized to comprise one estimate per row, coded according to a wide range of variables, and are available for download and analyses online (https://metaanalyses.shinyapps.io/OpAQ/). Because the dataset includes both original and meta-data it allows for fine-grained analyses (e.g., correlations of estimates for different tasks) but also for meta-analyses (e.g., effect sizes for anchoring effects).


(1) BACKGROUND
What is the percentage of psychological papers with open data? Is it more or less than 50%? Is it more or less than the last two digits of your phone number? What do you think is the percentage? Although neither of the suggestions is informative, subsequent estimates will be biased toward them. More generally, when people make numeric estimates and consider any number beforehand, their estimates are drawn towards the previously considered number. This phenomenon is called anchoring, anchoring effect, anchoring-and-adjustment (or even adjustment and anchoring, Tversky & Kahneman, 1974, p. 1128. As it has been shown that even entirely random numbers bias estimates (e.g., Bergman et al., 2010) and that even experts succumb to anchoring effects (e.g., Englich et al., 2006;Northcraft & Neale, 1987), anchoring has been termed one of the most robust phenomena of (social) psychology (e.g., Kahneman, 2012, p. 119).
Despite its robustness, there is currently no generallyaccepted theoretical account for the wide range of different anchoring effects, a state of affairs not helped by contradictory findings and replication failures (e.g., Bahník, 2021aBahník, , 2021bHarris et al., 2019). Furthermore, replication failures have drawn into question moderator findings (e.g., Big Five, Cheek & Norem, 2019;Schindler et al., 2021; intelligence, cognitive reflection, and self-control, Röseler, 2021;ego depletion, Röseler et al., 2020; and whether anchors need to be considered explicitly or whether an incidental presentation suffices, Röseler et al., 2021;Shanks et al., 2020; for a discussion, see also .
To sum up, theories that explain anchoring and its moderators need to be developed, but the replicability of many moderator findings is uncertain. We set out to build a comprehensive empirical dataset upon which future researchers can build new anchoring theories. Specifically, we aggregated all openly available anchoring datasets that include numeric estimates from studies with at least two different anchors and supplemented these with datasets that we received from other researchers' publications and file-drawers.
In aggregating the data, we tried to capture the full breadth of anchoring paradigms by coding numerous design features and potential moderators. Picking two different anchoring experiments will yield different procedural details almost every time as each researcher makes their own decision with respect to the absolute judgment question (e.g., How many words are there in this paragraph?), the anchors (e.g., Are there more or less than 10 words?), whether anchors are framed as random (e.g., Write down the last two digits of your phone number and think about whether there are more or less words) or potentially relevant (e.g., another participant estimated the number of words to be 90), whether participants are paid for accurate estimates or not, what the unit of the estimate is (e.g., meters or miles), and many more parameters, most of which have not received attention in previous research.
The primary goal for constructing the data set was to test whether susceptibility to anchors has been measured reliably. That is, we tested how likely it was that people who were susceptible to an anchor in one task were also susceptible to an anchor in another task. Measuring a person-specific susceptibility to anchoring effects is necessary for personality research. Only if susceptibility can be measured reliably as a trait does it make sense to expect that it may correlate with personality traits such as intelligence (e.g., Bergman et al., 2010; or need for cognition (e.g., Epley & Gilovich, 2006). Additionally, we tested which features of the anchoring paradigm (e.g., anchor extremeness, type of task, response scale), of the study (e.g., incentives), and of the participants (age and gender) affect reliability. This is also why we chose to aggregate participant-level datasets instead of meta-analytical data (e.g., effect sizes only). The reliability of people's susceptibility to anchoring in all paradigms with multiple items was tested and currently, there is no evidence that susceptibility to anchoring is a trait . Note that psychometric properties such as reliability are rarely assessed in social psychological tasks and the lack of reliability might also apply to other tasks (e.g., Berthet, 2021;Hedge et al., 2018;Parsons et al., 2018). Possible reasons for the poor reliability of anchoring are discussed by . Nevertheless, the aggregated data allow for tests of numerous other moderators, such as the role of incentives, nationality, or specific paradigm features, to be assessed.
We plan to add more anchoring datasets in the foreseeable future. The dataset can be viewed, downloaded, and analyzed interactively via our ShinyApp available at https://metaanalyses.shinyapps.io/OpAQ/ to aid researchers with power analyses, study design, and literature (or data) search.

STUDY DESIGN
Each row in the dataset represents one trial (i.e., an estimate) by a person (participant_id) for a given anchoring item (anchoring_item) and a given anchor (anchor). There may be multiple estimates per person and study (i.e., within-subjects manipulation of anchoring item) or only one (i.e., between-subjects manipulation of anchoring item). Studies included up to 30 anchoring items, but some included only one item. An item-wise version of the data with Hedges's g per anchoring item per study is available online (https://osf. io/k745n/). Sample anchoring items with variable names and codings are provided in Figure 1.
Estimates were aggregated from numerous crosssectional studies, which is why they vary with respect to study-design and type (online versus lab study), and many other variables that we coded. An overview over all variables is provided in Table 1. Note: Other types of stimuli may be written information, images, videos, or combinations of such stimuli. Unanchored mean estimates were computed on the basis of a condition where no anchor was presented, a pretest, or a previous study with a similar setting and similar participants. Unanchored mean estimates were not used if true values differed between participants (e.g., height of their grand-father).

VARIABLE DESCRIPTION NOTES AND EXAMPLES
anchor Anchor that was presented in the trial anchorhigh 1 = high anchor, 0 = low anchor anchortype 1 = explicitly random, 2 = fixed and provided without explanation, 3 = having some relevance with the target, 4 = self-generated Examples: 1 -Participants are involved in creating the random number, e.g., due to it being the last digits of their phone number or because they drew a number from an urn or threw a die. 2 -Was the telephone invented before or after 1830? 3 -The television was invented in 1900. When was the telephone invented? 4 -What is the boiling temperature of water on Mt Everest? (The self-generated anchor is 100°C, but 100 is not explicitly presented. (Contd.)

VARIABLE DESCRIPTION NOTES AND EXAMPLES
comparative_ question 0 = no, 1 = yes Comparative question refers to the question asking "Is the distance between Berlin and Prague more or less than X?" For this variable, it does not matter whether participants had to give explicit responses to this question or which responses they gave. Incidental or subliminal anchors or anchors framed as "Hint: The true value Is more than 50 km" were coded as 0 (no). direction 1 = direction of adjustment was known, 0 = direction was unknown Direction was coded as 0 if there was a comparative question (even if all participants gave the same answer to that question). Direction was coded as 1 if participants were told something like "Prices for this product in this store are given to compensate for decreases during negotiation" or "The true value is lower than $100".
estimate Estimate that was given by participant in the respective trial experiment_type 1 = online, 2 = lab, 3 = class, 4 = field, 5 = mixed Class refers to experiments conducted as part of a lecture or seminar in a classroom or in a synchronous online meeting. If the class was run online, it was coded as 1 (online).  Descriptions of individual studies are available for all data that was part of a published research article or preprint (variables: reference, link).

TIME OF DATA COLLECTION
Secondary data were collected from May 2021 through September 2022. Original data were collected between 2010 and 2022. The variable yearofpublication states the latest year of collection for unpublished datasets.

LOCATION OF DATA COLLECTION
Data were collected worldwide and stem from European, Asian, North-American, and South-American participants.

SAMPLING, SAMPLE AND DATA COLLECTION
The dataset includes k = 96 studies from 57 references. The total sample size is N = 21,359 participants who provided estimates for some of 412 unique anchoring items, yielding a total of 88,914 trials.
There are 6,941 male, 9,243 female, and 81 non-binary participants. Data for gender of the remaining 5,094 participants are not available. Mean age of the participants with available data for age was 32.69 years (median = 28, N = 15,322). 8,978 did not receive monetary incentives for participating in the respective anchoring study, 11,255 received monetary incentives for participating in the study and 694 received monetary incentives for accurate estimates. For 432 participants, estimates were coupled to prices they would pay or get for products.

MATERIALS/SURVEY INSTRUMENTS
The dataset includes 412 anchoring items. There are true values available for 355 (86.2%) of these items. Adjustment and absolute adjustment susceptibility scores were computed for all estimates. 0-1-scores and restricted 0-1-scores could be computed for items with true values, only. A list of all items is available online (https://osf.io/ g95hp/). Links to single datasets are in the variable "link" in the dataset and are available for 74 studies (77.1%).

QUALITY CONTROL
• All study-level data and the first trial for all trial-level data were checked by one of the authors. All triallevel data was furthermore checked by the respective resources contributors. • We checked whether anchoring effects differed between published and unpublished studies or between preregistered and non-preregistered studies and found no differences.

DATA ANONYMISATION AND ETHICAL ISSUES
No ethical approval was obtained for the data collection as only secondary data that had already been anonymized were used. No further steps to anonymize the data were taken.

DATA TYPE
Secondary data, processed data, aggregated data.

FORMAT NAMES AND VERSIONS
Datasets are available in .csv and .xlsx formats. We recommend opening both with GNU R (version 4 or above; R Core Team, 2018) or Microsoft Office Excel (Version 2004 or above).

LICENSE
CC-By Attribution 4.0 International.

LIMITS TO SHARING
The data are not under embargo and do not contain identifying information. The data may be updated with further anchoring data at a later date.

PUBLICATION DATE
The first version of the dataset including data from four anchoring studies was published on 23/06/2021. The latest version has been available since 01/04/2022.

FAIR DATA/CODEBOOK
The datasets have been posted publicly on the Open Science Framework (OSF), documented with meta-data, and assigned a DOI. Code with which the datasets have been created is available and can be run with open source software (e.g., GNU-R).

(4) REUSE POTENTIAL
Researchers can use the data for different questions related to anchoring effects but also more generally numeric estimation, advice-taking, or judgment and decision making.
As the data provide detailed information about anchoring paradigms such as true values of anchoring items (where applicable), researchers can use different anchoring scores (e.g., absolute difference between anchor and estimate) but also new scores to study the influence of any of participant-, item-, or study-features. In contrast to previous meta-analyses (Bystranowski et al., 2021;Li et al., 2021;Orr & Guthrie, 2006;Shanks et al., 2020;Townson, 2019), we did not find evidence of publication bias and there was no difference in effect size between published and unpublished studies. We plan to maintain the dataset for the foreseeable future and will add data from new studies. Thus, the dataset may become a starting-point for reviews on anchoring research but also a solid base upon which researchers can build to develop new theoretical accounts on the topic.  An empty file-drawer means that all of this author's studies that were completed before February 1 st 2022 are included in the OpaQ-dataset.
Resources co-Authors have provided data and checked them in the processed version of the OpAQ-dataset. Data Curation co-authors have processed their datasets themselves. All other datasets were processed by LR, LW, ES, and PT.