Background

Meta-analysis is an indispensable technique for summarizing a set of similar primary studies that share a common topic. Key goals include determining the average effect size (Hedges & Pigott, 2001) determining whether study characteristics moderate the effect size (Hedges & Pigott, 2005) and quantifying the extent of heterogeneity in the observed effects (Higgins & Thompson, 2002; Ioannidis et al., 2007). These applications make the role of meta-analysis pivotal to psychology research. Accordingly, it is essential to (1) teach psychological researchers how to be informed consumers of meta-analyses and (2) develop improved statistical methods to conduct meta-analyses. A necessity to support such efforts is the availability of high-quality and easily accessible data.

To help meet these needs, we developed the R package psymetadata, which contains 22 recent open-source datasets from meta-analyses of the psychological literature. These data were collected through the Open Science Framework (OSF) and span areas such as social, developmental, and cognitive psychology, among others.1 The purpose of collecting these datasets was twofold. First, to provide psychologists easily accessible empirical data that facilitate “real-world” examples in pedagogical settings concerning meta-analyses, and second, to enable methodological researchers to illustrate novel statistical techniques on archetypal psychological data. For example, the data contained in this package can be used throughout an introductory meta-analysis course or in a research article to show that a particular meta-analytic method has desirable statistical properties. Further, when conducting a Bayesian meta-analysis, informative priors can be elicited based on such data (e.g., van Erp et al., 2017). Notably, similar efforts already exist, such as the metadat R package (White et al., 2021), albeit without a focus on psychological data. Therefore, the psymedata package is of wide relevance to learners, teachers, researchers, and practitioners of meta-analysis alike.

Methods

Study design

Traditionally, meta-analytic techniques include only one effect size per study because otherwise, the classical assumption of independence among effect sizes is violated. However, the collected datasets may contain more than one effect size per study, or non-independent effect sizes. As we describe in the section Reuse Potential, this feature provides great flexibility in how the datasets can be used. Moreover, each dataset contains several variables suitable for use in moderator analysis. Lastly, in meta-analysis, the primary outcome under study is the average effect size. The effect sizes in the psymetadata currently include Hedges’ g, Cohen’s d, and Pearson’s r, among others.

Time of data collection

All datasets were collected between March 2021 and May 2021.

Location of data collection

The collected datasets were all freely available on the OSF. The original OSF repository for each dataset can be found in the package documentation (https://cran.r-project.org/web/packages/psymetadata/psymetadata.pdf).

Sampling, sample and data collection

A convenience sample was obtained by searching the keyword “meta-analysis” on the OSF and clicking through as many results as we could manage over the period of data collection. Each result was checked for whether it had openly available data and whether a codebook could be found either in the manuscript or the supplemental materials. We only collected datasets where there was a corresponding codebook. This procedure resulted in 22 datasets.

Materials/Survey instruments

No materials were used aside from the computers used to search, download, and clean the data.

Quality Control

The data were collected with diligence and care. All dataset names follow the convention [firstauthor][year]. For example, the data collected in Barroso et al. (2021) was named barroso2021 (see Table 1 for all authors and years). All common variables among the datasets were renamed according to mainstream conventions stemming from the popular metafor R package (Viechtbauer, 2010). Specifically, each dataset contains at least the following variables.

  • study_id: Unique identifier for each study.
  • es_id: Unique identifier for each effect size.
  • yi: The estimated effect size.
  • vi: The estimated variance of the effect size.

Variables included in the original dataset, but whose definitions were either unavailable or unclear were excluded from the final version included in psymetadata.

Table 1

Datasets included in psymetadata.


AUTHOR(S) YEAR TOPIC

Agadullina & Lovakov 2018 Out-group entitativity and prejudice

Aksayli et al. 2019 The cognitive and academic benefits of Cogmed

Barroso et al. 2021 Math anxiety and math achievement

Coles et al. 2019 Facial feedback

Gambleet al. 2019 Specificity of future thinking in depression

Gnambs 2020 The color red and cognitive performance

Lowe et al. 2021 The advantage of bilingualism in children

MacCann et al. 2020 Student emotional intelligence and academic performance

Maldonado et al. 2020 Age differences in executive function

The ManyBabies Consortium 2020 Variation in infancy research

Klein et al. 2018 Variation in replicability across samples and settings

Noble et al. 2019 Shared reading and language development

Nuijten et al. 2020 Intelligence

Sala etal. 2019 Working-memory training and near- and far-transfer measures

Schroeder et al. 2020 Transcranial direct current stimulation and inhibitory control

Spaniol & Danielsson 2020 Executive function components in intellectual disability

Stasielowicz 2019 Goal orientation and performance adaptation

Stasielowicz 2020 Cognitive ability in performance adaption

Steffens et al. 2021 Social Identity Theory and leadership

Stramaccia et al. 2020 Memory suppression

Wibbelink et al. 2017 Juvenile recidivism

Note: Stasielowicz (2019) contained two datasets.

Data anonymisation and ethical issues

The collected datasets are all secondary and only contain study-level information.

Existing use of data

The collected datasets all belonged to a primary meta-analysis. These original works are cited in the references and are listed in the documentation of the psymetadata package. Further, an applied exampled using one of the datasets (Gnambs, 2020) was included in Williams et al. (2021).

Dataset description and access

All datasets in psymetadata share a similar structure. To demonstrate this structure, Code 1 shows a truncated version of the coles2019 dataset originally used to conduct a meta-analysis examining the facial feedback hypothesis (Coles et al., 2019). As previously mentioned, each dataset contains the columns es_id, study_id, yi, and vi. These variables correspond to the unique identifier for the effect size, the unique identifier for the study from which the effect size was collected, the effect size, and the variance of the effect size, respectively. Additionally, most datasets contain information pertaining to the year of publication and the authors of the study from which the effect sizes were obtained. In the coles2019 dataset, the year column denotes the year the effect size was published. The remaining variables, in this case, file_drawer and w_v_b, correspond to moderator variables. For example, one may want to test whether the average effect size of a study may vary according to whether it was published in a peer-reviewed journal (file_drawer), or whether the study design was within- or between-participants (w_v_b).


CODE 1. EXAMPLE DATA FROM COLES ET AL. (2019).

es_id study_id yi vi year file_drawer w_v_b
1 1 0.020 0.013 2013 yes within
2 2 0.179 0.050 1998 no between
3 3 1.019 0.085 2014 no between
4 3 0.074 0.069 2014 no between
5 3 1.074 0.131 2014 no between
6 3 0.202 0.079 2014 no between
. . . . . . .
. . . . . . .
. . . . . . .
284 138 –0.0049 0.0098 2009 no within
285 139 0.5374 0.0440 1997 no within
286 140 –0.2377 0.1222 2002 no between

Repository location

The data from psymetadata can be accessed by downloading the R package psymetadata from CRAN, using install.packages(“psymetadata”), or from GitHub. Alternatively, individual files may be downloaded from GitHub.

Object/file name

All of the datasets are saved in the “Data” folder of the psymetadata GitHub repository and are saved using the format [author][year].rda.

Data type

All datasets are secondary.

Format names and versions

The data are saved in the R data format (i.e., the .rda file extension). Accessing files in this format requires using the R programming language (R Core Team, 2021).

Language

The data are saved in American English.

License

The data are distributed under the GNU General Public License Version 2.

Limits to sharing

There are no limitations on the sharing of this data.

Publication date

The psymetadata package was originally published to CRAN on 31/05/2021.

FAIR data/Codebook

For each dataset, the variable names, variable definitions, topic, and reference(s) have been documented. The documentation is available for all datasets (https://cran.r-project.org/web/packages/psymetadata/psymetadata.pdf). The documentation of a given dataset can also be accessed using the ? function in R (e.g., ?coles2019).

Reuse potential

The psymetadata package contains 22 datasets that contain multiple, dependent effect sizes and moderator variables. This affords a great deal of flexibility in teaching a variety of common techniques. For instance, if one were to either average effect sizes within studies or select a single effect size per study, then classical fixed-effects and random-effects meta-analysis (Borenstein et al., 2009) can be taught. On the other hand, these datasets may be used to demonstrate methods that explicitly account for dependent effect sizes, such as robust variance estimation (Hedges et al., 2010) or three-level meta-analysis (Assink & Wibbelink, 2016). Of course, additional techniques may be taught with these data, including moderator analysis (Hedges & Pigott, 2005), subgroup analysis (Borenstein & Higgins, 2013), testing for publication bias (Copas, 1999; Sutton, 2000) and Bayesian meta-analysis, among many others.

For methodological researchers, illustrative examples are commonly employed to demonstrate that novel methodologies have desirable statistical properties and are suitable for studying psychological phenomena. For example, Williams et al. (2021) used the gnambs2020 dataset to show how accounting for group differences in between-study heterogeneity may have profound implications for the resulting inferences of a meta-analysis. Further, priors for Bayesian meta-analyses can be determined by using these data. One can imagine that a future meta-analysis studying whether various developmental psychology studies replicate (e.g., The ManyBabies Consortium, 2020) may rely on a random effects model to do so. By using, say, the manyBabies2020 dataset, informed priors may be determined for the overall effect size, or for the between-study heterogeneity (e.g., van Erp et al., 2017). Finally, the open-source nature of the package allows researchers to contribute their own meta-analytic datasets to the psymetadata by following the steps outlined on the psymetadata GitHub repository.