Background
Meta-analysis is an indispensable technique for summarizing a set of similar primary studies that share a common topic. Key goals include determining the average effect size (Hedges & Pigott, 2001) determining whether study characteristics moderate the effect size (Hedges & Pigott, 2005) and quantifying the extent of heterogeneity in the observed effects (Higgins & Thompson, 2002; Ioannidis et al., 2007). These applications make the role of meta-analysis pivotal to psychology research. Accordingly, it is essential to (1) teach psychological researchers how to be informed consumers of meta-analyses and (2) develop improved statistical methods to conduct meta-analyses. A necessity to support such efforts is the availability of high-quality and easily accessible data.
To help meet these needs, we developed the R package psymetadata, which contains 22 recent open-source datasets from meta-analyses of the psychological literature. These data were collected through the Open Science Framework (OSF) and span areas such as social, developmental, and cognitive psychology, among others.1 The purpose of collecting these datasets was twofold. First, to provide psychologists easily accessible empirical data that facilitate “real-world” examples in pedagogical settings concerning meta-analyses, and second, to enable methodological researchers to illustrate novel statistical techniques on archetypal psychological data. For example, the data contained in this package can be used throughout an introductory meta-analysis course or in a research article to show that a particular meta-analytic method has desirable statistical properties. Further, when conducting a Bayesian meta-analysis, informative priors can be elicited based on such data (e.g., van Erp et al., 2017). Notably, similar efforts already exist, such as the metadat R package (White et al., 2021), albeit without a focus on psychological data. Therefore, the psymedata package is of wide relevance to learners, teachers, researchers, and practitioners of meta-analysis alike.
Methods
Study design
Traditionally, meta-analytic techniques include only one effect size per study because otherwise, the classical assumption of independence among effect sizes is violated. However, the collected datasets may contain more than one effect size per study, or non-independent effect sizes. As we describe in the section Reuse Potential, this feature provides great flexibility in how the datasets can be used. Moreover, each dataset contains several variables suitable for use in moderator analysis. Lastly, in meta-analysis, the primary outcome under study is the average effect size. The effect sizes in the psymetadata currently include Hedges’ g, Cohen’s d, and Pearson’s r, among others.
Time of data collection
All datasets were collected between March 2021 and May 2021.
Location of data collection
The collected datasets were all freely available on the OSF. The original OSF repository for each dataset can be found in the package documentation (https://cran.r-project.org/web/packages/psymetadata/psymetadata.pdf).
Sampling, sample and data collection
A convenience sample was obtained by searching the keyword “meta-analysis” on the OSF and clicking through as many results as we could manage over the period of data collection. Each result was checked for whether it had openly available data and whether a codebook could be found either in the manuscript or the supplemental materials. We only collected datasets where there was a corresponding codebook. This procedure resulted in 22 datasets.
Materials/Survey instruments
No materials were used aside from the computers used to search, download, and clean the data.
Quality Control
The data were collected with diligence and care. All dataset names follow the convention [firstauthor][year]. For example, the data collected in Barroso et al. (2021) was named barroso2021 (see Table 1 for all authors and years). All common variables among the datasets were renamed according to mainstream conventions stemming from the popular metafor R package (Viechtbauer, 2010). Specifically, each dataset contains at least the following variables.
- study_id: Unique identifier for each study.
- es_id: Unique identifier for each effect size.
- yi: The estimated effect size.
- vi: The estimated variance of the effect size.
Variables included in the original dataset, but whose definitions were either unavailable or unclear were excluded from the final version included in psymetadata.
Table 1
Datasets included in psymetadata.
AUTHOR(S) | YEAR | TOPIC |
---|---|---|
Agadullina & Lovakov | 2018 | Out-group entitativity and prejudice |
Aksayli et al. | 2019 | The cognitive and academic benefits of Cogmed |
Barroso et al. | 2021 | Math anxiety and math achievement |
Coles et al. | 2019 | Facial feedback |
Gambleet al. | 2019 | Specificity of future thinking in depression |
Gnambs | 2020 | The color red and cognitive performance |
Lowe et al. | 2021 | The advantage of bilingualism in children |
MacCann et al. | 2020 | Student emotional intelligence and academic performance |
Maldonado et al. | 2020 | Age differences in executive function |
The ManyBabies Consortium | 2020 | Variation in infancy research |
Klein et al. | 2018 | Variation in replicability across samples and settings |
Noble et al. | 2019 | Shared reading and language development |
Nuijten et al. | 2020 | Intelligence |
Sala etal. | 2019 | Working-memory training and near- and far-transfer measures |
Schroeder et al. | 2020 | Transcranial direct current stimulation and inhibitory control |
Spaniol & Danielsson | 2020 | Executive function components in intellectual disability |
Stasielowicz | 2019 | Goal orientation and performance adaptation |
Stasielowicz | 2020 | Cognitive ability in performance adaption |
Steffens et al. | 2021 | Social Identity Theory and leadership |
Stramaccia et al. | 2020 | Memory suppression |
Wibbelink et al. | 2017 | Juvenile recidivism |
Note: Stasielowicz (2019) contained two datasets.
Data anonymisation and ethical issues
The collected datasets are all secondary and only contain study-level information.
Existing use of data
The collected datasets all belonged to a primary meta-analysis. These original works are cited in the references and are listed in the documentation of the psymetadata package. Further, an applied exampled using one of the datasets (Gnambs, 2020) was included in Williams et al. (2021).
Dataset description and access
All datasets in psymetadata share a similar structure. To demonstrate this structure, Code 1 shows a truncated version of the coles2019 dataset originally used to conduct a meta-analysis examining the facial feedback hypothesis (Coles et al., 2019). As previously mentioned, each dataset contains the columns es_id, study_id, yi, and vi. These variables correspond to the unique identifier for the effect size, the unique identifier for the study from which the effect size was collected, the effect size, and the variance of the effect size, respectively. Additionally, most datasets contain information pertaining to the year of publication and the authors of the study from which the effect sizes were obtained. In the coles2019 dataset, the year column denotes the year the effect size was published. The remaining variables, in this case, file_drawer and w_v_b, correspond to moderator variables. For example, one may want to test whether the average effect size of a study may vary according to whether it was published in a peer-reviewed journal (file_drawer), or whether the study design was within- or between-participants (w_v_b).
CODE 1. EXAMPLE DATA FROM COLES ET AL. (2019). | ||||||
---|---|---|---|---|---|---|
es_id | study_id | yi | vi | year | file_drawer | w_v_b |
1 | 1 | 0.020 | 0.013 | 2013 | yes | within |
2 | 2 | 0.179 | 0.050 | 1998 | no | between |
3 | 3 | 1.019 | 0.085 | 2014 | no | between |
4 | 3 | 0.074 | 0.069 | 2014 | no | between |
5 | 3 | 1.074 | 0.131 | 2014 | no | between |
6 | 3 | 0.202 | 0.079 | 2014 | no | between |
. | . | . | . | . | . | . |
. | . | . | . | . | . | . |
. | . | . | . | . | . | . |
284 | 138 | –0.0049 | 0.0098 | 2009 | no | within |
285 | 139 | 0.5374 | 0.0440 | 1997 | no | within |
286 | 140 | –0.2377 | 0.1222 | 2002 | no | between |
Repository location
The data from psymetadata can be accessed by downloading the R package psymetadata from CRAN, using install.packages(“psymetadata”), or from GitHub. Alternatively, individual files may be downloaded from GitHub.
Object/file name
All of the datasets are saved in the “Data” folder of the psymetadata GitHub repository and are saved using the format [author][year].rda.
Data type
All datasets are secondary.
Format names and versions
The data are saved in the R data format (i.e., the .rda file extension). Accessing files in this format requires using the R programming language (R Core Team, 2021).
Language
The data are saved in American English.
License
The data are distributed under the GNU General Public License Version 2.
Limits to sharing
There are no limitations on the sharing of this data.
Publication date
The psymetadata package was originally published to CRAN on 31/05/2021.
FAIR data/Codebook
For each dataset, the variable names, variable definitions, topic, and reference(s) have been documented. The documentation is available for all datasets (https://cran.r-project.org/web/packages/psymetadata/psymetadata.pdf). The documentation of a given dataset can also be accessed using the ? function in R (e.g., ?coles2019).
Reuse potential
The psymetadata package contains 22 datasets that contain multiple, dependent effect sizes and moderator variables. This affords a great deal of flexibility in teaching a variety of common techniques. For instance, if one were to either average effect sizes within studies or select a single effect size per study, then classical fixed-effects and random-effects meta-analysis (Borenstein et al., 2009) can be taught. On the other hand, these datasets may be used to demonstrate methods that explicitly account for dependent effect sizes, such as robust variance estimation (Hedges et al., 2010) or three-level meta-analysis (Assink & Wibbelink, 2016). Of course, additional techniques may be taught with these data, including moderator analysis (Hedges & Pigott, 2005), subgroup analysis (Borenstein & Higgins, 2013), testing for publication bias (Copas, 1999; Sutton, 2000) and Bayesian meta-analysis, among many others.
For methodological researchers, illustrative examples are commonly employed to demonstrate that novel methodologies have desirable statistical properties and are suitable for studying psychological phenomena. For example, Williams et al. (2021) used the gnambs2020 dataset to show how accounting for group differences in between-study heterogeneity may have profound implications for the resulting inferences of a meta-analysis. Further, priors for Bayesian meta-analyses can be determined by using these data. One can imagine that a future meta-analysis studying whether various developmental psychology studies replicate (e.g., The ManyBabies Consortium, 2020) may rely on a random effects model to do so. By using, say, the manyBabies2020 dataset, informed priors may be determined for the overall effect size, or for the between-study heterogeneity (e.g., van Erp et al., 2017). Finally, the open-source nature of the package allows researchers to contribute their own meta-analytic datasets to the psymetadata by following the steps outlined on the psymetadata GitHub repository.