## 1. Background

Academic performance of college students may be affected by students’ mental health disorders as caused by their addiction to internet or social media in general (Faraci et al., 2013; Li et al., 2015). It is therefore important that any available information that may be of use to assess the link between internet and social media addiction, and mental health disorders be made publicly available to policy makers, particularly education psychologists. However, in Malawi, such type of information was lacking. To address this gap, we instituted a study (Manda et al., 2019) to investigate the relationship between internet addiction and mental health among college students in Malawi based on the primarily collected data. The data were collected from randomly sampled university (undergraduate and postgraduate) students pursuing either science or humanities, and social science disciplines in Malawi using a self-reporting questionnaire (SRQ). However, we do not presume that the previous study (Manda et al., 2019) was exhaustive with use of the data; hence the present study to share the data. Recognizing the need for sharing of data in accordance with the FAIR data policies (Carolan, 2016; Wang & Strong, 1996; Wilkinson et al., 2016), this article openly describes and shares the data with the public and other researchers who will find it useful. The raw data (Mwakilama et al., 2022) is available in both Excel and SPSS formats for ease of sharing and re-use. In addition, tabulated datasets are shared in form of supplement files to this article.

Stemming from a traditionally underrepresented population in psychological research, the current data has the potential to serve various stakeholders especially those with interest to conduct educational psychology research studies on college students in Malawi. The design, materials and methods of collecting this data may also be replicated to other cross-sectional or longitudinal studies elsewhere. Besides, this paper aims to motivate other researchers to publish their data for replication of studies in support of Open and FAIR Data Policies.

## 2. Methods

### Study Design

Data comes from a cross-sectional study which was conducted on undergraduate and postgraduate students from science and humanities and social science study disciplines in colleges located across the country Malawi (Figure 1). The structured questionnaires were designed on Google forms. The data contains demographic variables (gender, age, year of study, discipline of study, and level of study); internet and social media use variables; and transformed variables for identifying possible cases of common mental disorders.

Figure 1

Map of Malawi showing the provinces and districts, and its location in the south-east of Africa. Source (Klopper et al. 2012).

### Time of data collection

The data were collected between May 2018 and July 2018.

### Location of data collection

A total of 13 tertiary education institutions across four major geographical regions in Malawi (Figure 1) contributed to the list of study participants. These institutions were; College of Medicine, The Malawi Polytechnic, Malawi University of Science & Technology (MUST), National College of Information Technology (NACIT), Malawi College of Accountancy (MCA) and Catholic University of Malawi (CUNIMA) in the South; Chancellor College and Domasi College of Education (DCE) in the East; Malawi Institute of Management (MIM), Nalikule College of Education, Daeyang University, Natural Resources College (NRC) and Lilongwe University of Agriculture and Natural Resources (LUANAR) in the Centre; and Mzuzu University in the North.

### Sampling, sample and data collection

We collected data from 984 participants (345 females and 639 males, three with missing values, R-age [15–19] = 19.9%, [20–24] = 57.1%, [25–19] = 14.9%, > = 30 (8.1%); 938 undergraduates and 45 postgraduates, 749 from Science and 233 from Humanities and Social Science disciplines). A variation of missing responses in the range of one to seven (see Tables S1-S8 in supplementary file) is observed in the dataset, however. Hence, only 977 responses in the dataset which have complete records of socio-demographic characteristics (see Table S1 in the supplement), were used by the main study (Manda et al., 2019) for analysis. Nonetheless, the 984 sample size was arrived at following the multistage sampling technique. First, a list of universities/colleges were identified from the two strata, namely the public and private higher educational institutions (HEIs). Second, from each category of HEIs (stratum), only colleges/universities that had their study disciplines fully accredited by the National Commission for Higher Education (NCHE) institution at the time of the study were selected. Finally, from the student population of the selected institutions, a simple random sample (SRS) of individuals was drawn. In total, the SRS of n = 984 study participants was obtained using the Cochran’s formula (Cochran, 1977):

$n=\frac{{Z}^{2}{\sigma }^{2}}{{D}^{2}}$

where the sample n size was computed from an assumed normal population with Z = 1.96 at 95% level of confidence with precision level of D = 3 and population standard deviation σ = 48 (computed from a previous related study).

During data collection process, a combination of online and paper-based English questionnaires was used to enhance the response rate. Online forms were sent to study participants via “All-students” mail account as provided by the ICT directorates from some of the tertiary institutions. For those tertiary institutions which did not have such mail accounts, we employed an exponential non-discriminatory snow-ball sampling technique (Dudovskiy, 2022) by first sharing the Google form link with a few students, identified from the ICT database, who then kept on passing it onto others through either email, WhatsApp or Facebook Messenger accounts until we had reached the intended sample size. However, during the design of the Google form, the “Limit to 1 response” setting was activated to protect the link from multiple entries such that no online study participant could fill the form more than once. All the institutions were also supplemented with printed Google forms, for the sake of participants who had challenges in accessing the institutional internet. The printed forms were issued only to participants that had no access to online forms by an officer in the ICT directorate. This approach was also necessary because our study was not funded, hence there was no opportunity to provide internet data funds for the study participants. The completed physical forms were sent back to our study institute through university mailing office in sealed and signed envelopes. As a matter of continuation and consistency in data collection, three research assistants were then recruited to enter data from the hard-copy questionnaires into an Excel database that was retrieved from the online responses after we had closed the response accepting link. Except for the research assistants who were paid for the activities of delivering and collecting printed questionnaires and then entering of data in on-line forms, no respondent was paid or offered some benefit for participating in the study.

### Materials/survey instruments

We used a single structured questionnaire which was designed on Google form and was on average completed in about 8 minutes for most participants. At the beginning of it, participants are asked to complete demographic measures including gender, age, level of study and study discipline. Then the internet and social media responses were captured through an Internet Addiction Test (IAT) (Faraci et al., 2013; Reinecke, 2009; Young & de Abreu, 2010). IAT assesses aspects of an individual’s life that may be affected by their excessive use of internet. The IAT is a 20-item questionnaire (Figure 2) on which respondents rate items on a six-point Likert scale (from 0 – Does not apply, to 5-Always). The IAT Likert scale (see Table 1) was used to assess the degree to which Internet use affects the respondent’s daily routine: social life, productivity, sleeping pattern, and feelings (Widyanto & McMurran, 2004).

Figure 2

An example of questions covered in the IAT assessment tool.

Table 1

Description of IAT total score ranges.

IAT TOTAL SCORE RANGE INTERPRETATION

Less than 20 Non frequent Internet users

20 to 39 Average online user and able to control his/her Internet usage

40 to 69 Excessive Internet usage

70 to 100 Significant problem experience due to Internet usage

The last part of the questionnaire was designed to capture data on mental health through a Self-Reporting Questionnaire (SRQ-20) (van der Westhuizen et al., 2016), developed by World Health Organization (WHO) (Beusenberg & Orley, 1994) to identify probable cases of common mental disorders (CMD). CMD refers to the co-occurrence of depressive, anxious and somatic symptoms (Young & Rogers, 1998). To achieve that, the questionnaire has 20 items and employs a yes/no answer format to detect probable cases of anxiety and depression by asking participants about their general health feelings in the last two weeks (Figure 3). Since both the IAT and SQR-20 tools have been validated before using related studies in Malawi (Stewarta et al., 2009; Udedi et al., 2014), Table 2 describes the SRQ-20 score ranges as interpreted from the data on the basis of 7/8 cut-off point. The information is useful to any study that intends to re-use the available raw data.

Figure 3

An example of the SQR-20 assessment tool.

Table 2

Description of SRQ-20 total score ranges.

SRQ-20 TOTAL SCORE RANGE INTERPRETATION

≤7 Non-existence of a probable CMD symptom

≥8 Existence of a probable CMD symptom

### Quality control

To ensure that the collected data meets all the six data quality checks, namely 1) validity; 2) integrity; 3) precision; 4) reliability; 5) timeliness and 6) completeness, the survey tool was piloted by the faculty members to eliminate any possible ambiguities in the questions. The online survey form was only editable by two researchers responsible for data management. Since the shared link was not password protected, online data was double checked on daily basis by the two researchers to ensure that no form was submitted twice. However, a more plausible way, in future, would be to assign random passwords to each entry form. Weekly reminders were sent to anonymous study participants through their institutional ICT directorates to ensure timely feedback. All responses captured through printed forms are kept in secure cabinets with restricted access.

### Data anonymisation and ethical issues

Ethical approval to use data from college students was sought and given by the ICT directorates that it was to be used for purely academic purposes. Cognisant of the fact that the information on mental health, especially those who might have been experiencing mental disorder during the period of data collection is confidential, only students’ masked identities were captured. This was to ensure that the data did not leave any trace on possible student revelation. In addition, prior to participating in the study, respondents were served with informed consent form through the questionnaire which included information on what the study was all about (Figure 4).

Figure 4

Informed Consent.

Their participation was therefore voluntary, and that consenting to take part was by returning the completed questionnaires to the researchers. For any queries, contacts of the lead author were openly shared to whom questions were to be directed.

### Existing use of data

From the data, a paper (Book chapter) was published:

• Manda T.D., Jamu E.S., Mwakilama E.P., Maliwichi-Senganimalunje L. (2019). Internet addiction and mental health among college students in Malawi. In: Ndasauka Y., Kayange G. (eds), Addiction in South and East Africa. Palgrave Macmillan. Palgrave Macmillan: Cham. https://doi.org/10.1007/978-3-030-13593-5_16

## 3. Dataset description and access

To conform to the principles of Open and FAIR data, in this section, we provide an exact description of data in mention, where it is located, in what form the datasets are, and their respective data file types.

### Repository location

Data are available in the Mendeley Data (Digital Commons Data) repository (Mwakilama et al., 2022) through the link https://doi.org/10.17632/xbfbcy5bhv.3. The repository contains:

• IAT_data_imported.csv
• IAT_data_imported.dat
• IAT_data_imported.dta
• IAT_data_imported.sav
• IAT_data_imported.sd2
• IAT_SQR20 processed-data.docx
• IAT_SRQ20 Survey_tool.pdf

The data are also accessible through the link https://data.mendeley.com/drafts/xbfbcy5bhv.

### Data types

These are primary data, processed data, data collection tool and figures in the manuscript (Manda et al., 2019).

### Format names and versions

Internet_Addiction_Malawi_Data.xlsx file is the primary raw and anonymised data file generated from the online Google form and database. Other formats of the primary data are also available (Internet_Addiction_Malawi_Data.csv and Internet_Addiction_Malawi_Data.txt). However, the coded based data sets, are presented in SPSS format IAT_data_imported.sav and a replica in other formats (IAT_data_imported.csv; IAT_data_imported.dat; IAT_data_imported.dta). The variables in both types of data sets can be interpreted with the help of the study questionnaire (IAT_SRQ20 Survey_tool.pdf) and the processed data in tables (IAT_SQR20_processed_data.docx).

### Language

All materials are in British English.

CC BY 4.0

### Publication date

1st March 2021 (version 1)

24 January 2022 (version 2)

11th November, 2022 (version 3)

## 4. Reuse potential

The current data is useful for several researchers, including the Clinical, Counselling, Health, and Educational Psychologists because

• the dataset includes variables that may lead to some indication of probable mental health disorder condition of university students when associated with the internet addiction.
• the SRQ-20 data provides an opportunity for someone to draw out anxiety scores related to Beck Anxiety Inventory (BAI) approach. According to Azher (2014), BAI is a questionnaire of twenty-one items implied to measure anxiety among participants. Amongst anxiety scales existing out there, the BAI is rated as the third most used research instrument of anxiety. The other two are; STAI and the Fear Survey Schedule. Each question has a set of four possible answers. These are; not at all (0), mildly (1), moderately (2), severely (3). The BAI has a maximum score of 63. Thus, with a No (0) or Yes (1) responses of our collected SRQ-20 data items, BAI scores can be deduced through defining of the cut-off point of such derived scores to describe No anxiety (stress) and Anxiety (stress).
• the data contributes to the pool of sources for estimating prevalence of depression and mental disorder among university students in Malawi when protocols outlined by January et al. (2018) are adopted.
• the data should be useful to psychology of education researchers when trying to respond to further research questions as posed by some previous studies (Azher, 2014; Kassiani et al., 2018; Orsal et al., 2013).
• the data is a source for potential psychometric studies on college students (Zhang et al., 2022) mainly in Malawi where such research is scanty.
• some SRQ-20 items which are related to Zuckerman-Kuhlman Personality Questionnaire III (ZKPQ) (Capetillo-Ventura & Juárez-Treviño, 2015) can be taken out and do correlates of IAT scores with psychiatric symptoms and personality type among the college students.

### Special collection

This paper is submitted for review, contributing to JOPD’s special issue “Data for Psychological Research in the Educational Field” edited by Sonja Bayer, Katarina Blask, Timo Gnambs, Malte Jansen, Débora Maehler, Alexia Meyermann, Claudia Neuendorf.