1. Background

With the growth of internet-based recruitment and mass online job posting networks (e.g., Indeed), the process of aligning company job postings with candidates has become complicated. In our study, we focus specifically on the text found in job advertisements that describe the job role (i.e., responsibilities) and candidate expectations (i.e., requirements). Walker and Hinojosa (2013) argued, “job advertisements viewed by job seekers early in the recruitment process can influence the development of organizational attitudes… job advertisements may be one of the only sources of organizational information used to evaluate potential employers” (p. 269). However, most job applicants would likely agree that job advertisements are nowhere near as scientific and accurate as they should be if written on the basis of an empirical job analysis. Many of these advertisements contain an “asymmetry of information about the requirements of the buyer (the employer) and quality (skill set) of the supplier (the job seeker)” (Carnevale et al., 2014, p. 1) that leads to poor hiring decisions.

According to the attraction-selection-attrition (ASA) model in recruiting, organizations attract people with similar personalities, values, and interests, select people who are most similar to job incumbents, and then experience the attrition of people who do not fit in well (Schneider, 1987). This model underlies much selection research on person-environment fit (Barrick & Parks-Leduc, 2019), and it provides a framework for understanding how recruitment functions from both sides of the process. Specifically, the organization designs the recruitment process to attract the applicants who will suit the job and the organization, and the applicant seeks and evaluates information and cues from the organization’s recruitment process to determine whether they would fit well or poorly.

As a recruitment message, job advertisements function to provide information about the specific characteristics of a given job as well as the organization in general; thus, accuracy in the advertisement is crucial to attracting the appropriate candidates. Prior research has shown that the composition of a job advertisement can differentially attract applicants via advertised staffing policy, work characteristics, and compensation (Highhouse et al., 1999). Moreover, the wording of advertisements allows applicants to make inferences about their level of potential fit with the job (Highhouse et al., 1999; Stevens & Szmerekovsky, 2010). The perception of job characteristics during recruitment is integral to attraction processes and fit perceptions (Ehrhart, 2006; Uggerslev et al., 2012), and applicants pay keen attention to the wording of such advertisements in determining whether they should apply to the job. Perhaps the most important component of the job advertisement is the description of a job incumbent’s responsibilities (i.e., what a person will actually be doing on the job). Best practices from industrial-organizational psychology (I-O psych) research clearly advise that each job advertisement be written specifically for the position to be filled, so that the selection process can clearly map each assessment onto specific tasks or requirements in the job description (Gatewood et al., 2015). As such, a well-written job advertisement should be modeled after empirical job analyses so as to attract the appropriate applicants who are qualified to perform the tasks listed in the job advertisement.

Unfortunately, job advertisements are not nearly as accurate as one would hope. Although an under-researched topic, there is much anecdotal evidence to suggest that applicants must read “between the lines” to truly understand the nature of the job being advertised (e.g., Hickok, 2021; Kelly, 2018). Relatedly, some recruiting companies suggest that job advertisements should be written with a personal tone and should emphasize the “sell” (SmartRecruiters, n.d.). This creates a tension in recruitment between “selling” a job (e.g., to make it more appealing) and characterizing a job based on its job description, which shifts the focus of the job advertisement from veracity to audience appeal. This shift carries the potential dangers of overselling the job (e.g., “up to $65k in salary” when the actual package is $40k base plus commission) or exaggerating positive aspects of the company culture (e.g., “unlimited vacation days” when there’s internal pressure not to take vacation; Clark, 2017). In fact, a brief review of 824 job advertisements in 2015 revealed that up to 88% fell short of basic advertising legal standards, 40% were unclear about key job details such as full-time/part-time or temporary/permanent, and 33% did not give a salary estimate (Citizens Advice, 2015).

The problem, although heightened by modern technological advances, is not new; Cantor (1975) wrote a brief but pointed denunciation of hospital job advertisements for their increasingly “vague or misleading descriptions” (p. 44). Such misleading job advertisements can lead to numerous detrimental effects: they can be classified as employment fraud (in Australia, it can carry fines of up to $1.1 million; ACCC, 2011), and they have significant negative impact on applicant sentiment towards the company even after employment (e.g., Ryan, 2017; Slezak, 2012). While prior empirical studies have not explicitly examined the accuracy of job ads, Feldman and colleagues’ (2006) important work found that level of specificity (about the company, the job, and/or the work context) was vital to applicant outcomes and reactions. Similarly, Walker and colleagues (2008) found that more specifics on employee benefits (e.g.,“you will receive 40–80 hours of training annually” as opposed to “you will have the opportunity to participate in optional training”, p. 625) led to better applicant reactions. Importantly, both studies used fake job ads as part of their experiments; our study is the first to use real-life job ads and advanced text analysis to directly assess for accuracy in the job ads.

Our dataset consists of a large number of public job advertisements collected manually through a comprehensive systematic search. Our pending research using this dataset involves analyzing the text in the job advertisement via natural language processing, and capturing applicant reactions to job advertisements in an experimental study. Morever, we included data from the Occupational Information Network (O*NET; www.onetonline.org; Occupational Information Network n.d.). The O*NET is a powerful government database of empirically validated job descriptions (also known as “job analyses” in the field of industrial-organizational psychology) of over 1,000 job titles in the United States. It is the go-to source of reliable data on occupational information such as job responsibilities, skills and tools required, job interests, salary, and more (Peterson et al., 2001). While it has its limitations, it is regularly updated by experts who conduct interviews and collect data on each occupation. Future research using our dataset can further analyze other textual elements of job ads or compare them to other databases to examine important topics such as gender bias and minority recruiting.

2. Methods

From May 2021 to August 2021, we searched for job ads from companies listed on the Fortune 500 2020 list (www.fortune.com/fortune500). We initially tried also searching for job advertisements from companies on the Fortune 50 Best Small Workplaces, but we quickly discovered a lack of these job titles in small companies; thus, the data collection focused on companies on the Fortune 500 list. Data collection ceased either when 50 job ads had been collected for each job title, or when our list of companies was exhausted. Elements of the job ad that are available in the dataset include the location of the job, the “Responsibilities” section of each job ad, the “Requirements” text of each ad, “Preferred Requirements”, and other text from the ad such as information about the company. We anticipate that the text data can be used for extensive further research. Company names are also included and can be used for further analysis regarding differences between company (e.g., company size, location, financial records).

2.1 Study design

A total of 990 job ads for 32 different job titles were collected from online sources. We searched for job ads from companies listed on the Fortune 500 2020 list (www.fortune.com/fortune500). Data collection ceased either when 50 job ads had been collected for each job title, or when our list of companies was exhausted. On average, 31 ads were found for each of the 32 job titles, with a range of 10 ads to 52 ads per job title. Individual ads were used as stimuli in a forthcoming study on applicants’ perception of job advertisements along with a forthcoming study using text analysis to examine the “Responsibilities” section of each job ad. Other elements of the job ad that are available in the dataset include the location of the job, the “Requirements” text of each ad, “Preferred Requirements”, and other text from the ad such as information about the company. We anticipate that the text data can be used for extensive further research. Company names are also included and can be used for further analysis regarding differences between company (e.g., company size, location, financial records).

2.2 Time of data collection

May 2021 to August 2021.

2.3 Location of data collection

Data was collected online in the United States of America.

2.4 Sampling, sample and data collection

We selected four job titles marked as “bright outlook” for job growth on the O*NET (www.onetonline.org/find/bright) from each of eight different industries: computer science, human resources and management, finance, mechanical engineering, higher education, medicine, nursing, and data science. This resulted in 32 different job titles; for details on each job title selected, see Appendix A. Next, for each of the 32 job titles, we selected job advertisements from companies listed on the Fortune 500 2020 list (www.fortune.com/fortune500). We initially tried also searching for job advertisements from companies on the Fortune 50 Best Small Workplaces, but we quickly discovered a lack of these job titles in small companies; thus, the data collection focused on companies on the Fortune 500 list. For each job title, we searched through the list of companies (going in random order) to find job advertisements that matched the job title or any of the alternative job titles listed on the O*NET job analysis page. For example, the “Computer Systems Analyst” job (O*NET ID 15-1211.00, linked here: https://www.onetonline.org/link/details/15-1211.00) includes the following alternative job titles: Applications Analyst, Computer Systems Consultant, IT Analyst, and Systems Analyst. On each company’s website, we started by searching “Computer Systems Analyst” and then saved job ads that matched either that job or the other keywords in the alternative job titles list.

We looked for the job ads in the Jobs or Careers page of each company’s website. We only downloaded ads directly from company websites (as opposed to third party recruiting websites, or job ad databases such as Indeed, which tend to include outdated or unverified jobs). We then downloaded a copy of the text of the job advertisement, extracting specifically the “responsibilities” section (i.e., tasks that the candidate would perform in the job) and the “requirements” section (i.e., knowledge, skills, abilities, and other characteristics [KSAOs] that the candidate should possess). We also saved information, where reported, about the job location (city, state), salary, company size in terms of total number of employees, and other text in the job advertisement that did not belong to the “responsibilities” or “requirements” section. We stopped searching for job advertisements after either (a) collecting 50 job advertisements of each job title, or (b) exhausting our list of companies. Due to some job titles having a limited number of openings at the time of data collection (Summer 2021), the number of job ads found for each job title ranged from 10 to 52, with a total sample of 990 job advertisements downloaded (an average of 31 job ads per job title).

2.5 Materials/Survey instruments

N/A.

2.6 Quality Control

N/A.

2.7 Data anonymisation and ethical issues

The George Mason University IRB reviewed this study and approved it. Since the data we are submitting to this journal was publicly available online (i.e., public job ads), the data do not need to be anonymized.

2.8 Existing use of data

Zhou, S., McEachern, P. J., Aitken, J. A., & Lee, P. (2022, April 28–30). Are we attracting the right candidates? A text analysis approach to understanding the applicability of O*NET in job advertising. In Zhou, S. (Co-Chair), McChesney, J. E. (Co-Chair), & Hoff, K. A. (Co-Chair), Putting the O*NET into good use: A critical evaluation of the use and misuse of O*NET [Symposium].

3. Dataset description and access

3.1 Repository location

https://doi.org/10.17605/OSF.IO/PFHX3.

3.2 Object/file name

JobAdsData2022_OSF.xlsx.

3.3 Data type

Primary Data.

3.4 Format names and versions

Excel.

3.5 Language

American English.

3.6 License

N/A (OSF no license).

3.7 Limits to sharing

N/A.

3.8 Publication date

15/05/2022.

3.9 FAIR data/Codebook

Our data conform to FAIR guidelines. First, the data is easily findable; the metadata in the Excel file includes relevant information such as title, tags, and hyperlink. Second, the data is open and freely accessible via an OSF permalink with a DOI. Third, the data is interoperable, because it is formatted as an Excel file and can be loaded into statistical analyses software to combine with other datasets. Finally, the data is reusable because of the interesting variables included that future researchers could analyze.

The primary dataset (on the sheet “Data_Master”) has 12 variables. id is a unique identifier for each advertisement (990 total). onet_id is the identifier from the O*NET database for the job title (see second tab, ONET Job Titles, for details on each job title). job_title is the actual job title used in the job advertisement. company is the actual name of the company that posted the job ad. job_location is the city, state in which the job is located. salary is the posted salary for the job (most job ads did not post salaries). responsibilities_text is the raw text that describes the “Responsibilities” associated with the position. requirements_text is the raw text that describes the “Requirements” of applicants to the position. preferred_text is the raw text that describes any requirements for the position that were explicitly stated as “preferred” or “nice to have” in the job ad. company_desc is the raw text for any language found in the job ad that describes the overall company as opposed to the position.

The dataset also includes a sheet titled “ONET_Job_Titles”, which lists the 32 job titles included in our sample. Identical to the first sheet, onet_id is the identifier from the O*NET database for the job title. title is the formal job title according to the O*NET. alt_titles are alternative titles for the job, as listed on O*NET. onet_zone is the job zone as indicated by the O*NET (see https://www.onetonline.org/help/online/zones), which ranges from 1 (occupations that need little to no preparation) to 5 (occupations that need extensive preparation). As a note, because we only selected “bright outlook” job titles as described in section 2.4 above, no jobs from zone 1 were included in our dataset. Finally, industry is the industry that the job belongs to, out of eight options: computer science, human resources and management, finance, mechanical engineering, higher education, medicine, nursing, and data science.

4. Reuse potential

Presently, the data is being used for a research project comparing the “responsibilities_text” column to the corresponding empirically validated task statements on the O*NET database via natural language processing using Word Mover’s Distance algorithms (Kusner et al., 2015). Then, the text is being shown to potential job applicants to assess for their applicant reactions in an experimental study. The theory is that job advertisements that are more closely related (in textual content) to the empirically validated task statements found in the O*NET would get rated as more “accurate” by applicants and thus more likely to apply. The study is also examining vocational interests and gender biases using the same text.

The “responsibilities_text” column can be used for substantially more research projects. For example, we are aware of other scholars building natural language processing models that can predict an appropriate job tag for each job based on the advertisement. Such a model would be useful both for researchers seeking to train models using our dataset then test them on new job ads, or practitioners seeking to improve how job ads are being written. As another example, the “responsibilities_text” column may include language on specific skills or tasks that might vary in difficulty. Future research could build natural language processing models to identify the difficulty of responsibilities and assess for the correct “zone” of the job (i.e., entry-level, manager, director). Finally, a recently published paper by Putka and colleagues (2022) demonstrates a new machine learning algorithm that can extract interest ratings from job analysis text. All of these, and more, are examples of algorithms and natural language processing that can be used to analyze these text data.

There are many other columns in the dataset that have not been analyzed. For example, the dataset includes the exact company in which the job is posted. Researchers could pull data on these companies (e.g., size, revenue, industry) to explore how the job ads differ as a function of the company. For example, we would expect that larger companies, which should have access to more resources when it comes to industry best practices in recruiting, would have more “accurate” job ads. Additionally, the job_location column could be used to examine if attributes of the city (e.g., size, distance to major metropolitan area) are a factor.

In terms of the textual content, the requirements_text and preferred_text columns need substantial further analysis. Researchers should examine the degree to which job ads issue requirements that are relevant to the role. For example, a job ad for an entry-level assistant position should not include requirements such as “3–5 years of experience” or “expertise in business strategy”. Natural language processing models can be built to explore the requirements text, both basic and preferred, to examine accuracy of job advertisements through these lenses. As another example, there is a prevailing argument that “companies care more about years of experience than education.” Our dataset can be used to empirically test this assumption, by examining the requirements text to extract data on expectations for experience and education, then comparing the two. Finally, the “company_desc” column can be used to examine how the job ads describe the company overall. Combined with company-level data, this could be used to examine the degree to which job ads data uses positive sentiment vocabulary (e.g., AFINN or NRC sentiment analysis algorithms), compared to vocabulary found in public company reviews (e.g., Glassdoor).

Outside of these natural language processing models, other traditional statistical analyses could also be applied to these data. For example, the data represents a multilevel structure, such that individual job ads (990 ads) are nested within job titles (32 job titles). Multilevel modeling could be employed to investigate differences between job titles as a Level 2 predictor of differences between job ads (Level 1). Alternatively, response surface analysis (Shanock et al., 2010) may be an appropriate method of testing for the similarities between job ads of the same job title, or between job ads and O*NET job analyses.

These examples are just the beginning of a vast amount of research that can be conducted on this dataset. The goal of making this dataset public and citable is to encourage researchers to further explore this important question of the degree to which job advertisements are accurate, and why or why not. Such findings are important both for research on the science of recruiting and for practitioners. Actual company recruiters could use these findings to improve their own work on the job, and everyday workers could use these findings to learn how to better evaluate job advertisements.