The Validity of the NACE Competency Assessment Tool

November 19, 2024 | By Joshua Kahn

Career Readiness
An illustration of people climbing a pyramid at work.

TAGS: assessment, career readiness, competencies, journal,

NACE Journal / Fall 2024

In August 2024, NACE released the NACE Competency Assessment Tool to provide higher education professionals and early talent recruiters with the means to assess students, interns, and recent graduates on the eight NACE Career Readiness Competencies. Prior to its release, NACE’s research team engaged in more than two years of development and field testing of the tool to ensure its validity.

This article provides an overview of the evidence for the reliability and validity of the tool for the practitioner community in the early career talent recruiting space. It offers a brief context for the assessment tool, outlines the methodology employed, and provides the final, top-line results of the study. The full slate of more detailed results will be available through the forthcoming, in-depth technical report.

The Development of the Career Readiness Competencies

In 2015, NACE released its initial Career Readiness Competencies, which were developed by a task force of college career services and HR/recruiting professionals. The task force was charged with defining career readiness for the college educated.

Based on a review of the scholarly literature and informed by their practice, the task force identified a set of broad, aspirational competencies that were essential for 21st century graduates. These competencies were later refined, intending to make them less aspirational and more observable. Subsequent work added evidence-based, observable behaviors to the competencies.1

Building on these behaviors, which were published in 2021, the next logical step in the evolution of the work was to develop an assessment tool so the competencies could be measured reliably with a valid tool.

A Review of the Literature

When surveying the field for existing measures of career readiness, NACE staff found that there were surprisingly few valid and reliable instruments—and even fewer that were low or no cost. Moreover, we found that there are no transparent assessments available currently that are developmental and educational in nature and that are also intended to be student-facing.

Of the available instruments, ACT offers assessments that are valid and reliable.2 However, the instruments almost exclusively assess “hard” or technical skills—math reasoning and reading comprehension, for example—rather than “soft” skills, which are both highly desired and difficult to teach. In addition, the assessments do not allow for student learning, which the rubric approach of the NACE Competency Assessment Tool does.

The American Association of Colleges and Universities (AAC&U) took a different approach from ACT by developing the VALUE rubrics, which provide faculty with a transparent way to score complex student work products.3 The VALUE rubrics overlap with several of the NACE Career Readiness Competencies but focus more on academic competencies that are central to higher education’s mission, such as critical thinking, integrative learning, and civic engagement. Using a similar methodology as NACE in the current study to assess the VALUE rubrics’ reliability, AAC&U created realistic student work products, e.g., student essays and homework assignments that faculty (the intended users) would assess using the appropriate VALUE rubric. AAC&U’s reliability studies found that raters demonstrated perfect agreement for their critical thinking rubric 36% of the time, with a kappa score of .29. Civic engagement produced 32% agreement and .15 kappa, and integrative learning demonstrated 28% agreement and a kappa of .11. As AAC&U noted, these numbers are on the lower side, but they were generated by a diverse set of faculty from different disciplines, campuses, and school types, and their results provided grounds for future studies.

A Tool to Assess Career Readiness in Students—and Point the Way to Competency Development

The NACE Competency Assessment Tool, which includes eight assessments—one for each of the eight competencies—enables higher education and talent acquisition professionals to assess student proficiency across the eight NACE Career Readiness Competencies with an eye toward providing actionable feedback to further develop the individual’s career readiness. For example:

  • Students can use any of the eight assessments to rate themselves on their proficiency in the specific competency. They can use the assessments to consider how specific experiences, such as on-campus work experiences, practica, and internships, influenced the development of their career readiness across the elements of the assessments.
  • Career staff can use the assessments to center conversations on what is and will be expected of students in the workplace. These discussions can now move beyond the specific evidence-based behaviors to more general and abstract descriptions of career readiness in the workplace.
  • Faculty can use the assessments in their courses, both to aid students in the development of their career identity and as an assessment tool. Some faculty who participated in the field testing expressed a desire to use the assessments in a pretest/posttest manner, administering an assessment at the beginning of their course and then again toward the end to see how much growth students derived from the course.
  • Employers can use the assessments similarly to both career staff and faculty. Interestingly and anecdotally, when showing the assessments to approximately one dozen internship supervisors, several of them said the assessments opened their eyes to a different way of evaluating their interns, which had, they explained, traditionally focused on interns’ technical skills. The assessment showed them a new way to think about what interns can learn from their internships.

Using the rubric format

The tool’s assessments are formatted as rubrics, the benefits of which are many. Rubrics are especially useful for making learning goals and performance targets transparent to the learner, as they provide descriptions of the different levels of performance as opposed to a simple frequency or Likert-type scale.4,5 Rubrics are inherently educational in this way, and they make complex competencies simpler and more transparent by breaking down the complexity into more manageable components. Moreover, the rubrics will make even more clear to employers what competencies to expect from recent graduates, and how they are operationalized. Finally, rubrics that have been validated and accepted by higher education and the early career recruiting space will lead to acceptance of benchmark standards moving forward.6

Developing and Validating the Competency Assessment Tool

The NACE Competency Assessment Tool was developed iteratively over several phases spanning more than two years, which includes more than 15 months of field testing with more than 600 volunteers—including students. As part of that development, the NACE research team deployed methodology to evaluate the eight competencies’ assessments according to their content validity, usability, reliability, and discriminant validity.

Developing the Tool’s Assessments

As is typical in measure development and validation, we began the process by drafting the assessments; the drafts were undertaken by a dean of assessment who was contracted by NACE to do so.

It is important to note that, in 2022, prior to the development of the tool and its assessments, evidence-based behaviors were added to each competency.7 Building on that work, each draft assessment incorporated and relied on the evidence-based behaviors, but the performance descriptors were intentionally written to be slightly more general and abstract because there are more ways to demonstrate each dimension than the list of specific behaviors suggests. Additionally, when applicable, the specific behaviors were used as examples embedded in the performance descriptions.

The 2022-23 NACE Career Readiness Task Force reviewed the draft assessments and provided feedback. In addition, early feedback was also provided by practitioners taking part in a session at NACE’s 2022 annual conference. The feedback was incorporated into the next iteration of the assessments.

As a result of this process, one assessment was created for each competency. Each assessment contains three to four dimensions, or sub-skills, which comprise each competency and should be directly related to some part of the definition of the competency. For example, the Communication competency is comprised of four dimensions: verbal communication, written communication, nonverbal communication, and active listening. These dimensions were determined by analyzing the evidence-based behaviors and placing them into like categories. These categories then became the dimensions. Where enough behaviors did not exist, the definitions of the competencies were relied upon to derive the appropriate dimensions. Each dimension is assessed along a 4-point scale, starting with Emerging Knowledge (1), Understanding (2), Early Application (3), and Advanced Application (4). Performance descriptors were written to illustrate what performance looks like at each level.

Content Validity and Usability Methods

With completed drafts in place, NACE began the validation phase. This involved surveying experts in the field—career services practitioners, in this case—to ascertain if the content in each assessment is appropriate and covers the depth and breadth of the specific competency and to determine the usability of the assessments.

Typically, content validity is determined by experts in the field, including practitioners.8 Generally, 80% of experts should agree that the content is appropriate to achieve content validity. Though qualitative feedback was sought and considered, we surveyed 373 career services practitioners from 81 colleges and universities through two rounds of data collection on the following quantitative content validity questions:

  • To what extent do the dimensions reflect the definition?
  • To what extent do the dimensions cover the range of the competency?
  • To what extent do the performance descriptors cover the range of each dimension?
  • To what extent is each dimension essential?

Usability considers whether the assessments can actually be used in a practical way by the intended users. In our work, the intended users included career development professionals; recruiters and other employer-based professionals, such as internship supervisors; and students. (Note: More than 100 students over two rounds were surveyed on the usability of the assessments separately from the practitioners, but the surveys went out during the same time windows.)

We surveyed both career services practitioners and students with these questions:

  • To what extent are the instructions clear and easy to follow?
  • To what extent is the tool easy to use?
  • To what extent is the language student friendly?
  • To what extent are the dimension titles written clearly and easy to understand?
  • To what extent are the performance descriptors written clearly and easy to understand?
  • To what extent is the level of detail appropriate?

The results of the first round of these surveys were promising but did not achieve adequate content validity and usability standards. Consequently, substantial revisions were made—the most significant involved simplifying the language to make it more student friendly—and the surveys were sent back out to the practitioners and students.

Career services practitioner sample composition

Content Validity and Usability Results

After the second round of testing with practitioners, all eight assessments achieved content validity and usability. Across all four aspects of content validity that were assessed, every assessment scored above 80%, and about half were above 90%. Therefore, we can say with confidence that the instruments are “content valid” and have demonstrated content validity.
Similarly, on usability for practitioners, all assessments scored above 80% on questions one through four; all eight scored an average of 48.3 on question five, where a score of 0 was “too little,” 100 was “too much,” and 50 was “just right.”

For the second round of students, however, there were seven instances (out of 32) that fell slightly under 80%, even though they were above 80% in the first round. With a smaller sample size in the second round (n=22), falling under 80% was the difference of one person in six out of seven instances, and two people for the seventh. Furthermore, if one averaged all the metrics for all eight rubrics, all averages would be over 80%; for this reason, NACE research was not dissuaded from moving forward with a few metrics falling slightly under 80% on a few competencies. (Please see the forthcoming detailed technical report for these figures.)

Reliability and Discriminant Validity Methods

Once content validity and usability were demonstrated, the NACE research team moved on to the next phase of the project: evaluating the reliability and discriminant validity of the assessments. Reliability refers to the stability of scores on an assessment. Technically, reliability is the overall consistency of a measure. A measure has high levels of reliability when it produces similar results under similar conditions.9 In this case, reliability measures the extent to which experts agree on the ratings, and high levels of agreement mean that experts view the performance similarly. In other words, the scores would be stable, i.e., reliable, regardless of who rates the performance.

To conduct these analyses, NACE used a methodology in which experts would rate common student performances, so their levels of agreement could be calculated. This methodology required writing short vignettes of higher- and lower-skilled performances that allowed multiple experts to rate the same student performance. (Please see the forthcoming technical report for details on how these vignettes were written and validated.)

Results from the first round of testing showed many of the instruments were reliable or approaching it. Further revisions were made based on the results, and the instruments were sent back for a second round of testing—conducted at NACE’s 2024 annual conference—which  then showed the assessments had achieved varying levels of reliability.

Using the higher- and lower-skill performance vignettes, NACE was able to evaluate the instrument’s discriminant validity, which is a form of validity that evaluates the extent to which the assessment is sensitive enough to distinguish, or discriminate, between higher- and lower-skilled performances. In this case, if mean differences on the total scores of the assessment are statistically significant, then the assessment is deemed to demonstrate this type of validity.

Participants in reliability and discriminant validity testing

Reliability and discriminant validity results

NACE used three metrics of reliability to assess each rubric and each dimension: 1) simple percent agreement, which calculates the percent of time those rating agreed on their ratings (on a scale from 0% to 100%); 2) Fleiss’ Kappa, which is a chance-corrected measure of agreement for more than two raters, was used because some agreement could happen by random chance (on a scale from -1 to +1); and 3) the intra-class correlation (ICC), which is another standard way to assess agreement among those rating (on a scale from -1 to +1).

To help readers interpret the results shared here, above 80% agreement is considered adequate, but it is also understood that this level of agreement should be interpreted considering what is being measured. For example, on a simple two-point mathematics item, agreement should approach 100%. On the other hand, when scoring a complex six-point writing constructed-response item, an agreement of 60% would be considered an acceptable result.10 Given that career readiness and these eight competencies are more complex constructs, lower levels of agreement are acceptable.

For Fleiss’ Kappa, and according to NIH, .00 - .20 reflects no to slight agreement; .21 - .40 reflects fair agreement; .41 - .60 reflects moderate agreement; .61 - .80 reflects substantial agreement; .81 - 1.0 reflects almost perfect agreement.11
ICCs that are less than .50 reflects poor agreement; .51 - .75 reflects moderate agreement; .76 - .90 reflects good agreement; above .90 reflects excellent agreement.12

Figure 1

Figure 2 provides the statistical significance and effect sizes for all eight assessments. Based on the statistical significance and effect sizes, we can infer from these results that all eight assessments achieved discriminant validity. Effect sizes are used to give readers an idea of how large or small the statistically significant differences are. Hedges’ G is the preferred effect size to describe the magnitude of these mean differences.13,14 To help readers interpret the Hedges’ G effect sizes, 0.20 - .49 reflects a small effect; 0.50 - .79 reflects a medium effect; and 0.80 and above reflects a large effect.15 Given these general cut points, the effect sizes below are quite large.

Figure 2

 

Looking Ahead

Our results indicate that the NACE Competency Assessment Tool is valid and reliable. Our results also demonstrate that the tool is usable, as rated by practitioners and students. We look forward to seeing members implement the tool with their students and plan to report on those efforts. In addition, NACE will continue to collect data to support further refinement, as appropriate, of the tool.

Endnotes

1 Please see the Technical Report: Development and Validation of the NACE Career Readiness Competencies for more detail on this process.

2 ACT (2023). WorkKeys NCRC Assessments Technical Manual. ACT. https://www.act.org/content/dam/act/unsecured/documents/ACT-workkeys-NCRC-technical-manual.pdf.

3 Finley, A. (2011). How reliable are the VALUE rubrics? Peer Review, 13/14 (4/1).

4 Andrade, H. G. (2000). Using rubrics to promote thinking and learning. Educational Leadership, 57(5), 13-18.

5 Reibe, L. & Jackson, D. (2014). The use of rubrics in benchmarking and assessing employability skills. Journal of Management Education, 38, 319-344.

6 Ibid.

7 NACE (2022). Technical Report: Development and Validation of the NACE Career Readiness Competencies. Retrieved from www.naceweb.org/uploadedFiles/files/2022/resources/2022-nace-career-readiness-development-and-validation.pdf.

8 Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice, 14(4), 5-8.

9 American Educational Research Association, American Psychological Association, and National Council for Educational Measurement (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.

10 National Assessment of Educational Progress (NAEP) (N.D.). Constructed-response interrater reliability. NAEP Technical Documentation. National Assessment of Educational Progress. Retrieved from https://nces.ed.gov/nationsreportcard/tdw/analysis/initial_itemscore.aspx.

11 Landis, J. R., & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33, 159–174. https://doi.org/10.2307/2529310.

12 Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15, 155-163.

13 Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in psychology, 4, 1-12.

14 Taylor, J.M., & Alanazi, S. (2023). Cohen’s and Hedges’ g. Journal of Nursing Education, 62, 316-317.

15 Brydges, C. R. (2019). Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging, 3(4).

Josh Kahn Joshua Kahn is assistant director for research and public policy at NACE. He earned his Ph.D. in educational leadership with an emphasis on quantitative research methods at the University of Oregon in 2018. Kahn can be reached at jkahn@naceweb.org.

NACE JOBWIRE