Validity Evidence

Skip to Main Content

Types of Validity Evidence

Content Validity

Content validity addresses the match between test questions and the content or subject area they are intended to assess. This concept of match is sometimes referred to as alignment, while the content or subject area of the test may be referred to as a performance domain.

Experts in a given performance domain generally judge content validity. For example, the content of the SAT Subject Tests is evaluated by committees made up of experts who ensure that each test covers content that matches all relevant subject matter in its academic discipline. Both a face validity and a curricular validity study may be used to establish the content validity of a test.

Face Validity refers to the extent to which a test or the questions on a test appear to measure a particular construct as viewed by laypersons, clients, examinees, test users, the public, or other stakeholders. In other words, it looks like a reasonable test for whatever purpose it is being used. This common sense approach to validity is often important in convincing laypersons to allow the use of a test, regardless of the availability of more scientific means.

Content-related evidence of validity comes from the judgments of people who are either experts in the testing of that particular content area or are content experts.

In contrast, because these two groups may approach a test from different perspectives, it is important to recognize the valuable contributions made by both.

Curricular Validity is the extent to which the content of the test matches the objectives of a specific curriculum as it is formally described.

Curricular validity takes on particular importance in situations where tests are used for high-stakes decisions, such as state high school exit examinations. In these situations, curricular validity means that the content of a test that is used to make a decision about whether a student receives a high school diploma should measure the curriculum that the student is taught in high school.

Curricular validity is evaluated by groups of curriculum/content experts. The experts are asked to judge whether the content of the test is parallel to the curriculum objectives and whether the test and curricular emphases are in proper balance.

Criterion-Related Validity

Criterion-related validity looks at the relationship between a test score and an outcome. For example, SAT™ scores are used to determine whether a student will be successful in college. First-year grade point average becomes the criterion for success. Looking at the relationship between test scores and the criterion can tell you how valid the test is for determining success in college. The criterion can be any measure of success for the behavior of interest. In the case of a placement test, the criterion might be grades in the course.

A criterion-related validation study is completed by collecting both the test scores that will be used and information on the criterion for the same students (e.g., SAT scores and first-year grade point average, or AP® Calculus exam grades and performance in the next level college calculus course). The test scores are correlated to the criterion to determine how well they represent the criterion behavior.

A criterion-related validation study can be either predictive of later behavior or a concurrent measure of behavior or knowledge.

Predictive validity refers to the "power" or usefulness of test scores to predict future performance.

Examples of such future performance may include academic success (or failure) in a particular course, good driving performance (if the test was a driver's exam), or aviation performance (predicted from a comprehensive piloting exam). Establishing predictive validity is particularly useful when colleges or universities use standardized test scores as part of their admission criteria for enrollment or for admittance into a particular program.

The same placement study used to determine the predictive validity of a test can be used to determine an optimal cut score for the test.

I would like to do an admission validity study.
I would like to do a predictive placement validity study.

Concurrent Validity needs to be examined whenever one measure is substituted for another, such as allowing students to pass a test instead of taking a course. Concurrent validity is determined when test scores and criterion measurement(s) are either made at the same time (concurrently) or in close proximity to one another. For example, a successful score on the CLEP® College Algebra exam may be used in place of taking a college algebra course. To determine concurrent validity, students completing a college algebra course are administered the CLEP College Algebra exam. If there is a strong relationship (correlation) between the CLEP exam scores and course grades in college algebra, the test is valid for that use.

I would like to do a concurrent placement validity study.

Construct Validity

Construct validity refers to the degree to which a test or other measure assesses the underlying theoretical construct it is supposed to measure (i.e., the test is measuring what it is purported to measure).

As an example, think about a general knowledge test of basic algebra. If a test is designed to assess knowledge of facts concerning rate, time, distance, and their interrelationship with one another, but test questions are phrased in long and complex reading passages, then perhaps reading skills are inadvertently being measured instead of factual knowledge of basic algebra.

Construct validation requires the compilation of multiple sources of evidence. In order to demonstrate construct validity, evidence that the test measures what it purports to measure (in this case basic algebra) as well as evidence that the test does not measure irrelevant attributes (reading ability) are both required. These are referred to as convergent and discriminant validity.

Convergent validity consists of providing evidence that two tests that are believed to measure closely related skills or types of knowledge correlate strongly. That is to say, the two different tests end up ranking students similarly.

Discriminant validity, by the same logic, consists of providing evidence that two tests that do not measure closely related skills or types of knowledge do not correlate strongly (i.e., dissimilar ranking of students).

Both convergent and discriminant validity provide important evidence in the case of construct validity. As noted previously, a test of basic algebra should primarily measure algebra-related constructs and not reading constructs. In order to determine the construct validity of a particular algebra test, one would need to demonstrate that the correlations of scores on that test with scores on other algebra tests are higher than the correlations of scores on reading tests.

Consequential Validity

Some testing experts use consequential validity to refer to the social consequences of using a particular test for a particular purpose. The use of a test is said to have consequential validity to the extent that society benefits from that use of the test. Other testing experts believe that the social consequences of using a test—however important they may be—are not properly part of the concept of validity.

Messick (1988) makes the point that ". . . it is not that adverse social consequences of test use render the use invalid but, rather, that adverse social consequences should not be attributable to any source of test invalidity such as construct–irrelevant variance."

For example, suppose some subgroups obtain lower scores on a mathematics placement test and, consequently, are required to take developmental courses. According to Messick, this action alone does not render the test scores invalid. However, suppose it was determined that the test was measuring different traits for the particular subgroup than for the larger group, and those traits were not important for doing the required mathematics. In this case, one could conclude that the adverse social consequences (e.g., more subgroup members in developmental mathematics courses) were caused by using the test scores and were traceable to sources of invalidity. In that case, the validity of the test use (course placement) would be jeopardized.