Completion-rate data show how many students finished the test or section of a test. The percent of students completing the test is influenced by students who may reach the final questions but choose not to answer them because of their greater difficulty. Therefore, both the percent completing the test or section of a test and the percent completing three-fourths of the test or section are evaluated to determine if a test's time limits are appropriate. In general, a test's time limits are appropriate for the intended population if virtually all of those taking the test complete 75 percent of the questions and 80 percent reach the final question.
After each new version of the SAT is equated (see Equating), a table is used to convert the raw scores on a particular version of the test to the 200-to-800 College Board scale. Each conversion table is slightly different from any other conversion table because the difficulty levels of any two versions are never exactly the same across the entire score scale. The raw scores are not reported.
Correlation and Correlation Coefficient
Correlation refers to the extent that two variables are related. If high scores on one variable are related to high scores on the second variable, the relationship is positive. The correlation coefficient ranges from -1.00 (perfect negative relationship) to +1.00 (perfect positive relationship). A zero correlation coefficient indicates no statistical relationship between the two variables. Most correlations of test scores and measures of academic success are positive. The higher the correlation is, the better the prediction.
The questions in any particular test version vary from very easy (the percent answering correctly is high) to very hard (the percent answering correctly is low) to provide good estimates of test takers' abilities across the entire score scale. The average percent correct for the SAT is around 50, that is, the average percent correct across all items in a section is about 50 percent.
Each year several different forms of the SAT tests are administered to college-bound students. Detailed content and statistical specifications are used to assemble each new form of the tests. One goal of the test assembly process is to make all forms of a particular test equivalent in difficulty for test takers at all levels of ability. In practice, it is not possible to produce test forms that are exactly equivalent in difficulty, and a statistical procedure, referred to as score equating, is used to ensure that scores reported to students on different forms of a test are comparable. Thus, the purpose of equating is to adjust scores for minor differences in test difficulty from form to form, so that a score represents the same level of ability regardless of the difficulty of a particular form. That is, equating is the statistical procedure used to produce comparable scores on different versions of the SAT tests.
The mean is the arithmetic average.
The median is the point on the score scale at which 50 percent of the students' scores are above the point and 50 percent are below.
The rank is the percentage of students whose scores fall below a particular scaled score. Percentile ranks should not be compared across the SAT, including different SAT Subject Tests, because the tests are taken by different groups of students.
A raw score is the number of questions answered correctly minus a fraction of the incorrect answers. The raw scores are converted to scaled scores for reporting to adjust for any differences in difficulty across test versions (See Equating). One-quarter point is subtracted for incorrect responses to five-choice questions; one-third point for four-choice questions, and one-half point for three-choice questions. Nothing is subtracted for answering student produced response questions incorrectly.
Reliability is the extent to which a test measures consistently. For scaled scores, a reliability coefficient of 1.00 indicates a test that is perfectly reliable but is never realized in any operational testing program. The SAT tests are highly consistent with reliability coefficients that are approximately .90.
Restriction of Range
Because students choose colleges and colleges choose students, the range of high school grade point averages and admission test scores is narrower than the range found in the potential applicant pool. This restriction of range decreases the strength of the relationship (expressed as a correlation coefficient) between high school and college GPAs and between test scores and college GPA. The correlation coefficient is adjusted in these tables using the Pearson-Lawley multivariate correction to represent the correlation more accurately.
The standard deviation is a measure of the variability of a set of scores around their mean. If test scores cluster tightly around the mean score, as they do when the group tested is relatively homogeneous, the SD is smaller than it would be with a more diverse group and a higher deviation from the mean.
Standard Error of the Difference—SED
The SED is a tool for assessing how much two test scores must differ before they indicate ability differences. As noted in the definition of Standard Error of Measurement, student's test scores vary from their trues scores (see True Scores) and if a student takes the test several times the reported score may be slightly different. This is why score ranges are provided in the score reports, to emphasize the possibility of small differences. When two students are administered a test their scores will both have ranges based on the Standard Error of Measurement. However, to be confident that two scores indicate a true difference in ability, the scores must differ by at least the SED times 1.5. For example, SAT verbal and math scores must differ by 60 points (40 x 1.5) before the two scores should be considered statistically different.
True Score (See Standard Error of Measurement)
True score is a hypothetical concept indicating what an individual's score on a test would be if there were no error introduced by the measuring process. It is thought of as the hypothetical average of an infinite number of obtained scores for a test taker with the effect of practice removed.
A test is considered valid if it meets its intended purpose. Typical measures of validity are the correlation between test scores and grade point average and the correlation between test scores and course grades.