West Virginia Department of Education SAT-9 and Testing Conditions: Research Summary

West Virginia

Published by the National Center on Educational Outcomes
April 2003


Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:

West Virginia Department of Education. (2003). West Virginia Department of Education SAT-9 and testing conditions: Research summary. Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://education.umn.edu/NCEO/TopicAreas/Accommodations/WestVirginia.htm


Summary of Findings

This study employed descriptive and inferential statistics to analyze three years' testing data from the Stanford Achievement Test, Ninth Edition (SAT-9) in West Virginia to provide the West Virginia Department of Education (WVDE) personnel with decision-making and quality assurance information. Districts and schools must maintain specified levels of student performance under standard testing conditions on the norm-referenced SAT-9 to attain and maintain accreditation under the existing state board policy. Although test results are disaggregated each year in order to report back to the schools and to ensure that the policies of schools and districts are aligned with the state policies, there has not been a rigorous and systematic analysis of the SAT-9 data to examine the test results of students with disabilities.

Nearly 200,000 students enrolled in grades 3-11 in West Virginia completed the SAT-9 in each of 1999, 2000, and 2001 (see Table 1). The main focus of this study was to describe test performance of students with different disabilities and in various least restrictive environments as they differed according to testing conditions. Testing condition was a dichotomous variable. Students tested under standard conditions may or may not have received an accommodation but that accommodation was not to change the tested constructs. Students who took the test under non-standard conditions either moved into the school district mid-year, and/or received a modification during testing that potentially changed the tested constructs. Scaled scores were used in all analyses performed in this study; however, in order to compare across subject areas and to provide a more understandable metric for comparing groups of students, in some instances the grade equivalent scores are reported. As described in the body of the report, these scores are not intended to represent grade level performance; rather, they are used to illuminate differences found among subgroups and between subject areas.

Table 1. Numbers of Students With and Without Disabilities Completing SAT-9 (by Testing Condition)

Test Takers

1999

2000

2001

Standard

Non-Standard

Standard

Non-Standard

Standard

Non-Standard

Non-Disabled Students (General Education)

159,631

4,452

155,446

4,734

153,496

4,837

Students with Disabilities (Special Education)

10,056

17,063

9,608

18,541

9,434

19,332

Total

191,202

188,329

187,099

Students with disabilities as a group made gains in reading until the ninth grade, when reading achievement stabilized.   General education students, however, continued to make gains in reading, thus further increasing the gap between students based on educational status (general education versus special education). Special education students (students with disabilities) had marked growth in math from year to year. Throughout the research documented in this study, special education students' language arts scores lagged behind reading and math scores, with the trend becoming most pronounced in the middle school grades.  Figures 1 through 3 illustrate the test performance at different grades of students with disabilities and general education students.

Students with behavior disorders and students with specific learning disabilities showed marked similarities in their reading, math, and language arts achievement. Students with speech/language impairments performed most similarly to general education students, and students with mild to moderate mental impairment performed at the lowest levels among these groups.  When the scaled scores of students with different disabilities were disaggregated by testing conditions (standard versus non-standard), there was no difference (less than 2 points) based on testing conditions in the test scores of students with mental impairment. However, scaled scores of students with specific learning disabilities, behavior disorders, and with speech/language impairments showed large 22- and 23-point differences between those tested under standard and non-standard conditions. See Figure 4 for more detail.

In examining test scores based on levels of least restrictive environment, the students with disabilities who were in regular education full time performed at the highest levels.  However, students in regular education full time who received modifications (non-standard) performed similarly to those in regular education part time who did not (standard conditions).   Furthermore, students in regular education part time who took the test under non-standard conditions scored similarly to students who spent most of their time in a separate special education class.

Districts have different policies, or at least different implementation procedures for deciding which students with disabilities take the test under standard versus non-standard conditions.  An initial look at a sample of districts chosen to represent the diversity in West Virginia's school districts illustrated that districts can differ by over 50% in terms of the percentage of special education students tested under standard conditions. However, taking students out of standard testing did not appear to increase district test scores because the average reading score of a district that tested fewer than 10% of its special education students under standard conditions was almost identical to that in the district that tested 64.1% of its special education students under standard conditions (scores of 646 vs. 645 respectively). Conversely, when the scores of students who took the test under non-standard conditions were compared, the district with the more inclusive testing policies had a lower average reading score than the less inclusive district (619 vs. 638 respectively).   Moreover, the less inclusive district had a difference of 7 points compared to a 27-point difference in the more inclusive district when their standard and non-standard test takers scores were compared. This evidence therefore suggests that the highly inclusive district had all but the more profoundly disabled students take the test under standard conditions.

Relationships among variables were examined by calculating the effect sizes for scaled score comparisons (t-tests) between the two testing conditions. Results showed that testing conditions make a more profound practical difference in language arts than in either reading or math scores. Although the differences due to testing conditions differ across subject areas, they are stable across time. Sequential regressions were also employed to determine the relative roles of type of disability, LRE, and testing conditions in predicting special education students' SAT-9 test scores.  The three variables did not account for much of the variance in test scores (25-30%), and account for less variance as grade level increases.  Moreover, the predictive utility of the testing conditions variable decreased as grade level increases, whereas the opposite was true for both LRE and type of disability. Therefore, students' SAT-9 test scores appear to be largely and increasingly dependent on LRE and type of disability with testing conditions affecting performance less as grade level increases, but overall some 70 to 75% of the variance in students' test scores can be explained by other unknown factors.

A sample of eight districts representing different performance levels was drawn from the population data and examined in depth to determine the extent to which SAT-9 scores of students with disabilities varied as a function of district performance levels, testing conditions, disability status, content area, and student school level (elementary, middle, or high).  Districts were identified by examining all 55 West Virginia districts' test scores and testing patterns for special education (percent tested under standard conditions).  Four levels of district performance were identified including (1) two “High Performing” districts that are known for inclusive testing policies and respected for high levels of student achievement in the state; (2) two “Average” districts that are representative of the most common levels of inclusion in standard testing conditions and student achievement in the state; (3) two “Improving” districts that had previously been identified by the state as low performing, and received technical assistance from WVDE until they met state-mandated criteria for performance, but are no longer receiving technical assistance.  These districts tend to have low rates of testing students with disabilities in standard conditions, and low student achievement levels; and  (4) two “Low Performing” districts that have low rates of testing students with disabilities in standard conditions, and that were recently identified by the state as having poor student achievement levels, and are currently receiving technical assistance from the state but have not yet met the state-mandated criteria for performance.

The sample was further limited to students who were continuously enrolled for all three years in the same district, and to students whose disability status and LRE had remained the same between 1999 and 2001.   In addition, the types of disabilities examined were limited to behavior disorders, speech/language impairments, specific learning disabilities, and other health impairments. LRE was also limited to regular education full time, regular education part time, and special education separate class. Thus, the final sample consisted of 20,950 students, including 19,506 students in general education and 1,444 in special education.

As shown in Table 2, in this sample, lower performing districts tested larger proportions of students in non-standard conditions than did high performing districts.   Moreover, lower performing districts tested decreasing numbers of students in standard conditions from 1999 to 2001, whereas higher performing districts tended to test more of their students in special education under standard conditions during this period.

Table 2. Distribution of Students across Testing Conditions by Year and District Performance Level

District Performance Level

District

Testing Condition

1999

2000

2001

Number

Percent

Number

Percent

Number

Percent

High Performing

Mineral

Standard

100

62.1

112

69.6

127

78.9

Non-Standard

61

37.9

49

30.4

34

21.1

Wood

Standard

219

73.0

225

75.0

235

78.3

Non-Standard

81

27.0

75

25.0

65

21.7

Total

Standard

319

69.2

337

73.1

362

78.5

Non-Standard

142

30.8

124

26.9

99

21.5

Average

Jackson

Standard

74

54.0

72

52.6

77

56.2

Non-Standard

63

46.0

65

47.4

60

43.8

Marion

Standard

80

43.7

70

38.3

99

54.1

Non-Standard

103

56.3

113

61.7

84

45.9

Total

Standard

154

48.1

142

44.4

176

55.0

Non-Standard

166

51.9

178

55.6

144

45.0

Improving

Logan

Standard

56

36.1

47

30.3

36

23.2

Non-Standard

99

63.9

108

69.7

119

76.8

Mingo

Standard

50

26.6

49

26.1

55

29.3

Non-Standard

138

73.4

139

73.9

133

70.7

Total

Standard

106

30.9

96

28.0

91

26.5

Non-Standard

237

69.1

247

72.0

252

73.5

Low Performing

Lincoln

Standard

38

21.1

32

17.8

29

16.1

Non-Standard

142

78.9

148

82.2

151

83.9

McDowell

Standard

24

17.1

15

10.7

18

12.9

Non-Standard

116

82.9

125

89.3

122

87.1

Total

Standard

62

19.4

47

14.7

47

14.7

Non-Standard

258

80.6

273

85.3

273

85.3

Several series of sequential regressions were used to determine the extent to which district performance levels accounted for variance in SAT-9 test scores, and the extent to which they predicted the assignment to testing conditions. In the first series of regressions, LRE, district performance level, school level (elementary, middle, or high school), and four dichotomous disability categories (behavior disorder, speech/language impairment, specific learning disabilities, and other health impairments), were used to predict testing conditions in each of the three years.  The next series of regressions repeated the first but at each of the three school levels—elementary school (grades 3-5), middle school (grades 6-8), and high school (grades 9-11). This process was repeated to predict special education students' SAT-9 scores, this time however testing conditions was included as a predictor.   Finally, the same process was used to examine the role of district performance level in predicting general education students' SAT-9 scores.

Results showed the following:

  • LRE was the most important predictor and had the strongest unique relationship with assignment to testing conditions when all school levels were considered together, accounting for approximately 30% of the variance in testing conditions each year.
  • District performance had a small but strong relationship with assignment to testing conditions and became an increasingly important predictor of testing conditions from year to year, accounting for 6.5% of the variance in 1999 but 9.2% and 12.5% of the variance in 2000 and 2001 respectively.
  • When school level was held constant, there was a notable shift in the importance of each of these predictors. As shown in Table 3, results suggested that district performance level becomes an increasingly important predictor in the assignment of testing conditions over time as students mature, although speech/language impairment and LRE tended to have the strongest relationships and predictive utility with assignment to testing conditions at the elementary and middle school levels respectively.

Table 3. Proportion of Variance Accounted for in Assignment to Testing Conditions by School Level for Each Independent Variable (?) and the Combined Total (R2)

Year

Independent Variables**

Variance Accounted for in Testing Conditions

1999

2000

2001

?

R2

?

R2

?

R2

Elementary School

Speech/Language Impairments

.454

.596

.544

.655

.464

.634

LRE

.105

.072

.044

District Performance Level

.035

.036

.116

Behavior Disorders

.002

.003

*

Other Health Impairments

*

*

.011

Middle School

LRE

.235

.351

.277

.442

.315

.525

District Performance Level

.073

.101

.095

Speech/Language Impairment

.044

.055

.110

Behavior Disorders

*

.009

.005

High School

District Performance Level

.449

.503

.471

.547

.439

.573

LRE

.054

.069

.005

Behavior Disorders

*

.006

*

Specific Learning Disabilities

*

*

.005

*Variable did not make a statistically significant contribution to the regression equation.
**Variables not listed did not make a statistically significant contribution to the regression equation.

  • Results of regressions on SAT-9 test scores showed that grade level and LRE were the most important predictors of performance in reading, math, and language arts every year accounting for 25 to 46% of the variance, with overall variance explained with all predictors (i.e., grade level, testing conditions, LRE, types of disabilities, and district performance levels) ranging from 28-48%.
  • When school level was held constant, 1999 data showed again that LRE had the strongest relationship with students' SAT-9 scores at all school levels and content areas accounting for between 5 and 18% of the variance. Elementary school students' scores on the language arts subtest were an exception to this pattern in that testing conditions was a more important predictor of students' SAT-9 scores than LRE, and contributed some 12% of the variance explained.

  • The 2000 data showed a similar pattern of results with LRE the most important predictor of students' SAT-9 scores for all subtests at the middle and high school levels contributing between 12 and 17% of the variance, but having a weak relationship with SAT-9 scores at the elementary level (range 2-5%). At the elementary school level, speech/language impairment was the most important predictor of SAT-9 reading and math scores (11% and 21% respectively), and testing conditions was an extremely strong predictor of how a student would score on the language arts subtest alone accounting for 21% of the variance in test scores.
  • In 2001, LRE showed a very strong relationship with SAT-9 reading scores for all school levels (range 9 to 17%), SAT-9 math scores at the middle school level (14%), and SAT-9 language arts scores at the high school level (16%). However, of all the variables entered, testing conditions had the strongest relationship with SAT-9 math scores at the elementary and high school levels (19% and 14% respectively) and language arts scores at the elementary level (23%), but also made a modest but significant contribution to the prediction of these scores at the middle school level.
  • When district performance and grade level were used as predictors of general education students' SAT-9 reading, math, and language arts scores, results showed that school level had a very strong relationship to SAT-9 scores accounting for 14 to 29% of the variance, but that district performance level contributed very little (range less than 1 to 2%) to the prediction of test scores. Moreover, when school level was held constant, district performance again showed a very weak relationship with general education students' SAT-9 reading, math, and language arts scores.

These results taken together underscore the importance of appropriate assignment to LRE, and suggest that the influence of district performance level on SAT-9 test scores is indirect in that it affects assignment of students with disabilities to testing conditions, which in turn exert a modest effect on their test performance.   It is apparent from the earlier findings that as district performance levels decrease, the percentage of students with disabilities who take the test under standard conditions also diminishes. Therefore, although testing conditions only appear to have a minimal effect on test performance, because of policies and/or practices in low-performing districts, student performance—especially that of students under non-standard testing conditions—may be somewhat artificially inflated by the unnecessary assignment of students with disabilities to non-standard testing conditions.  This evidence may prove valuable in WVDE staff efforts to increase the standard condition participation rates of students with disabilities, and help persuade superintendents and school boards that excluding these students' scores does little to enhance their overall test performance.

This study is significant because very little previous research has analyzed data from statewide administrations of a norm-referenced assessment to determine what effect testing conditions have on test scores of students with disabilities.  Implications for WVDE personnel and policymakers are discussed with regard to improving data collection/documentation associated with testing conditions of students with disabilities, collecting data on the specific testing accommodations and modifications that are used, and redefining accountability measures for schools and counties using student test scores, particularly as West Virginia implements requirements of the NCLB Act. Areas for professional development school district personnel, especially special education teachers, and parents are highlighted. Recommendations for future research are suggested.

Return to State Accommodations Research page

© 2006 by the Regents of the University of Minnesota.
The University of Minnesota is an equal opportunity educator and employer.

Online Privacy Statement
This page was last updated on May 29, 2012