Assessing Special Education Students (ASES)
The purpose of this Brief is to help states and districts examine their current policies on assessment accommodations, so that the policies and procedures can be improved. This Brief describes the types of evidence needed to show the comparability of accommodated and standard test administrations, so that decisions can be made about the need for:
One result of the reauthorization of the Individuals with Disabilities Education Act (IDEA) in 1997 is a strong directive for students with disabilities to be included in large-scale testing. The legislation requires that for a state to receive Part B funds, children with disabilities must be "included in general State and district-wide assessment programs, with appropriate accommodations, where necessary" [IDEA, section 612(a)(17)(A)]. The legislation does not specify what constitutes an "appropriate" accommodation. Yet, decisions about which accommodations are allowed during testing have to be made by states as they develop and revise their assessment policies.
A testing accommodation is a change made to ensure that information is obtained on the construct being assessed rather than on the students disability. Accommodations usually are designed for specific individuals, to meet specific needs that these individuals have. While meeting these needs, accommodations should not change the nature of the construct being tested. In other words, accommodated and non-accommodated administrations should produce comparable for students.
When are two tasks considered comparable and when are they considered not comparable? How different is too different? According to measurement experts, this distinction is important because if tasks are comparable, then scores obtained from them can be aggregated and compared. When they are not comparable, then scores cannot be aggregated, and they cannot be used in comparisons. Whether tasks are comparable is the key question to be answered when deciding which accommodations are appropriate.
Task comparability is the topic of a paper developed for the Council of Chief State School Officers (CCSSO) by Dr. Gerald Tindal of the University of Oregon. Because the text of that paper is long and somewhat complex, this Brief was developed for state education agency personnel, especially assessment and special education specialists. Individuals interested in obtaining the longer report can request it from CCSSO (see Resource at end of the Brief).
By thinking through issues and examining state policies, more sophisticated accommodation policies and procedures can be designed by states. For example, a state may already have tailored a list of accommodations to fit the specifics of its assessment, perhaps providing specific conditions under which accommodations can be used, and even explanations of why the accommodations are considered appropriate. Based on the information in this Brief, the state might decide to further justify its policies by conducting post-hoc evaluations of the test results to determine whether the accommodations that were used had a consistent relationship to student characteristics and scores. The state might decide further to conduct a research study during field testing of its math assessment, a study designed to examine whether the construct remains the same when the test is presented orally. Finally, the state might decide to develop an observation checklist for IEP teams to use when making decisions about an individual students need for specific accommodations. Thus, states can use the approaches in this paper to enhance their current practices to meet the requirements of IDEA 1997.
A Continuum of Changes
Accommodations cover a range from those that are completely acceptable (i.e., people agree that they do not change the construct of the assessment) to those that are very controversial (i.e., people disagree about whether they change the construct), and further, to those that are completely unacceptable (i.e., people agree that they change the construct of the assessment). This range is reflected in terminology that is commonly used to describe test administrations:
Standard Administration procedures are used that are exactly the same as those used during test development; they are usually defined by the test developer.
Accommodated Administration a change is made in the tests setting, scheduling, timing, presentation format, response format, or some combination of these. To be an acceptable accommodation, it must not change the construct measured or the meaning of scores.
Modified Administration a change in made that causes the construct being measured to be different from that measured in the standard assessment, or that produces a score that means something different from scores of standard administrations. Not all states refer to these as modifications. Instead they may refer to them as inappropriate accommodations, or as accommodations that result in the students score not being counted or reported.
None of the above is the same as an alternate assessment, which usually measures constructs that are different from those measured in standard, accommodated, or modified administrations. The critical question is, when does a change in testing constitute an accommodated administration, and when does it constitute a modified administration? Or, put another way, when is an accommodation appropriate and when is it not appropriate because it changes the construct measured or the meaning of scores? Six approaches to addressing these questions are presented in this Brief.
Six Approaches to Addressing Accommodation Issues
Six approaches to defining the rationale for a states accommodations policies are described in this Brief. They are based on evaluating the extent to which accommodated administrations are comparable to standard administrations. The approaches are:
- Policy Presentation
- Policy Interpretation
- Implementation Analysis
- Post-hoc Evaluations
- Group Experiments
- Single Subject Designs
While the six approaches are grouped under those in which policy is supported by general decision making and those in which policy is supported by research, they actually fall along a continuum of approaches from descriptive to comparative to experimental (see figure below). The approaches also overlap some in their procedures, and more than one approach may be used at the same time.
Policy Presentation. In this approach, decisions about accommodations are made by referring to policy. Although this approach is basically descriptive, it includes a range from simply listing "approved" accommodations all the way to providing broad reasons for considering an accommodation as one that keeps a test administration comparable to a standard administration.
Example: This is an illustration of a policy listing approved accommodations, how they were selected by policymakers, and what happens to scores as a result of the policy.
Mississippi has a list of allowable accommodations organized around seating/setting, scheduling, format, and recording/transferring. The accommodations that are considered appropriate and included in the list were ones that: (a) do not affect the validity of the test, (b) function only to allow the test to measure what it purports to measure, and (c) can be narrowly tailored to address a specific student need (Mississippi Assessment System Exclusions and Accommodations, revised, 1995).
Two of the four tests used in Mississippi (Iowa Tests of Basic Skills, Tests of Achievement and Proficiency) allow very limited accommodations. For certain accommodations, the students scores are not included in summary statistics because it was decided that the students results "cannot be interpreted in the same manner" as the results of students who meet the qualifications for test standardization procedures."
Whenever non-allowable accommodations are used or when a Special Education, 504, or LEP student elects to take the test even though eligible for exclusion, the scores are excluded from summary statistics. Still, these scores are considered to provide valuable information that should be used as a tool to tailor the students educational goals.
Policy Interpretation. This descriptive approach goes beyond Policy Presentation in that it provides detailed explanations designed to help guide the decision-making process, so educators can make appropriate decisions about accommodations not listed in policy. Furthermore, the explanations help IEP teams understand how to make decisions about accommodations for individual students how to base them on the a students needs and how to ensure that they have been used during instruction prior to testing. Case studies may be presented to explain the decision-making process. These focus on fine points that highlight the central features to attend to when making decisions.
Example: Marylands Guidelines for Accommodations, Excuses, and Exemptions (9/3/96 revised) include a list of general principles, as well as definitions and procedures for the Maryland Functional Testing Program (MFTP), the California Test of Basic Skills (CTBS, 5th edition), and the Maryland School Performance Assessment Program (MSPAP).
The general principles in Marylands accommodations policies include:
Accommodations are made to ensure valid assessment of a students real achievement.
Accommodations must not invalidate the assessment for which they are granted . . . accommodations must be based upon individual needs and not upon a category of disability, level of instruction, environment, or other group characteristics.
Accommodations must have been operational in the students ongoing instructional program and in all assessment activities during the school year; they may not be introduced for the first time in the testing of an individual.
The decision . . . of not allowing an accommodation for testing purposes does not imply that the accommodation cannot be used for instructional purposes. (p. 2)
A summary is presented of five broad categories of accommodations (scheduling, setting, equipment, presentation, and response); each includes a list of specific accommodations permitted for each test. For example, calculators can be used for the MFTP, but not for the CTBS/5; they can be used but are considered to invalidate the mathematics scores of the MSPAP.
Case studies also are presented to help guide decision making. For example, a calculator is considered an allowable accommodation for a student from an elementary school who is described as having learning disabilities, memory problems with no mastery of facts, and an IEP with goals and objectives in reading, mathematics, and written expression. Other examples cover recommendations for use of accommodations that are considered to invalidate specific tests.
The Maryland example illustrates a Policy Interpretation approach. It provides a detailed explanation and interpretation of accommodations, what they mean, and how they are to be used and interpreted.
Implementation Analysis. In this descriptive approach, data are used to guide decisions about when an accommodation results in a test administration that is comparable to a standard administration. The first step is to examine other factors, such as student characteristics, as the basis for selecting accommodations. For example, accommodations might be selected based on the students ability to work in a group, work independently, or remain on-task for 45-60 minutes (characteristics required by the test). For students who cannot remain engaged in this manner, it can be inferred that their performance is adversely affected and that the test is not measuring what it should be measuring. This information, documented by IEP teams, would help them make decisions about appropriate accommodations.
Example: In Rhode Island, the following questions focus on student characteristics related to the requirements of the assessment. They represent questions the IEP team should be able to answer to identify needed accommodations:
1. Can the student work independently?
2. Can the student work in a room with 25 to 30 other students in a quiet setting?
3. Can the student work continuously for 45-60 minutes?
4. Can the student listen and follow oral directions?
5. Can the student use paper and pencil to write paragraph length responses to open-ended questions?
6. Based on the sample questions, can the student read and understand these questions?
7. Can the student manipulate a tagboard ruler and various tagboard shapes in small sizes?
8. Can the student operate a calculator?
9. Can the student follow oral directions in English?
10. Can the student write paragraph length responses to open-ended questions in English?
11. Based on the sample questions, can the student read and understand these questions in English?
"No" to any of these questions should result in the identification of appropriate accommodations. It is recommended that the principal and/or relevant school and district staff then be consulted on making any accommodations recommendations.
Distinctions can be made among test purposes (e.g., graduation diploma, district report card) and types of test instruments (e.g., norm-referenced, standards-based) to help inform policy about implementation. Whether an accommodated administration can be considered comparable to a standard administration usually is a function of the students tested, the decision being made, and the type of test. Opinion and survey data also could be collected, to the point of asking how students using accommodations perform compared to students not using accommodations. This last question, however, falls within the realm of post-hoc evaluations.
Post-hoc Evaluations. Post-hoc evaluations represent a comparative approach to defining task comparability. They rely on multiple data sources within a system, so that relational statements can be made about process and performance.
Example: An early study of the effects of providing accommodations is an example of a post-hoc evaluation (Grise, Beattie, & Algozzine, 1982).
Fifth grade students were given an accommodated version of the Florida State Student Assessment Test. Seven format changes were included in this version: (1) presenting items in an order that reflected a hierarchical progression of skills, (2) placing bubbles to the right side of multiple-choice options, (3) using an elliptical shape for bubbles, (4) not breaking sentences to fit a right-justified format, (5) placing reading passages in shaded boxes, (6) providing examples for each skill assessed, and (7) using directional symbols for continuing and stopping. Some of these tests also were enlarged by 30%.
The researchers found that enlarging the test had little effect on the performance of students with learning disabilities. In contrast, many students performed at mastery levels when the accommodated version of the test was administered.
Post-hoc evaluation data generally are confounded by other variables, thus limiting their usefulness for reaching cause-effect conclusions. For example, when existing groups (e.g., classes) are used, rather than random samples, findings are often difficult to interpret because the sample may be biased in some way (such as only those students expected to perform poorly receiving accommodations). The biggest difficulty, however, is that there is no way to determine differential effects of accommodations because the same students are not tested with and without accommodations. This is why experimental approaches are needed.
Group Experiments. Group designs for assessing the effects of accommodations typically involve the same students taking a test with accommodations one time and without the accommodations another time. Usually at least two groups are included, often one group is students with disabilities and the other is students without disabilities. A research design is established before data collection begins, taking into consideration any potential threats to the validity of the findings. A measurement system is used that is technically adequate for analyzing student performance.
Example: A recent study of the effects of providing accommodations is an example of a group experiment (Tindal, Hollenbeck, Heath, & Stieber, 1997).
All students in the study (both students with disabilities and students without disabilities) participated in two writing exams. The first exam involved students composing handwritten essays over 3 days as part of the Oregon State Writing Assessment. The second occurred about 3 months later under identical conditions except that students composed their essays on word processors. All essays were scored by certified state raters using the states six-trait analytic scores. For comparison, each exam was transcribed to the other form (e.g., handwritten compositions were word processed by researchers and word processed compositions were transcribed into handwriting by a youngster of the same age as the students in the study).
One of the findings of this study was that on four of six scoring traits state judges rated the handwritten versions of the same compositions significantly higher than the word processed versions. This finding implies that the two tasks (writing compositions by hand and writing them with computers) are not comparable and should not be used in the same evaluation system. Even though this interpretation is a function of rater sensitivity to the composition format, until this sensitivity can be removed from the judgment process, incorrect interpretations might be made from test results. For example, the conclusion that a student who did not get a passing score on "Conventions" for a word-processed composition needed more instruction on the mechanics of writing would be inappropriate because that same composition in handwritten format might receive a passing score.
Ideally, group experiments need to be part of a program of research, so that studies can be linked together to inform decision making. In this way, there will be information on the effects of accommodations that is less confounded and that contains fewer threats to validity. This approach allows conclusions to be reached for groups, such as the conclusion that a specific accommodation affects the performance of students with disabilities but not the performance of students without disabilities.
Single Subject Designs. Decisions about the effects of accommodations for individual students are most accurately made when research questions are anchored to the unique needs of the student. This is best accomplished in single subject designs, where the effects of accommodations for a student can be examined over time. In this research design, interviews and observations are used to identify why certain behaviors occur and what their function might be. Systematic changes in accommodations can be introduced to confirm explanations and to verify the need for certain accommodations.
When using a single subject approach, it is important to make decisions about the appropriateness of accommodations in relation to the standards being assessed, not the way that the standard is measured. If the outcome is the same for a person who uses a needed accommodation as it is for a person who does not need an accommodation, the the accommodated and standard conditions should be considered comparable. For example, if the standard is that a student demonstrate the ability to interact with text to comprehend it, then those who successfully comprehend text received in Braille can be considered the same as those who receive their text in printed format.
Interviews, observations, and systematic manipulations are all ways to collect data to judge comparability. Some of the useful information that can be obtained through interviews includes:
Dimensions of behavior (characteristics, frequency, duration, intensity).
Ecological events that have a potential impact on performance (e.g., medications, sleep cycles, eating routines, activity schedules).
Events and situations that predict performance (e.g., time of day, setting, social control).
Primary mode of communication.
Undesirable behavior functions and history of occurrence.
Presence of "functional alternative" behaviors.
Example: Interview information could be useful in developing IEPs that include accommodations to allow a student to participate in and complete statewide tests. It may be discovered that a student taking medications has times during the day in which the uptake or dissipation of a drug influences different dimensions of behavior, such as attention, time on task, or concentration. Without knowing this from the parents, the specialist, or the nurse, the person administering a large-scale test might assume the student is "taking" the test when, in fact, the test is merely in front of the student with no attention being given to its completion. By discovering this information, the test could be given at another time of day, with the assumption that the task is more comparable to the standard administration since the student is now actively interacting with the test materials.
Direct observations, which are more objective and timely than interviews, also can provide information useful for determining appropriate accommodations. Some of the dimensions that might be the focus of direct observations are:
Time of day.
Setting events that affect student behavior.
Example: In most test-taking situations, students need to be attentive to several aspects of the actual test, such as listening to directions, reading, scanning alternate choices on multiple choice measures, and planning written responses on performance measures. Direct observations are useful in determining whether these test expectations have been met.
There are a number of ways that students might not behave as expected during a test. For example, they may not be paying attention while directions are given or they might lose their place and begin recording answers in the wrong place. On multiple choice tests, the student may place an "x" in the bubble or make the bubble too large, resulting in scanning errors.
Direct observations in classrooms also can be useful. For example, a student may pay attention to distracting things when a teacher is talking at the front of the room. For this student, an adult reading from a paper or book in the front of the room is all the same (whether it is a test or not), signaling time for disengagement.
Systematic manipulations are probably the most direct way to determine which accommodations are appropriate. This approach involves studying the effects of using an accommodation for individual students.
Example: Systematic manipulations can be used to determine the inappropriateness of accommodations as well as the appropriateness of selected accommodations.
In response to a parents request, a student was provided an accommodation that involved the use of Irlen lens technology (reduction of various light frequencies to reduce distractive stimulation and to focus attention) (Robinson & Miles, 1987). This student had an attention deficit-hyperactivity disorder. His doctor had determined that the student needed a reduction of artificial light since it was perceived to overstimulate the students sight and brain. The doctors prescription was to provide accommodations that included: (a) wearing a baseball cap, (b) sitting near the window (for maximal exposure to natural light), and (c) using colored overlays on all his work.
Up to this time, the teacher had used two approaches to instruction (direct instruction; direct instruction plus points for reinforcement), and had documented their effects by charting the students correct reading performance, incorrect reading performance, and on-task behavior. Thus, through experimental manipulation, the students performance under these two conditions could be compared to the students performance when the doctors recommended accommodations were implemented.
The students performance, shown in the table below, is clearly different under each of the conditions. Graphic depictions of actual performance also can be constructed to show these differences. The students performance indicates that access to the test items and response opportunities are better without the doctor-recommended accommodations.
Direct Instruction DI plus Points Irlen Lens Treatment Words Correct Per Minute
Errors Per Minute
Percent On-Task Time
This type of approach can easily be transferred to performance on a statewide assessment.
Next Steps for States
The six approaches included in this Brief describe how judgments can be made about the comparability of accommodated and standard administrations. The amount of data needed in the various approaches varies, as does the nature of conclusions that can be reached (from simple to cause and effect). These approaches, and often a combination of them, can be discovered when examining states policies on accommodations. When the approaches currently in use are articulated, states can:
As noted at the beginning of this Brief, a state may already have acquired a list of commonly allowed accommodations and tailored it to fit the specifics of its state test. It also might already have provided a broad rationale for the accommodations. After reviewing the six approaches described in this Brief and presented in more detail in the full paper, the state might decide to further justify its policy by conducting post-hoc evaluations of the test results to determine whether the accommodations used had a consistent relationship to student characteristics and scores. In addition, the state might design a research study to be carried out during a field test of its next mathematics assessment to see whether an oral presentation of the test changes the construct of the test. Finally, the state might develop an observation checklist for IEP teams to use in making decisions about an individual students need for specific accommodations. States can thus enhance their current practices to meet the requirements of IDEA 1997.
Beattie, S., Grise, P., & Algozzine, B. (1983). Effects of test modifications on the minimum competency performance of learning disabled students. Learning Disability Quarterly, 6 (1), 75-77.
Grise, P., Beattie, S., & Algozzine, B. (1982). Assessment of minimum competency in fifth grade learning disabled students: Test modifications make a difference. Journal of Educational Research, 76 (1), 35-40.
Robinson, G.L., & Miles, J. (1987). The use of coloured overlays to improve visual processing A preliminary survey. The Exceptional Child, 34 (1), 63-70.
Tindal, G.. Hollenbeck, K., Heath, W., & Stieber, S. (1997). Trait differences on handwritten versus word-processed compositions: Do judges rate them differently? Eugene, OR: University of Oregon Research Consultation, and Teaching Program.
Tindal, G. (1998). Models for understanding task comparability in accommodated testing. Paper prepared for the Council of Chief State School Officers, State Collaborative on Assessment and Student Standards, Assessing Special Education Students (ASES). Obtain from CCSSO, One Massachusetts Avenue NW, Suite 700, Washington, DC 20001-1431. Also, see http://www.ccsso.org/.