NCEO Synthesis Report 48
Published by the National Center on Educational Outcomes
Any or all portions of this document may be reproduced and distributed without prior permission, provided the source is cited as:
Wiener, D. (2002). Massachusetts: One state's approach to setting performance levels on the alternate assessment(Synthesis Report 48). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://education.umn.edu/NCEO/OnlinePubs/Synthesis48.html
In Massachusetts, about one percent of all students being assessed submit portfolios for the Massachusetts Comprehensive Assessment System (MCAS) Alternate Assessment. These portfolios are based on "expanded" state standards that describe academic outcomes appropriate for students with significant disabilities. Teachers collect "evidence" of their studentsí performance on the standards during targeted instructional activities or structured student observations to create portfolios that contain an array of work samples, instructional data sheets, audio- and videotapes, or other evidence organized into "portfolio strands" in each content area.
MCAS Alternate Assessments are submitted to the state for scoring and designation of a performance level that gives parents and teachers information on how well these students are learning the general curriculum relative to their past performance and the performance of other students. The process used by the Massachusetts Department of Education to assign performance levels to alternate assessments is the focus of this report. This technical phase, called standard setting, reflects several steps that typically occur between scoring and reporting. However, the process reflects theoretical debates and decisions that occurred much earlier in the development process of the alternate assessment, sometimes years before the first portfolio was compiled and submitted. Several of these earlier conversations and their consequences are also described in this report since the recommendations form the philosophical basis of much that followed.
The alternate assessment in Massachusetts is one pathway to meet the state requirements for earning a "competency determination" needed to receive a regular high school diploma. Therefore, it was necessary to calibrate performance levels precisely between the alternate assessment and the general assessment, especially at the Needs Improvement level, which is the level required to earn the competency determination. Massachusetts decided to use an analytical rubric to convert raw scores to performance levels. Combinations of scores that could be obtained across the alternate assessment scoring rubrics for Level of Complexity, Demonstration of Knowledge and Skills, and Independence were discussed and reasoned perceptions were used to assign performance levels of Awareness, Emerging, Progressing, and Needs Improvement or above (Proficient, Advanced) in each portfolio strand. The reasoning behind the Massachusetts approach and the ways in which performance levels in each strand are combined to produce an overall performance level is described further in the report. This approach reflects not only the Massachusetts standards, but also its unique culture and values.
States are finding different ways to adapt their accountability systems to include all students, because the achievement of students with disabilities has typically lagged behind that of their non-disabled peers, and because recent state and federal laws require the participation of all students. Special educators are considering how, rather than whether, students with disabilities will participate in statewide assessments, while assessment policies themselves have become more flexible in accommodating the administration of those tests. Curriculum experts are placing increased emphasis on teaching students with disabilities the same content and skills being taught to their non-disabled peers, while regular and special educators are working together to adapt curriculum and instruction so diverse learners may participate more fully in academic activities.
A comparatively small number of students with the most complex and significant disabilities, though, have been more difficult to include in statewide assessments. Academic skills and subject matter have not always been a part of the curriculum for this population, and information has not systematically been collected on what these students have learned. The performance of these students is not easily determined using the same standardized paper-and-pencil tests used with the majority of students, but since participation in these assessments is now required, states have had to decide how best to include these students by giving them "alternate assessments." Alternate assessment methods and formats are determined by each state individually, though their common purpose is to improve instruction for these students and report their academic performance. By using alternate assessments with this population, schools can document what is being taught for purposes of system accountability, and demonstrate to parents and the public to what degree each of these students has learned state standards.
A majority of states have adopted individual academic portfolios as the most effective method of assessment for these "difficult-to-assess" students. Student portfolios accommodate a range of approaches to document learning, and afford teachers options for determining the ideal time, place, and method to assess their students. Portfolios provide teachers, students, and their parents with tangible evidence of student performance and feedback on their progress. While the contents of each is unique, their structure allows for evaluation and scoring using uniform criteria that can be shared with teachers beforehand.
The demands of creating and managing portfolios, and compiling this information for submission to the state, however, requires additional expertise on the part of teachers and time in which to complete this work. This fundamental change in classroom practice has required states to make a strong and continued commitment to provide professional development and technical assistance to educators who conduct alternate assessments, and to engage in an open dialogue about the efficiency, rigor, and usefulness of the process with those who are most affected by it.
Alternate Assessment in Massachusetts
In Massachusetts, about 5,000 students, or one percent of all students being assessed, submit portfolios for the Massachusetts Comprehensive Assessment System (MCAS) Alternate Assessment. In creating portfolios, their teachers must first identify challenging outcomes for each student based on the standards in each content area being assessed. Many states, including Massachusetts, use an "expanded" version of their standards that describes academic outcomes that are appropriate for students with significant disabilities. Teachers then collect "evidence" of their studentsí performance on those standards during targeted instructional activities or structured student observations. Portfolios may contain an array of work samples, instructional data sheets, audio- and videotapes, and other evidence organized into "portfolio strands" in each content area.
Once MCAS Alternate Assessments are submitted to the state, these are scored and a performance level assigned in each content area so parents and teachers have information on how well these students are learning the general curriculum relative to their past performance and the performance of other students. The process used by the Massachusetts Department of Education to assign performance levels to alternate assessments is the focus of this report. This technical phase, called standard setting, reflects several steps that typically occur between scoring and reporting (Quenemoen, Rigney, & Thurlow, 2002). However, the process reflects theoretical debates and decisions that occurred much earlier in the development process of the alternate assessment, sometimes years before the first portfolio was compiled and submitted. Several of these earlier conversations and their consequences are also described in this report since the recommendations form the philosophical basis of much that followed. First among these conceptual discussions was defining who should take an alternate assessment.
A Diverse Group of Advisors
Late in 1998, the Massachusetts Department of Education began convening regular task force meetings comprised of DOE staff (from Special Education and Assessment units), the contractor team (Measured Progress and the ILSSA group at the University of Kentucky), and the Massachusetts Alternate Assessment Advisory Committee (a diverse stakeholders group) who provided recommendations to the Department on a range of assessment issues, including:
Guidelines for IEP Teams: Who Should Take Alternate Assessments?
It was assumed from the beginning that students who needed alternate assessments were, for the most part, those who could not take paper-and-pencil tests and whose academic performance was based on the expanded standards appropriate for students with significant disabilities. However, the task force also identified students whose disabilities were not primarily cognitive whom they felt should also be considered for alternate assessments by their IEP and 504 teams. Generally, this smaller group of identified students had disabilities that presented them with "unique and significant challenges" to participation in standardized statewide testing regardless of the accommodations they could use on those tests. They recommended, for example, that students with severe behavioral and emotional disabilities, or those with cerebral palsy, sensory impairments (deaf, blind, or deaf and blind), or fragile health and medical conditions should also be considered for alternate assessments, regardless of their levels of academic performance since taking on-demand statewide tests could present them with insurmountable barriers to their participation, and therefore deny them access to the assessment (Massachusetts Department of Education, 1999).
Based on guidelines provided to Massachusetts IEP Teams since 1999, students across the full spectrum of academic performance, then, are eligible to take alternate assessments, even when they are able to demonstrate the same (or higher) levels of performance as a tested student. They simply require an alternate assessment format to demonstrate their knowledge and skills. Therefore, the MCAS reporting system required sufficient flexibility and integrity to provide meaningful feedback on students who demonstrate a "comparable performance" to a student who scores at the highest levels on the standard tests. It also became necessary to incorporate a method by which a student could meet the stateís graduation requirement through an alternate assessment. The task force strongly advised that the alternate assessment be a different, though not easier, pathway to demonstrate the same performance as a tested student.
Scoring Alternate Assessments
The task force next considered and selected criteria on which to base the scores of alternate assessment portfolios. They advised the Department to develop criteria based primarily on student performance, since that is what the standard assessment measured, rather than assessing how well the studentís program provided opportunities to learn this material. Some on the task force, however, felt that student achievement could not be separated from program effectiveness. In the end, a scoring rubric was developed in which four out of six categories are based on student performance, and two reflect the effectiveness of the studentís program:
Scores are determined and reported in each of the rubric areas listed above. Once numerical scores are obtained for a portfolio in these rubric areas, raw scores must somehow be combined to identify an overall performance level in the content area. Before performance levels can be determined, however, several important questions must be answered:
Defining Performance Levels
The task force recommended that performance levels be identical to performance levels on standard MCAS tests; but that the lowest performance level, called "Warning/Failing at Grade 10" for tested students, would be sub-divided into three distinct levels in order to provide more meaningful descriptions of performance at these lower levels. Figure 1 illustrates the performance levels and definitions used by Massachusetts to report assessment results on the standard and the alternate assessments, and the relationship between the two reporting scales.
Figure 1. MCAS Performance Levels
Counting Scores Toward an Overall Performance Level
On several occasions, the task force revisited the question of which scores to count in calculating the overall level of performance. In reviewing the goals, methods, and purpose of the general assessment, they realized, in essence, that regular MCAS tests measure the ability of a student to respond to test items accurately, with no assistance from peers or from the adult(s) administering the test, and that test results are based solely on the correctness of the studentís responses.
In the end, their recommendation was to "parallel the goals, methods, and purpose of the general assessment, where possible," when no other solution is obvious. With this advice, the task force established a foundation for future decision-making, and returned to this guidance frequently.
With these assumptions about the general assessment, and the advice of the task force to parallel the general assessment where possible, the Department decided it would base alternate assessment performance levels on raw numerical portfolio scores given in the areas of completeness, complexity, accuracy, and independence only; but not on self-evaluation or generalized performance, since scores in these last two areas depended on opportunities provided to the student, not on the studentís direct performance of the skill being assessed. Scores in all rubric areas, however, would be reported to schools and parents in order to provide those who work most closely with the student detailed information on his or her performance as shown in Figure 2.
Separate scores are reported for each
strand in Level of Complexity, Demonstration of Skills and Concepts
(accuracy), and Independence, while scores in the secondary areas of
Self-Evaluation and Generalized Performance are combined for the
entire content area.
Figure 2. Excerpt of Sample Parent/Guardian Report
How Will Numerical Scores be Combined to Yield a Performance Level?
The Massachusetts Department of Education
consulted with Ed Roeber of Measured Progress to assist in developing a strategy
or formula for combining scores to obtain an overall performance level for each
content area. Over time, Dr. Roeber recommended several options for calculating
a numerical score total in each content area of a portfolio. The following were
two mathematical formulas considered by the Department:
Consider the following scenario using both scoring methods:
Because the LC score is used as a multiplier in Method #2, scores also were spread over a wider range (1-40), avoiding the possibility of overlapping totals. Method #1, on the other hand, spreads scores across a narrow range (1-13) since scores are simply added together. It was agreed that Method #2 would be explored further for its effectiveness, impact, and unintended consequences, if any.
Combining Scores to Yield a Performance Level
Dr. Roeber suggested that MCAS-Alt project leadership meet with regular assessment psychometricians and data analysts from the Department and from Measured Progress to review and select the most effective formula for calculating a total content area score, and to identify "cut scores" for specific performance levels based on a range of calculated score totals. During ensuing discussions, however, questions were raised about the necessity of generating a single total numerical score for each strand and content area in the alternate assessment, and whether it might cause confusion to introduce another, entirely different score scale beside the 200-280 score scale already in use for MCAS test results. Some felt this would reinforce the separateness of the alternate assessment and wondered instead whether a system could be developed that used reasoned judgment, instead of a calculation, to describe overall student performance based on different raw score combinations. After a spirited discussion, this reasoning prevailed, and the idea of calculating a total numerical portfolio score was abandoned in favor of a different approach.
Whether a mathematical equation or a reasoned approach is used to determine a studentís performance, however, some kind of scale, analytical rubric, or other consistent method must be used to convert raw scores to performance levels (Roeber, 2002). The analytical rubric developed for this purpose in Massachusetts is actually a series of grids based on a studentís score as shown in Figure 3.
Sixty-four different possible score combinations were discussed and analyzed by the group, and a performance level identified by consensus for each. Decisions were based on reasoned perceptions of what each score combination revealed about the studentís performance, and the relative position of that performance level within the hierarchy of other levels. It was easier to analyze and assign performance levels beginning with the lowest and highest levels, then working toward the middle. In the end, the group was able to define and categorize all score combinations. The model was tested using various arbitrary score combinations to check that the defined performance level made sense, given the studentís scores, and that scores were appropriately scaled relative to adjacent scores.
An analysis of several arbitrary score combinations reveals, for example, that a student who scores LC=3, DSC=2, and Ind=3 according to the MCAS-Alt scoring rubric, is a student who is working on modified (or "expanded") learning standards, who demonstrates 26-50% accuracy, and who needs assistance 51-75% of the time during standards-based activities (Massachusetts Department of Education, 2001). From this information, the student would appear to be performing above the definition of Awareness in this content area, but not yet at Progressing, in which the student would perform the skills and demonstrate the knowledge with greater independence and accuracy. Since this student is somewhere between the Awareness and Progressing performance levels, we can say with relative confidence that the student is at the Emerging level. Another student who hypothetically scored LC=3, DSC=3, Ind=4 is also working on modified standards, but performs with a sufficiently high rate of accuracy and independence to be placed in the Progressing performance level. He or she is probably ready to attempt even more challenging tasks, skills, and concepts in the coming year, since the data suggest he or she has mastered skills and content in the current portfolio. Figure 3 shows the complete analytical rubric for determining performance levels in each portfolio strand.
Figure 3. Analytical Rubric for Determining Performance Levels in Each Portfolio Strand
Calculating the Overall Performance Level
Once performance levels are determined for each of three required portfolio strands in the content area, based on the analytical rubric shown in Figure 3, these are averaged and rounded to the nearest whole number to determine the overall performance level in that subject. To calculate the average of three performance levels, consecutive numerical values are given to each performance level, as follows: Awareness = 1, Emerging = 2, Progressing = 3, Needs Improvement = 4, etc. Figure 4 shows how different combinations are averaged to yield a final performance level.
Figure 4. Performance Levels in Each Strand are Averaged to Determine an Overall Performance Level
Meeting the Stateís Graduation Requirement Through MCAS Alternate Assessment
A performance level of Needs Improvement or higher is required on grade 10 MCAS assessments in English Language Arts and Mathematics in order to earn a "competency determination" (the stateís requirement to receive a regular high school diploma). As previously stated, alternate assessment is one pathway to meet that requirement. Therefore, it is necessary to calibrate performance levels precisely between the alternate assessment and the general assessment, especially at the Needs Improvement level. What does a Needs Improvement portfolio look like, and what specifically constitutes a "comparable performance" to a student who was tested and earned this score? Although portfolio scorers can accurately determine a portfolioís completeness, accuracy, and independence of performance, an additional level of review seemed necessary in order to assure the breadth, quality, and comparability of the studentís performance to that of other students who passed the grade 10 MCAS tests in those subjects.
To accomplish this, the Department convenes a panel of math and English language arts content specialists each year to review a selection of grade 10 portfolios set aside for this purpose, and to make recommendations to the Department on whether these students have demonstrated achievement at or above Needs Improvement level based on the evidence in their portfolios. Panelists, themselves, were selected by the Department for their secondary-level teaching expertise in the content area; their experience serving on the stateís Assessment Development Committees that develop and review general assessment test items with the stateís test contractor; and their extensive familiarity with Massachusetts Curriculum Frameworks. Panelists are familiar with work typical of students who "passed" the grade 10 MCAS tests in ELA and Mathematics since they teach these students on a daily basis. Panel members were asked to examine pre-scored portfolios at Level of Complexity 4 and 5, and to verify whether they felt the contents:
Although the number of students each year who perform at or above the Needs Improvement level on grade 10 ELA and Math alternate assessments is relatively small, this number can be expected to grow over time. Of course, as teachers also gain familiarity with portfolio management techniques, submission requirements, curriculum alignment, and instructional improvements, the scores of all students will rise. It is important for states to demonstrate the effectiveness of their statewide alternate assessments to improve the nature of instruction for students with significant disabilities generally, and to show that these improvements translate into expanded opportunities for these students both in and out of school. It is also important to demonstrate the capacity of the alternate assessment to assist students to meet the same important scholastic requirements as other students.
Developing a statewide alternate assessment presents states with a range of difficult choices, such as how to determine participation, measure performance, and report results. The demand for professional development and technical assistance required by such a system can be intensive, and there must be an ongoing commitment by state assessment personnel to maintain communication and accessibility with the public. In the end, each state must ultimately develop an alternate assessment that reflects not only its standards, but its unique culture and values that is integrated with the standard assessment system, and that promotes the greatest benefits to the most students.
Kleinert, H., & Kearns, J. (2001). Alternate assessment: Measuring outcomes and supports for students with disabilities. Baltimore, MD: Paul H. Brookes
Massachusetts Department of Education. (1999). Participation guidelines for MCAS Alternate Assessment. Malden, MA: Author.
Massachusetts Department of Education. (2001). Rubric for scoring portfolio strands in the 2002 Educatorís Manual for MCAS Alternate Assessment. Malden, MA: Author.
Quenemoen, R., Rigney, S., & Thurlow, M. (2002). Use of alternate assessment results in reporting and accountability systems: Conditions for use based on research and practice (Synthesis Report 43). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.
Roeber, E. (2002). Setting standards on alternate assessments (Synthesis Report 42). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes.