Wiener, D. (2002). Massachusetts: One state's
approach to setting performance levels on the alternate assessment(Synthesis
Report 48). Minneapolis, MN: University of Minnesota, National Center on
Educational Outcomes. Retrieved [today's date], from the World Wide Web: http://education.umn.edu/NCEO/OnlinePubs/Synthesis48.html
Executive Summary
In Massachusetts, about one percent of all
students being assessed submit portfolios for the Massachusetts Comprehensive
Assessment System (MCAS) Alternate Assessment. These portfolios are based on
"expanded" state standards that describe academic outcomes appropriate for
students with significant disabilities. Teachers collect "evidence" of their
students’ performance on the standards during targeted instructional activities
or structured student observations to create portfolios that contain an array of
work samples, instructional data sheets, audio- and videotapes, or other
evidence organized into "portfolio strands" in each content area.
MCAS Alternate Assessments are submitted
to the state for scoring and designation of a performance level that gives
parents and teachers information on how well these students are learning the
general curriculum relative to their past performance and the performance of
other students. The process used by the Massachusetts Department of Education to
assign performance levels to alternate assessments is the focus of this report.
This technical phase, called standard setting, reflects several steps
that typically occur between scoring and reporting. However, the process
reflects theoretical debates and decisions that occurred much earlier in the
development process of the alternate assessment, sometimes years before the
first portfolio was compiled and submitted. Several of these earlier
conversations and their consequences are also described in this report since the
recommendations form the philosophical basis of much that followed.
The alternate assessment in Massachusetts
is one pathway to meet the state requirements for earning a "competency
determination" needed to receive a regular high school diploma. Therefore, it
was necessary to calibrate performance levels precisely between the alternate
assessment and the general assessment, especially at the Needs Improvement
level, which is the level required to earn the competency determination.
Massachusetts decided to use an analytical rubric to convert raw scores to
performance levels. Combinations of scores that could be obtained across the
alternate assessment scoring rubrics for Level of Complexity, Demonstration of
Knowledge and Skills, and Independence were discussed and reasoned perceptions
were used to assign performance levels of Awareness, Emerging, Progressing, and
Needs Improvement or above (Proficient, Advanced) in each portfolio strand. The
reasoning behind the Massachusetts approach and the ways in which performance
levels in each strand are combined to produce an overall performance level is
described further in the report. This approach reflects not only the
Massachusetts standards, but also its unique culture and values.
Overview
States are finding different ways to adapt
their accountability systems to include all students, because the achievement of
students with disabilities has typically lagged behind that of their
non-disabled peers, and because recent state and federal laws require the
participation of all students. Special educators are considering how,
rather than whether, students with disabilities will participate in
statewide assessments, while assessment policies themselves have become more
flexible in accommodating the administration of those tests. Curriculum experts
are placing increased emphasis on teaching students with disabilities the same
content and skills being taught to their non-disabled peers, while regular and
special educators are working together to adapt curriculum and instruction so
diverse learners may participate more fully in academic activities.
A comparatively small number of students
with the most complex and significant disabilities, though, have been more
difficult to include in statewide assessments. Academic skills and subject
matter have not always been a part of the curriculum for this population, and
information has not systematically been collected on what these students have
learned. The performance of these students is not easily determined using the
same standardized paper-and-pencil tests used with the majority of students, but
since participation in these assessments is now required, states have had to
decide how best to include these students by giving them "alternate
assessments." Alternate assessment methods and formats are determined by each
state individually, though their common purpose is to improve instruction for
these students and report their academic performance. By using alternate
assessments with this population, schools can document what is being taught for
purposes of system accountability, and demonstrate to parents and the public to
what degree each of these students has learned state standards.
A majority of states have adopted
individual academic portfolios as the most effective method of assessment for
these "difficult-to-assess" students. Student portfolios accommodate a range of
approaches to document learning, and afford teachers options for determining the
ideal time, place, and method to assess their students. Portfolios provide
teachers, students, and their parents with tangible evidence of student
performance and feedback on their progress. While the contents of each is
unique, their structure allows for evaluation and scoring using uniform criteria
that can be shared with teachers beforehand.
The demands of creating and managing
portfolios, and compiling this information for submission to the state, however,
requires additional expertise on the part of teachers and time in which to
complete this work. This fundamental change in classroom practice has required
states to make a strong and continued commitment to provide professional
development and technical assistance to educators who conduct alternate
assessments, and to engage in an open dialogue about the efficiency, rigor, and
usefulness of the process with those who are most affected by it.
Alternate
Assessment in Massachusetts
In Massachusetts, about 5,000 students, or
one percent of all students being assessed, submit portfolios for the
Massachusetts Comprehensive Assessment System (MCAS) Alternate Assessment. In
creating portfolios, their teachers must first identify challenging outcomes for
each student based on the standards in each content area being assessed. Many
states, including Massachusetts, use an "expanded" version of their standards
that describes academic outcomes that are appropriate for students with
significant disabilities. Teachers then collect "evidence" of their students’
performance on those standards during targeted instructional activities or
structured student observations. Portfolios may contain an array of work
samples, instructional data sheets, audio- and videotapes, and other evidence
organized into "portfolio strands" in each content area.
Once MCAS Alternate Assessments are
submitted to the state, these are scored and a performance level assigned in
each content area so parents and teachers have information on how well these
students are learning the general curriculum relative to their past performance
and the performance of other students. The process used by the Massachusetts
Department of Education to assign performance levels to alternate assessments is
the focus of this report. This technical phase, called standard setting,
reflects several steps that typically occur between scoring and reporting
(Quenemoen, Rigney, & Thurlow, 2002). However, the process reflects theoretical
debates and decisions that occurred much earlier in the development process of
the alternate assessment, sometimes years before the first portfolio was
compiled and submitted. Several of these earlier conversations and their
consequences are also described in this report since the recommendations form
the philosophical basis of much that followed. First among these conceptual
discussions was defining who should take an alternate assessment.
A Diverse Group of Advisors
Late in 1998, the Massachusetts Department
of Education began convening regular task force meetings comprised of DOE staff
(from Special Education and Assessment units), the contractor team (Measured
Progress and the ILSSA group at the University of Kentucky), and the
Massachusetts Alternate Assessment Advisory Committee (a diverse stakeholders
group) who provided recommendations to the Department on a range of assessment
issues, including:
how to provide guidance to IEP
teams about which students to consider for alternate assessments;
what alternate assessments should
look like;
how alternate assessments should be
scored;
which scores should "count" toward
overall performance; and
how to describe and report the
performance of students who take alternate assessments.
Guidelines for IEP Teams: Who Should Take
Alternate Assessments?
It was assumed from the beginning that
students who needed alternate assessments were, for the most part, those who
could not take paper-and-pencil tests and whose academic performance was based
on the expanded standards appropriate for students with significant
disabilities. However, the task force also identified students whose
disabilities were not primarily cognitive whom they felt should also be
considered for alternate assessments by their IEP and 504 teams. Generally, this
smaller group of identified students had disabilities that presented them with
"unique and significant challenges" to participation in standardized statewide
testing regardless of the accommodations they could use on those tests. They
recommended, for example, that students with severe behavioral and emotional
disabilities, or those with cerebral palsy, sensory impairments (deaf, blind, or
deaf and blind), or fragile health and medical conditions should also be
considered for alternate assessments, regardless of their levels of academic
performance since taking on-demand statewide tests could present them with
insurmountable barriers to their participation, and therefore deny them access
to the assessment (Massachusetts Department of Education, 1999).
Based on guidelines provided to
Massachusetts IEP Teams since 1999, students across the full spectrum of
academic performance, then, are eligible to take alternate assessments, even
when they are able to demonstrate the same (or higher) levels of performance as
a tested student. They simply require an alternate assessment format to
demonstrate their knowledge and skills. Therefore, the MCAS reporting system
required sufficient flexibility and integrity to provide meaningful feedback on
students who demonstrate a "comparable performance" to a student who scores at
the highest levels on the standard tests. It also became necessary to
incorporate a method by which a student could meet the state’s graduation
requirement through an alternate assessment. The task force strongly advised
that the alternate assessment be a different, though not easier, pathway to
demonstrate the same performance as a tested student.
Scoring Alternate Assessments
The task force next considered and
selected criteria on which to base the scores of alternate assessment
portfolios. They advised the Department to develop criteria based primarily on
student performance, since that is what the standard assessment measured, rather
than assessing how well the student’s program provided opportunities to learn
this material. Some on the task force, however, felt that student achievement
could not be separated from program effectiveness. In the end, a scoring rubric
was developed in which four out of six categories are based on student
performance, and two reflect the effectiveness of the student’s program:
Completeness of the portfolio
Level of complexity: the difficulty
of academic tasks and knowledge attempted by the student
Demonstration of Skills and
Concepts: the accuracy of the student’s performance
Independence: cues, prompts, and
other assistance required by the student to perform the tasks or
activities
Self-evaluation: the extent to
which opportunities are provided to reflect, set goals, evaluate, and
monitor the student’s own performance
Generalized Performance: the number
of contexts and instructional approaches provided to the student to
perform tasks and demonstrate knowledge
Scores are determined and reported in each
of the rubric areas listed above. Once numerical scores are obtained for a
portfolio in these rubric areas, raw scores must somehow be combined to identify
an overall performance level in the content area. Before performance levels can
be determined, however, several important questions must be answered:
What will each performance level be
called; how many performance levels will there be; and how will each be
defined?
Which numerical scores in which
rubric areas will be counted in determining the overall performance
level?
How will numerical scores in those
rubric areas be combined to yield a performance level?
What range or combination of scores
will yield a particular performance level?
Defining Performance Levels
The task force recommended that
performance levels be identical to performance levels on standard MCAS tests;
but that the lowest performance level, called "Warning/Failing at Grade 10" for
tested students, would be sub-divided into three distinct levels in order to
provide more meaningful descriptions of performance at these lower levels.
Figure 1 illustrates the performance levels and definitions used by
Massachusetts to report assessment results on the standard and the alternate
assessments, and the relationship between the two reporting scales.
Figure 1. MCAS Performance Levels

Counting
Scores Toward an Overall Performance Level
On several occasions, the task force
revisited the question of which scores to count in calculating the overall level
of performance. In reviewing the goals, methods, and purpose of the general
assessment, they realized, in essence, that regular MCAS tests measure the
ability of a student to respond to test items accurately, with no assistance
from peers or from the adult(s) administering the test, and that test results
are based solely on the correctness of the student’s responses.
In the end, their recommendation was to
"parallel the goals, methods, and purpose of the general assessment, where
possible," when no other solution is obvious. With this advice, the task force
established a foundation for future decision-making, and returned to this
guidance frequently.
With these assumptions about the general
assessment, and the advice of the task force to parallel the general assessment
where possible, the Department decided it would base alternate assessment
performance levels on raw numerical portfolio scores given in the areas of
completeness, complexity, accuracy, and independence only; but not
on self-evaluation or generalized performance, since scores in these last two
areas depended on opportunities provided to the student, not on the student’s
direct performance of the skill being assessed. Scores in all rubric
areas, however, would be reported to schools and parents in order to provide
those who work most closely with the student detailed information on his or her
performance as shown in Figure 2.
Separate scores are reported for each
strand in Level of Complexity, Demonstration of Skills and Concepts
(accuracy), and Independence, while scores in the secondary areas of
Self-Evaluation and Generalized Performance are combined for the
entire content area.
Figure 2. Excerpt of
Sample Parent/Guardian Report

How Will Numerical Scores be Combined to
Yield a Performance Level?
The Massachusetts Department of Education
consulted with Ed Roeber of Measured Progress to assist in developing a strategy
or formula for combining scores to obtain an overall performance level for each
content area. Over time, Dr. Roeber recommended several options for calculating
a numerical score total in each content area of a portfolio. The following were
two mathematical formulas considered by the Department:
Method #1 - Calculate the sum of
scores in three rubric areas:
LC + DSC + Ind = Total Score
Method #2 - Multiply LC by the sum
of the other two rubric areas:
LC x (DSC + Ind) = Total Score
Key
LC = Level of Complexity
DSC = Demonstration of Skills and Concepts
Ind = Independence
Consider the following scenario using both
scoring methods:
| Student A |
Student B |
| Raw Scores: |
Raw Scores: |
| LC=3 |
LC=2 |
| DSC=3 |
DSC=4 |
| I=3 |
I=4 |
| Student A Total Score (Method #1) = 9 |
Student B Total Score (Method #1) = 10 |
| Student A Total Score (Method #2) = 18 |
Student B Total Score (Method #2) = 16 |
Using Method #1, Student A scored lower (9) than Student B (10), although
Student A worked on more challenging subject matter (LC=3) than Student B
(LC=2). Using Method #2, on the other hand, Student A scored higher (18)
than Student B (16), thereby rewarding Student A for attempting more challenging
material. For certain score combinations, Method #1 appeared to create a
disincentive for students to attempt increasingly complex skills and content,
and discouraged teachers from providing more challenging instruction to their
students, which was certainly not the intent of the alternate assessment.
Because the LC score is used as a
multiplier in Method #2, scores also were spread over a wider range (1-40),
avoiding the possibility of overlapping totals. Method #1, on the other hand,
spreads scores across a narrow range (1-13) since scores are simply added
together. It was agreed that Method #2 would be explored further for its
effectiveness, impact, and unintended consequences, if any.
Combining Scores to Yield a Performance Level
Dr. Roeber suggested that MCAS-Alt project
leadership meet with regular assessment psychometricians and data analysts from
the Department and from Measured Progress to review and select the most
effective formula for calculating a total content area score, and to identify
"cut scores" for specific performance levels based on a range of calculated
score totals. During ensuing discussions, however, questions were raised about
the necessity of generating a single total numerical score for each strand and
content area in the alternate assessment, and whether it might cause confusion
to introduce another, entirely different score scale beside the 200-280 score
scale already in use for MCAS test results. Some felt this would reinforce the
separateness of the alternate assessment and wondered instead whether a system
could be developed that used reasoned judgment, instead of a calculation, to
describe overall student performance based on different raw score combinations.
After a spirited discussion, this reasoning prevailed, and the idea of
calculating a total numerical portfolio score was abandoned in favor of a
different approach.
Whether a mathematical equation or a
reasoned approach is used to determine a student’s performance, however, some
kind of scale, analytical rubric, or other consistent method must be used to
convert raw scores to performance levels (Roeber, 2002). The analytical rubric
developed for this purpose in Massachusetts is actually a series of grids based
on a student’s score as shown in Figure 3.
Sixty-four different possible score
combinations were discussed and analyzed by the group, and a performance level
identified by consensus for each. Decisions were based on reasoned perceptions
of what each score combination revealed about the student’s performance, and the
relative position of that performance level within the hierarchy of other
levels. It was easier to analyze and assign performance levels beginning with
the lowest and highest levels, then working toward the middle. In the end, the
group was able to define and categorize all score combinations. The model was
tested using various arbitrary score combinations to check that the defined
performance level made sense, given the student’s scores, and that scores were
appropriately scaled relative to adjacent scores.
An analysis of several arbitrary score
combinations reveals, for example, that a student who scores LC=3, DSC=2, and
Ind=3 according to the MCAS-Alt scoring rubric, is a student who is working on
modified (or "expanded") learning standards, who demonstrates 26-50% accuracy,
and who needs assistance 51-75% of the time during standards-based activities
(Massachusetts Department of Education, 2001). From this information, the
student would appear to be performing above the definition of Awareness
in this content area, but not yet at Progressing, in which the
student would perform the skills and demonstrate the knowledge with greater
independence and accuracy. Since this student is somewhere between the
Awareness and Progressing
performance levels, we can say with relative confidence that the student is at
the Emerging
level. Another student who hypothetically scored LC=3, DSC=3, Ind=4 is also
working on modified standards, but performs with a sufficiently high rate of
accuracy and independence to be placed in the Progressing performance
level. He or she is probably ready to attempt even more challenging tasks,
skills, and concepts in the coming year, since the data suggest he or she has
mastered skills and content in the current portfolio. Figure 3 shows the
complete analytical rubric for determining performance levels in each portfolio
strand.
Figure 3. Analytical Rubric for Determining
Performance Levels in Each Portfolio Strand
Calculating the Overall Performance Level
Once performance levels are determined for
each of three required portfolio strands in the content area, based on the
analytical rubric shown in Figure 3, these are averaged and rounded to the
nearest whole number to determine the overall performance level in that subject.
To calculate the average of three performance levels, consecutive numerical
values are given to each performance level, as follows: Awareness
= 1, Emerging = 2, Progressing = 3, Needs Improvement = 4,
etc. Figure 4 shows how different combinations are averaged to yield a final
performance level.
Figure 4. Performance Levels in Each Strand
are Averaged to Determine an Overall Performance Level
Student |
Portfolio Strand |
Performance Level |
#1 |
#2 |
#3 |
A |
Aw (1) |
Aw (1) |
Em (2) |
Awareness (ave. 1.33) |
B |
Aw (1) |
Em (2) |
Em (2) |
Emerging (ave. 1.67) |
C |
Em (2) |
Pg (3) |
NI (4) |
Progressing (ave. 3.0) |
Meeting the
State’s Graduation Requirement Through MCAS Alternate Assessment
A performance level of Needs
Improvement or higher is required on grade 10 MCAS assessments in English
Language Arts and Mathematics in order to earn a "competency determination" (the
state’s requirement to receive a regular high school diploma). As previously
stated, alternate assessment is one pathway to meet that requirement. Therefore,
it is necessary to calibrate performance levels precisely between the alternate
assessment and the general assessment, especially at the Needs Improvement
level. What does a Needs Improvement portfolio look like, and what
specifically constitutes a "comparable performance" to a student who was tested
and earned this score? Although portfolio scorers can accurately determine a
portfolio’s completeness, accuracy, and independence of performance, an
additional level of review seemed necessary in order to assure the breadth,
quality, and comparability of the student’s performance to that of other
students who passed the grade 10 MCAS tests in those subjects.
To accomplish this, the Department
convenes a panel of math and English language arts content specialists each year
to review a selection of grade 10 portfolios set aside for this purpose, and to
make recommendations to the Department on whether these students have
demonstrated achievement at or above Needs Improvement level based on the
evidence in their portfolios. Panelists, themselves, were selected by the
Department for their secondary-level teaching expertise in the content area;
their experience serving on the state’s Assessment Development Committees that
develop and review general assessment test items with the state’s test
contractor; and their extensive familiarity with Massachusetts Curriculum
Frameworks. Panelists are familiar with work typical of students who
"passed" the grade 10 MCAS tests in ELA and Mathematics since they teach these
students on a daily basis. Panel members were asked to examine pre-scored
portfolios at Level of Complexity 4 and 5, and to verify whether they
felt the contents:
document the full range of learning
standards, covering knowledge and skills tested on grade 10 MCAS tests
in the content area;
demonstrate a level of performance
typical of students who perform at the Needs Improvement level on
the MCAS test in that subject; and
exemplify an even higher
performance level than Needs Improvement; for example,
Proficient or Advanced.
Conclusion
Although the number of students each year
who perform at or above the Needs Improvement level on grade 10 ELA and
Math alternate assessments is relatively small, this number can be expected to
grow over time. Of course, as teachers also gain familiarity with portfolio
management techniques, submission requirements, curriculum alignment, and
instructional improvements, the scores of all
students will rise. It is important for states to demonstrate the effectiveness
of their statewide alternate assessments to improve the nature of instruction
for students with significant disabilities generally, and to show that these
improvements translate into expanded opportunities for these students both in
and out of school. It is also important to demonstrate the capacity of the
alternate assessment to assist students to meet the same important scholastic
requirements as other students.
Developing a statewide alternate
assessment presents states with a range of difficult choices, such as how to
determine participation, measure performance, and report results. The demand for
professional development and technical assistance required by such a system can
be intensive, and there must be an ongoing commitment by state assessment
personnel to maintain communication and accessibility with the public. In the
end, each state must ultimately develop an alternate assessment that reflects
not only its standards, but its unique culture and values that is integrated
with the standard assessment system, and that promotes the greatest benefits to
the most students.
References
Kleinert, H., & Kearns, J. (2001).
Alternate assessment: Measuring outcomes and supports for students with
disabilities. Baltimore, MD: Paul H. Brookes
Massachusetts Department of Education.
(1999). Participation guidelines for MCAS Alternate Assessment. Malden,
MA: Author.
Massachusetts Department of Education.
(2001). Rubric for scoring portfolio strands in the 2002 Educator’s
Manual for MCAS Alternate Assessment. Malden, MA: Author.
Quenemoen, R., Rigney, S., & Thurlow, M.
(2002).
Use of alternate assessment results in reporting and accountability
systems: Conditions for use based on research and practice (Synthesis
Report 43). Minneapolis, MN: University of Minnesota, National Center on
Educational Outcomes.
Roeber, E. (2002). Setting standards on
alternate assessments (Synthesis Report 42). Minneapolis, MN: University of
Minnesota, National Center on Educational Outcomes.