Return to: U of M Home

Skip to main content.University of Minnesota.

One Stop | Directories | Search U of M

CAREIResearch Practice Newsletter Archive

Center for Applied Research and Educational Improvement (CAREI)
275 Peik Hall - 159 Pillsbury Dr. SE - Minneapolis MN 55455
Tel: 612-624-0300 - Fax: 612-625-3086

What's inside.

Volume 2, Number 1

In this issue:

From the Director

Performance Assessments of Critical Thinking

CAREI Seen from the Other Side of Pillsbury Drive

 

 

CAREI > Research/Practice Newsletter

Performance Assessments of Critical Thinking: The Reflective Judgment Approach

by Richard Christen and James Angermeyer, Independent District 196, Rosemount, Minn., and 

Mark L. Davison, College of Education, University of Minnesota,  and 

Karen Anderson, College of Education, University of Minnesota

Abstract

We developed a performance assessment of critical thinking based on a developmental theory of how students reason about complex problems that may not have a single right or wrong answer. The assessment was designed not as a single test, but as a format that can be adapted to various subject areas and scored with a common rubric.

In pilot research with high school social studies students durin1992-93, rater agreement was quite respectable for scores based on a single task, correlations with a multiple-choice measure were low, scores increased by grade, and teachers were generally satisfied with the measure. During 1993-94, District 169 is extending the assessment on a trial basis to high school science courses and, with the addition of graphic organizers, to a middle school.

Should Columbus be remembered as a great person whose landing in America transformed the course of history? Or was he a villain who inaugurated five centuries of suffering for America's native populations? Is the Endangered Species Act a valuable instrument in the fight to preserve threatened wild life and plant life? Or is it primarily a source of job loss and economic ruin for parts of the United States?

StudentsEducators generally consider the ability to think critically about issues such as these to be one of the most important educational outcomes. But what exactly is critical thinking? Can so complex a concept be reliably and meaningfully assessed in the classroom? Although descriptions of thinking skills are varied and often overlap, the selection of a particular assessment methodology should be based as closely as possible on how thinking is defined. In addition, if critical thinking assessment is to drive instruction, both the definition and assessment of critical thinking should mirror activities and tasks that classroom teachers value and understand. While a multiple choice measure like the Cornell Critical Thinking Test (Ennis and Millman, 1985) may have a solid theoretical basis and sound technical characteristics, it does not lend itself to the design of instructional strategies, and the task of taking such a test does not reflect the kinds of skills that classroom teachers expect in strong critical thinkers.

Stephen Norris and Robert Ennis define critical thinking as reasonable and reflective thinking that is focused upon deciding what to do or believe 1989, 1). Stated in this way, one can expect students to use critical thinking skills to reach a conclusion based on available information. According to Norris and Ennis, critical thinking is demonstrated by the ability to: (1) judge the soundness of information, (2) assess conclusions, and (3) make good inferences.

The Norris and Ennis definition helps clarify the assessment task. Rather than being the ability to solve "puzzles" (Churchman, 1971), critical thinking becomes the ability to make decisions or to reach conclusions about problems and issues that may not always have a single correct answer. Furthermore, one can easily envision assessment activities that would allow students to perform these kinds of tasks in a way that encourages classroom practices. A written essay, for example, in which the student is asked to sift through conflicting points of view and cogently defend a position would require many of the skills embodied in the Norris and Ennis definition and also practice reading, writing, and problem-solving skills that teachers value. 

Students with TeacherReflective judgment, a concept of critical thinking encompassed by the Norris and Ennis definition, is the basis for a developmental theory proposed by Karen Kitchener and Patricia King (King and Kitchener, 1993; Davison, King, and Kitchener, 1990). According to this theory, individuals proceed through a specified sequence of cognitive growth in which they show characteristic changes along four dimensions. By responding to ill-structured problems, i.e., problems with no clear right or wrong answers, students demonstrate their maturing capabilities along these dimensions: (1) Use of Evidence; (2) Use of Authority; (3) Plausibility of the Argument; and (4) Evaluation. Assessments based on the Reflective Judgment model can be scaled to show changes over time, providing valuable information from the standpoint of both testing and instruction.

Using a Reflective Judgment model, Independent School District 196 Rosemount, Apple Valley, Eagan has designed and tested high school-level critical thinking assessments in an attempt to answer these questions:

  • Can a reliable assessment and scoring system be designed for reflective judgment?
  • How do scores on this new measure compare to established critical thinking measures?
  • Do the average scores on the reflective judgment measure increase as students move from ninth grade to twelfth grade?
  • How do teachers feel about the utility of the reflective judgment measure?

Method

A committee of high school teachers designed tasks around four ill-structured problems, which roughly corresponded with general themes discussed at the four levels of the school district's social studies curriculum. The issues were election campaign financing, in the ninth-grade civics/U.S. government course; the quincentennial of Columbus' first voyage to America, in the tenth-grade U.S. history course; the collapse of the Roman Empire, in grade eleven world history; and the effects of working parents on children, in the twelfth-grade sociology/political science course. Figure 1 shows the Columbus controversy as it was posed to tenth-grade students.

Figure1

Columbus and the Quincentennial: How Should They be Viewed?

The year 1992 is the quincentenary or 500th anniversary of Columbus' first voyage to America. This event has sparked a controversy over the proper place of Columbus in history and the appropriateness of "celebrating" his voyages. While many view Columbus as one of the greatest individuals of modern times, others call for a recognition of the pain and suffering Columbus brought to Native Americans.

Task

The authors of the following articles offer a variety of viewpoints on the significance of Columbus and the appropriateness of celebrating the quincentennial of his first voyage. After reading these arguments, write an essay expressing your thinking about Columbus and the celebration of his quincentennial.

Be sure to explain your answer using the authorities and evidence cited in the following articles and any other authorities and evidence that you think is related.

Classrooms presumed to represent the general population of the district were selected from each of the three high schools, with approximately 75 students participating in each grade. Students were given two days of class time to read the material and to write their essays. They received no specific preparation prior to the assessment and were not to take the material home at the end of the first day. Later, the students also completed the Cornell Critical Thinking Test.

Scoring

The rubric used to score the essays underwent several revisions. The initial model used an analytical scoring approach in which the four dimensions of the Reflective Judgment model were described at three different stages of development. After field testing, the four dimensions were combined into the holistic scoring rubric shown in Figure 2. A five-point scale was defined with descriptions at scale points one, three and five. This holistic scoring approach has been used successfully with student writing samples (Northwest Evaluation Association, 1992).

To train the readers, six sample papers representing various levels of the rubric were selected by the project directors. Readers were given a chance to read and rate the papers against the rubric criteria. In the discussion that followed, readers asked questions about the meaning of various phrases in the scoring rubric and made recommendations to change some of the wording. Discussions continued until the group was comfortable with the criteria and the rating scale. This process has been used successfully by the project directors to train readers for other student performance-based assessments (Independent District 196, 1992).

In the study itself, each reader scored a packet of approximately 25 papers, with essays from all grades mixed together. Although the reader would obviously know the student's grade from the paper's content, it was believed that scoring them together would encourage readers to apply the same standards to all papers. Each paper was scored by two readers. Whenever the two scores were within one point, they were considered in agreement. Papers with scores that differed by more than one point were read by a third reader who determined a final set of scores.

Figure 2

Reflective Judgment Scoring Criteria

Level 5

  • The student draws extensively on evidence presented in the articles to support the conclusion. The conclusion makes coherent use of the evidence.
  • Significant recognition of authority; the student may have made some attempt to consider the authority's credibility or point of view.
  • The student recognizes both sides of the issue and is able to weigh the pros and cons of both sides; recognizes the strength and limitations of each position in taking a stand.

Level 4

Level 3

  • The student has made a limited effort to use evidence from the articles to support the argument; the evidence may not support the conclusions or may be used somewhat incoherently.
  • Some identification or recognition of authority, but little development.
  • The student recognizes that another side of the issue exists, but finds support for only his or her side; may tend to build up his or her argument by tearing down the other side.
  • May be a "laundry list", citing much evidence both pro and con, but student makes no attempt to take a side and make a decision.

Level 2

Level 1

  • The student does not cite evidence from the articles.
  • No identification or recognition of authority, i.e., those responsible for the arguments used.
  • Student sees only one side of the argument; no evaluation attempted.
  • No evidence that the student used the articles; could have written the essay by only skimming the articles.

Level 0

  • Non-scorable response

Results

In this study, we examined agreement among raters, the relationship between the performance based Reflective Judgment Performance Assessment and the multiple-choice Cornell Critical Thinking Test, grade trends on both the Reflective Judgment Performance Assessment and the Cornell Critical Thinking Test, and teacher's opinions of the Reflective Judgment Performance Assessment. Reported results were statistically significant (p < .05 except where noted).

Interrater Agreement

Each pair of scores was categorized as being in complete agreement when raters gave identical ratings, and it was categorized as being in close agreement when there was a difference of one point. Out of 302 papers rated, raters completely agreed on 149 (50%), closely agreed on 136 more (45%), and disagreed on only 17 (5%). The interrater reliability correlation between rater pairs was .75 for ninth grade, .74 for tenth grade, .59 for eleventh, and .58 for twelfth. To see if some raters might be systematically more lenient or harsh than others, we compared the mean rating of each rater with that of the raters with whom they were paired. The largest mean difference between a rater and those with whom s/he was paired was about half a point.

Relations to the Cornell Critical Thinking Test

The correlations between the Cornell Critical Thinking Test and the Reflective Judgment Performance Assessment were low. Within each grade, the correlations were .25 for ninth grade, .29 for tenth, .30 for eleventh, and .18 for twelfth, none of which was significant. These measures seem to share no more than 10 percent of their variance in common and cannot be said to measure the same construct.

Grade Trends

Figures 3a and b show the changes in mean Reflective Judgment Performance Assessment and Cornell Critical Thinking Test scores across grades. Figure 4 shows how the proportion of students scoring at each level of the scoring rubric changed from ninth to twelfth grade.

Figure 3a

Mean scores on the Reflective Judgment Performance Assessment increased across grades. Expressed on the five-point scale of the scoring rubric, the means for grades 9, 10, 11, and 12 respectively were 1.94, 2.22, 2.75, and 2.61. The decline from eleventh to twelfth grade was not statistically significant. Based on the mean scores and the Level 3 scoring criteria in Figure 2, it appears the teachers judged the average eleventh- and twelfth-grade performance as limited in use of evidence, use of authority, and recognition of alternative points of view, with little attempt to take a side or make a decision.

Figure 3b

On the Cornell Critical Thinking Test, means also generally increased across grades, except for a nonsignificant drop from eleventh to twelfth grade. At grades 9, 10, 11, and 12, the mean Cornell Critical Thinking Test scores were 43.96, 46.53, 49.69, and 48.60.

Although the correlations between the Reflective Judgment Performance Assessment and the Cornell Test were low within grades, the trends in mean scores across grades were very similar. The low correlations mean that within a grade, the students who performed best on the Reflective Judgment Performance Assessment grade were not necessarily the same students who performed best on the Cornell Critical Thinking Test.

Figure 4

 

Teacher's Opinions

While not always satisfied with the particular issues around which the assessments were constructed, the thirteen social studies teachers generally believed the essay format allowed students to display their critical thinking; the scoring rubric was clear, appropriate and easy to use; and the assessment model would be appropriate for other disciplines.

Conclusions and Future Plans

The low correlation between the Cornell Critical Thinking Test and the Reflective Judgment Performance Assessment suggests that they measure different forms of critical thinking. Teachers strongly supported the performance-based format, however, and were convinced that the assessment measured important aspects of critical thinking that should be taught and measured. Accordingly, the field test of the Reflective Judgment Performance Assessment will be continued into a second year in school district 196, with several adjustments to both the measurement itself and to the field test model.

Although the credibility or bias of authorities is an important scoring criterion, few students actually considered it during the original field test. Two minor changes address this shortcoming: (1) a brief statement reminding students to take into account the source of evidence and the point of view of the author will be added to the instructions, and (2) more detailed biographical information on the authors will be provided.

To gather additional data, the second-year field test will be administered in science as well as in social studies classes. A newly designed testing model will also be used. A sample of science and social studies students from all four grade levels and each district high school will perform the same task in an attempt to more clearly identify the reasons for improved student mean scores from ninth to twelfth grades.

In the original field test, interrater agreement and reliability were very respectable for an assessment based on a single performance. To test an alternative scoring format during the second year, a number of assessments will be scored by one teacher provided with the rubric but no training as well as by two trained teachers. A sample of students also will complete two different assessments in order to provide data on the consistency of performance across tasks.

Based on the success of the assessments at the high school, the study will be expanded to the middle schools during the second year. A sample of middle school students from grades six, seven, and eight will complete an assessment examining the causes of dinosaur extinction. The task format and rubric will be identical to that used in the high school, but some students will be provided with graphic organizers in order to test their impact on performance. As this extension to the middle school illustrates, the format is flexible enough for adaptation to various grades and subject matters. Because of this flexibility, the Reflective Judgment model has the potential to serve not only as a basis for critical thinking assessment, but also as an impetus for curriculum improvement organized around tasks and activities valued by teachers in various disciplines.

References

Churchman, C.W. 1971. The design of inquiring systems: Basic concepts of systems and organizations. New York: Basic.

Davidson, M.L., King, P.M., Kitchener, K.S. 1990. Developing reflective thinking and writing. In R. Beach and S. Hynds, eds., Developing disclosure practices in adolescence and adulthood. New York: Ablex.

Ennis, R.H., Millman, J. 1985. Cornell critical thinking test - level X, 3d ed. Pacific Grove, Calif.: Midwest Publications.

Independent District 196. 1992. Performance indicators of achievement: Interim report no. 2 - writing assessment results. Unpublished report, Rosemount, Minn.

King, P.M., Kitchener, K.S. 1993. The development of reflective judgment in adolescence and adulthood. San Francisco: Jossey-Bass.

Norris, S.P., Ennis, R.H. 1993.  Evaluating critical thinking. Pacific Grove. Calif.: Midwest Publications.

Paul, R., Binker, A.J.A., Martin, D., Vetrano, C., Kreklau, H. 1989. Critical thinking handbook: A guide for remodeling lesson plans in language arts, social studies, and science. Rohnert Park, Calif.: Center for Critical Thinking and Moral Critique.

 

 

©2006 Regents of the University of Minnesota. All rights reserved.

Contact CAREI Webmaster | Contact U of M | Privacy

The University of Minnesota is an equal opportunity educator and employer.

Last modified on September 17, 2009

©2000-2006 Regents of the University of Minnesota. All rights reserved.
The University of Minnesota is an equal opportunity educator and employer.
Last modified on September 17, 2009