

Equity,
Assessment, and Thinking Mathematically: Richard Lesh Mark Hoover Anthony E. Kelly 

Traditional approaches to mathematics assessment have favored the design of problems that are easy to score, but that deny individual students the opportunity to demonstrate mathematical problem solving. In this way, traditional problems may be serving to screen out students who are poor at computation and memorization, but may have mathematical abilities that are not being properly elicited. The focus of this paper will be a description of six principles for the design of what we will call modeleliciting activities, whose goal is to promote greater equity in mathematics assessment by spurring, nurturing, and supporting mathematical contributions from a larger pool of students. According to the Mathematical Sciences Education Board, characteristics which distinguish mathematics from other domains of knowledge can be summarized as follows: (i) doing "pure" mathematics means investigating patterns (or systems) for their own sake, by constructing and transforming them in structurally interesting ways, and by studying their structural properties, and (ii) doing "applied" mathematics means using patterns (or systems) as models (or structural metaphors) to describe, explain, predict, or control other systems. Yet, when studies have investigated the alignment of nationally significant standardized tests with the NCTM Standards, their conclusions have been consistent and discouraging (National Research Council, 1990; Romberg, Wilson, & Khaketla, 1991), When new conceptual and procedural tools are used for new purposes in new types of problem solving situations, past conceptions of mathematical ability are often far too narrow, lowlevel, and restricted to use as a basis for identifying students whose mathematical abilities should be recognized and encouraged. To illustrate the contrasts between traditional and more current views of assessment in mathematics, consider the following table.


Assumptions
Underlying Alternative Approaches to Assessment




At the national level, our foremost problem is not to screen talent; it is to identify and nurture capable students. As long as we continue to collapse all achievements and abilities into a single score or letter grade, discrimination is inevitable, since many talented students will be misclassified (Lesh & Lamon, 1992). Research in mathematics education offers overwhelming evidence that: (i) there are many alternative types of mathematical talent, (ii) many different kinds of personalities, knowledge, and capabilities can lead to success, (iii) many different types of success are possible, and (iv) most people have irregular profiles of expertise, with strengths in some areas and weaknesses in others. In ethnographic studies investigating the mathematical capabilities of shoppers, tailors, carpenters, street vendors, and others (e.g., Lave, 1988; Saxe, 1990; Carraher et al., 1985), it has become clear that most people's "school math" abilities operate relatively independently from their "real math" abilities, and that failure or success in one area does not guarantee failure or success in the other. For example, Resnick & Resnick (1987) focus on the following reasons why traditional textbooks, teaching, and tests have been inconsistent with "real life" problem solving and decision making. School learning (i) emphasizes individual cognition, while learning in everyday contexts tends to be a cooperative enterprise; (ii) stresses "pure thought," while the outside world makes heavy use of toolaided learning; (iii) emphasizes the manipulation of abstract symbols, while nonschool reasoning is heavily involved with objects and events; (iv) tends to be generalized, while the learning required for onthejob competency tends to be situation specific. They conclude that "... school work draws on only a limited aspect of intelligence, ignoring many of the intelligences needed for vocational success, especially in the more prestigious vocations" (p. 21). Mathematical Problem Solving Involves Modeling Cycles Our study of actual problem solving in mathematics suggests that the more traditional "getoneanswerinoneminuteorless" tasks do not capture the problem solving that we wish to promote. To illustrate the types of modeling cycles which tend to occur when problem solvers engage with modeleliciting problems, consider a problem that was discussed in one of our teacher groups:




The solution that follows illustrates the kind of "modeling cycles" that one typical group of teachers went through while dealing with the problem. The First Modeling Cycle. The group's first idea was to try to calculate a single "average score" for each student. They spent ten minutes reading and recording the numerical data in a spreadsheet table. One member of the group read the data; another member typed; and, the third member monitored and checked for data entry errors. Then, one member showed the others how to calculate an average for each row of scores. The Second Modeling Cycle. After the preceding calculations were completed, the teachers began sorting students into groups based on the averages that were given. But disagreements arose about where some students should be assigned; in the course of these discussions, the group began to think more deeply about what the preceding scores meant in terms of whether a student was doing well or not doing well at a given grade level. For example, a score of 7.0 at the sixth grade means something quite different from a score of 7.0 at the eighth grade. As a result of these discussions, the original data were converted to "grade level equivalent" scores (as shown below):




Next, average scores were again calculated for each student, and the teachers again began to sort students into three groups, a high ability group, a middle ability group, and a low ability group. But disagreements arose again about where some students should be assigned. One of the teachers pointed out that "It doesn't make sense to use reading scores to sort students into math groups!" The Third Modeling Cycle. As a result of the preceding discussions, a new table was created in which the reading scores were deleted (as shown below). Then, because the spreadsheet was able to create graphs quickly and easily, the following graph was created.




The Fourth Modeling Cycle. Again, when the group tried to use the preceding information to sort students into groups, everyone began to doubt that scores from the primary grades should count as much as scores from later grades. So, gradually, the idea arose to look at TRENDS rather than AVERAGES ... as shown in the graph that follows.  
The Fifth Modeling Cycle. Finally, when the group began to use the information based on trends, they also began to make use of the qualitative information (i.e., the teachers' comments) which had been ignored in their earlier interpretations of the data. For example, they noticed that Hank's scores in math were probably influenced by his early lack of proficiency in speaking English. Therefore, when such factors began to be considered, the teachers concluded that the school should abandon its policy of sorting students into low, middle, and high ability groups. Instead, they recommended that: (i) three equivalent math groups should be created, (ii) students who appeared to need special attention should be distributed equally among the three sections, and (iii) all of the case histories (e.g., the trend information and teachers' comments) should be made available to the students' teachers. Summary. The preceding solution illustrates the following characteristics which are common in "real life" situations in which mathematics is used: (i) coordinating and communicating efforts is often as important as the efforts themselves, (ii) analyzing plans (or processes, or results) is often as important as generating them, and (iii) justifying and explaining decisions is often as important as simply making decisions. Further, since powerful technologybased tools were available, more time was spent discussing the meaning of the procedures that were used; and, because of this process, onerule/singlestep solutions often emerged as being overly simplistic. Finally, because both "too much" and "not enough" information were available, and because some of the relevant information involved patterns of data rather than simply isolated pieces, an explicit model was needed to filter, organize, interpret, "fill holes," and "project beyond" the information that was available. If the solution process itself is examined, some other common characteristics of "real life" problems also become apparent. First, the overall solution process involved a series of "modeling cycles," each of which involved somewhat different interpretations of givens, goals, and available solution steps. Second, the earliest interpretations of the problem made a number of unjustified assumptions that were not given, and (at the same time) failed to recognize a number of significant pieces of information that were given. Third, the final solution involved a significant reformulation of the original conception of the problem itself; in addition it also went beyond providing a solution to a single isolated situation to provide a conceptual framework for interpreting a wide range of other structurally similar situations. Fourth, even though many levels and types of "correct" responses were possible, the problem solvers themselves knew when one model was better than another, and they were also able to identify in which directions a given model should be modified in order to improve. After discussing the characteristics of a variety of problem solving situations of this type, the teachers who participated in our projects agreed to adopt the following firstround assumptions about the "real life" mathematical activities that they were trying to develop.
Beyond the preceding characteristics, however, the most important attribute of the problems is that they should involve inventing (constructing, extending, refining) a conceptual system that was both mathematically significant and practically important... without explicitly teaching this system to them. In other words, the problem should be modeleliciting. Principles for the Design of ModelEliciting Problems The remainder of our paper will focus on examples to illustrate the following six principles for developing modeleliciting activities aimed at helping students (simultaneously) develop and document "real life" problem solving abilities.
The Model Construction Principle The goal is not simply for students to produce an "answer" to a question. Students should also construct mathematically significant systems that can be used to describe, explain, manipulate, or predict a wide range of "real life" experiences. Therefore, in order for a problem to be modeleliciting, one of the activity author's main goals is to confront students with the need for a model. If the need for a model is clear, then students tend to produce one. How can activities create the need for a model? Authors should ask themselves "What kinds of situations create the need for anyone (myself, other adults) to create models, whether they are working in mathematics, in science, in business, or in everyday life?" Answers to this question include the following.
From the point of view of instruction, the preceding kinds of activities lead to three positive outcomes. First, the models that students construct often involve mathematical ideas that are far more sophisticated than those associated with their prior failures in traditional textbooks, tests, and teaching. Second, they enable teachers to recognize students' baseline abilities and understandings (or misunderstandings), so that followup instruction can build on students' strengths and address their weaknesses. Third, the solutions that students produce explicitly reveal the diverse ways that students think about mathematically rich situations. Therefore, they help teachers recognize and reward a broader range of mathematically capable students. Consider the following running problem (for students), which is based on a recent newspaper article and is similar to the math placement problem (for teachers) that was described earlier. It involves making predictions that extrapolate beyond the given information; it also involves producing materials to explain the predictions that are made.








If an activity is truly
modeleliciting, then authors should be able to give clear answers to the
following questions about the systems that students are being challenged
to construct. (i) What kind of mathematical "objects" do students need to
consider? (Possible answers include ratios, trends, and coordinates.) (ii)
What kind of relationships among objects do students need to consider? (Possible
answers include equivalence relationships, ordering relationships, and invariance
under transformations.) (iii) What kind of combinations or interactions
among objects do students need to consider? (Possible answers include additive
combinations and multiplicative interactions.) Therefore, if we are to write
effective modeleliciting activities, the goal is to create the need for
students to construct a description or an explanation that involves interesting
mathematical objects, relations, and operations.
Of course, when an activity creates the need for a description or an explanation, there is nothing to guarantee that the system that students construct will be identical to the one the teacher had in mind. This is because choices are nearly always available about issues such as: (i) which representation system to use (e.g., graphs versus equations), (ii) which types of units to emphasize (e.g., trends involving ratios, or relationships involving trends), or (iii) which level of accuracy and precision is most appropriate. For example, in the case of the running problem, one recent sample of middle schoolers generated results which reflected the following ways to think about the situation. Different Ways to Think about the Running Problem Units of Analysis (Simple vs. Composite): What are the units people think about when working on this problem? Sometimes people use small simple units such as one year, one Olympics, individual running speeds, or individual running times. Other times they use larger, composite units, such as blocks of data, patterns, or trends.




Differences vs. Ratios: How do people think about change? Sometimes people think of change in terms of differences (absolute change). Other times they think more in terms of percentages or ratios (relative change). Change can be relative to time intervals (e.g., years), or change can be relative to running speeds or running times. Complex ratios can be relative to both time intervals and running speeds/times. Numerical Patterns vs. Graphical Patterns: How do people think about data? Sometimes people think in terms of numerical data, such as numbers, sequences of numbers (e.g., lists or tables), or composites of numbers (e.g., sums or averages). Other times they think in terms of visual data, such as points, sequences of points (e.g., patterns or graphs), and composites of points (e.g., slopes). Independent Data vs. Comparative Data: How do people think about more than one set of data? Sometimes people respond to men's and women's data separately, project into the future, then compare the two. Other times they consider differences between the two sets of data, then project the differences. LittlePicture vs. BigPicture: How do people think about making predictions based on past performances? Sometimes people think about the problem by projecting from littlepicture information what a next one will be, then a next one, a next one, and so on. Other times they extrapolate from big picture information what some future situation will be and then attempt to fill in the holes.




Linear vs. NonLinear:
How do people think about trends? Sometimes people think about the
problem in a linear fashion. They see a constant rate of increase and project
it in a steadystate fashion. Other times people think about the problem
in nonlinear ways. They think in terms of a limiting factor or a dynamic
rate of change. This may be expressed as leveling off, maxing out, or peaking.
The important point to notice about the preceding ways of thinking about the problem is that the various types of objects, relationships, and operations could be combined in many ways to produce acceptable responses. Also, just as in the case of the math placement problem for teachers, students' solution procedures for the running problem could involve using many different types of tools, and a given student could play many different roles during the solution process. Nonetheless, in each case, when students construct a structurally interesting model, they actually invent (or modify, or refine, or extend) an important mathematical system. The Simple Prototype Principle Situations that serve as good prototypes (or good structural metaphors) tend to be elegant. This is why the goal of the simple prototype (or parsimony) principle is to emphasize activities that are as simple as possible, while still serving as useful prototypes for thinking about significant mathematical systems. Yet, when teachers first begin to try to write modeleliciting activities of the type described in this paper, simplicity is not one of the terms that they are most likely to use to describe the kind of activities they want to emphasize. Instead, they use terms such as realistic, applications, complex, and difficult. There is some truth in each of the preceding terms. But they also tend to reflect the following common misconceptions:
One reason this last phenomenon occurs so frequently is that teachers, much more than their students, are often reluctant to put aside their "school math minds" and use their "real math minds." Yet many skills and beliefs that contribute to success in school are counterproductive in most "real life" situations. For example, in school, being caught thinking tends to mean that you were caught not knowing; taking more than three seconds to respond means you'll probably be passed over by the teacher; getting an answer that isn't a whole number is a clue that you're doing something wrong; and using more than a single rule means that you probably aren't doing it the "right" way. By contrast, in many "real life" situations, it is often the inappropriate responses that are associated with quickly generated answers and singlerule "canned" solution procedures. The Model Documentation Principle One way in which the running problem differed from the math placement problem was that for many groups of teachers who worked the latter, it was necessary to watch videotapes of their solution processes in order to discover how they had interpreted the situation. This is because the stated goal of the math placement problem focused mainly on making decisions (or on giving answers) rather than on descriptions or explanations of the decisions that were made. From the point of view of the author of the running problem, it was important to state the goal of the problem in such a way that the results students produce reveal as much as possible about: (i) how they are thinking about the situation, and (ii) information that otherwise would have become \apparent only by watching a videotaped record of the solution process. Of course, not everything that is apparent on a videotape can be captured in students' final results. For example, the final results that students produce seldom provide much information about: (i) the "modeling cycles" that students went through to arrive at these results, or (ii) alternative models that students developed and rejected, or (iii) the roles that different students played during the solution process. Nonetheless, if care is taken, it is usually possible to state the goals of modeleliciting problems so that they capture as much information as possible about how students think about the situation. Then other relevant information can usually be recorded by teachers using classroom observation forms, or using other data gathering tools and procedures. One straightforward implication of the modeldocumentation principle is that traditional kinds of mathematics questions are not acceptable if the results they ask students to produce consist of nothing more than simple numerical answers (e.g., 30 feet, 5 dollars) or simple decisions (e.g., yesno, multiple choice). Even though it is possible for such problems to elicit a model, they provide little information about the nature of the model that is elicited. Effective modeleliciting activities should enable students to simultaneously learn and document what they are learning. The SelfEvaluation Principle If students are unable to make judgments about whether (or in which directions) current solutions need to improve, or about which of several alternative solutions is most useful, then it is unlikely that they will ever go beyond firstround solutions to problems. The direction of the task (as described by information about what is to be produced, when, why, for whom, and under which conditions it is to be produced) is especially important in the case of modeleliciting activities for groups of students, because outstanding solutions generally involve several modeling cycles. The model produced at each cycle requires that in the statement of the problem there exist some criteria for judging its adequacy. Effective modeleliciting activities should be stated in such a way that students themselves can assess the usefulness of the results that they produce. Therefore, when students ask "Are we done yet?", the appropriate response from teachers should be to refer students back to the statement of the problem, where guidelines should be available (implicitly or explicitly) about issues such as: useful to whom? useful for what purpose? useful under what conditions? To emphasize the preceding points, some teachers who participated in our research have written "letters from the editor" (see the example below) to address some of the student responses. Other teachers have chosen to use such letters to provide guidelines for wholeclass discussions, where the goal is for various groups of students to compare alternative responses to the problem and to assess the strengths, weaknesses, and directions for improvement for each. The Model Generalization Principle The following two types of generalization are not the same. First, one can ask whether a given student is able to generalize a particular piece of mathematical knowledge; second, one can ask whether a student is able to construct a model which generalizes to a whole class of situations. In the first case, generalizability is attributed to the student; in the second case, generalizability is attributed to the model. In the first case, generalizability tends to be quite difficult to verify; but in the second case, it is usually a straightforward task to verify it. It is the second case to which the model generalization principle applies. To understand the intent of the Generalization Principle, consider the Don't Drink & Drive Problem.








Early versions of the Don't Drink & Drive Problem were similar to many problems in textbooks and tests. For example, students were asked to estimate the bloodalcohol index for a person whose weight or level of consumption fell between or beyond the data given in the table. But no information was given about who was asking the question, or for what purpose the question was being asked. Consequently, there was no way for students to make judgments about issues such as how accurate the answer should be. Also, if the question involved only a single instance (i.e., a single person, and a single set of circumstances), then it usually was possible for students to generate "acceptable sounding answers" without really doing much mathematics. On the other hand, when the task that was posed involved using a graph or a formula to describe (and generate results for) a whole class of situations, then the responses students generated naturally focused on significant mathematical ideas. The Reality Principle The goal here is to encourage students to develop mathematical models based on extensions of things that they already understand and can do. Therefore, if students' legitimate ideas are dismissed or not taken seriously (even though their points might be valid in real situations), then the result is that the students usually "tune out" even if the topic appeared to be one that would "turn them on." To illustrate the Reality Principle consider a teacher's rewriting of the following problem, which was included in the NCTM's Standards for Curriculum and Assessment in School Mathematics (NCTM, 1989).




As the preceding analysis makes clear, a "real" problem is not simply one that refers to a real situation. The question that is asked should also make sense in terms of students' "real life" knowledge and experiences. The topics that work best tend to be those that fit the current local interests of specifically targeted groups of students; thus, teachers should "localize" problems to fit the current interests and experiences of specific groups of students. Not all students should have to demonstrate their competencies in the same situation. Clones, or structurally isomorphic problems, should be available that involve a variety of contexts. The goal is to provide all students with as many "low pressure, high interest" opportunities as possible to demonstrate their abilities and achievements within contexts that are familiar and comfortable. Therefore, teachers and students should work together to select whichever task fits their interests and experiences. Summary Even in the case of everyday problems that involve little more than addition or subtraction, the mathematics that is used is functioning as a model of the situation. But unfortunately, students in schools are rarely allowed to see situations in which the model that is being used is problematiceven though such situations abound within "real life" situations. Thinking mathematically means, more than anything else, interpreting situations mathematically. Yet, most of the word problems in traditional textbooks and tests attempt to minimize (or eliminate) the interpretation phases of problem solving, often because of the misguided belief that thinking mathematically consists mainly of rule following, and because of the corollary that mathematical problems usually have only one correct answer. Mathematical ability does not simply consist of the ability to flawlessly remember and execute intricate sequences of rules; and when doing mathematics includes generating descriptions, explanations, manipulations, and predictions, there is rarely only one correct answer. Choices about which answer is most useful generally depend on conditions (e.g., time constraints, resource constraints) and on purposes (e.g., risk constraints, generalizability constraints). Furthermore, when powerful conceptual and procedural tools are available, many more options are available concerning modes of responses and solution paths. When a broader range of mathematical ideas and tools are used to address a broader range of problem solving situations, a broader range of mathematical abilities emerge as resources for successand a broader range of students emerge with higher potential. References Carraher, T., Carraher, D. & Schliemann, A. D. (1985). Mathematics in the streets and the schools. British Journal of Developmental Psychology, 3, 2129. Lave, J. (1988). Cognition in practice: Mind, mathematics and culture in everyday life. New York: Cambridge University Press. Lesh, R. & Lamon, S. (1992). Assessments of authentic performance in elementary mathematics. Washington, DC: American Association for the Advancement of Sciences. Mathematical Sciences Education Board. (1990). Reshaping School Mathematics: A Philosophy and Framework for Curriculum. National Research Council. Washington, DC: National Academy Press. Mislevy, R. J. (1992). Foundations of a new test theory. In N. Frederiksen, R. J. Mislevy, and I. Bejar (Eds.), Test theory for a new generation of tests. Hillsdale, NJ: Lawrence Erlbaum Associates. National Council of Teachers of Mathematics. (1989). Curriculum and Evaluation Standards for School Mathematics, K12. Reston, VA: Author. Resnick, D. P. & Resnick, L. B. (1985). Standards, curriculum, and performance: A historical and comparative perspective. Educational Researcher, 14(4), 521. Resnick, L., & Tucker, M. (1991). The standards project: Creating a national examination system that prepares students for the challenges of the 21st century. (Unpublished overview). Pittsburgh, PA: Learning Research Development Center, University of Pittsburgh. Romberg, T. A., Wilson, L., & Khaketla, M. (1991). The alignment of six standardized tests with the NCTM standards. Madison, WI: University of Wisconsin. Romberg, T. A., Zarinnia, E. A., & Collis, K. F. (1989). A new world view of assessment of mathematics. In G. Kulm (Ed.), Assessing higher order thinking in mathematics (pp. 2138). Washington, DC: American Association for the Advancement of Sciences. Saxe, G. B. (1990). Culture and Cognitive Development: Studies in Mathematical Understanding. Hillsdale, NJ: Lawrence Erlbaum Associates. Steen, L. A. (Ed.). (1990). On the Shoulders of Giants: New Approaches to Numeracy. National Research Council. Washington, DC: National Academy Press. Biographical Notes Richard Lesh is a Principal Research Scientist at Educational Testing Service, where he is currently serving as the Director of ETS's Research Program on Technology and Assessment. For more than fifteen years he was a Professor of Mathematics and Psychology at Northwestern University. Dr. Lesh's research has included studying children's abilities to use elementary mathematics in everyday problemsolving situations, where realistic tools are available and responses often require 3060 minutes to construct. Current NSFfunded projects include three in which "real life problemsolving activities" are being used to focus on issues involving equity, teacher development, and the influence of technologybased tools in onthejob problemsolving situations. He has contributed much to software development, and has authored books for children, teachers, teacher educators, curriculum developers, and researchers. He is coauthor of Assessments of Authentic Performance in Elementary Mathematics, published by the American Association for the Advancement of Science which is particularly relevant to the current national curriculum reform effort. Mark Hoover is an Associate Research Scientist who divides his time between computer science test development for the GRE and AP programs and research in mathematics and computer science education. He has an M.A. in mathematics and an M.A. in anthropology from the University of North Carolina at Chapel Hill and a Ph.D. in computer science from the University of New Mexico at Albuquerque. Dr. Hoover's early research was in educational anthropology, theoretical computer science, and computational graph theory. He is currently conducting research on personality types in mathematics education and instruction/assessment in mathematics education. Anthony E. Kelly serves as Assistant Professor at the Graduate School of Education of Rutgers University. He received his Ph.D. from Stanford University in psychological studies in education. Dr. Kelly's research interests include student modeling techniques for intelligent tutoring systems as they relate to mathematics and science, as well as authentic assessment. 