

A Diagnostic
Analysis of a William
M. Bart and Thomas Post 

It is generally recognized that there are serious problems with the achievement of American youth in mathematics. The Second International Mathematics Study (McKnight, et al, 1994) provided results to support that conclusion. In that study, eighth graders from 20 countries were administered a variety of mathematics tests. 'Me Japanese children produced the highest mean scores for the tests. The American children, on the other hand, produced average scores for the same tests that ranged from eighth to eighteenth place in the rankings. In the second International Assessment of Educational Progress (Lapointe, Mead, & Askew, 1992), American 13yearolds tied for 13th place from among large samples of 13 yearold subjects from 15 countries. New efforts are clearly needed to correct this widespread academic deficit. One such effort lies in improving the diagnosis of error in mathematical problem solving in terms of faulty cognitive processes. Such diagnosis is termed cognitive diagnosis and results from answering questions such as the following: (a) why do students make the mistakes they do when given a mathematical problem? and (b) what defective rules do students follow when incorrectly solving a mathematical problem? (e.g., Brown & Burton, 1978). These questions also need to be addressed and answered if we arc to mount more effective mathematics education programs. Cognitive diagnosis is necessary for the prescription of effective instruction in mathematics (Glaser, 198 1; Linn, 1986; Bart & Orton, 1991). One issue that deserves consideration is the nature of a diagnostic test. But a test is only as diagnostic as its items, so that the issue can be restated as follows: what is the nature of a diagnostic test item? In other words, what are the properties of a test item that would permit one to make diagnostic inferences as to why students make the errors they make? It is the purpose of this paper to investigate the properties of a diagnostic item. Although interview data and complete written answers by students can provide detailed information about student thought processes, such techniques are often costly, timeconsuming, and impractical. Educators need to make valid conclusions about student understandings more quickly while maintaining the qualitative cognitive information often associated with the interview. Although both free response and multiple choice items can be assessed as to their diagnostic properties, items that are multiple choice in format will be highlighted in this paper. These items are easily scored and therefore are economical to use. The thesis of this paper is that, if properly constructed, these items can readily provide accurate information related to children's cognitive understandings and misunderstandings. 

The
SemiDense Item
Various terms need to be defined in this paper for the benefit of the reader. One such term is that of the semidense item. An informal definition of a semidense item is the following: an item is semidense if one can exactly interpret the errors students make when they respond to the item. Such an item is viewed as "inferencerich" in that one can infer cognitive rules from the responses to the item. The term "cognitive rule "has a specific meaning within this context. A cognitive rule is any sequence of cognitive operations that produces an item response from the processing of an item stem. Cognitive rules include problem solving strategies, decision making strategies, and algorithms. A viable cognitive rule underlies the generation of a correct response to an item. However, there is always a possibility that some other unanticipated cognitive rule was used by a subject to generate a given item response. Thus there is never a guarantee that the cognitive rule inferred from an item response was without doubt the cognitive rule used by a subject in generating that response to the item. For this reason, cognitive rules inferred from item responses are viewed as interpretations of the responses. From a cognitive diagnostic perspective, a semidense item is an ideal test item. Unfortunately, there may be very few if any semidense items in certain fields. What should then be sought and developed are items that have as many of the properties of a semidense item as possible. 

Cognitive Microtheory  
In order to analyze an item in terms of the extent to which the item has semidense item properties, it is necessary to examine the relationship between the responses to the item and the cognitive rules used to interpret the responses to the item. Such cognitive rules associated with an item constitute a cognitive microtheory for the item. An item that has no cognitive microtheory has no diagnostic value, because none of the responses to the item can be interpreted cognitively. An item that has a cognitive microtheory has at least one of its responses interpreted cognitively. The properties of a semidense item describe relationships between the set of responses to an item and die cognitive rules indicated by the cognitive microtheory for the item.  
The Properties of the SemiDense Item  
To explicate the properties of a semidense item, the framework for diagnostic testing formulated by Bart and his advisees (e.g., Bart, 1984; Bart, 1985; Bart & WilliamsMorris, 1996; Bart & WilliamsMorris, 1990) will be used. The five properties of a semidense item are the following: (a) response interpretability, (b) response discrimination, and (c) rule discrimination (d) exhaustive rule set usage, (e) semidensity.  
Response
Interpretability. The first property of a semidense item is response
interpretability. An item has response interpretability if each response
to the item is interpretable by at least one cognitive rule. In other
words, an item has response interpretability if the cognitive microtheory
associated with the item can account for each response to the item with
at least one cognitive rule. Item lacks response interpretability to the
extent to which there are responses to the item that cannot be cognitively
interpreted.
Response Discrimination. The second property of a semidense item is response discrimination. An item has response discrimination if each response to the item is interpretable by only one cognitive rule. In other words, an item has response discrimination if there exists a functional relationship between the responses to the item and the cognitive rules indicated by the cognitive microtheory associated with the item. The item lacks response discrimination to the extent to which there are responses to the item that either are not cognitively interpretable or are interpreted by more than one cognitive rule. This property relates to whether the responses to the item discriminate among the cognitive rules interpreting the responses. If an item has some responses that are interpreted by several cognitive rules, then that item has responses that do not discriminate well among cognitive rules. In such a case, one cannot infer which of several cognitive rules account for some response to the item. Rule Discrimination I. The third property of a semidense item is rule discrimination I. An item has rule discrimination I if (a) the item has response discrimination and (b) each cognitive rule, which interprets a response to an item, interprets only one response to the item. Thus the property of rule discrimination I has two defining conditions which together constitute a sufficient condition for that property. In other words, an item has rule discrimination I if there exists a onetoone function between the responses to the item and the cognitive rules indicated by the cognitive microtheory associated with the item. The second defining condition of the property of rule discrimination relates to the extent to which a cognitive rule is finegrained. If a cognitive rule indicated by the cognitive microtheory for an item interprets more than one response to the item, then that cognitive rule is too inclusive and grossly defined, because it cannot discriminate among different responses. Cognitive rules should be precisely defined so that each of them interprets only one response to an item. An item with response discrimination lacks rule discrimination I to the extent to which the associated cognitive rules are grossly defined. Cognitive rules can discriminate among item responses and that condition refers to rule discrimination I. But also cognitive rules can discriminate among instructional sequences and that condition refers to rule discrimination II. Thus there are two forms of rule discrimination distinguished from each other by the suffixes "I" and "U". Rule discrimination U is a higher order property of a test which relates to its prescriptive value. To discuss rule discrimination in depth transcends the scope of this paper which is limited to an examination of the diagnostic value of an item. Suffice it to say that if an item has rule discrimination U, then one could, for any given unsuccessful response to the item, prescribe the instructional experience that would correct the defective cognitive rule that interprets that response. This commentary is provided merely to explain why the suffix l" is used in relation to die form of rule discrimination that relates to diagnostic testing. Exhaustive rule set usage. The fourth property of the semidense item is exhaustive rule set usage. An item has exhaustive rule set usage if (a) the item has response discrimination and (b) every cognitive rule in the cognitive microtheory associated with the item interprets at least one response to the item. The property of exhaustive rule set usage has then two defining conditions which together constitute a sufficient condition for that property. Both the property of exhaustive rule set usage and the property of rule discrimination I have two defining conditions. They each share one defining condition  namely, response discrimination. However, both properties are independent, because their other defining conditions are different. Thus an item could have both properties, or just one of the properties, or neither property. The property of exhaustive rule set usage relates to the extent to which the cognitive rules indicated by the cognitive microtheory for the item interpret responses to the item. The cognitive rule indicated by the cognitive microtheory for an item does not interpret even one response to the item, then the cognitive microtheory is not economical. Such a cognitive microtheory is too expansive and too broadly formulated and would need to be more narrowly conceptualized, so that no extraneous cognitive rules are derivable from the cognitive microtheory for the item. 'Me cognitive microtheory for an item should provide as good an explanatory fit for the item and its responses as possible. SemiDensity. The fifth property of the semidense item is semidensity. An item has semidensity if the item has (a) rule discrimination and (b) exhaustive rule set usage. The property of semidensity has then two defining conditions which together constitute a sufficient condition for that property. Whereas the other four properties of a semidense item are necessary conditions of a semidense item, the property of semidensity is a necessary and sufficient condition of a semidense item  i.e., an item is semidense if and only if the item has semidensity for an item with semidensity, there is exactly one cognitive rule interpreting each response to the item and each cognitive rule indicated by the cognitive microtheory for the item interprets exactly one response. An item with semidensity has thus a properly fitting cognitive microtheory with finegrained cognitive rules. Such an item would be an ideal diagnostic test item. The Order Among the Properties. The five properties of a semidense item were not mutually exclusive. From an examination of their properties, one may infer that they can be ordered by the relation "is a precondition to". Figure 1 depicts the order among the five properties. 

Figure
1. The order among the properties of a semidense item.


Figure 1 indicates the following relationships: (a) a precondition for semi density is rule discrimination I. (b) another precondition for semidensity is exhaustive rule set usage, (c) a precondition for rule discrimination I is response discrimination, (d) a precondition for exhaustive rule set usage is response discrimination, and (e) a precondition for response discrimination is response interpretability. A test item acquires mote semidense item properties, as the cognitive microtheory for the item is refined so that its derived cognitive rules arc better able to interpret the responses to the item in an unambiguous, unconfounded, and economical manner.  
The
Refined Item Digraph


To assist in the analysis of the item, a refined item digraph of the item is constructed. Some, explanatory commentary is necessary, because a refined item digraph is a relatively new technical term. A digraph is an array of points interconnected by arrows (Harary, Norman, & Cartwright. 1965). An item digraph is any digraph depicting the inferences (represented as arrows) interrelating the item stein (e.g., "7 + 7 = _"), the responses to the item (e.g., "14"), the cognitive rules interpreting the item responses, and the instructional sequences interrelating the cognitive rules, whether defective or not, to the correct rule (represented as points). A refined item digraph is any item digraph, which has no inferences being, depicted either between Item responses. or between cognitive rules in an item cognitive microtheory, or between instructional sequences interrelating cognitive rules. Refined item digraphs can be used to determine how diagnostic and prescriptive an item is. In the context of this paper, instructional prescription is not considered, because there are no instructional sequences associated with the item under examination. In this case, the following notation is used: (a) the item stem is termed "s" and constitutes the set S, (b) the item responses are termed "r_{1},...,r_{n}" and constitutes the set R, and (c) the cognitive rules arc termed "t_{1}...,t_{v}" and constitute the set T. A refined item digraph is thus an item digraph that has only betweenset inferences and no withinset inferences. 

An
Example


To exemplify how an item can be analyzed in terms of its diagnostic value, an item and a set of solution strategies observed by die Rational Number Project (Post, Behr, & Lesh, 1989) was selected. The specific item is the following:  


Let us assume that there were two responses to this item  namely, 24 cents and 12 cents.  
A
Cognitive Microtheory


The cognitive microtheory related to this type of item includes seven types of solution strategies identified earlier by the Rational Number Project (Heller, et al, 1985): (a) unit rate, (b) factor of change, (c) cross multiplication algorithm, (d) equivalent fractions, (e) equivalence class, (f) pair generation, and (g) additive. Each strategy was also encountered by Bezuk (1986). Each will be described in terms of its application to the item and the response resulting from that strategy will be indicated. Unit rate. This strategy involves (a) computing the cost of one piece of gum and then (b) multiplying this unit price times the number of pieces bought to generate the desired answer. This answer is the cost of the pieces to Kathy. For this item, each piece costs three cents, so eight pieces will cost 24 cents, because 8 pieces X 3 cents/piece =24 cents. Factor of change. This strategy involves (a) comparing the number of pieces of gum that each person bought (b) determining the factor of change which indicates how many tunes as much gum one person bought as the other person. and (c) multiplying the factor times the price the first person paid. For this item, Kathy bought four times as much gum as Ann, so Kathy should pay four times as much or 24 cents (i.e., 4 x 6 cents). Cross multiplication algorithm. This strategy involves the formulation of a proportion determined by die equality of two ratios of piecestocents. The cross products are computed and the unknown in the resulting equation is solved. In other words, if d/e=f/x, then x =ef/d, or if d/e = x/f, then x = df/e. For this item, the proportion can be expressed as 2 pieces/6 cents = 8 pieces/x cents. Thus x = 8 pieces X 6 cents/2 pieces = 24 cents. Equivalent fractions. This strategy Involves determining a proportion given a simple ratio of piecestocents and one term of a second, equivalent ratio. The ratios are treated as equivalent fractions. The task is to find a fraction equivalent to the given one. The ratio with values given is multiplied by a particular fraction of the form n/n equal to one, so that the product ratio has a term equal to the desired answer. For this item, the equivalent is 2/6 = 8/x. A ratio (fraction) is then sought that is equivalent to 2 pieces/6 cents and that has a numerator of eight pieces. The fraction (ratio) 2 pieces/6 cents is thus multiplied by 4/4 to produce 8 pieces/24 cents. The desired answer is then 24 cents, since the numerator is eight pieces. Equivalence class. This strategy involves determining the given rate pair as a fraction. The student then generates a class of equivalent fractions until the desired rate pair is identified. For this item, the given rate pair is 2 pieces/ 6 cents. 'Me class of equivalent fractions is 2/6 =4/12 =6/18 =8/24. The desired rate pair is then a pieces/24 cents, as it corresponds to the fraction 8/24. The answer is thus 24 cents. Pair generation. This strategy involves (a) beginning with the rate pair given in the problem and (b) generating equivalent pairs until the desired pair is generated. One way to generate equivalent pairs is by doubling the previous rate pair. For this item, the rate pair given is two pieces for six cents. From that rate pair comes (a) four pieces for twelve cents and (b) eight pieces for 24 cents. Thus the response is 24 cents, because Kathy bought eight pieces of gum. This strategy could be viewed as a special case of the equivalence claw strategy in that only a subclass of equivalent fractions resulting from, say, the doubling procedure is generated. Additive. This inadequate strategy involves determining how much needs to be added to one value in the given rate pair to produce the other value in die same rate pair. A comparable additive relationship is then posited between the values in the other rate pair, in which one value is known and the other value unknown. The unknown value is then computed. For this item, the value of 4 needs to be added to the value of 2 (corresponding to 2 pieces) to produce the value of 6 (corresponding to 6 cents) in the case of Ann; therefore, the value of 4 needs to be added to the value of 8 (corresponding to 8 pieces) to produce the value of 12 (corresponding to 12 cents) in the case of Kathy. Thus Kathy paid 12 cents. 

A Refined Item Digraph  
With respect to the proportional reasoning item under consideration, there are two responses and seven cognitive rules or strategies. Figure 2 depicts the refined item digraph for the item.  
Figure
L The refined item digraph of a proportional reasoning item.


Using the refined item
digraph represented in Figure 2, one can identify the semidense properties
of the proportional reasoning item.
First of all, each response is interpreted by at least one cognitive rule. Response r_{1}is interpreted by six cognitive rules and response r_{2} is interpreted by one cognitive rule. Thus this item has response interpretability which is a property very few test items have. Secondly, one response (r_{1}) is interpreted by more than one rule. Thus the item lacks response discrimination. One consequence is that this item cannot discriminate among six cognitive rules posited for the item. To make such a discrimination, either subject protocols (interview or written) would need to be collected or other items would be needed that discriminate among the six rules. That response discrimination is a precondition for rule discrimination I, exhaustive rule set usage, and semidensity provides the basis for another consequence of the finding that this item lacks response discrimination. That consequence is that the item lacks three additional properties of the semidense item  namely, rule discrimination I, exhaustive rule set usage, and semidensity. In general, this carefullyformulated item assessing one aspect of proportional reasoning has only very modest diagnostic value, because the item lacks four out of five semidense item properties. This item would need to be modified so that it could attain more semidense item properties. One possible revision which would improve the diagnostic value of this item involves providing each rulebased (i.e., strategybased) rationale with its numerical response. This procedure was first used by Ruud (1976) in a science context. The following array of responses could be used in a revision of the gum buying problem: 



Such a modification of the set of responses would permit the item to have the additional diagnostic item properties of response discrimination, rule discrimination I, exhaustive rule set usage, and semidensity. 'Me strategies cited in the parentheses would not actually be used in the item. These strategies are cited solely to help the reader understand the response rationales. To provide such a set of responses requires a fairly welldeveloped cognitive microtheory for that item type.  
Conclusion


The diagnostic analysis of items using refined item digraphs should be useful in the determination of the diagnostic values of the test items. Because of the paucity of cognitive microtheories associated with items, most test items will be found to have little if any diagnostic value, because they have few if any semidense item properties. This methodology hopefully will stimulate more research into the cognitive rules accounting for the correct and incorrect responses to test items. Presently, research into the cognitive foundations of mathematics test items is progressing (e.g. Ginsburg, Lopez. Mukhopadhyay, Yamamoto, Willis, & Kelly, 1992; Lamm & Lesh, 1992; Mislevy, Yamamoto, & Anacker, 1992; Siegler, 1976). As a result, diagnostic testing in mathematics education is becoming more feasible. However, certain provisos need to be made clew for this type of inquiry. First of all, a cognitive microtheory which consists of the cognitive rules underlying a test item of a set of test items should result from empirical research. If the rules cited in the cognitive microtheory are spurious, then the diagnostic inferences drawn from item responses will be invalid. If the rules cited in the cognitive microtheory are valid in terms of being grounded in rigorous research probing the cognitive rules used by subjects when solving selected items, then the diagnostic inferences drawn from responses to similar items will be valid. The validity of the diagnostic inferences drawn from the item responses is highly dependent on the validity of the cognitive microtheory underlying the items to be used. Another proviso concerns the determination of the rule that best characterizes die rule used by a subject when solving a set of problems. Such determination should always be based on the responses to several items and not merely one or two items. One procedure is a scaling technique discussed by Bart & Orton (1991). One is essentially attempting to identify the rule among a set of rules that best interprets the set of responses of a subject to a set of items. This type of psychometric inquiry is in the early stages of development. A third proviso concerns the utility of the semidense item concept in the evaluation of mathematics achievement. It is premature to make any pronouncements as to such utility, because we are at the beginning stages of the development of the semidense concept. More theoretical and empirical inquiry will be needed in the investigation of its utility in a range of educational settings before any firm set of guidelines can be provided for the productive application of the semidense concept and its associated methodology in the evaluation of mathematics achievement. Continued progress in cognitive diagnostic testing in mathematics education would be an important measure in overall plans to correct the mediocre status of typical student achievement in mathematics. 

REFERENCES


Bart, W.M. (1984). The dense item: A bridge between learning and instruction. Proceedings of the 26th Conference of the Military Testing Association. Munich, Federal Republic of Germany, 391396. Bart, W.M. (1995). How qualitatively informative are test items?: A dense item analysis. Proceedings of the 27th Annual Conference of the Military Testing Association. San Diego, CA, 707712. Bart, W.M., & Orton, R.E. (1991). The cognitive effects of a mathematics inservice workshop on elementary school teachers. Instructional Science, 20,267288. Bart, W.M., & WilliamsMorris, R. (1986). A refined item digraph analysis of a cognitive ability test. Proceedings of the 28th Annual Conference of the Military Testing Association. Mystic, CT, 396400. Bart, W.M., & WilliamsMorris, R. (1990). A refined item digraph analysis of a proportional reasoning test. Applied Measurement in Education, 3, 143165. Bezuk, N. (1986). Task variables affecting seventh grade students' performance and solution strategies on proportional reasoning word problems. Unpublished doctoral dissertation, University of Minnesota. Brown, J., & Burton, R. (1978). Diagnostic models for procedural bugs in basic mathematical skills. Cognitive Science, 2, 155192. Ginsburg, H.P., Lopez, L.S., Mukhopadhyay, S., Yamamoto, K., Willis, M., & Kelly, M. (1992). Assessing understanding of arithmetic. In R. Lesh (Ed.), Assessment of authentic performance in school mathematics (pp. 265289). Washington, DC: American Association for the Advancement of Science. Glaser, R. (1981). Tim future of testing: A research agenda for cognitive psychology and psychometrics. American Psychologist, 36, 923936. Hatary, F., Norman, R., & Cartwright, D. (1965). Structural models: An introduction to the theory of directed graphs. Now York: Wiley. Heller, P., Post, T.,& Behr, M. (190). The effect of rate type, problem setting, and rational number achievement on seventh grade students performance on qualitative and numerical proportional reasoning problems. Proceedings of the Ah General Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education. Columbus, OH. Lamm, S.J., & Lesh, R. (1992). Interpreting responses to problems with several levels and types of correct answers. In R_ Lesh (Ed.), Assessment of authentic performance in school mathematics (pp. 319342). Washington, DC: American Association for the Advancement of Science. Lapointe, A.E., Mead, N.A., & Askew, J.M. (1992). Learning mathematics. Princeton, NJ: Educational Testing Service. Linn, RL. (1996). Educational testing and assessment. American Psychologist, 41, 11531160. McKnight, C.C., Crosswhite, S.J., Dossey, J.A., Kifer, E., Swafford, J.O., Travers, K.J., & Cooney, T.J. (1987). The underachieving curriculum: Assessing U.S. school mathematics from an international perspective. Champaign, EL: Stipes Publishing Company. Mislevy, R.J., Yamamoto, K., & Anacker, S. (1992). Toward a test theory for assessing student understanding. In R. Lesh (Ed.),Assessment of authentic performance in school mathematics (pp. 293318). Washington, DO American Association for the Advancement of Science. Post, T., Behr, M., & Lesh, R. (1989). Proportionality and die development of prealgebra understandings. In The Algebra Curriculum K1 2. The National Council of Teachers of Mathematics  1988 Yearbook. Preston, VA: NC`I`M. Ruud, 0. (1976). The construction of an instrument measure proportional reasoning ability of junior high students. Ph.D. thesis, University of Minnesota. 