1.1 Introduction
In many educational and psychological measurement situations there is an underlying variable of interest. This variable is often something that is intuitively understood, such as intelligence. People can be described as being bright or average and the listener has some idea as to what the speaker is conveying about the object of the discussion. Similarly, one can talk about scholastic ability and its attributes such as gets good grades, learns new material easily, relates various sources of information, and uses study time effectively. In academic areas, one can use descriptive terms such as reading ability and arithmetic ability. Each of these is what psychometricians refer to as an unobservable or latent trait . While such a variable is easily described and knowledgeable persons can list its attributes, it cannot be measured directly as can height or weight, since the variable is a concept rather than a physical dimension. A primary goal of educational and psychological measurement is the determination of how much of such a latent trait a person possesses. Since most of the research has dealt with variables such as scholastic, reading, mathematical, and arithmetic abilities, the generic term ability is used within item response theory to refer to such latent traits.
If one is going to measure how much of a latent trait a person has, it is necessary to have a scale of measurement , that is, a ruler having a given metric. For a number of technical reasons, defining the scale of measurement, the numbers on the scale, and the amount of the trait that the numbers represent is a very difficult task. For the purposes of the first six chapters, this problem shall be solved by simply defining an arbitrary underlying ability scale. It will be assumed that, whatever the ability, it can be measured on a scale having a midpoint of zero, a unit of measurement of one, and a range from negative infinity to positive infinity. Since there is a unit of measurement and an arbitrary zero point, such a scale is referred to as existing at an interval level of measurement. The underlying idea here is that, if one could physically ascertain the ability of a person, this ruler would be used to tell how much ability a given person has, and the ability of several persons could be compared. While the theoretical range of ability is from negative infinity to positive infinity, practical considerations usually limit the range of values from, say, 3 to + 3. Consequently, the discussions in the text and the computer sessions will only deal with ability values within this range. However, you should be aware that values beyond this range are possible.
The usual approach taken to measure an ability is to develop a test consisting of a number of items (i.e., questions). Each of these items measures some facet of the particular ability of interest. From a purely technical point of view such items should be free response items where the examinee can write any response that seems appropriate. The person scoring the test then must decide whether the response is correct or not. When the item response is determined to be correct, the examinee receives a score of one, an incorrect answer receives a score of zero, that is, the item is dichotomously scored . Under classical test theory , the examinees raw test score would be the sum of the scores received on the items in the test. Under item response theory, the primary interest is in whether an examinee got each individual item correct or not rather than in the raw test score. This is because the basic concepts of item response theory rest upon the individual items of a test rather than upon some aggregate of the item responses such as a test score.
From a practical point of view, free response items are difficult to use in a test. In particular, they are difficult to score in a reliable manner. As a result, most tests used under item response theory consist of multiple-choice items. These are scored dichotomously with the correct answer receiving a score of one and each of the distractors yielding a score of zero. Items scored dichotomously are often referred to as binary items .
1.2 The Item Characteristic Curve
A reasonable assumption is that each examinee responding to a test item possesses some amount of the underlying ability. Thus, one can consider each examinee to have a numerical value, a score, that places the examinee somewhere on the ability scale. This ability score will be denoted by the Greek letter theta, . At each ability level there will be a certain probability that an examinee with that ability will give a correct answer to the item. This probability will be denoted by P (). In the case of a typical test item, this probability will be small for examinees of low ability and large for examinees of high ability. If one plotted P () as a function of ability, the result would be a smooth S-shaped curve such as shown in Fig.. The probability of correct response is near zero at the lowest levels of ability and increases until at the highest levels of ability the probability of correct response approaches unity. This S-shaped curve describes the relationship between the probability of correct response to an item and the ability scale. In item response theory , it is known as the item characteristic curve. Each item in a test will have its own item characteristic curve.
Fig. 1.1
A typical item characteristic curve
The item characteristic curve is the basic building block of item response theory and all the other constructs of the theory depend upon this curve. Because of this, considerable attention will be devoted to this curve and its role within the theory. There are two technical properties of an item characteristic curve that are used to describe it. The first is the difficulty of the item. Under item response theory, the difficulty of an item describes where the item functions along the ability scale. For example, an easy item functions among the low-ability examinees while a hard item would function among the high-ability examinees; thus, item difficulty is a location index . The second technical property is the discrimination of an item, which describes how well an item can differentiate between examinees having abilities below the item location and those having abilities above the item location. This property essentially reflects the steepness of the item characteristic curve in its middle section. The steeper the curve the better the item can discriminate. The flatter the curve the less the item is able to discriminate since the probability of correct response at low ability levels is nearly the same as it is at high ability levels. Using these two descriptors, one can describe the general form of the item characteristic curve. These descriptors are also used to discuss the technical properties of an item. It should be noted that these two properties say nothing about whether the item really measures some facet of the underlying ability or not; that is a question of validity. These two properties simply describe the form of the item characteristic curve.