Robustness of BW Aberrance Indices Against Test Length

: Many research had shown person fit indices might be influenced by the factor of test length on their detection rates of aberrant responses. The purpose of this study was to examine test length effects on the BW aberrance indices. Three conditions were designed in this study: test length (K, including 25, 50,100, and 200 items), ability ratio (T/K, defined as the total person score divided by test length K), and error ratio (E/K, defined as the number of errors within ability level divided by test length). Four 100-person times varying-item data matrices (100x25, 100x50, 100x100, and 100x200) were randomly generated and permuted 500 times for each data matrix through 20 repeats. Results showed that after partialling out the factors of E/K and T/K, the effect of test length on the association between the two indices was very slight. In nonlinear regression analyses, E/K and T/K can predict more than 76 and 73 percent of the variances of the B index and that of the W index, respectively, but test length with both very slight contributions on them. Furthermore, a very good model fit generated from SEM analyses also showed the effect of test length on the B and W indices were very tiny. All these pieces of evidence endorsed the B and W indices were robust with test length.


Introduction
Aberrant responses can characterize responders by their response patterns. For example, some responders may have trouble in starting to take a test, i.e., they may appear slow, fumbling, or anxious in startup (Wright & Stone, 1979;Smith, 1982). Other responders may become careless in answering easy items, or may be lucky to get some hard items correct (Wright & Stone, 1979;Smith, 1982;Sato, 1975;D'Costa, 1993). Still other responders appear to be plodders who unexpectedly omit items at the end of a test (Wright & Stone, 1979;Smith, 1982). There is also the type that shows extreme creativity by reinterpreting the easiest items as too simple to be true (Hulin, Drasgow, & Parsons, 1983). The patterns of these aberrant responses, defined as unexpected response patterns compared to an ideal response model, can provide diagnostic information for individuals.
Several indices called aberrance indices or person fit indices were developed for detecting aberrant response patterns (e.g., D'Costa, 1993; Drasgow, Levine, & Williams, 1985;Sato, 1975;Harnisch & Linn, 1981;Smith, 1991;Linacre & Wright, 1994;Tatsuoka & Tatsuoka, 1982;Tatsuoka & Linn, 1983). Two new indices named the BW aberrance indices (Huang, 2006，2008，2011), modified from the Sato Caution Index (SCI, Sato, 1975) and inheriting from the beyond ability surprise and within ability concern indices (D' Costa, 1993), were designed to detect the aberrant response patterns beyond or within one's ability level. The main idea of the B and W indices, as below in equations (1) and (2), is that the discrepancy between a person's ability and the difficulty of an aberrantly responded item reflects the level of aberrance. Note that u ij represents responses, 1's for correct answers and 0's for wrong. The q's are the levels of item difficulty ordered from easy to hard bounded within the interval of [0,1], and the q* iT are corrected ability level for a T total score person. The bracketed expression with test length K, [(K-1)/2], representing the theoretical maximum value of the numerator is equal to the value of lower Gauss integer. Ideally, a person is supposed to answer all withinability items correctly and all beyond-ability items wrongly. Items that a person should succeed or should fail on but did not are defined as aberrant. The discrepancies between a person's ability level and difficulty levels of aberrant items imply the degrees of destruction on an ideal relationship of responses. That is, the greater the discrepancies are, the more the aberrance of the responses.
A response matrix with four persons by ten items with ability in descending order from top to bottom and difficulty from left (0.1) to right (1.0) with 0.1 decreasing unit is illustrated to introduce the B and W indices. As can be seen, the first two persons were more guessing-leaned and the latter two were more careless-leaned. Furthermore, Person J succeeded with 2 hard items than Person H did (only 1 hard item). Person M missed 2 easy items than Person N did (only missed 1 easy item). Thus, there should be different kinds and levels of aberrances displayed among these four persons. As expected, Person H and Person J performed more surprising than Person M and Person N did and thus received higher B's (34 and 56 vs. 8 and 2, respectively), but performed less concerned than the latter two did with lower W's (2 and 8 vs. 56 and 34, respectively). Besides, for both missing first two easy items within ability levels, the more able Person M (total score = 8) received higher caution (W = 56) but less surprising (B = 8) than the less able Person J (total score = 2) did (W = 8, B = 56). As the matrix displayed, the B and W indices did reflect the variations of individuals' response patterns.
However, test length is always designed to be a manipulated variable when examining the power of an index. Many research had shown person fit indices might be influenced by the factor of test length on their detection rates of aberrant responses (Cui & Leighton, 2009;de La Torre & Deng, 2008;Karabatsos, 2003;Meijer, 1994;Meijer & Sijtsma, 2001). Almost consistent results showed that as test length increased, the detection rate always increased. However, rare studies concerned how work test length confounded the indices themselves. That is the persistency of an index against various test lengths. If an index itself was influenced by test length, it might be not adequate to examine the detection rate of misfit responses independently. High rate of detective accuracy might be due to the nature of test length increases, not due to the power of the index. Thus, the main purpose of this study was to examine test length effects on the BW aberrance indices so as to answer the question of whether the two indices can be robust against the influence of test length.

Method
Three conditions were designed in this study: test length, ability ratio, and error ratio. Four kinds of test lengths (25, 50,100, and 200) were used in this study. Ability ratio (t = T/K) was defined as the total person score, T, (sum of 1's) divided by test length K. There were ten categories coded from 1 to 10 for the ability ratios to represent different levels of ability: 0<t 1 <=0.1, 0.1<t 2 <=0.2,….., 0.8<t 9 =0.9, 0.9<t 10 <1.0. Similarly, error ratio was defined as the number of errors within ability level divided by test length (s = E/K). Note that the number of errors within person ability is the same as the number of errors beyond ability. Five categories of error ratios coded from 1 to 5 were classified to represent different levels of aberrances: 0<s 1 <=0.1, 0.1<s 2 <=0.2, 0.2<s 3 <=0.3, 0.3<s 4 <=0.4, 0.4<s 5 <=0.5. Finally, four 100-person times varying-item data matrices (100x25, 100x50, 100x100, and 100x200) were randomly generated and permuted 500 times for each data matrix through 20 repeats. Four kinds of statistical techniques, including partial correlation, nonlinear regression analysis, principal component analysis, and structural equation modelling, were conducted to analyze the effects of test length on the BW indices sequentially.

Relationship Investigation
An overview of the relationships between the B index, the W index, test length (K), ability ratio (T/K), and error ratio (E/K) is presented in Table 1. As can be seen in the lower-left triangle correlation matrix, almost all variables are significantly correlated with each other. This is especially true for the correlation between error ratio (E/K) and the W index, as well as the correlation between error ratio (E/K) and the B index (r = .773, and .797, respectively). However, these interrelations might be due to some common factors that influence their correlations. Thus, to obtain more accurate results, it is necessary to examine the partial correlations for these factors further or to filter specific effects from combined factors. Although the W index and the B index appear strongly correlated (r = .488, p< .01) in Table 1 and since the W index and the B index were positively correlated with E/K and T/K, it would be suspected that the pure relationship between these two indices might be shrunk by partial out the effect of error ratio and ability ratio. Results showed that the partial correlation between the W index and the B index is not significant (see Table 2, r = -.03, p = .243). This indicates that, after partialling out the effects of error ratio, ability ratio, and test length, there is no association between the W and the B indices. Note that the effect of test length is very slight. The p values only decrease 0.005 (.243-.248) after reducing one degree of freedom.

Nonlinear regression analysis
Since the B and W indices revealed a nonlinear relationship with ability ratio (T/K), it was proper first to posit a curvilinear model. To choose the curve fitting regressions for the B and the W, the R-square statistic that estimates the percent of variances explained by a specific model was used to evaluate the best fit model. Results showed two cubic fitting models providing the highest R squares for both B and W indices (R 2 = .284 and .288, respectively, both ps < .001) were best fitted. In addition, due to a linear relationships with error ratio and the intention to examine the effect of test length on the B and W indices, it is reasonable to combine a linear expression for error ratio (E/K) and test length (K) as well as previous cubic expression for ability ratio (T/K). As can be seen in Table 3, more than 76 percent of the variance of the B index can be predicted by the factors of error ratio (E/K) and ability ratio (T/K) in a nonlinear regression model. However, the predictor, test length (K), with very small regression coefficient (0.0002) contributed very slightly to the R-square value in the prediction of the B index (R 2 difference = 0.0001). It again verifies the previous discussion that the B index would be independent of test length (K) and supports the generalizability of the B index across various test lengths. Similarly, almost 73 percent of the variance of the W index can be predicted by the factors of error ratio (E/K) and ability ratio (T/K) in a nonlinear regression model. The predictor, test length (K), with very small regression coefficient (0.0005) contributed very slightly to the R-square value in the prediction of the W index (R 2 difference = 0.0005). This again verifies the previous discussion that the W index would be independent of test length (K) and supports the generalizability of the W index across various test lengths.   Table 4 presented the results of a principal factor analysis for the B and W indices, error ratio, ability ratio, and test length. Two factors with eigenvalues greater than 1 were extracted and rotated orthogonally. As can be seen, Component 1 appears to be errororiented by containing error ratio, the W index, and the B index. It also suggested that all three variables contribute to the concept of error. In addition, Component 2 appeared to be ability-oriented by containing ability ratio, also to a smaller W and B indices. Component 2 was bipolar with the W index and ability ratio (T/K) on the one side, and the B index on the other side. This bipolar property was consistent with D'Costa's (1993) findings, which indicated that a positive relationship existed between the W index and ability ratio, but a negative relationship existed between the B index and ability ratio. It is interesting that the B index and the W index contributed to both components simultaneously. This is reasonable because the W index and the B index measure aberrance (Component 1) and, at the same time, they are measuring a different aspect of aberrance (Component 2) based on ability. The relationship of variables can also be seen in Figure 1.

Structural equation modelling analysis
Another approach to examine an integral relationship of the various variables in this study was structural equation modelling analysis. A very good fitting model ( 2  = 1.368, p = .242) for these five variables was displayed in Figure 2. As can be seen in this model, approximately 76 percent of the variances of the B index and 72 percent of the variances of the W index can be predicted by the model. Due to the linear properties of SEM used in the study, the finding almost the same as that analyzed by the nonlinear regression models indicated the nonlinear effects contributed by ability ratio (T/K) were slight. Also note that a positive effect was contributed by ability ratio (T/K) on the W index and a negative effect on the B index (  = .35 and -.35, respectively, both ps < .05). This indicated that, given a certain error ratio, high-ability persons tended to show higher within-ability-concern aberrances and lower beyond-ability-surprise aberrances than lowability persons. On the other hand, error ratio (E/K), as expected, contributed the highest effects to both predicted variances (  = .80 and .76, respectively, both ps < .05). However, test length (K) contributed very slight effect on the B and W indices (  = .00 and .03, respectively). This again confirms that the B and W indices are independent of test length (K).
It is important to recognize that the effects of ability ratio (T/K) on both indices were not linear (see previous analysis) Thus, the previous structural equation model with linear prediction might not reflect correctly the true effects for the entire ability ratio (T/K) range. Therefore, it is necessary to reexamine the half-range effect of ability ratio (T/K) on both indices; or in other words, for a low-ability group and a high-ability group. The following paragraphs will explore these effects by using the same structural equation model for the low-ability group (T/K  5) and for the high-ability group (T/K  6).
Results showed both pretty good fits for the low-ability group ( 2  = .091, p = .764) and the high-ability group model ( 2  = .093, p = .760). The low-ability model predicted approximate 73 percent of the variances for the B index, and 72 percent of the variances for the W index, while the high-ability group predicted 76 percent of the variances for the B index and 68 percent of the variances for the W index. It is interesting to note that the effects contributed by error ratio (E/K) and ability ratio (T/K) on the W and the B index in low-ability group were opposite to those in high-ability group. Specifically, error ratio (E/K) contributed higher effects to the B index (  = .92, p < .05) than to the W index (  = .68, p < .05) in the low-ability group, while it contributed higher effects to the W index (  = .88, p < .05) than to the B index (  = .71, p < .05) in the high-ability group. This implies that the effect of number of errors on the B index was higher than that on the W index for low-ability persons, while the effect of number of errors on the W index was higher than that on the B index for high-ability persons. In other words, given the same number of within-ability error (or beyond-ability error), lowability persons tended to display more severe beyond-ability-surprise aberrances than high-ability persons, but less severe within-ability-concern aberrances than high-ability persons.

Conclusions
The main purpose of this study was to examine the robustness of test length effects on the BW aberrance indices. Three conditions were designed in this study: test length (K, including 25, 50,100, and 200 numbers of items), ability ratio (T/K, defined as the total person score divided by test length K), and error ratio (E/K, defined as the number of errors within ability level divided by test length). Four 100-person times varying-item data matrices (100x25, 100x50, 100x100, and 100x200) were randomly generated and permuted 500 times for each data matrix through 20 repeats. Results showed that after partialling out the factors of E/K and T/K, the effect of test length on the association between the two indices was very slight. In nonlinear regression analyses, E/K and T/K can predict high percent of the variances of the B index and that of the W index, respectively, but test length with both very slight contributions on them. Furthermore, a very good model fit generated from SEM analyses also showed the effect of test length on the B and W indices were very tiny. All these pieces of evidence seemed to endorse the B and W indices were robust against test length.
Since all findings showed the robustness of the B and W indices against the influences of test length, the two aberrance indices seemed to possess nice internal quality themselves. This indicated the B and W indices can be used in small short or long test length situation, e.g., in a class assessment situation or in a standardized achievement test situation to detect whether an individual's response pattern aberrant or not. On the other hand, it is also interesting that the B index and the W index contributed to both components simultaneously. The W index and the B index seemed to measure the "errororiented" aberrance and, at the same time, they are measuring a different type of "abilityoriented" aberrance based on an individual ability level. For future study suggestions, researchers might compare the powers of the B and W indices with other person fit indices in detecting individuals' aberrant responses by really conditioning the factor of test length. For practical suggestions, that fact of students with high vales of the B and W indices indicates their response patterns on a certain test might be confounded with guessing and carelessness. These students might not really understand what they had learned or might had some unique originalities on a certain concept. But they all indicated need to be concerned furthermore.