Explanatory Modelling of Rater Behaviour

Currently, comparative judgment as a means of “scoring” assessment responses is attracting a lot of attention due to its purported ability to remove rater effects from the resulting scores.

Comparative Judgment scoring (CJ) engages raters in direct comparisons of two responses, and the rater simply indicates which of the responses is “better.” There are several shortcomings of this approach, including the fact that a single response may need to be read several times before it can be scaled, and comparative judgments result in a scaling of scores that is continuous and contains no natural demarcations without further decision making (e.g., a standard setting process). This project constitutes an initial look into how different comparative judgments are from the more typical categorization scoring employed in many testing programs. In addition, the data serve the secondary purpose of determining the degree to which automated scoring engines that are trained on comparative judgment scaling versus categorization result in similar scoring model parameterization.

External project members include Dr Edward Wolfe (ETS).