Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Sign In to gain access to subscriptions and/or personal tools.
Language Testing
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Eckes, T.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Rater types in writing performance assessments: A classification approach to rater variability

Thomas Eckes

TestDaF Institute, Germany, thomas.eckes{at}testdaf.de

Research on rater effects in language performance assessments has provided ample evidence for a considerable degree of variability among raters. Building on this research, I advance the hypothesis that experienced raters fall into types or classes that are clearly distinguishable from one another with respect to the importance they attach to scoring criteria. To examine the rater type hypothesis, I asked 64 raters actively involved in scoring examinee writing performance on a large-scale assessment instrument to indicate on a four-point scale how much importance they would attach to each of nine routinely used criteria. The criteria covered various performance aspects, such as fluency, completeness, and grammatical correctness. In a preliminary step, many-facet Rasch analysis revealed that raters differed significantly in their views on the importance of the various criteria. A two-mode clustering technique yielded a joint classification of raters and criteria, with six rater types emerging from the analysis. Each of these types was characterized by a distinct scoring profile, indicating that raters were far from dividing their attention evenly among the set of criteria. Moreover, rater background variables were shown to partially account for the scoring profile differences. The findings have implications for assessing the quality of large-scale rater-mediated language testing, rater monitoring, and rater training.

Key Words: classification • large-scale performance assessment • rater cognition • rater monitoring • rater variability

Language Testing, Vol. 25, No. 2, 155-185 (2008)
DOI: 10.1177/0265532207086780


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?