|
Sign In to gain access to subscriptions and/or personal tools.
|
Assessment criteria in a large-scale writing test: what do they really mean to the raters?
Tom Lumley
Hong Kong Polytechnic University, egluml{at}polyu.edu.hk
The process of rating written language performance is still not well understood, despite a body of work investigating this issue over the last decade or so (e.g., Cumming, 1990; Huot, 1990; Vaughan, 1991; Weigle, 1994a; Milanovic et al., 1996). The purpose of this study is to investigate the process by which raters of texts written by ESL learners make their scoring decisions using an analytic rating scale designed for multiple test forms. The context is the Special Test of English Proficiency (step), which is used by the Australian government to assist in immigration decisions. Four trained, experienced and reliable step raters took part in the study, providing scores for two sets of 24 texts. The first set was scored as in an operational rating session. Raters then provided think-aloud protocols describing the rating process as they rated the second set. A coding scheme developed to describe the think-aloud data allowed analysis of the sequence of rating, the interpretations the raters made of the scoring categories in the analytic rating scale, and the difficulties raters faced in rating.
Data show that although raters follow a fundamentally similar rating process in three stages, the relationship between scale contents and text quality remains obscure. The study demonstrates that the task raters face is to reconcile their impression of the text, the specific features of the text, and the wordings of the rating scale, thereby producing a set of scores. The rules and the scale do not cover all eventualities, forcing the raters to develop various strategies to help them cope with problematic aspects of the rating process. In doing this they try to remain close to the scale, but are also heavily influenced by the complex intuitive impression of the text obtained when they first read it. This sets up a tension between the rules and the intuitive impression, which raters resolve by what is ultimately a somewhat indeterminate process. In spite of this tension and indeterminacy, rating can succeed in yielding consistent scores provided raters are supported by adequate training, with additional guidelines to assist them in dealing with problems. Rating requires such constraining procedures to produce reliable measurement.
Language Testing, Vol. 19, No. 3,
246-276 (2002)
DOI: 10.1191/0265532202lt230oa

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
This article has been cited by other articles:

|
 |

|
 |
 
A. Gebril
Score generalizability of academic writing tasks: Does one test method fit it all?
Language Testing,
October 1, 2009;
26(4):
507 - 531.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Yu
Lexical Diversity in Writing and Speaking Task Performances
Applied Linguistics,
June 4, 2009;
(2009)
amp024v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
U. Knoch
Diagnostic assessment of writing: A comparison of two rating scales
Language Testing,
April 1, 2009;
26(2):
275 - 304.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Schaefer
Rater bias patterns in an EFL writing assessment
Language Testing,
October 1, 2008;
25(4):
465 - 493.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Saito
EFL classroom peer assessment: Training effects on rating and commenting
Language Testing,
October 1, 2008;
25(4):
553 - 581.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Eckes
Rater types in writing performance assessments: A classification approach to rater variability
Language Testing,
April 1, 2008;
25(2):
155 - 185.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Cumming
Book reviews: Lumley, T. 2005: Assessing second language writing: the rater's perspective. Frankfurt: Peter Lang (Volume 3, Language Testing and Evaluation Series, edited by Rudiger Grotjahn and Gunther Sigott). 368 pp. ISBN 3-631-53327-6 US-ISBN 0-8204-7655-2 US$62.95
Language Testing,
April 1, 2007;
24(2):
287 - 291.
[PDF]
|
 |
|

|
 |

|
 |
 
H. Breland, Y.-W. Lee, and E. Muraki
Comparability of TOEFL CBT Essay Prompts: Response-Mode Analyses
Educational and Psychological Measurement,
August 1, 2005;
65(4):
577 - 595.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Schoonen
Generalizability of writing scores: an application of structural equation modeling
Language Testing,
January 1, 2005;
22(1):
1 - 30.
[Abstract]
[PDF]
|
 |
|
|
|