Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Sign In to gain access to subscriptions and/or personal tools.
Language Testing
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Weigle, S. C.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Using FACETS to model rater training effects

Sara Cushing Weigle

Department of Applied Linguistics and ESL, Georgia State University, sweigle{at}gsu.edu

This article describes a study conducted to explore differences in rater severity and consistency among inexperienced and experienced raters both before and after rater training. Sixteen raters (eight experienced and eight inexperienced) rated overlapping subsets of essays from a total sample of 60 essays before and after rater training in the context of an operational administration of UCLA’s English as a Second Language Placement Examination (ESLPE). A three-part scale was used, comprising content, rhetorical control, and language. Ratings were analysed using FACETS, a multi-faceted Rasch analysis program that provides estimates of rater severity on a linear scale as well as fit statistics, which are indicators of rater consistency. The analysis showed that the inexperienced raters tended to be both more severe and less consistent in their ratings than the experienced raters before training. After training, the differences between the two groups of raters were less pronounced; however, significant differences in severity were still found among raters, although consistency had improved for most raters. These results provide support for the notion that rater training is more successful in helping raters give more predictable scores (i.e., intra-rater reliability) than in getting them to give identical scores (i.e., inter-rater reliability).

Language Testing, Vol. 15, No. 2, 263-287 (1998)
DOI: 10.1177/026553229801500205


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
Language TestingHome page
J. S. Johnson and G. S. Lim
The influence of rater language background on writing performance assessment
Language Testing, October 1, 2009; 26(4): 485 - 505.
[Abstract] [PDF]


Home page
Language TestingHome page
A. Gebril
Score generalizability of academic writing tasks: Does one test method fit it all?
Language Testing, October 1, 2009; 26(4): 507 - 531.
[Abstract] [PDF]


Home page
Language TestingHome page
E. Schaefer
Rater bias patterns in an EFL writing assessment
Language Testing, October 1, 2008; 25(4): 465 - 493.
[Abstract] [PDF]


Home page
Language TestingHome page
H. Saito
EFL classroom peer assessment: Training effects on rating and commenting
Language Testing, October 1, 2008; 25(4): 553 - 581.
[Abstract] [PDF]


Home page
Language TestingHome page
T. Eckes
Rater types in writing performance assessments: A classification approach to rater variability
Language Testing, April 1, 2008; 25(2): 155 - 185.
[Abstract] [PDF]


Home page
Language TestingHome page
Y. Sawaki
Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite
Language Testing, July 1, 2007; 24(3): 355 - 390.
[Abstract] [PDF]


Home page
Language TestingHome page
M. Usman Erdosy
Book review: Weigle, S.C. 2002: Assessing writing. Cambridge, UK: Cambridge University Press. xiv, 268 pp. ISBN: 0521780276 (cloth) 0521784468 (paperback)
Language Testing, July 1, 2007; 24(3): 445 - 451.
[PDF]


Home page
Language TestingHome page
C. Elder, G. Barkhuizen, U. Knoch, and J. von Randow
Evaluating rater responses to an online training program for L2 writing assessment
Language Testing, January 1, 2007; 24(1): 37 - 64.
[Abstract] [PDF]


Home page
Educational and Psychological MeasurementHome page
H. Breland, Y.-W. Lee, and E. Muraki
Comparability of TOEFL CBT Essay Prompts: Response-Mode Analyses
Educational and Psychological Measurement, August 1, 2005; 65(4): 577 - 595.
[Abstract] [PDF]


Home page
Language TestingHome page
Y. Kozaki
Using GENOVA and FACETS to set multiple standards on performance assessment for certification in medical translation from Japanese into English
Language Testing, January 1, 2004; 21(1): 1 - 27.
[Abstract] [PDF]


Home page
Language TestingHome page
W. J. Bonk and G. J. Ockey
A many-facet Rasch analysis of the second language group oral discussion task
Language Testing, January 1, 2003; 20(1): 89 - 110.
[Abstract] [PDF]


Home page
Language TestingHome page
T. Lumley
Assessment criteria in a large-scale writing test: what do they really mean to the raters?
Language Testing, July 1, 2002; 19(3): 246 - 276.
[Abstract] [PDF]


Home page
Language TestingHome page
K. Kondo-Brown
A FACETS analysis of rater bias in measuring Japanese second language writing performance
Language Testing, January 1, 2002; 19(1): 3 - 31.
[Abstract] [PDF]


Home page
Language TestingHome page
L. F. Bachman
Modern language testing at the turn of the century: assuring that what we count counts
Language Testing, January 1, 2000; 17(1): 1 - 42.
[Abstract] [PDF]