|
Sign In to gain access to subscriptions and/or personal tools.
|
Using FACETS to model rater training effects
Sara Cushing Weigle
Department of Applied Linguistics and ESL, Georgia State University, sweigle{at}gsu.edu
This article describes a study conducted to explore differences in rater severity and consistency among inexperienced and experienced raters both before and after rater training. Sixteen raters (eight experienced and eight inexperienced) rated overlapping subsets of essays from a total sample of 60 essays before and after rater training in the context of an operational administration of UCLAs English as a Second Language Placement Examination (ESLPE). A three-part scale was used, comprising content, rhetorical control, and language. Ratings were analysed using FACETS, a multi-faceted Rasch analysis program that provides estimates of rater severity on a linear scale as well as fit statistics, which are indicators of rater consistency. The analysis showed that the inexperienced raters tended to be both more severe and less consistent in their ratings than the experienced raters before training. After training, the differences between the two groups of raters were less pronounced; however, significant differences in severity were still found among raters, although consistency had improved for most raters. These results provide support for the notion that rater training is more successful in helping raters give more predictable scores (i.e., intra-rater reliability) than in getting them to give identical scores (i.e., inter-rater reliability).
Language Testing, Vol. 15, No. 2,
263-287 (1998)
DOI: 10.1177/026553229801500205

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
This article has been cited by other articles:

|
 |

|
 |
 
J. S. Johnson and G. S. Lim
The influence of rater language background on writing performance assessment
Language Testing,
October 1, 2009;
26(4):
485 - 505.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Gebril
Score generalizability of academic writing tasks: Does one test method fit it all?
Language Testing,
October 1, 2009;
26(4):
507 - 531.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Schaefer
Rater bias patterns in an EFL writing assessment
Language Testing,
October 1, 2008;
25(4):
465 - 493.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Saito
EFL classroom peer assessment: Training effects on rating and commenting
Language Testing,
October 1, 2008;
25(4):
553 - 581.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Eckes
Rater types in writing performance assessments: A classification approach to rater variability
Language Testing,
April 1, 2008;
25(2):
155 - 185.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Sawaki
Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite
Language Testing,
July 1, 2007;
24(3):
355 - 390.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Usman Erdosy
Book review: Weigle, S.C. 2002: Assessing writing. Cambridge, UK: Cambridge University Press. xiv, 268 pp. ISBN: 0521780276 (cloth) 0521784468 (paperback)
Language Testing,
July 1, 2007;
24(3):
445 - 451.
[PDF]
|
 |
|

|
 |

|
 |
 
C. Elder, G. Barkhuizen, U. Knoch, and J. von Randow
Evaluating rater responses to an online training program for L2 writing assessment
Language Testing,
January 1, 2007;
24(1):
37 - 64.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Breland, Y.-W. Lee, and E. Muraki
Comparability of TOEFL CBT Essay Prompts: Response-Mode Analyses
Educational and Psychological Measurement,
August 1, 2005;
65(4):
577 - 595.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Kozaki
Using GENOVA and FACETS to set multiple standards on performance assessment for certification in medical translation from Japanese into English
Language Testing,
January 1, 2004;
21(1):
1 - 27.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
W. J. Bonk and G. J. Ockey
A many-facet Rasch analysis of the second language group oral discussion task
Language Testing,
January 1, 2003;
20(1):
89 - 110.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Lumley
Assessment criteria in a large-scale writing test: what do they really mean to the raters?
Language Testing,
July 1, 2002;
19(3):
246 - 276.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Kondo-Brown
A FACETS analysis of rater bias in measuring Japanese second language writing performance
Language Testing,
January 1, 2002;
19(1):
3 - 31.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
L. F. Bachman
Modern language testing at the turn of the century: assuring that what we count counts
Language Testing,
January 1, 2000;
17(1):
1 - 42.
[Abstract]
[PDF]
|
 |
|
|
|