|
Sign In to gain access to subscriptions and/or personal tools.
|
Rater characteristics and rater bias: implications for training
Tom Lumley
NLLIA Lanauaae Testing Centre, University of Melbourne
T.F. McNamara
NLLIA Lanauaae Testing Centre, University of Melbourne
Recent developments in multifaceted Rasch measurement (Linacre, 1989) have made possible new kinds of investigation of aspects (or 'facets') of performance assessments. Relevant characteristics of such facets (for exam ple, the relative harshness of individual raters, the relative difficulty of test tasks) are modelled and reflected in the resulting person ability measures.
In addition, bias analyses, that is interactions between elements of any facet, can also be analysed. (For the facet 'person', an element is an individual candidate; for the facet 'rater', an element is an individual judge, and so on.) This permits investigation of the way a particular aspect of the test situation (type of candidate, choice of prompt, etc.) may elicit a consistently biased pattern of responses from a rater.
The purpose of the research is to investigate the use of these analytical techniques in rater training for the speaking subtest of the Occupational English Test (OET), a specific-purpose ESL performance test for health professionals. The test involves a role-play based, profession-specific inter action, involving some degree of choice of role-play material. Data are presented from two rater training sessions separated by an 18-month interval and a subsequent operational test administration session. The analysis is used to establish 1) consistency of rater characteristics over different occasions; and 2) rater bias in relation to occasion of rating. The study thus addresses the question of the stability of rater characteristics, which has practical implications in terms of the accreditation of raters and the requirements of data analysis following test administration sessions. It also has research implications concerning the role of multifaceted Rasch measurement in understanding rater behaviour in performance assessment contexts.
Language Testing, Vol. 12, No. 1,
54-71 (1995)
DOI: 10.1177/026553229501200104

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
This article has been cited by other articles:

|
 |

|
 |
 
J. S. Johnson and G. S. Lim
The influence of rater language background on writing performance assessment
Language Testing,
October 1, 2009;
26(4):
485 - 505.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Matsuno
Self-, peer-, and teacher-assessments in Japanese university EFL writing classrooms
Language Testing,
January 1, 2009;
26(1):
075 - 100.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
E. Schaefer
Rater bias patterns in an EFL writing assessment
Language Testing,
October 1, 2008;
25(4):
465 - 493.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
H. Saito
EFL classroom peer assessment: Training effects on rating and commenting
Language Testing,
October 1, 2008;
25(4):
553 - 581.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
T. Eckes
Rater types in writing performance assessments: A classification approach to rater variability
Language Testing,
April 1, 2008;
25(2):
155 - 185.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
L. Llosa
Validating a standards-based classroom assessment of English proficiency: A multitrait-multimethod approach
Language Testing,
October 1, 2007;
24(4):
489 - 515.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
M. East
Bilingual dictionaries in tests of L2 writing proficiency: do they make a difference?
Language Testing,
July 1, 2007;
24(3):
331 - 353.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Sawaki
Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite
Language Testing,
July 1, 2007;
24(3):
355 - 390.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
C. Elder, G. Barkhuizen, U. Knoch, and J. von Randow
Evaluating rater responses to an online training program for L2 writing assessment
Language Testing,
January 1, 2007;
24(1):
37 - 64.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
B. Garrett, E. Towles, H. Kleinert, and J. Kearns
Portfolios in Large-Scale Alternate Assessment Systems: Frameworks for Reliability
Assessment for Effective Intervention,
January 1, 2003;
28(2):
17 - 27.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
W. J. Bonk and G. J. Ockey
A many-facet Rasch analysis of the second language group oral discussion task
Language Testing,
January 1, 2003;
20(1):
89 - 110.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Kondo-Brown
A FACETS analysis of rater bias in measuring Japanese second language writing performance
Language Testing,
January 1, 2002;
19(1):
3 - 31.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
L. F. Bachman
Modern language testing at the turn of the century: assuring that what we count counts
Language Testing,
January 1, 2000;
17(1):
1 - 42.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
J. A. Upshur and C. E. Turner
Systematic effects in the rating of second-language speaking ability: test method and learner discourse
Language Testing,
January 1, 1999;
16(1):
82 - 111.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
B. K. Lynch and T. F. McNamara
Using G-theory and Many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants
Language Testing,
April 1, 1998;
15(2):
158 - 180.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Brindley
Outcomes-based assessment and reporting in language learning programmes: a review of the issues
Language Testing,
January 1, 1998;
15(1):
45 - 85.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Schoonen, M. Vergeer, and M. Eiting
The assessment of writing ability: expert readers versus lay readers
Language Testing,
July 1, 1997;
14(2):
157 - 184.
[Abstract]
[PDF]
|
 |
|
|
|