Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to sign up for SAGE Journal Email Alerts today!

Sign In to gain access to subscriptions and/or personal tools.
Language Testing
This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Elder, C.
Right arrow Articles by von Randow, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Evaluating rater responses to an online training program for L2 writing assessment

Catherine Elder

University of Melbourne, caelder{at}unimelb.edu.au

Gary Barkhuizen

University of Auckland

Ute Knoch

University of Auckland

Janet von Randow

University of Auckland

The use of online rater self-training is growing in popularity and has obvious practical benefits, facilitating access to training materials and rating samples and allowing raters to reorient themselves to the rating scale and self monitor their behaviour at their own convenience. However there has thus far been little research into rater attitudes to training via this modality and its effectiveness in enhancing levels of inter- and intra-rater agreement.

The current study explores these issues in relation to an analytically-scored academic writing task designed to diagnose undergraduates’ English learning needs. 8 ESL raters scored a number of pre-rated benchmark writing samples online and received immediate feedback in the form of a discrepancy score indicating the gap between their own rating of the various categories of the rating scale and the official ratings assigned to the benchmark writing samples.

A batch of writing samples was rated twice (before and after participating in the online training) by each rater and Multifaceted Rasch analyses were used to compare levels of rater agreement and rater bias (on each analytic rating category). Raters’ views regarding the effectiveness of the training were also canvassed.

While findings revealed limited overall gains in reliability, there was considerable individual variation in receptiveness to the training input. The paper concludes with suggestions for refining the online training program and for further research into factors influencing rater responsiveness.

References

  • Barritt, L., Stock, P. and Clark, F. 1986: Researching practice: evaluating assessment essays . College Composition and Communication 37, 315-327 .
  • Cason, G. J. and Cason, C. L. 1984: A deterministic theory of clinical performance rating . Evaluation and the Health Professions 7, 221-247 .[Abstract/Free Full Text]
  • Charney, D. 1984: The validity of using holistic scoring to evaluate writing: a critical overview . Research in the Teaching of English 18, 65-81 .
  • Congdon, P. J. and McQueen, J. 2000. The stability of rater severity in large-scale assessment programs . Journal of Educational Measurement 37, 163-178 .[CrossRef]
  • Elder, C., McNamara, T. and Congdon, P. 2003. Rasch techniques for detecting bias in performance of native and non-native speakers on a test of academic English . Journal of Applied Measurement 4, 2-2, 181-197 .[Medline] [Order article via Infotrieve]
  • Hamilton, J., Reddel, S. and Spratt, M. 2001: Teachers’ perceptions of online rater training and monitoring . System 29, 505-520 .[CrossRef]
  • Huot, B. 1990: Reliability, validity, and holistic scoring: What we know, what we need to know . College Composition and Communication 41, 201-213 .
  • Kenyon, D. and Stansfield, C. W. 1993: Evaluating the efficacy of rater self-training. Washington, DC: Center for Applied Linguistics .
  • Linacre, J. M. 1989: Many-faceted Rasch measurement. Chicago, IL: MESA Press .
  • Linacre, J. M. and Wright, B. D. 1993: A User’s Guide to FACETS (Version 2.6). Chicago, IL: MESA Press .
  • Lumley, T. and McNamara, T. 1995: Rater characteristics and rater bias: implications for training . Language Testing 12, 54-71 .[Abstract/Free Full Text]
  • Lunz, M. E. and Stahl, J. A. 1990: Judge consistency and severity across grading periods . Evaluation and the Health Professions 13, 425-444 .[Abstract/Free Full Text]
  • Lunz, M. E., Wright, B. D. and Linacre, J. M. 1990: Measuring the impact of judge severity on examination scores . Applied Measurement in Education 3, 331-345 .
  • McIntyre, P.N. 1993: The importance and effectiveness of moderation training on the reliability of teachers’ assessment of ESL writing samples. Unpublished MA thesis, University of Melbourne.
  • McNamara, T. 1996: Measuring second language performance. Harlow, Essex: Pearson Education .
  • Moore, T. and Morton, J. 1999: Authenticity in the IELTS Academic Module Writing Test: A comparative study of Task 2 items and university assignments. IELTS Research Reports 2. Canberra: IELTS Australia
  • Myford, C. M. and Wolfe, E. W. 2000: Monitoring sources of variability within the Test of Spoken English Assessment System. TOEFL Research Report 65. Princeton, NJ: Educational Testing Service .
  • Reed, D. J. and Cohen, A. D. 2001: Revisiting raters and ratings in oral language assessment. In Elder, C., Brown, A., Grove, E., Hill, K., Iwashita, N., Lumley, T., McNamara, T. and O’Loughlin, K., editors, Experimenting with uncertainty. Essays in honour of Alan Davies. Cambridge: Cambridge University Press .
  • Rosenfeld, M., Leung, S. and Oltman, P. 2001: The reading, writing and listening tasks important for academic success at undergraduate and graduate levels. TOEFL Monograph Series 21. Princeton, NJ: Educational Testing Service .
  • Smith, S.D. 2003: Standards for academic writing: are they common within and across disciplines? Unpublished Masters thesis, University of Auckland.
  • Stahl, A. and Lunz, M. E. 1992: Judge performance reports: media and message . Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
  • Weigle, S.C. 1994a: Effects of training on raters of English as a second language compositions: quantitative and qualitative approaches. Unpublished PhD dissertation, University of California, Los Angeles.
  • Weigle, S. C. 1994b: Effects of training on raters of ESL compositions . Language Testing 11, 197-223 .[Abstract/Free Full Text]
  • Weigle, S. C. 1998: Using FACETS to model rater training effects . Language Testing 15, 263-287 .[Abstract/Free Full Text]
  • Wigglesworth, G. 1993: Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction . Language Testing 10, 305-323 .[Abstract/Free Full Text]

Language Testing, Vol. 24, No. 1, 37-64 (2007)
DOI: 10.1177/0265532207071511


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
Language TestingHome page
U. Knoch
Diagnostic assessment of writing: A comparison of two rating scales
Language Testing, April 1, 2009; 26(2): 275 - 304.
[Abstract] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Elder, C.
Right arrow Articles by von Randow, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?