Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to sign up for SAGE Journal Email Alerts today!

Sign In to gain access to subscriptions and/or personal tools.
Language Testing
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Bachman, L. F.
Right arrow Articles by Mason, M.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Investigating variability in tasks and rater judgements in a performance test of foreign language speaking

Lyle F. Bachman

University of California, Los Angeles and The Chinese University of Hong Kong

Brian K. Lynch

University of Melbourne

Maureen Mason

University of California, Los Angeles

Much of the recent debate that has surrounded the development and use of 'performance', or 'communicative' language tests has focused on a supposed trade-off between two sets of desirable qualities: correspondence between test tasks and test performance to nontest language use for content relevance; and reliability of scores derived from test performance. One area that has been of particular concern with performance tests is the potential variability in tasks and rater judgements, and this has been investigated in the language testing literature with two complementary approaches: generalizability the ory and many faceted Rasch modelling. GENOVA, which performs general izability theory analyses, estimates the relative contribution of variation in test tasks and rater judgements to variation in test scores. FACETS, which performs many faceted Rasch modelling, estimates differences in task difficulty and rater severity, and adjusts ability estimates of test takers, taking these differences into account. In this article we first discuss the design and development of a foreign language (Spanish) test battery that was designed for two purposes: first, to place University of California Education Abroad students into programmes at universities abroad that are appropriate for their level of language ability, and secondly to provide diagnostic information that will be useful for designing appropriate teaching and learning pro grammes for prospective education abroad students. The test battery consists of four subtests: reading, listening and note-taking, speaking, and writing. All subtests share a common theme or topic, and are interdependent. We then discuss the results of the GENOVA and FACETS analyses of the speaking subtest, based on a full field trial with a group of University of California undergraduate students who had been selected for participation in the Education Abroad Program. Finally, we discuss the implications of these results for the use of G-theory and many faceted Rasch modelling for the development of performance tests of foreign language ability.

Language Testing, Vol. 12, No. 2, 238-257 (1995)
DOI: 10.1177/026553229501200206


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?


This article has been cited by other articles:


Home page
Language TestingHome page
T. Eckes
Rater types in writing performance assessments: A classification approach to rater variability
Language Testing, April 1, 2008; 25(2): 155 - 185.
[Abstract] [PDF]


Home page
Language TestingHome page
Y. Sawaki
Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite
Language Testing, July 1, 2007; 24(3): 355 - 390.
[Abstract] [PDF]


Home page
Language TestingHome page
Xiaoming Xi
Evaluating analytic scoring for the TOEFL(R) Academic Speaking Test (TAST) for operational use
Language Testing, April 1, 2007; 24(2): 251 - 286.
[Abstract] [PDF]


Home page
Language TestingHome page
Y.-W. Lee
Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks
Language Testing, April 1, 2006; 23(2): 131 - 166.
[Abstract] [PDF]


Home page
Language TestingHome page
R. Schoonen
Generalizability of writing scores: an application of structural equation modeling
Language Testing, January 1, 2005; 22(1): 1 - 30.
[Abstract] [PDF]


Home page
Language TestingHome page
Y. Kozaki
Using GENOVA and FACETS to set multiple standards on performance assessment for certification in medical translation from Japanese into English
Language Testing, January 1, 2004; 21(1): 1 - 27.
[Abstract] [PDF]


Home page
Language TestingHome page
G. Fulcher and R. M. Reiter
Task difficulty in speaking tests
Language Testing, July 1, 2003; 20(3): 321 - 344.
[Abstract] [PDF]


Home page
Language TestingHome page
A. Brown
Interviewer variation and the co-construction of speaking proficiency
Language Testing, January 1, 2003; 20(1): 1 - 25.
[Abstract] [PDF]


Home page
Language TestingHome page
W. J. Bonk and G. J. Ockey
A many-facet Rasch analysis of the second language group oral discussion task
Language Testing, January 1, 2003; 20(1): 89 - 110.
[Abstract] [PDF]


Home page
Language TestingHome page
K. Kondo-Brown
A FACETS analysis of rater bias in measuring Japanese second language writing performance
Language Testing, January 1, 2002; 19(1): 3 - 31.
[Abstract] [PDF]


Home page
Language TestingHome page
L. F. Bachman
Modern language testing at the turn of the century: assuring that what we count counts
Language Testing, January 1, 2000; 17(1): 1 - 42.
[Abstract] [PDF]


Home page
Language TestingHome page
J. A. Upshur and C. E. Turner
Systematic effects in the rating of second-language speaking ability: test method and learner discourse
Language Testing, January 1, 1999; 16(1): 82 - 111.
[Abstract] [PDF]


Home page
Language TestingHome page
B. K. Lynch and T. F. McNamara
Using G-theory and Many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants
Language Testing, April 1, 1998; 15(2): 158 - 180.
[Abstract] [PDF]


Home page
Language TestingHome page
R. Schoonen, M. Vergeer, and M. Eiting
The assessment of writing ability: expert readers versus lay readers
Language Testing, July 1, 1997; 14(2): 157 - 184.
[Abstract] [PDF]