|
Sign In to gain access to subscriptions and/or personal tools.
|
Investigating variability in tasks and rater judgements in a performance test of foreign language speaking
Lyle F. Bachman
University of California, Los Angeles and The Chinese University of Hong Kong
Brian K. Lynch
University of Melbourne
Maureen Mason
University of California, Los Angeles
Much of the recent debate that has surrounded the development and use of 'performance', or 'communicative' language tests has focused on a supposed trade-off between two sets of desirable qualities: correspondence between test tasks and test performance to nontest language use for content relevance; and reliability of scores derived from test performance. One area that has been of particular concern with performance tests is the potential variability in tasks and rater judgements, and this has been investigated in the language testing literature with two complementary approaches: generalizability the ory and many faceted Rasch modelling. GENOVA, which performs general izability theory analyses, estimates the relative contribution of variation in test tasks and rater judgements to variation in test scores. FACETS, which performs many faceted Rasch modelling, estimates differences in task difficulty and rater severity, and adjusts ability estimates of test takers, taking these differences into account. In this article we first discuss the design and development of a foreign language (Spanish) test battery that was designed for two purposes: first, to place University of California Education Abroad students into programmes at universities abroad that are appropriate for their level of language ability, and secondly to provide diagnostic information that will be useful for designing appropriate teaching and learning pro grammes for prospective education abroad students. The test battery consists of four subtests: reading, listening and note-taking, speaking, and writing. All subtests share a common theme or topic, and are interdependent. We then discuss the results of the GENOVA and FACETS analyses of the speaking subtest, based on a full field trial with a group of University of California undergraduate students who had been selected for participation in the Education Abroad Program. Finally, we discuss the implications of these results for the use of G-theory and many faceted Rasch modelling for the development of performance tests of foreign language ability.
Language Testing, Vol. 12, No. 2,
238-257 (1995)
DOI: 10.1177/026553229501200206

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati Twitter What's this?
This article has been cited by other articles:

|
 |

|
 |
 
T. Eckes
Rater types in writing performance assessments: A classification approach to rater variability
Language Testing,
April 1, 2008;
25(2):
155 - 185.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Sawaki
Construct validation of analytic rating scales in a speaking assessment: Reporting a score profile and a composite
Language Testing,
July 1, 2007;
24(3):
355 - 390.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
Xiaoming Xi
Evaluating analytic scoring for the TOEFL(R) Academic Speaking Test (TAST) for operational use
Language Testing,
April 1, 2007;
24(2):
251 - 286.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
Y.-W. Lee
Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks
Language Testing,
April 1, 2006;
23(2):
131 - 166.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Schoonen
Generalizability of writing scores: an application of structural equation modeling
Language Testing,
January 1, 2005;
22(1):
1 - 30.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
Y. Kozaki
Using GENOVA and FACETS to set multiple standards on performance assessment for certification in medical translation from Japanese into English
Language Testing,
January 1, 2004;
21(1):
1 - 27.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
G. Fulcher and R. M. Reiter
Task difficulty in speaking tests
Language Testing,
July 1, 2003;
20(3):
321 - 344.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Brown
Interviewer variation and the co-construction of speaking proficiency
Language Testing,
January 1, 2003;
20(1):
1 - 25.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
W. J. Bonk and G. J. Ockey
A many-facet Rasch analysis of the second language group oral discussion task
Language Testing,
January 1, 2003;
20(1):
89 - 110.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Kondo-Brown
A FACETS analysis of rater bias in measuring Japanese second language writing performance
Language Testing,
January 1, 2002;
19(1):
3 - 31.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
L. F. Bachman
Modern language testing at the turn of the century: assuring that what we count counts
Language Testing,
January 1, 2000;
17(1):
1 - 42.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
J. A. Upshur and C. E. Turner
Systematic effects in the rating of second-language speaking ability: test method and learner discourse
Language Testing,
January 1, 1999;
16(1):
82 - 111.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
B. K. Lynch and T. F. McNamara
Using G-theory and Many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants
Language Testing,
April 1, 1998;
15(2):
158 - 180.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
R. Schoonen, M. Vergeer, and M. Eiting
The assessment of writing ability: expert readers versus lay readers
Language Testing,
July 1, 1997;
14(2):
157 - 184.
[Abstract]
[PDF]
|
 |
|
|
|