Читать книгу The Concise Encyclopedia of Applied Linguistics - Carol A. Chapelle - Страница 147
Rating Scales and Scale Descriptors
ОглавлениеAssessment of speaking requires assigning numbers to the characteristics of the speech sample in a systematic fashion through the use of a scale. A scale represents the range of values associated with particular levels of performance, and scaling rules represent the relationship between the characteristic of interest and the value assigned (Crocker & Algina, 1986). The use of a scale for measurement is more intuitively clear in domains apart from language ability. For example, we can measure weight very accurately.
Measurement of a speaking performance, however, requires a different kind of scale, such as those used in certain sports competitions (see Spolsky, 1995) where the quality of performance is based on rank. There is no equal‐interval unit of measurement comparable to ounces or pounds that allows the precise measurement of a figure skating performance. Likewise, assessing speaking ranks students into ordinal categories (often referred to as vertical categories) similar to bronze, silver, and gold; A2, B2, C2; or beginning, intermediate, and advanced.
The global assessment of performance is associated with holistic scales where abilities are represented by level descriptors comprised of a qualitative summary of the raters' observations. Benchmark performances are selected to exemplify the levels and their descriptors. Scale descriptors are typically associated with, but not limited to: pronunciation (focusing on segmentals); phonological control (focusing on suprasegmentals); grammar/accuracy (morphology, syntax, and usage); fluency (speed and pausing); vocabulary (range and idiomaticity); coherence; and organization. If the assessment involves evaluation of interaction, the following may also be included: turn‐taking strategies, cooperative strategies, and asking for or providing clarification.
Holistic vertical indicators, even when accompanied by scale descriptors and benchmarks, may not be sufficient for making instructional or placement decisions. In such cases, an analytic rating, discourse analysis, or extraction of temporal measures of fluency may be conducted to explicate components of the examinee's performance. The specific components chosen, which can include any of the same aspects of performance used in holistic scale descriptors, depend on the purpose of the test, the needs of the score users, and the interests of the researcher.
Score users are central in Alderson's (1991) distinction among three types of scales: constructor oriented, assessor oriented, and user oriented. The language used to describe abilities tends to focus on the positive aspects of performance in user‐oriented and constructor‐oriented scales, where the former may focus on likely behaviors at a given level and the latter may focus on particular tasks associated with a curriculum or course of instruction. Assessor‐oriented scales shift the focus from the learner and objectives of learning toward the rater; scale descriptors are often negatively worded and focus on the perceptions of the rater and are often more useful for screening purposes.
From another perspective, scales for speaking assessments can be theoretically oriented, empirically oriented, or both. The starting point for all assessments is usually a description or theory of language ability (e.g., Canale & Swain, 1980; Bachman, 1990). These broad orientations are then narrowed down to focus on a particular skill and particular components of that skill. Empirical approaches to the development and validation of speaking assessment scales involve identification of characteristics of interest for the subsequent development of scale levels (e.g., Chalhoub‐Deville, 1995; Fulcher, 1996) or explications of assigned ability levels (e.g., Xi & Mollaun, 2006; Iwashita, Brown, McNamara, & O'Hagan, 2008; Ginther, Dimova, & Yang, 2010), or both. In addition, collecting data about examinee perspectives and experiences can be used to improve test development and administrative procedures (Yan, Thirakunkovit, Kauper, & Ginther, 2016).
Specific‐purpose scales are often derived from general guidelines and frameworks. For example, the ACTFL proficiency guidelines (2009) serve as a starting point for the ACTFL OPI scale. Another influential framework is the Common European Framework of Reference for Languages (CEFR; Council of Europe, 2001). The CEFR is a collection of descriptions of language ability, ranging from beginning to advanced, across and within the four main skills. The document is comprehensive and formidable in scope, but, in spite of its breadth, the CEFR has been used to construct scales for assessing language performance, to communicate about levels locally and nationally (Figueras & Noijons, 2009), and to interpret test scores (Tannenbaum & Wylie, 2008).