Читать книгу Embedded Formative Assessment - Dylan Wiliam - Страница 9

Оглавление

chapter 2

The Case for Formative Assessment

We’ve discussed how increasing the educational achievement of students is a national economic priority, and the only way to do that is to improve teacher quality. We also saw that deselecting existing teachers and improving the quality of entrants into the profession will have, at best, marginal effects, and so securing our economic future boils down to helping current teachers become more effective.

This chapter reviews the research on teacher professional development—specifically focusing on learning styles, educational neuroscience, and content-area knowledge—and shows that while there are many possible ways in which we could seek to develop the practice of serving teachers, attention to minute-by-minute and day-to-day formative assessment is likely to have the biggest impact on student outcomes. It continues by discussing the origins of formative assessment and by defining what, exactly, formative assessment is. The chapter concludes by presenting the strategies of formative assessment, which will be the subjects of each subsequent chapter in this book, and by discussing assessment as the bridge between teaching and learning.

The Importance of Professional Development

Andrew Leigh (2010) analyzed a data set that includes test scores on ninety thousand Australian elementary school students and found that, as in the American research, whether the teacher has a master’s degree or not makes no difference. He did, however, find a statistically significant relationship between how much a student learns and the experience of the teacher, as seen in figure 2.1.

Source: Adapted from Leigh, 2010.

Figure 2.1: Increases in teacher productivity with experience.

The value added by a teacher increases particularly quickly in the first five years of teaching, but what is most sobering about figure 2.1 is the vertical axis. If a student’s literacy teacher is a twenty-year veteran, the student will learn more than he will if his teacher is a novice, but not much more. In a year with a twenty-year veteran, a student will make an extra half-month’s progress—in other words, a twenty-year veteran teacher achieves in thirty-four weeks what a novice teacher will take thirty-six weeks to achieve. Because of the size of the study, this result is statistically significant, and the improvement is worth having, but it is not a large difference. Therefore, it’s not surprising that many have argued that the answer is more, and better, professional development for teachers.

Indeed, it would be hard to find anyone who would say that teacher professional development is unnecessary. Professional development for serving teachers is a statutory requirement in most states. However, most of these requirements are so loosely worded as to be almost meaningless. Pennsylvania’s Act 48 (Act of Nov. 23, 1999, P.L. 529, No. 48) requires teachers to complete 180 hours of professional development that relates to an educator’s certificate type or area of assignment every five years. Note that there is no requirement for teachers to improve their practice or even to learn anything. The only requirement is to endure 180 hours of professional development.

Many states justify these requirements with the need for teachers to “keep up to date” with the latest developments in the field, but such a justification merely encourages teachers to chase the latest fad. One year, it’s language across the curriculum; the next year, it’s differentiated instruction. Because teachers are bombarded with innovations, none of these innovations has time to take root, so nothing really changes. And worse, not only is there little or no real improvement in what happens in classrooms, but teachers get justifiably cynical about the constant barrage of innovations to which they are subjected. The reason that teachers need professional development has nothing to do with professional updating. Teachers need professional development because the job of teaching is so difficult, so complex, that one lifetime is not enough to master it.

The fact that teaching is so complex is what makes it such a great job. At one time, André Previn was the highest-paid film-score composer in Hollywood, and yet one day, he quit. People asked him why he had given up this amazing job, and he replied, “I wasn’t scared anymore.” Every day, he was going in to his office knowing that his job held no challenges for him. This is not something that any teacher is ever going to have to worry about.

Even the best teachers fail. Talk to these teachers, and no matter how well the lesson went, they always can think of things that didn’t go as well as they would have liked, things that they will do differently next time. But things get much, much worse when we collect the students’ notebooks and look at what they thought we said. That’s why Doug Lemov (2010) says that, for teachers, no amount of success is enough. The only teachers who think they are successful are those who have low expectations of their students. They are the sort of teachers who say, “What can you expect from these kids?” The answer is, of course, a lot more than the students are achieving with those teachers. The best teachers fail all the time because they have such high aspirations for what their students can achieve (generally much higher aspirations than the students themselves have).

People often contact me and ask whether I have any research instruments for evaluating the quality of teaching. I don’t, because working out which teachers are good and which teachers are not so good is of far less interest to me than helping teachers improve. No teacher is so good—or so bad—that he or she cannot improve. That is why we need professional development.

Although there is widespread agreement that professional development is valuable, there is much less agreement about what form it should take, and there is little research about what should be the focus of teacher professional development. However, there does seem to be a consensus that one-shot deals—sessions ranging from one to five days held during the summer—are of limited effectiveness, even though they are the most common model (Muijs, Kyriakides, van der Werf, Creemers, Timperley, & Earl, 2014). The following sections highlight some of the more popular areas of focus for professional development.

Learning Styles

Many teachers are attracted to developments such as theories pertaining to students’ learning styles. The idea that each learner has a particular preferred style of learning is attractive—intuitive even. It marries up with every teacher’s experience that students really are different; it just feels right. However, there is little agreement among psychologists about what learning styles are, let alone how to define them. One review of the research in this area finds seventy-one different models of learning styles (Coffield, Moseley, Hall, & Ecclestone, 2004). Indeed, it is difficult not to get the impression that the proposers of new classifications of learning styles have followed Annette Karmiloff-Smith’s advice: “If you want to get ahead, get a theory” (Karmiloff-Smith & Inhelder, 1974/1975). Some of the definitions, and the questionnaires used to measure them, are so flaky that one may classify an individual as having one learning style one day and a different one the next (Boyle, 1995). Others do seem to tap into deep and stable differences between individuals in how they think and learn, but there does not appear to be any way to use this in teaching.

Although many studies have tried to show that taking students’ individual learning styles into account improves learning, evidence remains elusive (Coffield et al., 2004). The Association for Psychological Science asked a blue-ribbon panel of America’s leading psychologists of education to review the available research evidence to see whether there was evidence that teaching students in their preferred learning style would have an impact on student achievement. They realized that any experiment that showed the benefit of teaching students in their preferred learning style (what they called the meshing hypothesis) would have to satisfy three conditions.

1. Following some assessment of their presumed learning style, teachers would divide learners into two or more groups (for example, visual, auditory, and kinesthetic learners).

2. Teachers would randomly allocate learners within each of the learning-style groups to at least two different methods of instruction (for example, visual- and auditory-based approaches).

3. Teachers would give all students in the study the same final test of achievement.

In such an experiment, the meshing hypothesis would be supported if the results showed that the learning method that optimized test performance of one learning-style group (for example, visual learners) was different from the learning method that optimized the test performance of a second learning-style group (for example, auditory learners). In their review, Harold Pashler and colleagues found only one study that gave even partial support to the meshing hypothesis, and two that clearly contradicted it. Their conclusion was stark: “If classification of students’ learning styles has practical utility, it remains to be demonstrated” (Pashler, McDaniel, Rohrer, & Bjork, 2008, p. 117).

Now, of course, the fact that there is currently no evidence in favor of the meshing hypothesis does not mean that such evidence will not be forthcoming in the future; absence of evidence is not evidence of absence. However, it could be that the whole idea of learning styles research is misguided because its basic assumption—that the purpose of instructional design is to make learning easy—may just be incorrect.

Since the pioneering work of Hugh Carlton Blodgett in the 1920s, psychologists have found that performance on a learning task is a poor predictor of long-term retention (for a summary of this research, see Soderstrom & Bjork, 2015). More precisely, when learners do well on a learning task, they are likely to forget things more quickly than if they do badly on the learning task; good instruction creates “desirable difficulties” (Bjork, 1994, p. 193) for the learner. As Daniel Willingham (2009) says, “memory is the residue of thought” (p. 41). By trying to match our instruction to our students’ preferred learning style, we may, in fact, be reducing learning. If students do not have to work hard to make sense of what they are learning, then they are less likely to remember it in six weeks’ time. Perhaps the most important takeaway from the research on learning styles is that teachers need to know about learning styles if only to avoid the trap of teaching in the style they believe works best for them. A review of the literature on learning styles and learning strategies (Adey, Fairbrother, Wiliam, Johnson, & Jones, 1999) concludes that:

The only feasible “solution” is that teachers should NOT try to fit their teaching to each child’s style, but rather that they should become aware of different styles (and help students also to become aware of different styles) and then encourage all students to use as wide a variety of styles as possible. Students need to learn both how to make the best of their own learning style and also how to use a variety of styles, and to understand the dangers of taking a limited view of their own capabilities. (p. 36)

As long as teachers vary their teaching style, then it is likely that all students will get some experience of being in their comfort zone and some experience of being pushed beyond it. Ultimately, we should remember that teaching is interesting because our students are so different, but only possible because they are so similar.

Educational Neuroscience

Another potential area for teacher professional development—and one that has received a lot of publicity—is applying what we are learning about the brain to the design of effective teaching. Cognitive psychologists work to understand what the brain does and how it does what it does, while neuroscientists try to connect what the brain does to its physiology.

Some of the earliest attempts to relate brain physiology to educational matters relate to the respective roles of the left and right sides of the brain in various kinds of tasks in education and training, despite clear evidence that the conclusions being drawn were unwarranted (see, for example, Hines, 1987). Schools have been inundated with suggestions for how they can use the latest findings from cognitive neuroscience to develop brain-based education, and despite the wealth of evidence that these claims are at best premature and at worst simply disingenuous (for example, Bruer, 1997, 1999; Goswami, 2006; Howard-Jones, 2009), many neuromyths still abound.

• Approximately 50 percent of teachers in China, Greece, the Netherlands, Turkey, and the United Kingdom believe that we only use about 10 percent of our brains, and more than 90 percent of teachers in these countries believe that instruction in students’ preferred learning styles is more effective (Howard-Jones, 2014). Neither of these claims is actually true.

• People are more likely to believe a psychological report if the explanation claims to be based in neuroscience, even if the explanation is nonsense (Weisberg, Keil, Goodstein, Rawson, & Gray, 2008).

• Over 50 percent of teachers in the Netherlands and the United Kingdom believe that children are less attentive after consuming drinks or snacks that contain a lot of sugar (they’re not), and 90 percent believe that differences in whether the left or the right brain is dominant can help explain individual differences among learners (they can’t; Dekker, Lee, Howard-Jones, & Jolles, 2012).

• Many believe that people remember 10 percent of what they read, 20 percent of what they hear, 30 percent of what they see, 50 percent of what they hear and see, 70 percent of what they see and write, and 90 percent of what they do, despite the fact that there is absolutely no evidence to support these suspiciously neat percentages (De Bruyckere, Kirschner, & Hulshof, 2015).

Other neuromyths include the idea that the left side of our brain is analytical and the right side is creative, that you can train your brain with activities like Brain Gym (www.braingym.org), that male and female brains are different, that listening to classical music can improve a child’s cognitive development (the so-called Mozart effect), or that we can learn when we are asleep. None of these is true as far as we know right now (De Bruyckere, Kirschner, & Hulshof, 2015). In fact, we know a great deal about how the brain works and what kinds of activities help students learn, but these findings come from cognitive science rather than neuroscience. Neuroscience, rather, provides plausible explanatory mechanisms for things we already knew from cognitive science. Two leading experts in the field of neuroscience and education, Sergio Della Sala and Mike Anderson (2011), sum it up thus in their “opinionated introduction” to their book, Neuroscience in Education:

While the use of the term “neuroscience” is attractive for education it seems to us that it is cognitive psychology that does all the useful work or “heavy lifting.” The reason for this is straightforward. We believe that for educators, research indicating that one form of learning is more efficient than another is more relevant than knowing where in the brain that learning happens. There is indeed a gap between neuroscience and education. But that gap is not filled by the “interaction” of neuro-scientists and teachers (nearly always constituted by the former patronizing the latter) or “bridging” the two fields by training teachers in basic neuroscience and having neuroscientists as active participators in educating children. Rather what will ultimately fill the gap is the development of evidence-based education where that base is cognitive psychology. (p. 3)

Content-Area Knowledge

If training teachers in cognitive neuroscience isn’t going to help, what about increasing teachers’ knowledge of their subjects? After all, surely the more teachers know about their subjects, the more their students will learn.

There is evidence that teachers in countries that are more successful in international comparisons than the United States appear to have stronger knowledge of the subjects they are teaching (Babcock et al., 2010; Ma, 1999), and this, at least in part, appears to be responsible for a widespread belief that teacher professional development needs to be focused on teachers’ knowledge of the subject matter they are teaching.

It is important to note that not all kinds of subject-matter knowledge have the same impact on student progress. A study of German high school mathematics teachers found that students did not make more progress when their teachers had advanced mathematics knowledge (such as knowledge of mathematics studied at university). However, when teachers had a profound understanding of the school-level mathematics they were teaching, then, echoing Heather Hill, Brian Rowan, and Deborah Ball’s (2005) study, students did make more progress (Baumert et al., 2010). Thus, it appears that an in-depth understanding of the curriculum may be more beneficial to student progress than advanced study of a subject on the part of the teacher.

Most studies of the relationship between teacher subject-matter knowledge and student progress, including those by Hill et al. (2005) and Jurgen Baumert et al. (2010) discussed previously, are cross-sectional in nature; researchers look to see whether the teachers whose classes make more progress have higher levels of subject knowledge. However, even if a link is found, it is not clear what this means. It could be that what really matters is general intellectual ability—that those with higher intellectual ability find learning their subject easier, and also make more effective teachers. To rule this out, we need experimental studies, where some teachers work on improving their subject knowledge while others work on something else, and then we compare their students’ progress. Here, the results are rather disappointing.

Summer professional development workshops do increase teachers’ knowledge of their subjects (Hill & Ball, 2004), but most studies that have increased teachers’ subject knowledge find little or no knock-on effects on student achievement. For example, an evaluation of professional development designed to improve second-grade teachers’ reading instruction found that an eight-day content-focused workshop increased teachers’ knowledge of scientifically-based reading instruction and also improved the teachers’ classroom practices on one out of three instructional practices that had been emphasized in the professional development (Garet et al., 2008). However, at the end of the following school year, there was no impact on the students’ reading test scores. More surprising, even when supplementing the workshop with in-school coaching, the effects were the same.

A similar story emerges from an evaluation of professional development for middle school mathematics teachers in seventy-seven schools in twelve districts (Garet et al., 2010). The districts implemented the program as intended, which resulted in an average of fifty-five hours of additional professional development for participants (who had been selected by lottery). Although the professional development had been specifically designed to be relevant to the curricula that teachers were using in their classrooms and did have some impact on teachers’ classroom practice (specifically the extent to which they engaged in activities that elicited student thinking), there was no impact on student achievement, even in the specific areas on which the intervention focused (ratio, proportion, fractions, percentages, and decimals). A study that attempted to improve mathematics and science learning in early years teaching found that increasing teachers’ subject knowledge had no impact on student achievement (Piasta, Logan, Pelatti, Capps, & Petrill, 2015).

These findings are clearly counterintuitive. It seems obvious that teachers need to know about the subjects they are teaching, yet the relationship between teachers’ knowledge of the subjects and their students’ progress is weak. Attempts to improve student outcomes by increasing teachers’ subject knowledge appear to be almost entirely failures.

Of course, these failures could be due to our inability to capture the kinds of subject knowledge that are necessary for good teaching, but they suggest that there is much more to good teaching than just knowing the subject. We know that teachers make a difference, but we know much less about what makes the difference in teachers. However, there is a body of literature that shows a large impact on student achievement across different subjects, across different age groups, and across different countries, and that is the research on formative assessment.

The Origins of Formative Assessment

Polymath and academic philosopher Michael Scriven coined the term formative evaluation in 1967 to describe the role that evaluation could play “in the on-going improvement of the curriculum” (p. 41). He contrasted this with summative evaluation. Summative evaluation’s job is:

To enable administrators to decide whether the entire finished curriculum, refined by use of the evaluation process in its first role, represents a sufficiently significant advance on the available alternatives to justify the expense of adoption by a school system. (Scriven, 1967, pp. 41–42)

Two years later, Benjamin Bloom (1969) applied the same distinction to classroom tests:

Quite in contrast is the use of “formative evaluation” to provide feedback and correctives at each stage in the teaching-learning process. By formative evaluation we mean evaluation by brief tests used by teachers and students as aids in the learning process. While such tests may be graded and used as part of the judging and classificatory function of evaluation, we see much more effective use of formative evaluation if it is separated from the grading process and used primarily as an aid to teaching. (p. 48)

Bloom (1969) went on to say, “Evaluation which is directly related to the teaching-learning process as it unfolds can have highly beneficial effects on the learning of students, the instructional process of teachers, and the use of instructional materials by teachers and learners” (p. 50).

Although educators used the term formative infrequently in the twenty years after Bloom’s (1969) research, a number of research reviews began to highlight the importance of using assessment to inform instruction, the best known of which is cognitively guided instruction (CGI).

In the original CGI project, a group of twenty-one elementary school teachers participated, over a period of four years, in a series of workshops that showed teachers extracts of videotapes selected to illustrate critical aspects of children’s thinking. The researchers then prompted teachers to reflect on what they had seen, by, for example, challenging them to relate the way a child had solved one problem to how he or she had solved or might solve other problems (for a summary of the whole project, see Fennema et al., 1996). Throughout the project, researchers encouraged the teachers to make use of the evidence they had collected about the achievement of their students to adjust their instruction to better meet their students’ learning needs. Students taught by CGI teachers did better in number fact knowledge, understanding, problem solving, and confidence (Carpenter, Fennema, Peterson, Chiang, & Loef, 1989), and four years after the end of the program, the participating teachers were still implementing the principles of the program (Franke, Carpenter, Levi, & Fennema, 2001).

The power of using assessment to adapt instruction was vividly illustrated in a 1991 study of the implementation of the measurement and planning system (MAPS), in which 29 teachers, each with an aide and a site manager, assessed the readiness for learning of 428 kindergarten students (Bergan, Sladeczek, Schwarz, & Smith, 1991). The researchers tested students in mathematics and reading in the fall and again in the spring. Their teachers learned to interpret the test results and to use the classroom activity library—a series of activities typical of early grades instruction but tied specifically to empirically validated developmental progressions—to individualize instruction. The researchers then compared these students’ performances with the performances of 410 other students taught by 27 different teachers. At the end of the year, 27 percent of the students in the control group were referred for placement and 20 percent were actually placed into special education programs for the following year. In the MAPS group, only 6 percent were referred, and fewer than 2 percent were placed in special education programs.

In addition to these specific studies, in the late 1980s, a number of research reviews began to highlight the importance of using assessment to inform instruction. A review by Lynn Fuchs and Douglas Fuchs (1986) synthesized findings from twenty-one different research studies on the use of assessment to inform the instruction of students with special needs. They found that regular assessment (two to five times per week) with follow-up action produced a substantial increase in student learning. A couple of other findings were notable. First, some studies required teachers, before assessing their students, to make up systematic evaluation rules that would tell the teachers when or how they were to make changes to the instructional plans they had made for their students. In other studies, teachers made judgments about what instructional changes might be needed only after seeing their students’ results. Both strategies increased student achievement, but the benefit was twice as great when the teachers used rules, rather than judgments, to determine what to do next. Second, when teachers tracked their students’ progress with graphs of individual students’ achievements as a guide and stimulus to action, the effect was almost three times as great as when they didn’t track progress. These findings suggest that when teachers rely on evidence to make decisions about what to do next, students learn more.

Over the next two years, two further research reviews, one by Gary Natriello (1987) and the other by Terence Crooks (1988), provided clear evidence that while classroom assessment can improve learning, it can also have a substantial negative impact on student achievement. Natriello (1987) concludes that much of the research he reviewed was difficult to interpret because of a failure to make key distinctions in the design of the research (for example, between the quality and quantity of feed-back). As a result, although some studies showed that assessment could be harmful, it was not clear why. He also points out that assessments serve a number of different purposes in schools, and many of the studies showed that assessment that is designed for selecting students (for example, by giving a grade) is not likely to improve student achievement as much as assessment that is specifically designed to support learning. Crooks’s (1988) paper focuses specifically on the impact of assessment practices on students and concludes that although classroom assessments do have the power to influence learning, too often the use of assessments for summative purposes—grading, sorting, and ranking students—gets in the way.

In 1998, Paul Black and I sought to update Natriello’s and Crooks’s reviews. One of the immediate difficulties we encountered was how to define the field of study. Their reviews cited 91 and 241 references respectively, and yet only 9 references were common to both papers. Neither paper cited Fuchs and Fuchs’s (1986) review. While many reviews of research use electronic searches to identify relevant studies, we found that the keywords used by authors were just too inconsistent to be of much help. Where one researcher might use the term formative assessment, another might use formative evaluation, and a third might use responsive teaching. In the end, we decided that there was no alternative to actually going into the library and physically examining each issue, from 1987 to 1997, of seventy-six education and psychology journals that we thought most likely to contain relevant research studies. We read the abstracts, and where those looked relevant, read the studies. Through this process, we found just over 600 studies related to the field of classroom assessment and learning, of which approximately 250 were directly relevant.

We did consider at this point conducting a formal meta-analysis of the studies we had identified, but we quickly realized that with such a diverse range of studies, meta-analysis would simply not be appropriate (see Wiliam, 2016, for an extended analysis of the problems with meta-analysis in education). Instead we conducted what we might call a configurative review (Gough, 2015) because our main purpose was to make sense of the field rather than to quantify the effects of classroom assessment processes on students. However, many of the studies that we reviewed provided considerable evidence that attention to classroom assessment processes could substantially increase the rate of student learning, in some cases effectively doubling the speed of student learning. We realized that because of the diversity of the studies, there was no simple recipe to easily apply in every classroom, but we were confident we had identified some fruitful avenues for further exploration:

Despite the existence of some marginal and even negative results, the range of conditions and contexts under which studies have shown that gains can be achieved must indicate that the principles that underlie achievement of substantial improvements in learning are robust. Significant gains can be achieved by many different routes, and initiatives here are not likely to fail through neglect of delicate and subtle features. (Black & Wiliam, 1998a, pp. 61–62)

While we did not conduct a formal meta-analysis, in a subsequent publication (Black & Wiliam, 1998b), we did try to provide some indication for practitioners and policymakers of the likely potential benefits of formative assessment. We suggested that the effective use of formative assessment would increase achievement by between 0.4 and 0.7 standard deviations, which would be equivalent to a 50 to 70 percent increase in the rate of student learning (see Wiliam, 2006, for details).

While we were confident that the research evidence that we had compiled made a compelling case for making classroom formative assessment a priority, we were not sure that these ideas could be implemented in real classrooms, especially where students were regularly subjected to external standardized tests, and where teachers were held accountable for their students’ achievement.

We therefore recruited twenty-four (later expanded to thirty-six) secondary school mathematics and science teachers in six schools in two districts in England to help us explore what classroom formative assessment might look like in classrooms (Black, Harrison, Lee, Marshall, & Wiliam, 2003). The work with teachers had two main components. The first was a series of eight workshops over an eighteen-month period, which introduced teachers to the research base underlying how assessment can support learning, allowed them the opportunity to develop their own plans for implementing formative assessment practices, and, at later meetings, provided them the time to discuss with colleagues the changes they had attempted to make in their practice. Most of the teachers’ plans contained reference to two or three important areas in their teaching in which they were seeking to increase their use of formative assessment, generally followed by details of techniques that they could use to make this happen. The second component was a series of visits to the teachers’ classrooms, so that the researchers could observe teachers implementing some of the ideas they had discussed in the workshops and could discuss how their ideas could be put into practice more effectively.

Because each teacher had made his or her own decisions about what aspect of formative assessment to emphasize and which classes to try it with, it was impossible to use a traditional experimental design to evaluate the effects of our intervention. Therefore, we designed a poly-experiment. For each class with which a teacher was trying out formative assessment techniques, we looked for the most similar comparison class and set up a mini-experiment in which we compared the test scores of the class that was using formative assessment with the test scores of the comparison class. In some cases, this was a parallel class taught by the same teacher; in some cases, it was a similar class to one the teacher had taught in previous years; and in other cases, it was a similar class taught by a different teacher. This experimental design is not as good as a random-allocation trial, because the teachers participating in the experiment might have been better teachers to begin with, and so the results need to be interpreted with some caution. Nevertheless, in this study, using scores on externally scored standardized tests, the students with which the teachers had used formative assessment techniques made twice as much progress over the year (Wiliam, Lee, Harrison, & Blade, 2004).

Definitions of Formative Assessment

As the evidence that formative assessment can have a significant impact on student learning accumulates, many researchers have proposed a variety of definitions of formative assessment. In our original review, Paul Black and I (1998a) define formative assessment “as encompassing all those activities undertaken by teachers, and/or by their students, which provide information to be used as feedback to modify the teaching and learning activities in which they are engaged” (p. 7). Writing around the same time, Bronwen Cowie and Beverley Bell (1999) qualify this slightly by requiring that teachers and students act upon the information from the assessment while learning takes place. They define formative assessment as “the process used by teachers and students to recognise and respond to student learning in order to enhance that learning, during the learning [emphasis added]” (Cowie & Bell, 1999, p. 32, emphasis added). Others also emphasize the need for action during instruction and define formative assessment as “assessment carried out during the instructional process for the purpose of improving teaching or learning” (Shepard et al., 2005, p. 275). Reviewing practice across eight countries, OECD defines formative assessment as “frequent, interactive assessments of students’ progress and understanding to identify learning needs and adjust teaching appropriately” (as cited in Looney, 2005, p. 21).

What is notable about these definitions is that, however implicitly, they regard formative assessment as a process. Others tend to regard formative assessment as a tool. For example, Stuart Kahl (2005), cofounder of Measured Progress, defines formative assessment as “a tool that teachers use to measure student grasp of specific topics and skills they are teaching. It’s a ‘midstream’ tool to identify specific student misconceptions and mistakes while the material is being taught” (p. 11). Indeed, it appears educators more often use formative assessment to refer to a particular kind of assessment instrument than a process to improve instruction.

The difficulty with trying to make the term formative assessment apply to a thing (the assessment itself) is that it just does not work. Consider an advanced placement (AP) calculus teacher who is getting her students ready to take their examination. Like many teachers, she has her students take a practice examination under formal test conditions. Most teachers would then collect the papers, score them, write comments for the students, and return the papers to the students so that they could see where they went wrong. However, this calculus teacher does something slightly different. She collects the papers at the end of the examination, but she does not score them. Instead, during her next period with the class, each group of four students receives its unscored papers and one blank examination paper, and has to compile the best composite examination paper response that it can. Within each group, the students review their responses, comparing their answers to each question and discussing what the best answer would be. Toward the end of the period, the teacher reviews the activity with the whole class, asking each group to share its agreed-on answers with the rest of the class.

The AP calculus assessment that the teacher uses was designed entirely for summative purposes. The College Board designs AP exams to confer college-level credit so that students passing the exams at a suitable level are exempt from introductory courses in college. However, this teacher uses the assessment instrument forma-tively—what Black and I have called “formative use of summative tests” (Black et al., 2003, p. 53). Describing an assessment as formative is, in fact, what philosopher Gilbert Ryle (1949) calls a category error: the error of ascribing to something a property that it cannot have, like describing a rock as happy. Because the teacher can use the same assessment both formatively and summatively, the terms formative and summative make much more sense as descriptions of the function that assessment data serve, rather than of the assessments themselves (Wiliam & Black, 1996).

Some people (for example, Popham, 2006; Shepard, 2008) call for the term formative assessment not to be used at all, unless instruction is improved. In the United Kingdom, the Assessment Reform Group argues that using assessment to improve learning requires five elements to be in place (as cited in Broadfoot et al., 1999).

1. Providing effective feedback to students

2. Actively involving students in their own learning

3. Adjusting teaching to take into account the assessment results

4. Recognizing the profound influence assessment has on students’ motivation and self-esteem, both of which are crucial influences on learning

5. Needing students to be able to assess themselves and understand how to improve

The group suggests that formative assessment—at least in the way many people use it—is not a helpful term for describing such uses of assessment because, as it says, “the term ‘formative’ itself is open to a variety of interpretations and often means no more than that assessment is carried out frequently and is planned at the same time as teaching” (Broadfoot et al., 1999, p. 7). Instead, it suggests that it would be better to use the phrase assessment for learning.

The earliest use of the term assessment for learning appears to be in the book Assessment for Learning in the Mentally Handicapped (Mittler, 1973). Harry Black (1986) used the term as the title of a chapter in the book Assessing Educational Achievement, and Mary James brought it to a wider audience as the title of a paper presented at the 1992 annual conference of the Association for Supervision and Curriculum Development in New Orleans (James, 1992), but the term has become common in North America as a result of the work of Rick Stiggins, who adopted the term assessment for learning as being very different from formative assessment.

In the United States, for many years, educators used the term formative assessment to describe a process for monitoring student achievement. Students took assessments at regular periods (typically four to ten weeks), and teachers then looked at the resulting data to determine which students were making sufficient progress and which were not. Where students were not making sufficient progress, teachers would investigate what might be done to improve progress (such assessments are also called benchmark assessments or interim assessments).

Now it is important to realize that monitoring student progress is a good thing to do. Any well-run organization should be able to monitor its progress toward its goals. As W. Edwards Deming is reputed to have said, “In God we trust. All others bring data” (Hastie, Tibshirani, & Friedman, 2009, p. vii). However, if formative assessment merely identifies which students are falling behind, then it limits the impact on student achievement. It is in response to this limited view of formative assessment that Rick Stiggins (2005), founder of the Assessment Training Institute, writes:

If formative assessment is about more frequent, assessment FOR learning is about continuous. If formative assessment is about providing teachers with evidence, assessment FOR learning is about informing the students themselves. If formative assessment tells users who is and who is not meeting state standards, assessment FOR learning tells them what progress each student is making toward meeting each standard while the learning is happening—when there’s still time to be helpful. (pp. 1–2)

However, just replacing the term formative assessment with the term assessment for learning merely clouds the definitional issue (Bennett, 2011). What really matters is what kind of processes we value, not what we call them. The problem, as researcher Randy Bennett (2011) points out, is that it is an oversimplification to say that formative assessment is only a matter of process or only a matter of instrumentation. Good processes require good instruments, and instruments are useless unless teachers use them intelligently.

The original, literal meaning of the word formative, according to Merriam-Webster’s Online Dictionary, is “capable of alteration by growth and development” (“formative,” 2017). This suggests that formative assessments should shape instruction—our formative experiences are those that have shaped our current selves—and so we need a definition that can accommodate all the ways in which assessment can shape instruction. And there are many. Consider the following eight scenarios.

1. In spring 2016, a science curriculum supervisor needs to plan the summer workshops that the district will offer to its middle school science teachers. She analyzes the scores the district’s middle school students obtained on the 2015 state tests and notes that while the science scores are generally comparable to those of the rest of the state, the students in her district appear to be scoring rather poorly on items involving physical sciences when compared with those testing life sciences. She decides to make physical science the focus of the professional development activities offered in summer 2016, which are well attended by the district’s middle school science teachers. Teachers return to school in fall 2016 and use the revised instructional methods they have developed over the summer. As a result, when students take the state test in spring 2017, the achievement of middle school students in the district on items involving physical sciences increases, and so the district’s performance on the state tests, reported in summer 2017, improves.

2. Each year, a group of algebra 1 high school teachers reviews students’ performance on a statewide algebra 1 test and, in particular, looks at the facility (proportion correct) for each item on the test. When item facilities are lower than the group expects, the group looks at how teachers prepared and delivered instruction on that aspect of the curriculum and considers ways in which teachers can strengthen the instruction in the following year.

3. A school district administers a series of interim tests, tied to the curriculum, at intervals of six to ten weeks to check on student progress. The district uses past experience to determine a threshold that gives students an 80 percent chance of passing the state test, and requires students whose interim test scores fall below the threshold to attend additional instruction on Saturday mornings.

4. Since 2003, the School District of Philadelphia has mandated a core curriculum that includes a tightly sequenced planning and scheduling timeline, in which the school year is divided up into a number of six-week cycles. In each six-week cycle, the district expects teachers to use the first five weeks for instruction, at the end of which students take a multiple-choice test, which the teachers can use to determine how to spend the final week of the cycle. If students have done well, teachers typically schedule enrichment and enhancement activities, but if there are significant weaknesses in students’ understanding, the final week becomes a “re-teaching week” (Oláh, Lawrence, & Riggan, 2010).

5. A middle school science teacher is designing a unit on pulleys and levers. She allocates fourteen periods to the unit, but plans to cover all the content in the first eleven periods. Building on ideas common in Japan (see, for example, Lewis, 2002), in period twelve, the teacher gives the students a quiz and collects the papers. Instead of grading the papers, she reads through them carefully, and based on what she discovers about what the class has and has not learned, she plans appropriate remedial activity for periods thirteen and fourteen.

6. A history teacher has been teaching about the issue of bias in historical sources. Three minutes before the end of the lesson, students pack away their books and receive an index card on which the teacher asks them to respond to the question “Why are historians concerned about bias in historical sources?” The students turn in these exit passes as they leave the class at the end of the period. After all the students leave, the teacher reads through the cards and then discards them, concluding that the students’ answers indicate a good-enough understanding for the teacher to move on to a new chapter.

7. A language arts teacher has been teaching his students about different kinds of figurative language. Before moving on, he wants to check his students’ understanding of the terms he has been teaching, so he uses a real-time test. The teacher gives each student a set of six cards bearing the letters A, B, C, D, E, and F; and on the board, he displays the following.

A. Alliteration

B. Onomatopoeia

C. Hyperbole

D. Personification

E. Simile

F. Metaphor

He then reads a series of statements.

• This backpack weighs a ton.

• He was as tall as a house.

• The sweetly smiling sunshine melted all the snow.

• He honked his horn at the cyclist.

• He was a bull in a china shop.

After the teacher reads each statement, he asks the class to hold up a letter card (or cards) to indicate which kind or kinds of figurative language features in each of the six statements. All students respond correctly to the first question, but in responding to the second, each student holds up a single card (some hold up E, and some hold up C). The teacher reminds the class that some statements might be more than a single type of figurative language. Once the students realize that there can be more than one answer, the class responds correctly to statements two, three, and four. About half the students, however, indicate that they think statement five is a simile. The teacher then leads a whole-class discussion during which students give their reasons for why they think statement five is a simile or a metaphor, and after a few minutes, all the students agree that it is a metaphor, because the statement does not include like or as.

8. An AP calculus teacher has been teaching students about graph sketching and wants to check quickly that the students have grasped the main principles. She asks the students, “Please sketch the graph of y = 1 over 1 + x2.” Each student sketches the graph on a whiteboard and holds it up for the teacher to see. The teacher sees that the class understands and moves on.

In each of these eight examples, the teachers use evidence to elicit and interpret student achievement and make a decision about what to do next, but whether this is enough to make each of these an example of formative assessment is a matter of some debate.

I often ask teachers which of these eight cases they would regard as formative, and there is rarely any consensus. In example 1, assessment modifies instruction, especially if you regard the supervisor as the teacher and the teachers as her students, but many people are unhappy that it is two years before the changes occur. Similar concerns are raised about example 2, especially since the students on whom the data are collected do not benefit from the process, and, moreover, it is not clear that next year’s algebra 1 students will have the same problems. Example 3 raises concerns for many teachers that assessment is being used in a punitive way, but as Harvard economist Roland Fryer (2014) points out, some students need more instructional time to reach proficiency on their state’s standards. He calls this the basic physics of education: “If your students are falling behind, you have two choices: spend more time in school or convince the high-performing schools to give their kids four-day weekends. The key is to change the ratio.”

Requiring students to attend additional classes on Saturdays may not be ideal, especially in rural areas where transportation requirements create additional difficulties. However, a school that provides additional instruction on Saturdays for students who need it has at least found a solution to the problem of how it is going to get more instructional input to the students who need it. Any school that hopes to “close the gap” without having a way of getting more instructional input to the students who need it is just paying lip service to the idea of equality.

Example 5 is interesting because it is a districtwide policy in which the formative assessment process is hardwired into the school year. To create the slack needed for the system to work, teachers have to prioritize content, which is difficult because teachers and administrators are generally told that all the state standards are essential. The problem is that most state standards contain so much material that only the fastest-learning students have any chance of mastering the required material in the time available. Robert Marzano and his colleagues asked teachers how much time they would need to cover all the content in their state standards for each year, and the average figure was twenty months (Marzano, Kendall, & Gaddy, 1999).

While that figure may have been reduced somewhat in states that have adopted variants of the Common Core State Standards, the fact is that most states specify considerably more content for their students to learn than can possibly be achieved by most students in the time available. The teacher could, of course, cover the required material at a rate that guarantees all the standards are covered, but that would mean that most students would be floundering. In Philadelphia’s school system, as discussed previously, teachers have to make choices about which of the standards are essential and which are desirable, teach the prioritized standards, and then assess. If students have made enough progress, then the teacher can spend some of the “re-teaching week” on new material, but if a substantial number of students have not made enough progress on the essential standards, they remain the priority. The defining feature of this system is that the teacher does not know what he or she will be teaching in the re-teaching week until she sees how the students have done on the assessment.

Most teachers, in my experience, are happy to regard example 5 as an example of formative assessment, although some teachers suggest that twelve periods is a long time to wait to find out whether students are learning anything. On the other hand, many teachers are disturbed by example 6 because the teacher discarded the students’ responses, rather than giving students individual feedback. However, this rather misses the point, because the reason that the teacher used exit passes in this situation was because she did not want to give students individual feedback. Her aim was to decide how to start the next lesson, as I discovered when I asked her why she had discarded the cards.

Me: Why did you discard the exit passes?

Teacher: Because I know where to start tomorrow’s lesson.

Me: What did you decide?

Teacher: They mostly got it right. I’m moving on.

Me: What would you have done if they weren’t ready to move on?

Teacher: I would have taught it again, but slower and louder. I’m joking of course. I would have taught it again, but in a different way.

Me: What would you have done if half the students had answered correctly and half of the students had answered incorrectly?

Embedded Formative Assessment

Подняться наверх