Читать книгу Embedded Formative Assessment - Dylan Wiliam - Страница 8

На сайте Литреса книга снята с продажи.

chapter 1

Why Educational Achievement Matters

Educational achievement matters—more now than at any time in the past. It matters for individuals, and it matters for society. For individuals, higher levels of education mean higher earnings, better health, and increased life span. For society, higher levels of education mean lower health care costs, lower criminal justice costs, and increased economic growth. In this chapter, we will explore why education and educational achievement are vital to the prosperity of every nation and why the vast majority of attempts by policymakers to improve the achievement of school students have failed. We will then discuss three generations of school effectiveness research, the impact of teacher quality, and research-proven ways to increase teacher quality.

The Increasing Importance of Educational Achievement

Education has always been important, but it has never been as important as it is now. In 1979, the median salary of those with bachelor’s degrees was $30,000 higher than the median salary of those with a high school diploma or GED (in constant 2012 dollars). By 2012, the annual earnings gap had widened to over $58,000 (Autor, 2014).

Higher levels of education are also associated with better health; people with more education are less susceptible to a whole range of diseases, including cancer, and are less likely to have a significant period of disability toward the end of their lives (Jagger et al., 2007). No doubt this is partly due to lifestyle choices, such as smoking, but it is also due in part to the kinds of work that are available to those with limited education. According to the Organisation for Economic Co-operation and Development (OECD, 2010), approximately 75 percent of American adults who did not complete high school say they are in good health, compared with 95 percent of those with college degrees.

Perhaps more surprisingly, people with more education live longer. Between 1915 and 1939, at least thirty states changed their child labor laws and periods of compulsory schooling. As a result, a number of students were required to attend school for one more year than children in other states. By looking at the life spans of those who had been required to attend an extra year of school, Adriana Lleras-Muney (2005) estimates that each additional year of schooling adds 1.7 years to one’s life. These are high stakes indeed.

Educational achievement also matters for society. Henry Levin and his colleagues at Columbia University estimate that preventing one high school dropout produces a net benefit to society of $209,000 (Levin, Belfield, Muennig, & Rouse, 2007). The main components of this total are:

• $139,000 in extra taxes the individual would pay because he or she would be earning more money

• $40,500 in reduced health care costs, partly because the individual would be healthier, as noted previously, but also partly because he or she would be more likely to get health benefits from an employer and, therefore, be less dependent on public assistance

• $26,600 in reduced criminal justice costs (largely because the individual would be less likely to be incarcerated)

With higher levels of education, the U.S. economy will also grow faster. Using data from OECD’s Programme for International Student Assessment (PISA) and a variety of other sources, Eric Hanushek and Ludger Woessmann (2015) examine the impact of increased achievement on economic growth. They estimate that if educators could raise the scores of American fifteen-year-olds on the triennial PISA tests by twenty-five points—the improvement that Poland made over a period of ten years—then by 2095, the U.S. economy would be 30 percent larger than it would otherwise be. And if American fifteen-year-olds could achieve a level of reading and mathematics that allowed them to participate effectively in modern society (defined by a score of 420 on PISA as compared with the international average of 500), the U.S. economy would grow by an extra $30 trillion (Hanushek & Woessmann, 2015).

The reason that higher levels of education are so valuable is because employers’ educational demands are increasing steadily, nowhere more so than in manufacturing. According to the U.S. Bureau of Labor Statistics (2016), seventeen million Americans worked in manufacturing in 2000. Ten years later, the figure was less than twelve million. This means that in the first decade of the 21st century, the U.S. economy lost 2,700 manufacturing jobs every day. It is common to hear people say, “We don’t make stuff in America anymore,” but the sentiment is wrong. It turns out that more goods were manufactured in the United States in 2016 than at any other point in its history, surpassing the previous peak reached in 2008 (Federal Reserve Bank of St. Louis, n.d.). The United States makes more stuff than it has ever made before. It just doesn’t use so many humans to do it—and that’s a good thing. Across all U.S. manufacturing, between 2002 and 2015, manufacturing output per hour of labor went up by 47 percent (Levinson, 2016). Compared to the heyday of American manufacturing, the average American worker employed in manufacturing in 2016 is more than six times as productive per hour as a worker in 1950. The reason that American workers are so much more productive is because they can work with more sophisticated technology, but this means that modern workers need higher levels of skill. Almost half of the manufacturing jobs that did not require a high school diploma in 2000 were gone by 2015, while the number of manufacturing jobs that required at least a master’s degree rose by 32 percent (Levinson, 2016).

One common reaction to such changes in the working world is the fear that we are going to run out of jobs. Many people believe that there are only a certain number of jobs to go around, and if some of these jobs are destroyed, then there will not be enough work for everyone. This is an incorrect, albeit common, belief. Despite the millions of jobs that were lost in manufacturing, according to the Bureau of Labor Statistics, there are more people working in the United States than at any time in its history—160 million as of May 2017 (U.S. Bureau of Labor Statistics, 2017).

Interestingly, many of the new jobs being created do not demand much in the way of educational qualifications. In 2013, the U.S. Bureau of Labor Statistics projected that between 2012 and 2022, the U.S. economy will create just over four million new jobs for those with college degrees, but it will also create three million that require some education beyond high school but not a college degree, four million that require only a high school diploma, and another four million jobs that do not even require a high school diploma (U.S. Bureau of Labor Statistics, 2013). As far as can be seen, there will be jobs for people whatever their level of education in the United States. It is therefore probably not accurate to say to young people that they need to get a good education to get a job, but it does seem that education will be important to getting a good job.

We have already seen this in the changes that have occurred since the late 1990s. The greatest job destruction has not been for the lowest-skilled workers; rather, it has been for those doing routine jobs, whatever the skill level. And because computers are simpler and less expensive than robots, things like routine office work—what economists call routine cognitive jobs—have been easier to automate than manual work (Dvorkin, 2016). We think playing chess is an amazing human achievement, but for a few dollars, you can now buy a smartphone app that will beat most humans on the planet. What employers haven’t been able to do yet is to use robots to stack shelves in Walmart in a cost-effective way, which is why humans do the job right now. But the message since the Industrial Revolution has been that as soon as a job can be done cost-effectively by a machine, it will be.

If having a valued skill no longer guarantees employment, then the only way to be sure of being employable is to be able to develop new skills, as Seymour Papert (1998) observes:

So the model that says learn while you’re at school, while you’re young, the skills that you will apply during your lifetime is no longer tenable. The skills that you can learn when you’re at school will not be applicable. They will be obsolete by the time you get into the workplace and need them, except for one skill. The one really competitive skill is the skill of being able to learn. It is the skill of being able not to give the right answer to questions about what you were taught in school, but to make the right response to situations that are outside the scope of what you were taught in school. We need to produce people who know how to act when they’re faced with situations for which they were not specifically prepared.

This is why education—as opposed to training—is so important. Not only does education confer skills, but it also produces the ability to develop new skills.

The fundamental idea that education is the engine of future economic prosperity has been understood for many years, but studies have shown just how much education increases economic growth (or, conversely, just how much economic growth is limited by low educational achievement).

The Difficulties of Raising Student Achievement

Successive governments have understood the importance of educational achievement and have sought to raise standards through a bewildering number of policy initiatives. Although most of these seemed like sensible measures at the time, the depressing reality is that the net effect of the vast majority of these measures on student achievement has been close to, if not actually, zero.

A number of reform efforts have focused on the structures of schooling. In the United States, particular attention was given to reducing school size. The logic was simple: many high schools are very large and impersonal, and so the creation of smaller high schools should create more inclusive learning communities, which should then result in better learning. Advocates for smaller high schools also pointed to evidence that in many states, the highest test scores were in small high schools. However, they forgot to look at the other end of the distribution. The lowest test scores were also in small high schools (Wainer & Zwerling, 2006). The evidence suggests that small schools aren’t any better, on average. They are just more likely to have extreme results—whether high or low—because they are small (Kahneman, 2011). The fewer students there are in a class, the greater the chance that, in a particular year, the students happen to be either very strong or very weak academically. In fact, the evidence suggests that smaller high schools are actually less effective than larger ones because teachers have to teach a range of courses, and therefore have fewer opportunities to specialize. As one high school student in Seattle summarizes, “There’s just one English teacher and one mathematics teacher. They end up teaching things they don’t really know” (Geballe, 2005).

The creation of smaller high schools can also be rather inefficient. In many cases, large high schools of around three thousand students are divided into five or six smaller high schools, each with five hundred or six hundred students but housed in the same building. Often, in such cases that I have seen, the only change is increased administrative costs, as a result of appointing six new principals for each of the small high schools and increasing the compensation of the existing principal for looking after six newly appointed junior principals.

In other cases, students have not experienced all the potential benefits of small high schools because leaders assumed the creation of small high schools was an end in itself, rather than a change in structure that would make other needed reforms easier to achieve. One benefit leaders hoped for was that smaller high schools would improve staff-student relationships, and with improved relationships, students would become more engaged in their learning. Students would interact with a smaller number of teachers, thus fostering the development of better staff-student relationships. This may well be effective, although it should be said that getting students engaged so that they can be taught something seems much less efficient than getting them engaged by teaching them something that engages them. But every student would still have a language arts teacher, a mathematics teacher, a science teacher, a social studies teacher, and so on. The size of a high school does not affect the number of teachers a student meets in a day. Staff-student relationships can grow stronger if teachers loop through with their students so that the same teacher teaches a class for more than a single semester or year; however, this requires amendments to schedules and depends on having teachers who can teach multiple grades. Large high schools could easily incorporate this system, if they considered it a priority.

Other countries are going in the opposite direction. In England, for example, high-performing schools are asking their principals to assume responsibility for less successful schools by forming federations of schools—groups of schools with a single principal—but as yet, there is no evidence that this has led to improvement.

Other reforms have involved changes to the governance of schools. The most widespread such reform in the United States has been the introduction of charter schools. According to the Education Commission of the States (2017), forty-three states and the District of Columbia now have charter laws, but the evaluations of their impact on student achievement do not allow for any easy conclusions.

There is no doubt that some charter schools are achieving notable success, but others are not, and it appears that, at least to begin with, there were more of the latter than the former. In 2009, the Center for Research on Education Outcomes (CREDO) at Stanford University reported that across fifteen states and the District of Columbia, approximately one-half of charter schools obtain similar results to traditional public schools, one-third get worse results, and one-sixth get better results (CREDO, 2009). For the first twenty years of their operation, the net effect of charter schools was to lower student achievement rather than increase it, although this may well have been partly because most charters get less money per student (Miron & Urschel, 2010). Some charter schools, such as those that the Knowledge Is Power Program (KIPP) operate, are undoubtedly much more effective than comparable traditional public schools. Students at KIPP schools typically make an extra three to four months more progress each year, but they achieve this by having longer school days, some Saturday classes, and a longer school year (Tuttle et al., 2013). Each year KIPP school students spend 45 percent more time in school and make about 30 percent more progress—a clear example of diminishing returns. Moreover, while some charter schools are highly effective, most are not. An evaluation of the charter school system in Chicago (Hoxby & Rockoff, 2004) finds that students attending charter schools score higher on the Iowa Test of Basic Skills, but the effects are small: an increase of 4 percentile points for reading and just 2 percentile points for mathematics. As states have become better at closing less effective charter schools, the performance of charter schools has improved relative to traditional public schools, but a report from the CREDO team at Stanford—now covering twenty-two states and Washington, DC—finds that the differences are small (CREDO, 2013). In mathematics, performance was higher in 29 percent of charter schools, about the same in 40 percent, and lower in 31 percent of schools. For reading, the figures were 56 percent, 25 percent, and 19 percent (CREDO, 2013). To put this in perspective, on average, a student attending a charter school in the United States would make 4 percent more progress (equivalent to eight days) than if he or she attended a traditional public school—an improvement worth having, but much less improvement than we need.

As the characteristics of successful charter schools become better understood, it will, no doubt, be possible to ensure that charter schools are more successful, but it is worth noting that the organizations that run the best charter schools are not keen to expand quickly, so any impact on the whole education system will be slow. For example, if we assume that the North American school population increases, as it has in the past, at a rate of around 0.7 percent per year, and the number of charter school places increases by 250,000 each year (the average rate over the last few years), then even if these new charter schools are as good as KIPP schools, it will be 2058 before students achieve an extra three weeks’ learning per year (Wiliam, in press a). Of course, we could expand charter schools more aggressively, but this would be likely to result in lower quality, thus weakening the impact. Whatever their benefits, the creation of charter schools is not likely to have a substantial and immediate impact on student achievement (Carnoy, Jacobsen, Mishel, & Rothstein, 2005).

In England, the government has reconstituted many low-performing schools as “academies” that are run by philanthropic bodies but receive public funds equivalent to public schools, in addition to a large capital grant for school rebuilding. The principals of these academies have far greater freedom to hire and fire staff and are not required to follow national agreements on teacher compensation and benefits, nor to follow the national curriculum. Student test scores in these academies have risen faster than those in regular public schools, but this is to be expected, since such schools start from a lower baseline, and therefore have more room for improvement. A comparison with similarly low-performing schools not reconstituted as academies shows that they improve at the same rate (Machin & Wilson, 2009).

One of the most radical experiments in the organization of schooling has been taking place in Sweden. In 1992, the Swedish government invited for-profit providers to run public schools. Although many evaluations of this initiative found some successes, each of these studies contained significant methodological weaknesses. An evaluation from the Institute for the Study of Labour (Böhlmark & Lindahl, 2008), which corrected the flaws of earlier studies, found that the introduction of for-profit education providers did produce moderate improvements in short-term outcomes such as ninth-grade GPA and in the proportion of students who chose an academic high school track. However, these improvements appeared to be concentrated in more affluent students and were transient. There was no impact on longer-term outcomes such as high school GPA, university attainment, or years of schooling (Böhlmark & Lindahl, 2008), nor on employment, earnings, or engagement with the criminal justice system (Wondratschek, Edmark, & Frölich, 2014).

In England, since 1986, secondary schools have applied for specialist school status, along the lines of magnet schools in the United States. Specialist schools do get higher test scores than traditional secondary schools in England, but they also get more money—around $200 more per student per year. The improvement in results achieved by specialist schools turns out to be just what you would expect if you gave traditional public schools an extra $200 per student per year (Mangan, Pugh, & Gray, 2007). Moreover, specialist schools do not get better results in the subjects in which they specialize than they do in other subjects (Smithers & Robinson, 2009).

Other reform efforts have focused on curriculum. Almost every country aspires to have a 21st century curriculum. For instance, the Scottish government has adopted a Curriculum for Excellence, but whether anything changes in Scottish classrooms remains to be seen, and the short-term results are not encouraging (OECD, 2016). Trying to change students’ classroom experiences through changes in curriculum is very difficult. A bad curriculum well taught is usually a better experience for students than a good curriculum badly taught; pedagogy trumps curriculum. Or more precisely, pedagogy is curriculum, because what matters is how things are taught, rather than what is taught.

There is no standard definition of the term curriculum. The word originally (in English at least) described the selection of courses in Scottish universities in the 17th century, but over the years, it has come to mean activities that educational organizations arrange to help their students learn the intended material. However, there are at least three levels at which the word might be applied: (1) intended, (2) implemented, and (3) achieved. The intended curriculum includes the things that states or national governments determine students should learn in school. The implemented curriculum includes the textbooks and other materials that schools and districts adopt, and the achieved curriculum is what actually happens in classrooms. Needless to say, there is often some slippage between these three kinds of curricula. Often the textbooks that schools adopt do not align well with the intended curriculum (even though the publishers often claim that they do), and the way that teachers use those materials often does not accord with the intentions of those who wrote the textbooks (Wiliam, 2013).

While the way that teachers use textbooks may not always accord with the intentions of textbook publishers, textbooks do influence how teachers teach, and so there has been a great deal of interest in whether some textbooks are more effective than others. Some textbooks may align better with one state’s standards than others. However, as more and more states have either adopted common standards, such as the Common Core State Standards or the Next Generation Science Standards, or other similar local standards, alignment has become less of an issue. What has become apparent, however, and particularly in mathematics, is that some textbooks are much more effective than others at teaching the same content.

Initially, most studies of textbook adoption have found little evidence that changes in textbooks alone have much impact on student achievement. However, particularly in elementary schools, just changing the textbooks can have a marked impact on student achievement, increasing the rate of student learning by up to 25 percent (Agodini & Harris, 2016). When researchers consider all grade levels, it is only when the programs change teaching practices and student interactions that a significant impact on achievement occurs (Slavin & Lake, 2008; Slavin, Lake, Chambers, Cheung, & Davis, 2009; Slavin, Lake, & Groff, 2009). However, at present, it does not seem to be possible to predict which textbooks are likely to be the most effective. We know that textbooks make a difference, but we don’t know what makes the difference in textbooks. Thus, while textbook choice is important, it does not seem to be, at present, a reliable way of raising student achievement.

Many reforms look promising at the pilot stage but, when rolled out at scale, fail to achieve the same effects. In 1998, after less than a year in office, Tony Blair’s Labour Party launched the National Literacy Strategy and, a year later, the National Numeracy Strategy for elementary schools in England and Wales. Although these programs showed promising results in the early stages, their effectiveness when rolled out to all elementary schools was equivalent to only one extra eleven-year-old in each elementary school reaching proficiency per year (Machin & McNally, 2009). Bizarrely, the fastest improvement in the achievement of English eleven-year-olds was in science, which had not been subject to any government reform efforts.

Other reform efforts have emphasized the potential impact of educational technology, such as computers, on classrooms. While there is no shortage of boosters for the potential of computers to transform education, reliable evidence of their impact on student achievement is rather hard to find. The history of computers in education is perhaps best summarized by the title Oversold and Underused (Cuban, 2002). This is not to say that computers have no place in education; some computer programs can be effective at teaching challenging content matter. One good example is Carnegie Learning’s Cognitive Tutor: Algebra I, developed over a period of twenty years at Carnegie Mellon University (Ritter, Anderson, Koedinger, & Corbett, 2007). The program has a very specific focus—teaching procedural aspects of ninth-grade algebra—and therefore should be used only for two or three hours per week, but it is more effective than many teachers at teaching this particular content (Pane, Griffin, McCaffrey, & Karam, 2014; Ritter et al., 2007). However, such examples are rare, and computers have failed to revolutionize our classrooms in the way leaders predicted (Bulman & Fairlie, 2016). As Heinz Wolff once said, “The future is further away than you think” (as cited in Wolff & Jackson, 1983).

Attention has become focused on the potential of interactive whiteboards. In the hands of expert practitioners, these are stunning pieces of educational technology, but as tools for improving educational outcomes at scale, they appear to be very limited. We know this from an experiment that took place in London. The English secretary for education, Charles Clarke, was so taken with the interactive whiteboard that he established a fund that doubled the number of interactive whiteboards in London schools. The net impact on student achievement was zero (Moss et al., 2007). But, say the technology boosters, you should provide professional development to go with the technology. This may be so, but if interactive whiteboards are only effective when teachers are given a certain number of hours of professional development, then surely it is right to ask whether the same number of hours of professional development could be more usefully, and less expensively, used in another way.

As a final example of an effort to produce substantial improvement in student achievement at scale, it is instructive to consider the impact of teachers’ aides in England. One large-scale evaluation of the impact of support staff on student achievement found that teachers’ aides actually lowered the performance of the students they intended to help (Blatchford et al., 2009), largely because in many schools, aides were routinely tasked to help the students with the most profound learning needs—a task for which they were not well suited. Of course, this does not mean that the use of teachers’ aides cannot increase student achievement. Evidence from North Carolina suggests that teachers’ aides can be cost-effective—particularly for minority students—if they are well managed and assigned suitable classroom roles (Clotfelter, Hemelt, & Ladd, 2016). However, all this means is that two or three teachers’ aides can be as effective as a regular teacher. Where qualified teachers are in short supply, the deployment of teachers’ aides may be a useful short-term measure. However, it is unlikely to have much impact on overall student achievement.

The reform efforts discussed here, and the history of a host of other reform efforts, show that improving education at scale is clearly more difficult than we often imagine. Why have we pursued such ineffective policies for so long? Much of the answer lies in the fact that we have been looking in the wrong place for answers.

Three Generations of School Effectiveness Research

Economists have known about the importance of education for economic growth for years, and this knowledge has led to surges of interest in studies of school effectiveness. Some schools appeared to get consistently good test results, while others seemed to get consistently poor results. The main thrust of the first generation of school effectiveness research, which began in the 1970s, was to understand the characteristics of the most effective schools. Perhaps if we understood that, we could reproduce the same effect in other schools.

Unfortunately, things are rarely that simple. Trying to emulate the characteristics of today’s most effective schools would lead to the following three measures.

1. First, get rid of the boys. All over the developed world, girls are outperforming boys, even in traditionally male-dominated subjects such as mathematics and science (OECD, 2016). The more girls you have in your school, the better you are going to look.

2. Second, become a parochial school. Again, all over the world, parochial schools tend to get better results than other schools, although this appears to be more due to parochial schools tending to be more socially selective than public schools (see, for example, Cullinane, Hillary, Andrade, & McNamara, 2017).

3. Third, and most important, move your school into a nice, leafy, suburban area. This will produce three immediate benefits. First, it will bring you much higher-achieving students. Second, parents will better support their students, whether this is in terms of supporting the school and its mission or paying for private tuition. Third, the school will have more money—potentially lots more. Some American schools receive more than $40,000 per student per year, compared with others that receive less than $5,000 per student per year (National Public Radio, 2016).

In case it wasn’t obvious, these are not, of course, serious suggestions. Girls’ schools, parochial schools, and schools in affluent areas get better test scores primarily because of who goes there, rather than how good the schools are, as researchers pointed out in the second generation of school effectiveness studies (see, for example, Thrupp, 1999). These researchers pointed out that most of the differences between school scores are due to the differences in students attending those schools rather than any differences in the quality of the schools themselves. The OECD data (Programme for International Student Assessment, 2010) are helpful in quantifying this. The PISA data show that 74 percent of the variation in the achievement of fifteen-year-olds in the United States is within schools, which means that 26 percent of the variation in student achievement is between schools (such as some schools getting better test scores than others). However, around two-thirds of the between-school variation is caused by differences in the students attending that school. This, in turn, means that only 8 percent of the variability in student achievement is attributable to the school—or, in reverse, that 92 percent of the variability in achievement is not attributable to the school (PISA, 2010). What this means in practice is that if fifteen out of a class of thirty students achieve proficiency in an average school, then seventeen out of thirty would do so in a “good” school (one standard deviation above the mean, or one of the best one-third of all schools) and thirteen out of thirty would do so in a “bad” school (one standard deviation below the mean). While these differences are no doubt important to the four students in the middle who are affected, they are, in my experience, much smaller than people imagine. It seems that Basil Bernstein (1970) was right, therefore, when he said that “education cannot compensate for society” and that we should be realistic about what schools can, and cannot, do (as cited in Thrupp, 1999).

However, as higher-quality data sets have become available, we have been able—in the third generation of school effectiveness studies—to dig a little deeper. In particular, where a data set allows us to compare a student’s achievement at the beginning of the year and at the end of the year, we can estimate the school’s value added (the difference between what a student knew when she arrived at a school and what she knew when she left). It turns out that as long as you go to school (and that’s important), then it doesn’t matter very much which school you go to, but it matters very much which classrooms you’re in.

In the United States, the classroom effect appears to be at least four times the size of the school effect (PISA, 2007), which, predictably, has generated a lot of interest in what might be causing these differences. It turns out that these substantial differences between how much students learn in different classes have little to do with class size, how the teacher groups the students for instruction, or even the presence of between-class grouping practices (for example, tracking). The most critical difference is simply the quality of the teacher.

Parents have always understood how important having a good teacher is to their children’s progress, but only since the mid-1990s have we been able to quantify exactly how much of a difference teacher quality makes.

The Impact of Teacher Quality

For a long time, it seems, many people involved in education assumed that the correlation between teacher quality and student progress was effectively zero. In other words, if properly qualified, all teachers were equally good, and so on average, students should progress at the same rate in all classrooms. Of course, different students would progress at different rates according to their talents and aptitudes, but the assumption was that all teachers were comparable and therefore able to function like a commodity.

To an economist, a commodity is a good for which there is a demand, and it is fungible—a person can substitute one unit for another, since all units are assumed to be of equal quality. It is convenient for policymakers to treat teachers as a commodity, because then they can determine teacher compensation on a supply-and-demand basis. Teacher compensation could—like that for traders on the financial markets—be set based on the value they contribute, but this would mean that the best teachers would cost way too much—over $300,000 per year according to one study (Chetty et al., 2010). It is convenient for politicians to set a standard for “the qualified teacher,” so that everyone who meets that standard gets in. Teacher compensation can then be determined by supply and demand—how much needs to be paid to get a qualified teacher in every classroom (although in this context, it is worth noting that this is not the basis that politicians tend to use to determine their own compensation!).

The desire of teacher unions to treat all teachers as equally good is understandable, because it generates solidarity among their members, but more important because performance-related pay is in principle impossible to determine fairly. Consider a district that tests its students every year from third through eighth grade and then uses the test score data to work out which teachers have added the most value each year. This looks straightforward, but there is a fatal flaw: no test can capture all that is important for future progress. A fourth-grade teacher who spends a great deal of time developing skills of independent and collaborative learning, who ensures that her students become adept at solving problems, and who develops her students’ abilities at speaking, listening, and writing in addition to teaching reading may find that her students’ scores on the fourth-grade mathematics and reading tests are not as high as those of other teachers in her school who have been emphasizing only what is on the test. And yet, the teacher who inherits this class in fifth grade will look very good when the results of the fifth-grade tests are in, not because of what the fifth-grade teacher has done, but because of the firm foundations that the fourth-grade teacher laid.

In addition, evidence suggests that paying teachers bonuses for the achievement of their students does not raise test scores. Between 2006 and 2009, researchers selected teachers in Nashville, Tennessee, at random and offered bonuses of $15,000 for getting their students’ achievement to match the highest-performing 5 percent of students, with lesser bonuses of $10,000 and $5,000 for matching the highest-performing 10 percent and 20 percent, respectively. An evaluation of the incentives found that the scores of the students taught by teachers offered bonuses were no higher than the scores of those taught by other teachers (Springer et al., 2010).

Such results seem to surprise many economists. They tend to assume that people’s primary motivation is economic reward, and so offering cash incentives for people to try harder must surely increase results. They forget that such incentives work only when people are not already trying as hard as they can. There are, no doubt, some teachers who do not care about how well their students do, and for this small minority of teachers, incentives may work. But the vast majority of teachers are trying everything they can to increase their students’ achievement. There is certainly no evidence that there are teachers who are holding on to a secret proven method for teaching fractions until someone pays them more money. So, performance-related pay is impossible to implement fairly, does not seem to work, and even if it can be made to work, will make a difference only for that small minority of teachers who are not already trying their best.

As noted previously, for many years, researchers and politicians assumed that one teacher was as good as another, providing each was adequately qualified for the job. However, in 1996, William Sanders and June Rivers published a paper based on an analysis of three million records on the achievement of Tennessee’s entire student population from second to eighth grade. Because of the way they collected the data, it was possible to track the progress each individual student made and match that to the teacher who had been teaching them each year. They found that there were differences in how much students learned with different teachers, and that these differences were large. To show how large the differences were, they classified the teachers into five equally sized groups based on how much progress their students had made (low, below average, average, above average, and high). They then examined how an average eight-year-old student would fare, depending on what teachers he or she got. What they found was rather surprising. A student who started second grade at the 50th percentile of achievement would end up at the 90th percentile of achievement after three years with a high-performing teacher but, if assigned to the classes of low-performing teachers for three years, would end up at the 37th percentile of achievement—a difference of over 50 percentile points. They found that increases in teacher quality were most beneficial for lower-achieving students, and the general effects were the same for students from different ethnic backgrounds (Sanders & Rivers, 1996).

Subsequent studies (for example, Rivkin, Hanushek, & Kain, 2005; Rockoff, 2004) confirmed the link between teacher quality and student progress on standardized tests. While different studies have found slightly different results, there is now a clear consensus among researchers that the correlation between teacher quality and student progress is at least 0.1, and may be over 0.2, especially for mathematics, as the data in table 1.1 clearly indicate. A correlation of 0.1 would mean that if a student was taught by an above-average teacher (for example, a teacher who is one standard deviation above the mean), then that student would make 0.1 standard deviations’ more progress in a year than a student taught by an average teacher. Since for most students in these studies, one year’s progress is around 0.4 standard deviations, this would equate to a 25 percent increase in the rate of learning.

Table 1.1: Correlation Between Teacher Quality and Student Progress in Reading and Mathematics

The estimates in table 1.1 are based on the progress made from one year to the next on standardized tests, and these numbers could be very different if we looked at other measures of student achievement. However, there is no evidence that having a good teacher is more important for performance on standardized tests than it is for other measures of performance. As far as we know, having a good teacher makes a difference no matter what the subject, or the age of the student.

One objection to this argument is that teachers may appear to make more progress with their students not because they are better teachers, but because they teach higher-achieving students. There is no doubt that in many school districts, teachers with seniority have influence over the classes they are assigned to teach. To test this explanation, the Measures of Effective Teaching (MET) project, funded by the Bill and Melinda Gates Foundation, identified teachers who had been successful in one school, and reassigned them to teach in another school, often teaching students of different socioeconomic backgrounds. The project found that the teachers who were more successful in one school were more successful in a very different school (Kane, McCaffrey, Miller, & Staiger, 2013). While the circumstances in which teachers work—the time they have to plan instruction, the quality of curriculum with which they are working, the size of classes they teach—undoubtedly have an influence, there is something that successful teachers carry around with them in their heads that makes them more effective wherever they work.

The average correlations for reading and mathematics in table 1.1 are 0.14 and 0.18 respectively, so 0.15 is a reasonable average value for the correlation between teacher quality and student progress. What this means is that an increase of one standard deviation of teacher quality would result in an increase of 0.15 standard deviations in student achievement, equivalent to moving a student from the 50th to the 56th percentile of achievement. Averaged across K–12, one year’s progress is equivalent to moving a student from the 50th to the 65th percentile of achievement, so having an above-average teacher (one standard deviation above the mean, or capable of increasing a student’s achievement by 6 percentile points in addition to the 15 percentile points of one year’s progress) improves student learning by an extra five months’ progress per year.

At the extremes, these effects are even more pronounced. Take a group of fifty teachers. The students of the most effective teacher in that group will learn in six months what the students of the average teachers will learn in a year. And those students of the least effective teacher in that group of fifty teachers are likely to take two years to learn the same material. In other words, the very best teachers generate learning in their students at four times the rate of the least effective teachers.

Just as important, teacher quality appears to play a significant role in promoting equality of outcomes. In the United States, many policymakers seem to have assumed that excellence and equity are somehow in tension—that we can have one or the other, but not both. However, evidence from international comparisons has shown that the countries with the high average scores also tend to have a narrow range of achievement (Bursten, 1992; Mullis, Martin, & Foy, 2008; OECD, 2016).

As noted previously, Sanders and Rivers (1996) find that increases in teacher quality confer greater benefits on low achievers than high achievers, and a particularly well-designed study of fifteen- and sixteen-year-olds (Slater, Davies, & Burgess, 2008) also finds that the benefits of having a high-performing teacher are greatest for low achievers (although, interestingly, it also finds that high achievers benefit more than those of average achievement). In their work in kindergarten and first-grade classrooms, Bridget Hamre and Robert Pianta (2005) find that in the classrooms where students make the most progress in reading and mathematics (as measured by scores on the revised Woodcock–Johnson Psychoeducational Battery, a standardized test of cognitive ability and academic achievement), students from socioeconomically disadvantaged backgrounds make as much progress as those from wealthy homes, and those with behavioral difficulties (such as aggressive or defiant behaviors) progress as well as those without. In other words, when teachers are more effective on average, they are disproportionately more effective for at-risk students.

This last finding is particularly important because it shows that Basil Bernstein was wrong—education can compensate for society, provided it is of high quality. Ideally, in the short term, we would concentrate on how to get the highest-quality teachers to the students who need them most—schools will only secure equitable outcomes by ensuring that the lowest-achieving students get the best teachers. In the short term, this means taking the best teachers away from the high-achieving students, which is politically challenging, to say the least.

In the longer term, a focus on improving teacher quality will mean that teacher allocation will no longer be a zero-sum game. Our focus on the achievement gap draws attention to the gap between the high achievers and the low achievers. The problem with thinking about this as an issue of gaps is that one can reduce the gap either by improving the performance of the lowest achievers or by reducing the achievement of the highest achievers. This leads back to the traditional, and now discredited, thinking that equity is the enemy of excellence. Rather than thinking about narrowing the gap, we should set a goal of proficiency for all, excellence for many, with all student groups fairly represented in the excellent. And the way to achieve this is simply to increase teacher quality. As Michael Barber says, “The quality of an education system cannot exceed the quality of its teachers” (Barber & Mourshed, 2007, p. 19).

Ways to Increase Teacher Quality

The realization that teacher quality is the single most important variable in an education system has led to an exploration of how teacher quality can be improved, and there are only two options. The first is to attempt to replace existing teachers with better ones, which includes both deselecting serving teachers and improving the quality of entrants to the profession. The second is to improve the quality of teachers already in the profession.

Because past attempts to improve the performance of serving teachers have achieved so little success, some authors suggest that the only way to improve the profession is through replacement, including both rigorous deselection and increasing the threshold for entry into the profession (see, for example, Hanushek, 2010).

Teacher deselection may be politically attractive—after all, who could possibly be against getting rid of ineffective teachers? But it is hard to do, it may not work anyway, and even when it does work, it is a slow process (see, for example, Winters & Cowen, 2013). First, it is hard to do because, although we know that teachers make a difference, it is very difficult, if not impossible, to work out who really are the least effective teachers. Second, it may not work because, to be effective, you must be able to replace the teachers you deselect with better ones, and that depends on whether there are better potential teachers not currently teaching. Jack Welch famously believes in getting rid of the lowest-performing 10 percent of employees each year (Welch & Welch, 2005). Such an approach sounds a little like the joke that “firings will continue until morale improves,” but even if it does not have a negative impact on those who remain employed, it is only effective if one can replace the deselected 10 percent with better employees. When those recruited to fill the vacancies are worse than those fired, the 10 percent rule is guaranteed to lower average employee quality.

The third problem with deselection is that it is very slow. Replacing the lowest 10 percent with teachers who are only slightly better will take many years to have any noticeable impact on average teacher quality. Given the difficulty with deselecting the right teachers, it is natural that much attention has focused instead on improving the quality of entrants into the profession.

When looking for ways to improve educational achievement, many people look at high-performing countries like Finland and Singapore and note that these countries hold teaching in high regard (Tucker, 2011). Therefore, not surprisingly, many people want to be teachers, so competition for places on preservice teacher preparation programs is intense. For example, at the end of twelfth grade, students at Finnish high schools take a series of end-of-course tests. In addition, all students who want to go on to university in Finland must take a national written test. Those who perform well then take a second test set by the university to which they have applied. In 2014, the University of Helsinki had 1,650 applicants for the 120 places on its preservice teacher program (Sahlberg, 2015). The university selected approximately seventy students based on a combination of their school end-of-course test scores and their scores on the university tests, and fifty students based on the university tests alone. While it is not true to say that Finland recruits only the highest-achieving students, it certainly draws from the highest-achieving one-third of students (Ingersoll, 2007).

In contrast, the United States tends to draw teachers from the lower end of the college achievement range. According to Marigee Bacolod (2007), only around 10 percent of teachers who began their careers in the mid-1980s were academic high achievers in high school (in the top 20 percent), while for other professions, the figure is over 60 percent. This has, predictably, led to calls to recruit more academic high achievers into teaching. The problem with this attractive solution is that there is little evidence that academic achievement has much of a relationship with teacher quality (Harris & Sass, 2009).

Now this does not, of course, mean that academic ability is unimportant in teaching—clearly a certain amount of academic ability is necessary to be a teacher—and so it probably makes sense to require teachers to have college degrees. But what many people find surprising is that, beyond this, qualifications do not seem to matter much (Harris & Sass, 2009). Teachers with higher college grade point averages do not seem to be any more effective than other teachers. Perhaps even more surprisingly, teachers with master’s degrees in education are no more effective than those with just a bachelor’s degree (Harris & Sass, 2009). Some researchers have gone so far as to claim that the only teacher variable that consistently predicts how much students will learn is teacher IQ (Hanushek & Rivkin, 2006), although other studies (like Harris & Sass, 2009) find no consistent relationship between teachers’ intellectual abilities and the progress of their students.

In the 21st century, researchers have made some progress in determining what kinds of teacher knowledge do contribute to student progress. For example, elementary school teachers’ scores on a test of mathematical knowledge for teaching (MKT) correlated significantly with their students’ progress in mathematics (Hill, Rowan, & Ball, 2005). Although the effect was greater than the impact of socioeconomic status or race, it was, in real terms, small; a one standard deviation increase in a teacher’s mathematical knowledge for teaching resulted in a 4 percent increase in a student’s rate of learning. In other words, students whose teacher scored highly on the MKT (that is, one standard deviation above the mean) would learn in fifty weeks what a student whose teacher scored the average would learn in fifty-two weeks—a difference, but not a big one. Or, to put it another way, we saw earlier that one standard deviation of teacher quality increases the rate of student learning by around 40 percent, and we have just seen that one standard deviation of mathematical knowledge for teaching increases the rate of student learning by 4 percent. This suggests that subject knowledge accounts for only around 10 percent of the variability in teacher quality.

While the impact of teacher knowledge on student progress in Hill et al.’s (2005) research may be disappointingly small, this is, in fact, one of the strongest results in the research literature. A study of over thirteen thousand teachers, involving almost one million items of data on over three hundred thousand students in the Los Angeles Unified School District (LAUSD), shows that student progress is unrelated to teachers’ scores on licensure examinations, nor are teachers with advanced degrees more effective (Buddin & Zamarro, 2009). Most surprising, in the study, there is no relationship between LAUSD teachers’ scores on the Reading Instruction Competence Assessment (which all elementary school teachers must pass) and their students’ scores in reading. As the researchers themselves note, since this test is a requirement for all elementary school teachers, those who fail the test are not permitted to teach, and so we cannot conclude that the test is not effective in screening out weaker teachers, but the results do suggest that the relationship between teachers’ knowledge of reading pedagogy and student progress in reading is, at best, weak and perhaps nonexistent (Buddin & Zamarro, 2009).

In an article in the New Yorker, Malcolm Gladwell (2008a) likens this situation to the difficulties of finding a good quarterback for the National Football League (NFL). Apparently, for most positions, how well a player plays in college predicts how well he will fare in the NFL, but at quarterback, how well a player plays in college seems to be useless at predicting how well he will play in the pros.

One theory about why good—and often even outstanding—college quarterbacks fail in the NFL is that the professional game is so complex (Gladwell, 2008a). To try to mitigate this discrepancy, all players drafted in the NFL now take the Wonderlic Personnel Test—a fifty-item test that assesses arithmetic, geometric, logical, and verbal reasoning. Unfortunately, as several studies have shown (for example, Mirabile, 2005), there does not appear to be any clear relationship between scores on the Wonderlic and how good a quarterback will be in the NFL. In 1999, for example, of the five quarterbacks taken in the first round of the draft, only one—Donovan McNabb—is likely to end up in the Hall of Fame, and yet his score was the lowest of the five. Other quarterbacks scoring in the same range as McNabb include Dan Marino and Terry Bradshaw—widely agreed to be two of the greatest quarterbacks ever (Mirabile, 2005). Although efforts continue to try to predict who will do well and who will not within the NFL, Gladwell (2008a) suggests that there is increasing acceptance that the only way to find out whether someone will do well in the NFL is to try him out in the NFL.

The same appears to be true for teaching. It may be that the only way to find out whether someone has what it takes to be a teacher is to try him or her out in the classroom, even though Thomas Kane and Douglas Staiger have estimated that we might need to try out four prospective teachers to get one good one (as cited in Gladwell, 2008a).

Even if we could identify in advance who would make the best teachers, doing anything useful with that information would take a long time. Suppose, for example, we could predict exactly how good each teacher was going to be. Suppose also that we had the luxury of so many people wanting to be teachers that we could raise the bar to a level whereby only two-thirds of those who are currently entering the profession would get in. Over time, this would certainly raise the quality of teachers. However, if we “raise the bar” for entry into teaching today, it would still be forty years before those who had begun teaching before the raising of the bar finally retired.

We can’t wait that long. While deselecting the least effective teachers and trying to raise the quality of those entering the profession will have some effects, they are likely to be small, and nothing like the kinds of improvements in teacher quality our students need. If we are serious about securing our economic future, we have to help improve the quality of those teachers already working in our schools—what Marnie Thompson, my former colleague at the Educational Testing Service, calls the “love the one you’re with” strategy.

Conclusion

Improving educational outcomes is a vital economic necessity, and the only way that we can achieve this is by increasing the quality of the teaching force. Identifying the least effective teachers and deselecting them has a role to play, as does trying to increase the quality of those entering the profession, but as the data and the research studies examined in this chapter have shown, the effects of these measures will be small and will take a long time to materialize. In short, if we rely on these measures to raise student achievement, the benefits will be too small and will arrive too late to maintain the United States’ status as one of the world’s leading economies. Our future economic prosperity, therefore, depends on investing in those teachers already working in our schools.

Подняться наверх