Читать книгу The New Art and Science of Classroom Assessment - Robert J Marzano - Страница 8
ОглавлениеCHAPTER 1
The Assessment-Friendly Curriculum
The starting place for a new assessment paradigm is a curriculum that provides teachers with clear guidance in terms of what they should assess and how they should assess it. At first, this might sound like a very simple undertaking. After all, don’t schools and districts already have standards that teachers are supposed to follow when designing assessments? While the answer to this question is yes, the standards themselves do not provide much assessment guidance. That is one of the primary messages of this chapter. In fact, national, provincial, state, and local standards as currently written actually muddy the waters in terms of classroom assessments. More pointedly, we believe that the standards movement has unwittingly hurt classroom assessment practices as much as it has helped.
In this chapter, we discuss the problem with standards and practices that render them inconsequential. We describe the limited assessment focus of standards and the need to create supplemental measurement topics.
The Problem With Standards
There are at least three reasons why standards do not provide classroom teachers with adequate guidance in using classroom assessments: (1) too much content, (2) redundancy, and (3) equivocal descriptions of content.
Too Much Content
To illustrate the problem of too much content, consider the following mathematics standard: “Understands the properties of operations with rational numbers (for example, distributive property, commutative and associative properties of addition and multiplication, inverse properties, identity properties)” (standard 3, grades 6–8; Mid-continent Research for Education and Learning [McREL], 2014a).
If we unpack the content in this standard, it becomes clear that it contains at least five elements. The student:
1. Understands the distributive property with rational numbers
2. Understands the commutative property of addition with rational numbers
3. Understands the commutative property of multiplication with rational numbers
4. Understands the inverse property of rational numbers
5. Understands the identity properties of rational numbers
While this standard is for mathematics, the same problem holds true for other subject areas. This is the crux of the problem that standards documents have created for classroom teachers who wish to design highly focused classroom assessments. For example, consider the following middle school science standard.
MS-LS1–4. Use argument based on empirical evidence and scientific reasoning to support an explanation for how characteristic animal behaviors and specialized plant structures affect the probability of successful reproduction of animals and plants respectively. (NGSS Lead States, 2013)
This standard appears relatively straightforward and fairly focused until we consider the clarification statement that accompanies it:
Examples of behaviors that affect the probability of animal reproduction could include nest building to protect young from cold, herding of animals to protect young from predators, and vocalization of animals and colorful plumage to attract mates for breeding. Examples of animal behaviors that affect the probability of plant reproduction could include transferring pollen or seeds, and creating conditions for seed germination and growth. Examples of plant structures could include bright flowers attracting butterflies that transfer pollen, flower nectar and odors that attract insects that transfer pollen, and hard shells on nuts that squirrels bury. (NGSS Lead States, 2013)
If we unpack the content in this standard and its clarification statement, a number of topics rise to the surface. The student:
• Understands how to identify empirical evidence and how to use it in an argument
• Understands scientific reasoning and how to use it in an argument
• Understands examples of animal behaviors
• Understands how specific animal behaviors affect successful reproduction
• Uses empirical evidence and scientific reasoning to explain how and why specific animal behaviors affect successful reproduction
• Understands examples of plant behaviors
• Understands how specific plant behaviors affect successful reproduction
• Uses empirical evidence and scientific reasoning to explain how and why specific plant behaviors affect successful reproduction
In effect, standards documents typically embed so much content in a single statement that it would be impossible to assess (or teach) all those topics in the amount of time available to teachers. To illustrate, Robert J. Marzano, David C. Yanoski, Jan K. Hoegh, and Julia A. Simms (2013) identify seventy-three standards statements (which they refer to as elements) for eighth-grade English language arts (ELA) within the Common Core State Standards (CCSS; NGA & CCSSO, n.d.a, n.d.b, 2010a, 2010b, 2010c). If we assume an average of five topics embedded in each element, which seems reasonable given the previous example, then we can conclude that eighth-grade ELA teachers must assess and teach 365 topics in a single school year.
This problem is prevalent in every subject area. Figure 1.1 provides a few more examples of standards with embedded topics.
Source for standards: McREL, 2014a.
Figure 1.1: Multiple topics in content standards.
Redundancy
The second problem with standards is that they include a great amount of redundant content. This was one of the findings of Julia A. Simms (2016) in her analysis of the CCSS. To illustrate, consider the topic of “examining claims and evidence.” When examining the eighth-grade CCSS for ELA standards and benchmarks, Simms (2016) finds overlapping aspects of this in seven different standards or benchmark statements. See figure 1.2, which depicts six unpacked ELA standards at the eighth-grade level: RI.8.8, W.8.1b, W.8.1a, W.8.1, SL.8.3, and SL.8.1d.
Source for standards: NGA & CCSSO, 2010a.
Figure 1.2: Overlapping components in ELA standards at the eighth-grade level.
When unpacked, the Common Core standard RI.8.8 has five statements, standard W.8.1b has two statements, and so on. In all, there are twenty-two statements embedded in six standards. Even though these statements employ different phrasing, they pretty much all deal with claims, evidence, and reasoning. While the problem of redundancy might seem to mitigate the problem of too much content, it still adds to the teacher’s workload by requiring him or her to analyze standards in the manner that Simms (2016) exemplifies.
Equivocal Descriptions of Content
The final problem with current standards statements is that many of them are highly equivocal—they are open to a number of possible interpretations. To illustrate, consider the following standard from grade 4 mathematics:
Solve multistep word problems with whole numbers and have whole number answers using the four operations (addition, subtraction, multiplication and division) including division word problems in which remainders must be interpreted. (4.0A.A.3, NGA & CCSSO, 2010b)
While it is clear that the standard’s overall focus is multistep problems with whole numbers and whole-number answers, such problems are very different across the operations for addition, subtraction, multiplication, and division. Consider the following four problems that would appear to fulfill this standard.
1. Addition problem: James wants to paint racing stripes around his room. His room is 11 feet long and 10 feet wide. If James paints a stripe to go around the top and bottom of his room, how many linear feet of racing stripes will he need?
We might represent the reasoning involved in this problem in the following way.
♦ 10 + 10 + 11 + 11 = 42 feet
♦ Top of room = 42 feet
♦ Bottom of room = 42 feet
♦ 42 + 42 = 84 feet
James will need to paint 84 feet of racing stripes.
2. Subtraction problem: Kelly works at her family’s pet store. She put 272 bags of dog food on the shelf. Last week, customers bought 117 bags. How many bags were left? Her parents also need to place an order to buy more dog food when they reach 50 bags. How many more bags can they sell before they need to place a new order?
We might represent the reasoning involved in this problem in the following way.
♦ 272 – 117 = 155 bags remaining
♦ 155 – 50 = ?
♦ 155 – 50 = 105
The family can sell 105 bags before they need to reorder.
3. Multiplication problem: Aidan collects baseball cards. His collection currently includes 48 players. His brother has 4 times as many cards, and his friend has 3 times as many. How many cards do Aidan’s brother and friend have in all?
We might represent the reasoning involved in this problem in the following way.
♦ 48 × 4 = brother
♦ 48 × 3 = friend
♦ Brother = 192 cards
♦ Friend = 144 cards
♦ 192 + 144 = 336 cards
Aidan’s brother and friend have 336 cards in all.
4. Division problem: Libby collects concert shirts. Each shirt costs $12. If Libby has $120, how many concert shirts can she buy? If Libby saved $9 per month, how long would it take her to save enough money to buy that many shirts?
We might represent the reasoning involved in this problem in the following way.
♦ $120 ÷ 12 = 10 shirts
♦ Libby can buy 10 shirts with $120.
♦ $120 ÷ $9 = 13.33
It would take Libby thirteen months and about ten days to save enough money to buy ten shirts.
Clearly the steps in reasoning necessary to solve these problems have some significant differences from type to type. The steps to solving the addition problem are straightforward. Students find the perimeter by adding the length of the sides and then doubling that quantity to account for the two stripes.
The steps to the subtraction problem are more involved. Students first determine the remaining number of bags after 117 are sold. This is quite simple. Students then compute the difference between the remaining number of bags and the threshold number of 50. Although this step involves subtraction, as did the first step, it has a totally new perspective.
The multiplication problem begins with students multiplying two numbers—one for the brother and one for the friend. The next step involves addition.
The division problem involves the most complex set of steps. It begins with a straightforward division task—the total amount of money available divided by the cost of each shirt. The problem then shifts contexts. Students must take the total amount of money available and divide it by an amount of money that Libby can save each month. However, this step also involves dealing with a remainder, which adds complexity.
Looking at the problem types, the subtraction problem is the only one of the four that requires students to perform an operation on a quantity that the directions to the problem do not explicitly state. The division problem is the only one that involves a remainder. One can also make the case that each of the four problems makes some unique cognitive demands simply to understand it.
The reason this standard is equivocal, then, is that it does not make clear how important the four operations are to demonstrating proficiency. It does not make clear how important remainders are to demonstrating proficiency, and it seems to treat four operations equally, although they have significant differences in their execution.
As another example of equivocality in standards, consider the following high school ELA standard and the upper elementary civics standards.
ELA (high school): Analyze multiple interpretations of a story, drama, or poem (e.g., recorded or live production of a play or recorded novel or poetry), evaluating how each version interprets the source text (RL.11–12.7; NGA & CCSSO, 2010a)
Civics (grades 3–5): Identify the major duties, powers, privileges, and limitations of a position of leadership (e.g., class president, mayor, state senator, tribal chairperson, president of the United States) … evaluate the strengths and weaknesses of candidates in terms of the qualifications of a particular leadership role. (section H, standard 1; Center for Civic Education, 2014)
In both of these examples, assessments would be quite different depending on a teacher’s selection of available options. For example, in the ELA standard, comparing the treatment of the same content in a story and a poem is a quite different task from comparing a story and a play. In the civics standard, knowing the duties, powers, and privileges of a class president is a quite different task from knowing the duties of a state senator.
Standards as Inconsequential
We have observed two practices that appear to address the problems associated with standards but, in fact, render standards inconsequential: (1) tagging multiple standards and (2) relying on sampling across standards.
Tagging Multiple Standards
One common practice is for teachers to assess standards by simply tagging multiple standards in the tests they give. For example, assume that a teacher has created the following assessment in a seventh-grade ELA class.
We have been reading Roll of Thunder, Hear My Cry by Mildred D. Taylor, which tells the story of Cassie Logan and her family who live in rural Mississippi. In the novel, Taylor develops several themes. Describe how the author develops the theme of the importance of family through characters, setting, and plot. Compare the importance of family theme with one other theme from the book. Write a short essay that explains which of the two themes you think is the most important to the development of the novel. Justify your choice with logical reasoning and provide textual evidence.
Because the teacher must cover all the seventh-grade ELA standards, he or she simply identifies all those standards directly or tangentially associated with this assessment. For example, the teacher might assert that this assessment addresses the following Common Core standards to one degree or another:
RL.7.1
Cite several pieces of textual evidence to support analysis of what the text says explicitly as well as inferences drawn from the text.
RL.7.2
Determine a theme or central idea of a text and analyze its development over the course of the text; provide an objective summary of the text.
RL.7.10
By the end of the year, read and comprehend literature, including stories, dramas, and poems, in the grades 6–8 text complexity band proficiently, with scaffolding as needed at the high end of the range.
WHST.6–8.1.B
Support claim(s) with logical reasoning and relevant, accurate data and evidence that demonstrate an understanding of the topic or text, using credible sources.
WHST. 6–8.10
Write routinely over extended time frames (time for reflection and revision) and shorter time frames (a single sitting or a day or two) for a range of discipline-specific tasks, purposes, and audiences. (NGA & CCSSO, 2010a)
In effect, then, the teacher uses the score on one test to represent a student’s standing on five separate standards. Such an approach gives the perception that teachers are addressing standards but in reality, it constitutes a record-keeping convention that wastes teachers’ time and renders the standards inconsequential. In fact, we believe that this approach is actually the antithesis of using standards meaningfully.
Relying on Sampling Across Standards
At first glance, it might appear that designing assessments that sample content from multiple standards solves the problem of too much content. If a teacher has seventy-three standards statements to cover in a year, he or she can design assessments that include items from multiple statements. One assessment might have items from three or more statements. If a teacher systematically samples across the standards in such a way to equally emphasize all topics, then in the aggregate, the test scores for a particular student should paint an accurate picture of the student’s standing within the subject area. This is different from and better than tagging because the teacher designs assessments by starting with the standards. With tagging, the teacher designs assessments and then looks for standards that appear to be related.
Even though sampling has an intuitive logic to it, it still doesn’t work well with classroom assessments. Indeed, sampling was designed for large-scale assessments, but even there it doesn’t work very well. To illustrate, consider the following example:
You are tasked with creating a test of science knowledge and skills for grade 5 students. The school will report test results at both the individual and school levels to help students, parents, teachers, and leaders understand how well students are learning the curriculum. The test must address a variety of topics such as X, Y, Z and, in order to effectively assess their knowledge, many of the items require students to construct and justify responses. Some of the items are multiple choice.
Pilot testing of items indicates that students require about 10 minutes to complete a constructed response item and about two minutes to complete a multiple-choice item. Your team has created 32 constructed response items and 16 multiple choice items that you feel cover all topics in the grade 5 science curriculum. Based on your estimates of how much time a student needs to complete items, the test will require approximately 6 hours to complete, not including time for set up and instructions, and breaks. And that’s just one content area test. (Childs & Jaciw, 2003, p. 8)
We can infer from the comments of Ruth A. Childs and Andrew P. Jaciw (2003) that adequate sampling, even for three topics, requires a very long assessment. As a side note, Childs and Jaciw (2003) imply that fifth-grade science involves three topics only (for example, X, Y, and Z). In fact, Simms (2016) has determined that fifth-grade science involves at least twelve topics, four times the amount of content that Childs and Jaciw’s (2003) example implies.
Finally, even with a relatively slim version of the content involved in fifth-grade science (three topics as opposed to twelve), and a test that requires six hours to complete, the sampling process might not be robust enough to justify reporting scores for individual students. Childs and Jaciw (2003) describe the following concern for any test that purports to provide accurate scores for individual students:
Whether there is enough information at the student level to report subscores may be a concern. For example, research by Gao, Shavelson, and Baxter (1994) suggests that each student must answer at least nine or ten performance tasks to avoid very large effects of person-by-item interactions. To produce reliable subscores, even more items may have to be administered. Given that there are limits in test administration time, it may not be feasible to administer enough items to support student-level subscores. Instead, only overall scores might be reported at the student level, while both overall scores and subscores are reported at the school level. (p. 8)
Despite these clear flaws in sampling procedures as the basis for test design, educators do it all the time. Everyone in the system (students, teachers, leaders, parents) relies on the resulting information to make important decisions that influence student grades, placement in classes and coursework, and advancement to the next grade or course.
As we mention in the introduction, using proficiency scales solves a variety of assessment problems, sampling being one of those. In a system that uses proficiency scales as a measurement tool, one might lose the ability to generalize across a content area using a single test but gain immense clarity in particular slices of the target domain (for example, fifth-grade science).
The Assessment Focus of Standards
Clearly, it is the case that standards statements as currently written are not effective vehicles to drive classroom assessment. We recommend that educators rewrite standards statements so they provide a clear and unequivocal focus for classroom assessments. We call these rewritten standards statements focus statements. Focus statements translate into measurement topics. As the name implies, these measurement topics are considered important enough to assess multiple times at the school level or district level in an effort to determine the most accurate scores for individual students. To illustrate, we present figure 1.3.
Source for standards: Adapted from McREL, 2014a.
Figure 1.3: Standards with focus statements.
The focus statements in figure 1.3 contain the essence of the content in the full standards statement with enough detail to provide guidance for assessment, but not so much as to add unnecessary complexity. As we demonstrate in chapter 2, proficiency scales add even more detail, but focus statements are a useful step in the process of identifying critical content. As we indicate in the last column of figure 1.3, once educators articulate focus statements, it is easy to translate them into measurement topics.
The wording of the focus statements in figure 1.3 highlights the type of knowledge they represent. Those that begin with the word knows or understands are examples of declarative knowledge. Those that begin with the word executes are examples of procedural knowledge. It is important to note that we did this to make a point—namely, that the content embedded in standards statements comes in two different forms—declarative and procedural knowledge. We believe this distinction is critical simply because assessments should reflect the type of knowledge on which they focus. We describe how to do this in chapter 3 (page 43).
For now, suffice it to say that different subject areas have differing proportions of declarative and procedural knowledge. To illustrate, we consider a McREL (2014b) study in which researchers analyze the standards in fourteen different subject areas and determine the distributions of declarative and procedural knowledge in those subject areas. They updated their analysis in 2008, producing the results we depict in table 1.1.
Table 1.1: Standards Relating to Percentages of Declarative Versus Procedural Knowledge
Subject | Declarative | Procedural |
Mathematics | 139 | 84 |
ELA | 86 | 254 |
Science | 253 | 8 |
History | 1,240 | 41 |
Geography | 230 | 8 |
Arts | 147 | 122 |
Civics | 426 | 1 |
Economics | 159 | 0 |
Foreign Language | 52 | 56 |
Health | 121 | 15 |
Physical Education | 47 | 58 |
Behavioral Studies | 100 | 0 |
Technology | 106 | 38 |
Life Skills | 67 | 241 |
Total | 3,173 (77.41 percent) | 926 (22.59 percent) |
Note: Procedural and contextual have been combined.Source: McREL, 2014b.
Notice that, in general, there are far more declarative standards than procedural standards. More specifically, 77 percent of the standards in this study involve declarative content, and 23 percent involve procedural content. That noted, there is some significant variation from subject area to subject area. For example, physical education, life skills, arts, and foreign language are about equal in terms of their distribution of declarative and procedural content, whereas behavioral studies and economics have no procedural content.
A Small List of Measurement Topics
Ultimately, the purpose of analyzing and restating standards is to identify a relatively small set of measurement topics as the subject of classroom assessment and instruction. This list constitutes the assessment-friendly curriculum that is essential for a new paradigm of classroom assessment.
Developing the assessment-friendly curriculum is somewhat of a value-driven decision, but research guidance does exist. Specifically, Simms (2016) finds that if one removes the redundancy in standards and considers only those that national assessments typically contain, then the list of essential measurement topics is quite small. Table 1.2 reports the number of essential topics in mathematics, science, and ELA.
Table 1.2: Essential Topics in Mathematics, Science, and ELA
Source: Simms, 2016.
The list of essential measurement topics is available in The Critical Concepts (Simms, 2016). Educators can use this list as a starting place as they translate their local or state standards into measurement topics. As we indicate in table 1.2, there are a relatively small number of measurement topics at each grade level or grade-level span. For example, consider fifth grade. There are fourteen essential measurement topics in mathematics, ten in science, and fifteen in ELA. Contrast this with the seventy-three topics from the Common Core State Standards for ELA at the eighth-grade level, which we discussed previously.
Narrowing down all the content in state and national standards into a small set means that the measurement topics will not include all content. What, then, do we do with all this leftover content? There are two basic approaches to answering this question: (1) relying on incidental learning and (2) creating a supplemental measurement topic.
Relying on Incidental Learning
Incidental learning is a largely untapped resource that teachers can leverage to enhance content coverage in a classroom that focuses on standards. To understand how this works, assume that a teacher was working with the following ten measurement topics in fourth-grade science.
1. Energy
2. Motion
3. Light and vision
4. Waves
5. Geographic features
6. Earth changes
7. Natural hazards
8. Natural resources
9. Plant needs
10. Animal needs
As we have discussed, national, provincial, state, and local standards documents would surely contain many other topics like Earth’s history, human impacts on resource use, and scientific contributions throughout history. Even though the measurement topics do not specifically include these topics, the teacher might integrate the content into instruction formally or informally. Formally means that the teacher actually plans for direct instruction in the supplemental content. Informally means that the teacher does not plan for direct instruction in the content but addresses it if it comes up naturally during class. For example, while discussing the topic of different methods of energy production, which is part of the measurement topic energy, the teacher might remember that the supplemental topic of human impact on resource use also applies to the example he or she is providing. This approach inserts additional content into the instructional process but doesn’t necessarily assess that content.
Relying on incidental learning mitigates the common misconception that if a teacher doesn’t test on it, then students don’t learn it. While it is true that students stand a better chance of remembering content they’ve taken tests on, it is also true that brief exposure to content gives them a good chance of remembering it. Some educators refer to this as fast mapping (Carey, 1978).
Creating a Supplemental Measurement Topic