the merit of merit pay

Big new study finds that performance bonuses for teachers boost test scores (a bit)

PHOTO: Megan Witucki
Megan Witucki, a teacher at Compass Montessori School in Wheat Ridge, works with students.

A school district leader with money for teacher bonuses faces a choice: Should she spread the money around to all teachers equally or give more to teachers who have performed best?

A new study, released by the federal government, suggests that merit-based bonuses are the way to go, as they help raise student test scores without making a significant dent in teacher morale. It offers the latest evidence that programs of this sort can help schools and students, despite the common perception that they are ineffective.

The research focuses on a federal program known as the Teacher Incentive Fund, and compares schools that gave all teachers an automatic 1 percent bonus to those schools that gave bonuses based on classroom observations and student test scores. About 130 schools across 10 districts were randomly assigned to one of the groups.

The result? The schools that gave performance bonuses boosted student test scores throughout the four years of the study, between 2011 and 2015.

The differences between the two groups of schools weren’t big, and they were only sometimes statistically significant. For instance, in year three of the study, performance pay increased student test score performance by 2 percentile points.

A graph showing the share of teachers getting different size bonuses in districts that implemented performance pay.

But because the program was relatively cheap to implement, performance bonuses were very cost effective. They offered a better bang for the buck than class-size reductions, the researchers estimated.

And what about fears that performance pay would dampen teacher morale or reduce collaboration as teachers competed with each other for a fixed pool of bonus money? There was little evidence of that. According to teacher surveys, the program had only a limited impact on their job satisfaction, interactions with colleagues, or school morale. In the initial year of performance pay, there seemed to be small dips in morale and collaboration, but in year three the effects were actually positive.

The program also led to small increases in teacher retention, which may help explain the positive findings.

The study comes with an important caveat: It relies on test scores alone as a measure of learning. If the incentives pushed teachers to raise scores by less desirable means, like cheating or “teaching to the test,” that wouldn’t say much about the success of the policy.

The research adds to the hotly contested debate about how teachers should be paid. High-profile research in New York City and Tennessee from several years ago found disappointing results for merit pay, solidifying a conventional wisdom that the approach doesn’t work for teachers. Other districts, like Denver, have struggled with the design of a performance pay program and have seen inconsistent results.

But some recent studies have re-opened this debate. An overview of research last year showed that performance pay leads to small boosts in test scores, and a number of studies have found that added pay can help keep more effective teachers in the classroom.

It’s not entirely clear why the studies have reached such different conclusions. But performance-pay programs that judge teachers by test scores alone and only measure short-term effects tend to be disappointing. Programs like the Teacher Incentive Fund and those in Minnesota and Washington, DC, which use performance pay as part of a broader system for evaluating teachers, often produce more positive results.

That doesn’t mean that districts will heed the latest research. In fact, even in the new federal study — which found benefits for students — less than half the districts said they planned to continue offering performance bonuses after the Department of Education grant ran out.

testing 1-2-3

Tennessee students to test the test under reworked computer platform

PHOTO: Getty Images

About 45,000 high school students in a third of Tennessee districts will log on Tuesday for a 40-minute simulation to make sure the state’s testing company has worked the bugs out of its online platform.

That platform, called Nextera, was rife with glitches last spring, disrupting days of testing and mostly disqualifying the results from the state’s accountability systems for students, teachers, and schools.

This week’s simulation is designed to make sure those technical problems don’t happen again under Questar, which in June will finish out its contract to administer the state’s TNReady assessment.

Tuesday’s trial run will begin at 8:30 a.m. in participating high schools statewide to simulate testing scheduled for Nov. 26-Dec. 14, when some students will take their TNReady exams. Another simulation is planned before spring testing begins in April on a much larger scale.

The simulation is expected to involve far more than the 30,000 students who will test in real life after Thanksgiving. It also will take into account that Tennessee is split into two time zones.

“We’re looking at a true simulation,” said Education Commissioner Candice McQueen, noting that students on Eastern Time will be submitting their trial test forms while students on Central Time are logging on to their computers and tablets.

The goal is to verify that Questar, which has struggled to deliver a clean TNReady administration the last two years, has fixed the online problems that caused headaches for students who tried unsuccessfully to log on or submit their end-of-course tests.


Here’s a list of everything that went wrong with TNReady testing in 2018


The two primary culprits were functions that Questar added after a successful administration of TNReady last fall but before spring testing began in April: 1) a text-to-speech tool that enabled students with special needs to receive audible instructions; and 2) coupling the test’s login system with a new system for teachers to build practice tests.

Because Questar made the changes without conferring with the state, the company breached its contract and was docked $2.5 million out of its $30 million agreement.

“At the end of the day, this is about vendor execution,” McQueen told members of the State Board of Education last week. “We feel like there was a readiness on the part of the department and the districts … but our vendor execution was poor.”

PHOTO: TN.gov
Education Commissioner Candice McQueen

She added: “That’s why we’re taking extra precautions to verify in real time, before the testing window, that things have actually been accomplished.”

By the year’s end, Tennessee plans to request proposals from other companies to take over its testing program beginning in the fall of 2019, with a contract likely to be awarded in April.

The administration of outgoing Gov. Bill Haslam has kept both of Tennessee’s top gubernatorial candidates — Democrat Karl Dean and Republican Bill Lee — in the loop about the process. Officials say they want to avoid the pitfalls that happened as the state raced to find a new vendor in 2014 after the legislature pulled the plug on participating in a multi-state testing consortium known as PARCC.


Why state lawmakers share the blame, too, for TNReady testing headaches


“We feel like, during the first RFP process, there was lots of content expertise, meaning people who understood math and English language arts,” McQueen said. “But the need to have folks that understand assessment deeply as well as the technical side of assessment was potentially missing.”

Academic Accountability

Coming soon: Not one, but two ratings for every Chicago school

Starting this month, Chicago schools will have to juggle two ratings — one from the school district, and another from the state.

The Illinois State Board of Education is scheduled to release on October 31 its annual report cards for schools across the state. This year, for the first time, each school will receive one of four quality stamps from the state: an “exemplary” or “commendable” rating signal the school is meeting standards while an “underperforming” or “lowest performing” designation could trigger intervention, according to state board of education spokeswoman Jackie Matthews.

A federal accountability law, the Every Student Succeeds Act, requires these new ratings.

To complicate matters, the city and state ratings are each based on different underlying metrics and even a different set of standardized tests. The state ratings, for example, are based on a modified version of the PARCC assessment, while Chicago ratings are based largely on the NWEA. The new state ratings, like those the school district issues, can be given out without observers ever having visited a classroom, which is why critics argue that the approach lacks the qualitative metrics necessary to assess the learning, teaching, and leadership at individual schools.

Patricia Brekke, principal at Back of the Yards College Preparatory High School, said she’s still waiting to see how the ratings will be used, “and how that matters for us,” but that parents at her school aren’t necessarily focused on what the state says.

“What our parents usually want to know is what [Chicago Public Schools] says about us, and how we’re doing in comparison to other schools nearby that their children are interested in,” she said.

Educators at Chicago Public Schools understand the power of school quality ratings.  The district already has its own five-tiered rating system: Level 1+ and Level 1 designate the highest performing schools, Level 2+ and Level 2 describe for average and below average performing schools, respectively, and Level 3, the lowest performance rating, is for schools in need of “intensive intervention.” The ratings help parents decide where to enroll their children, and are supposed to signal to the district that the school needs more support. But the ratings are also the source of angst — used to justify replacing school leaders, closing schools, or opening new schools in neighborhoods where options are deemed inadequate.

In contrast, the state’s school quality designations actually target underperforming and lowest-performing schools with additional federal funding and support with the goal of improving student outcomes. Matthews said schools will work with “school support managers” from the state to do a self-inquiry and identify areas for improvement. She described Chicago’s school quality rating system as “a local dashboard that they have developed to communicate with their communities.”

Staff from the Illinois State Board of Education will be traveling around the state next week to meet with district leaders and principals to discuss the new accountability system, including the ratings. They’ll be in Bloomington, Marion, O’Fallon, Chicago, and Melrose Park. The Chicago meeting is Wednesday, Oct. 24, at 5 p.m. at Chicago Public Schools headquarters.

Rae Clementz, director of assessment and accountability at the state board said that a second set of ratings reveals “that there are multiple valid ways to look at school quality and success; it’s a richer picture.”

Under auspices of the Every Student Succeeds Act, the state school report cards released at the end of the month for elementary schools are 75 percent based on academics, including English language arts and math test scores, English learner progress as measured by the ACCESS test, and academic growth. The other 25 percent reflects the school climate and success, such as attendance and chronic absenteeism.

Other measures are slated to be phased in over the next several years, including academic indicators like science proficiency and school quality indicators, such as school climate surveys of staff, students and parents

High school designations take a similar approach with English and math test scores but will take into account graduation rates, instead of academic growth, and also includes the percentage of  9th graders on track to graduate — that is freshmen who earn 10 semester credits, and no more than one semester F in a core course.

Critics of Chicago’s school rating system argue that the ratings correlate more with socioeconomic status and race than they do school quality, and say little about what’s happening in classrooms and how kids are learning. Chicago does try to mitigate these issues with a greater emphasis on growth in test scores rather than absolute attainment, school climate surveys, and including academic growth by priority groups, like African-American, Latino, ELL, and students in special education.

Cory Koedel, a professor of economics and public policy at the University of Missouri, said that many rating systems basically capture poverty status with a focus on how high or low students score on tests. Chicago’s approach is fairer than that of many other school systems.

“What I like about this is it does seem to have a high weight on growth and lower weight on attainment levels,” he said.

Morgan Polikoff, a professor at University of Southern California’s school of education, said that Chicago’s emphasis on student growth is a good thing “if the purpose of the system is to identify schools doing a good job educating kids.”

Chicago weights 50 percent of the rating on growth, but he’s seen 35 to as low as 15 percent at other districts. But he said the school district’s reliance on the NWEA test rather than the PARCC test used in the state school ratings was atypical.

“It’s not a state test, and though they say it aligns with standards, I know from talking to educators that a lot of them feel the tests are not well aligned with what they are supposed to be teaching,” he said. “It’s just a little odd to me they would have state assessment data, which is what they are held accountable for with the state, but use the other data.”

He’s skeptical about school systems relying too heavily on standardized test scores, whether the SAT, PARCC or NWEA, because “You worry that now you’re just turning the curriculum to test prep, and that’s an incentive you don’t want to create for educators.”

He said the high school measures in particular include a wide array of measures, including measures that follow students into college, “so I love that.”

“I really like the idea of broadening the set of indicators on which we evaluate schools and encouraging schools to really pay attention to how well they prepare students for what comes next,” he said.