First Person

Rigor Mortis And Measurement Error In New Evaluations

The word rigor comes up a lot in teacher-evaluation systems. It’s akin to motherhood, apple pie and the American flag. What policymaker is going to take a stand against rigor? But the term is getting distorted almost beyond recognition.

In science, a rigorous study is one in which the scientific claims are supported by the evidence. Scientific rigor is primarily determined by the study’s design and data-analysis methods. It has nothing to do with the substance of the scientific claims. A study that concludes that an educational program or intervention is ineffective, for example, is not inherently more rigorous than one that concludes that a program works.

In the current discourse on teacher-evaluation systems, however, an evaluation system is deemed rigorous based either on how much of the evaluation rests on direct measures of student-learning outcomes, or the distribution of teachers into the various rating categories, or both. If an evaluation system relies heavily on No Child Left Behind-style state standardized tests in reading and mathematics — say, 40 percent of the overall evaluation or more — its proponents are likely to describe it as rigorous. Similarly, if an evaluation system has four performance categories — e.g., ineffective, developing, effective and highly effective — a system that classifies very few teachers as highly effective and many teachers as ineffective may be labeled rigorous.

In these instances, the word rigor obscures the subjectivity involved in the final composite rating assigned to teachers. The fraction of the overall evaluation based on student-learning outcomes is wholly a matter of judgment; and if you believe, as I do, that a teacher’s responsibility for advancing student learning extends well beyond the content that appears on standardized tests, you could conceivably argue that increasing the weight given to standardized tests in teacher evaluations makes these evaluations less rigorous. This is, however, a hard sell in the absence of other concrete measures of student-learning outcomes that could supplement the standardized-test results.

Even more importantly, describing a teacher-evaluation system as rigorous hides the fact that the criteria for assigning teachers to performance categories — either for subcomponents or for the overall composite evaluation — are arbitrary. There’s no scientific basis for saying, as New York has, that of the 20 points out of 100 allocated for student “growth” on New York’s state tests, a teacher needs to receive 18 to be rated “highly effective,” or that a teacher receiving 3 to 8 points will be classified as “developing.” In fact, the cut-off separating “developing” from “effective” changed last week as a result of an agreement reached between the New York State Education Department and the state teachers’ union — not because of science, mind you, but because of politics.

And it’s politics, and politics alone, that accounts for the fact that the rules for the overall composite evaluation say that any teacher who scores 0 to 64 points will be classified as ineffective, and that the two subcomponents for student “growth” and local assessments, each of which counts for 20 points, classify teachers who score 0 to 2 points on each component as ineffective. This means, as Long Island principal Carol Burris and others have pointed out, that if a teacher is classified as ineffective on both of these subcomponents, that teacher is automatically rated ineffective overall, even if that teacher is rated highly effective on the 60 points allocated for measures of a teacher’s professional practices. It certainly seems odd that two components accounting for 40 percent of a teacher’s overall rating can trump the remaining 60 percent — but this isn’t science, it’s politics.

Other states face the same challenge in assigning teachers’ value-added scores or student growth percentile scores to performance categories, and most of them have punted, issuing regulations that defer these difficult decisions until later. Illinois says that it’s “working diligently” on this. Georgia claims that its model will be identified soon. Michigan is counting on a rating system to be developed by the Governor’s Council on Educator Effectiveness. After a year of debate, Delaware concluded that it couldn’t figure out how to use students’ scores on the state assessment system in teachers’ summative ratings for the 2011-2012 school year, and deferred implementation until the future.

It violates a basic principle of fairness for teachers to be held accountable for performance criteria that aren’t clearly specified in advance and that may be unattainable. These states, and many others, have their work cut out for them.

Nowhere is this more evident than with the mapping of teachers’ value-added or student growth percentile scores onto the ratings composing a teacher’s summative evaluation. The value-added or student growth percentile scores are measured with errors that can be substantial, especially when they are based on a single year’s worth of student achievement data. But the scoring bands for ratings categories such as “developing” or “effective” have strict cut-offs. What to do?

One way of reclaiming the concept of rigor in teacher-evaluation systems is to assign ratings that take into account the uncertainty or errors in the measures. This is consistent with a scientific conception of rigor: the assignment of teachers to rating categories should be consistent with the quality of the evidence for doing so. A teacher shouldn’t be assigned a rating of “ineffective” based on a value-added score, for example, if there’s a substantial probability that the teacher’s true rating is “developing.”

So here’s a challenge, and a proposal. The challenge is to state education policymakers across the country who have hitched their teacher-evaluation systems to measures that seek to isolate teachers’ contributions to their students’ learning: Develop clear and consistent guidelines for assigning teachers to rating categories that take into account the inherent uncertainty and errors in the value-added measures and their variants.

And here’s the proposal: A teacher should be assigned to the lower of two adjacent rating categories only if there is at least 90 percent confidence that the teacher is not in the higher category. Operationally, this involves a statistical test based on a cut score, a teacher’s score and the error associated with that score.

Suppose, for example, that the cut-off separating “ineffective” and “developing” is a teacher being in the 10th percentile across the state on a value-added or student growth percentile measure. Teacher A’s percentile rating is the eighth percentile, but the standard error for her rating is two percentile points. Given the uncertainty in the rating, there is a 16 percent probability that Teacher A’s true percentile rating is greater than the 10th percentile, and an 84 percent probability that her true percentile is lower than the 10th percentile. Thus, in my proposal, Teacher A should be classified as developing, not ineffective.

Conversely, Teacher B’s percentile rating is in the fourth percentile, and the standard error for her rating is three percentile points. Given the uncertainty in the rating, there is only a 2 percent probability that Teacher B’s true percentile value is above 10, and a 98 percent probability that his true percentile rating is lower than the 10th percentile. Teacher B would therefore be classified as ineffective.

Other approaches are certainly viable; the 90 percent confidence rating is arbitrary, but one that seems sensible to me. In most educational, social and medical research, a common standard is to trust an observed effect only if that effect could be observed by chance under 5 percent of the time, relative to the hypothesis that there’s no true effect in the population. The 90 percent standard I’m proposing is slightly more lenient. And of course this approach doesn’t address the arbitrariness in the New York scheme described above.

If policymakers aren’t willing to take measurement error into account in a defensible way in teacher-evaluation systems, don’t talk to me about rigor — rigor is dead.

This post also appears on Eye on Education, Aaron Pallas’s Hechinger Report blog.

First Person

I’ve been mistaken for the other black male leader at my charter network. Let’s talk about it.

PHOTO: Alan Petersime

I was recently invited to a reunion for folks who had worked at the New York City Department of Education under Mayor Michael Bloomberg. It was a privilege for me to have been part of that work, and it was a privilege for me to be in that room reflecting on our legacy.

The counterweight is that only four people in the room were black males. Two were waiters, and I was one of the remaining two. There were definitely more than two black men who were part of the work that took place in New York City during that era, but it was still striking how few were present.

The event pushed me to reflect again on the jarring impact of the power dynamics that determine who gets to make decisions in so-called education reform. The privileged end up being relatively few, and even fewer look like the kids we serve.

I’m now the chief operating officer at YES Prep, a charter school network in Houston. When I arrived at YES four years ago, I had been warned that it was a good old boys club. Specifically, that it was a good old white boys club. It was something I assessed in taking the role: Would my voice be heard? Would I truly have a seat at the table? Would I have any influence?

As a man born into this world with a black father and white mother, I struggled at an early age with questions about identity and have been asking those questions ever since.

As I became an adult, I came to understand that being from the suburbs, going to good schools, and being a lighter-skinned black person affords me greater access to many settings in America. At the same time, I experience my life as a black man.

Jeremy Beard, head of schools at YES, started the same day I did. It was the first time YES had black men at the leadership table of the organization. The running joke was that people kept mistaking Jeremy and me for each other. We all laughed about it, but it revealed some deeper issues that had pervaded YES for some time.

“Remember when you led that tour in the Rio Grande Valley to see schools?” a board member asked me about three months into my tenure.“That wasn’t me,” I replied. I knew he meant Jeremy, who had worked at IDEA in the Valley. At that time, I had never been to the Valley and didn’t even know where it was on the map.

“Yes, it was,” he insisted.

“I’ve never been to the Valley. It wasn’t me. I think you mean Jeremy.”

“No, it was you, don’t you remember?” he continued, pleading with me to recall something that never happened.

“It wasn’t me.”

He stopped, thought about it, confused, and uttered, “Huh.”

It is difficult for me to assign intent here, and this dynamic is not consistent with all board members. That particular person may have truly been confused about my identity. And sure, two black men may have a similar skin tone, and we may both work at YES. But my life experience suggests something else was at play. It reminds me that while I have the privilege of sitting at the table with our board, they, as board members, have the privilege of not having to know who I am, or that Jeremy and I are different black dudes.

It would be easy to just chalk this all up to racial politics in America and accept it as status quo, but I believe we can change the conversation on privilege and race by having more conversations on privilege and race. We can change the dynamics of the game by continuing to build awareness of diversity, equity, and inclusion. We can also advocate to change who has seats at the table and whose voices will be heard.

I remain hopeful thanks to the changes I have witnessed during my time at YES. The board has been intentional in their efforts to address their own privilege, and is actively working to become more diverse and inclusive.

Personally, I have worked to ensure there are more people of color with seats at the table by mentoring future leaders of color at YES Prep and other black men in this work. Jeremy and I also created Brothers on Books, a book club for black men at YES to find mentorship and fellowship. Through this book club, we can create a safe space to have candid discussions based on literature we read and explore what it means to be black men at YES.

When I think about privilege, I am torn between the privilege that has been afforded to me and the jarring power dynamics that determine who gets to have conversations and make decisions in so-called education reform. White people are afforded more voices and seats at the table, making decisions that primarily impact children of color.

It is not lost on me that it is my own privilege that affords me access to a seat at the table. My hope is that by using my role, my voice and my privilege, I can open up dialogue, hearts, minds, opinions, and perceptions. I hope that readers are similarly encouraged to assess their own privileges and determine how they can create positive change.

Recy Benjamin Dunn is YES Prep’s chief operating officer, overseeing operations, district partnerships, and growth strategy for the charter school network. A version of this piece was first published on YES Prep’s blog.

First Person

I’m a Bronx teacher, and I see up close what we all lose when undocumented students live with uncertainty

The author at her school.

It was our high school’s first graduation ceremony. Students were laughing as they lined up in front of the auditorium, their families cheering them on as they entered. We were there to celebrate their accomplishments and their futures.

Next to each student’s name on the back of those 2013 graduation programs was the college the student planned to attend in the fall. Two names, however, had noticeable blanks next to them.

But I was especially proud of these two students, whom I’ll call Sofia and Isabella. These young women started high school as English learners and were diagnosed with learning disabilities. Despite these obstacles, I have never seen two students work so hard.

By the time they graduated, they had two of the highest grade point averages in their class. It would have made sense for them to be college-bound. But neither would go to college. Because of their undocumented status, they did not qualify for financial aid, and, without aid, they could not afford it.

During this year’s State of the Union, I listened to President Trump’s nativist rhetoric and I thought of my students and the thousands of others in New York City who are undocumented. President Trump falsely portrayed them as gang members and killers. The truth is, they came to this country before they even understood politics and borders. They grew up in the U.S. They worked hard in school. In this case, they graduated with honors. They want to be doctors and teachers. Why won’t we let them?

Instead, as Trump works to repeal President Obama’s broader efforts to enfranchise these young people, their futures are plagued by uncertainty and fear. A Supreme Court move just last week means that young people enrolled in the Deferred Action for Childhood Arrivals program remain protected but in limbo.

While Trump and the Congress continue to struggle to find compromise on immigration, we have a unique opportunity here in New York State to help Dreamers. Recently, the Governor Cuomo proposed and the state Assembly passed New York’s DREAM Act, which would allow Sofia, Isabella, and their undocumented peers to access financial aid and pursue higher education on equal footing with their documented peers. Republicans in the New York State Senate, however, have refused to take up this bill, arguing that New York state has to prioritize the needs of American-born middle-class families.

This argument baffles me. In high school, Sofia worked hard to excel in math and science in order to become a radiologist. Isabella was so passionate about becoming a special education teacher that she spent her free periods volunteering with students with severe disabilities at the school co-located in our building.

These young people are Americans. True, they may not have been born here, but they have grown up here and seek to build their futures here. They are integral members of our communities.

By not passing the DREAM Act, it feels like lawmakers have decided that some of the young people that graduate from my school do not deserve the opportunity to achieve their dreams. I applaud the governor’s leadership, in partnership with the New York Assembly, to support Dreamers like Sofia and Isabella and I urge Senate Republicans to reconsider their opposition to the bill.

Today, Sofia and Isabella have been forced to find low-wage jobs, and our community and our state are the poorer for it.

Ilona Nanay is a 10th grade global history teacher and wellness coordinator at Mott Hall V in the Bronx. She is also a member of Educators for Excellence – New York.