First Person

Student growth percentiles and shoe leather

Editor’s note: This piece was submitted by Damian W. Betebenner, Richard J. Wenning and  Professor Derek C. Briggs. Thumbnail biographies of the three authors appear at the bottom of this article.

Bruce D. Baker recently published a critique of The Colorado Growth Model and its use of Student Growth Percentiles in his School Finance 101 blog (cross-posted on Education News Colorado).  In his blog, he both mischaracterizes the SGP methodology and the policy context.  Having participated in creating the Colorado Growth Model and leading the policy development associated with it, we thought it would be useful to clarify these misconceptions.

In work over the past decade with over two dozen State Education Agencies (SEAs) to develop models of student growth based upon state assessment results, one lesson that is repeatedly learned is that data, regardless of their quality, can be used well and can be used poorly. Unfortunately Professor Baker conflates the data (i.e. the measure) with the use. A primary purpose in the development of the Colorado Growth Model (Student Growth Percentiles/SGPs) was to distinguish the measure from the use: To separate the description of student progress (the SGP) from the attribution of responsibility for that progress.

There is a continuum of opinion about how large-scale assessment data and derived quantities can be used in accountability systems. On one extreme are those who believe large-scale assessment results are the ONLY “objective” indicator and thus any judgment about educator/education quality should be based on such measures. At the other extreme are those that hold that any use of large-scale assessment data is an abuse.

Our experience in discussing these issues in numerous contexts with stakeholders ranging from parents to policy makers, students to superintendents, is that they fall in between these two extremes. We believe that the results of large-scale assessments, particularly when examined in a longitudinal fashion, can yield numerous insights (some quite profound) about the manner in which the education system is functioning.

Not all growth models are value-added models

In work with the Colorado Department of Education and numerous other SEAs we clearly state that all growth models (including the Colorado Growth Model) can be turned into a value-added model (VAM). A VAM is a type of growth model but not all growth models are necessarily VAM models. We propose that a VAM is, in fact, constituted by its use, not by any particular statistical model specification. A simple gain score model, for example, is often used as an example (usually a bad example) of a value-added model. Other examples abound in the literature (see, for example, McCaffrey, Bin & Lockwood, 2008).

After deriving quantities of individual growth it is natural (and responsible) to ask whether there are contexts or curricular programs where students demonstrate higher or lower rates of growth, on average, than others. This is where investigations of growth start to become investigations of value-added. Believing that “value-added” is a hypothesis to be tested (Ho, 2011) and not a quantity derived from a model, the challenge in Colorado and other states we work with is to develop indicator systems that facilitate the investigation of what programs, districts, schools, teachers, and contexts promote (and fail to promote) the greatest growth amongst students in the state.

Furthermore, going beyond traditional VAM approaches focused on attributing responsibility, to use student growth to investigate growth toward career and college readiness and issues of equal educational opportunity through the examination of growth gaps between demographic and other student subgroups of interest.

The causal nature of the questions together with the observational nature of the data makes the use of large-scale assessment data difficult “detective work”. Indeed, good detective work requires shoe leather, looking at multiple sources of evidence, particularly as stakes become high, to ensure that conclusions about responsibility are warranted. We believe that the education system as a whole can benefit from such scrupulous detective work, particularly when all stakeholders hold a seat at the table and are collectively engaged in these efforts to develop and maintain an education system geared toward maximizing the academic progress of all students.

Test scores cannot be the sole determinant

To be clear about our own opinions on the subject: The results of large-scale assessments should never be used as the sole determinant of education/educator quality.

No state or district that we work with intends them to be used in such a fashion. That, however, does not mean that these data cannot be part of a larger body of evidence collected to examine education/educator quality. The dichotomy of appropriate/inappropriate does not and should not lead to an all or nothing dichotomy of data use. The challenge is to enable appropriate and beneficial uses while minimizing those that are inappropriate and detrimental.

Despite Professor Baker’s criticism of VAM/SGP models for teacher evaluation, he appears to hold out more hope than we do that statistical models can precisely parse the contribution of an individual teacher or school from the myriad of other factors that contribute to students’ achievement.

Numerous published writings by scholars on the subject over the past decade (see, for example, Raudenbush (2004); Rubin, Stuart, & Zanutto (2004); Braun (2005), Lockwood, McCaffrey, Mariano, & Setodji (2007); Linn (2008); Rothstein, 2009; 2010; Betebenner & Linn (2010); Briggs & Domingue (2011)) have taken issue with this presumption.

Professor Baker emphasizes this with SGPs:

Again, the whole point here is that it would be a leap, a massive freakin’ unwarranted leap to assume a causal relationship between SGP and school quality, if not building the SGP into a model that more precisely attempts to distill that causal relationship (if any). [Emphasis in original]

We would add that it is a similar “massive … leap” to assume a causal relationship between any VAM quantity and a causal effect for a teacher or school, not just SGPs. We concur with Rubin et al (2004) who assert that quantities derived from these models are descriptive, not causal, measures. However, just because measures are descriptive does NOT imply that the quantities cannot and should not be used as part of a larger investigation of root causes.

There are a number of excellent papers and books published over the last two decades that lay out the use and abuse of regression techniques in the social sciences, particularly with regard to making unsubstantiated causal claims. David Freedman’s “Statistical Models and Shoe Leather” (1991), Richard Berk’s “Regression Analysis: A Constructive Critique” (2003) are particularly good. Berk’s book, in fact, details the importance of using regression analyses descriptively as part of a larger program to identify root causes. And this aligns with Linn’s (2008, p. 21) call for descriptive accountability:

“Accountability system results can have value without making causal inferences about school quality, solely from the results of student achievement measures and demographic characteristics. Treating the results as descriptive information and for identification of schools that require more intensive investigation of organizational and instructional process characteristics are potentially of considerable value. Rather than using the results of the accountability system as the sole determiner of sanctions for schools, they could be used to flag schools that need more intensive investigation to reach sound conclusions about needed improvements or judgments about quality.”

The development of the Student Growth Percentile methodology was guided by Rubin et al’s (2004) admonition that VAM quantities are, at best, descriptive measures. Taken seriously, we are tasked with constructing the best and most useful description possible. Believing that the quality of a description is judged primarily by its utility, the goal with the development and use of the SGP methodology is to maximize utility while maintaining the technical sophistication of a growth model that serves both norm- and criterion-referenced purposes (Betebenner, 2009).  Given that all data, regardless of its quality, can be abused, the challenge is to produce an indicator system that maximizes the beneficial use cases of data.

We encourage the continued investigation of measures of student growth with the goal of producing indicator systems that address fundamental policy considerations and maximize utility without compromising technical quality. Comparisons between models (especially those utilizing the full achievement history of student scores) often produce results that are highly correlated (> 0.8), making determinations of which model is “best” difficult if not impossible to resolve using technical criteria alone. For example, comparisons of SGPs with value-added model results have high correlations (Briggs & Betebenner, 2009; Wright, 2010).

Claims of model “bias” that Professor Baker refers to are often difficult to disentangle because, as McCaffrey, Bin, and Lockwood (2008) point out in their comprehensive comparison of VAM measures, there is no gold standard “teacher effect” or “school effect” against which to judge any of these measures. And scenarios where differential performance by demographic subgroup on a growth/value-added measure occur do not necessarily imply “bias” any more than scenarios with differential achievement level performance by demographic subgroup (e.g., percent at or above proficient) does. On the contrary, such growth gaps can be indicative of unequal educational opportunity. The determination of model validity is complex, involving judgments that are both technical and practical. This reality, we believe, reaffirms the wisdom of Box’s (1987, p. 424) famous maxim: “All models are wrong, but some are useful”.

Returning to the opening point, our work is directed toward the use of large-scale assessment results as an evidence base to promote and help facilitate the difficult detective work associated with investigations of quality and effectiveness in an education system. Ultimately, we contend, the goal is to use what we learn to improve the education system for the benefit of all children. To that end, the validity of an accountability system is determined by the consequences that derive from it.

Assessment practices and systems of accountability are systemically valid if they generate useful information and constructive responses that support one or more policy goals (Access, Quality, Efficacy, Equity, and Efficiency) within an education system, without causing undue deterioration with respect to other goals. (Braun, 2008)

Large-scale assessment results are an important piece of evidence but are not sufficient to make causal claims about school or teacher quality. Black and white polemics about appropriate/inappropriate use of data often undercut valuable descriptions of the reality of a system in which large percentages of students are not receiving the education they deserve and we desire. Our goal is not to promote scapegoating for these unpalatable realities but to give stakeholders interpretable and actionable data that enable sound decision making, promote learning, and marshal a consensus for change.

Dr. Damian W. Betebenner is a Senior Associate with the National Center for the Improvement of Educational Assessment (NCIEA). Since joining the NCIEA in 2007, his work has centered exclusively on the research and development of student growth models for state accountability systems. He is the analytic architect of the student growth percentile (SGP) methodology developed in collaboration with the Colorado Department of Education as the Colorado Growth Model. 

Richard J. Wenning served until June 2011 as the Associate Commissioner of the Colorado Department of Education (CDE) and led CDE’s Office of Performance and Policy.  His responsibilities included public policy development and the design and implementation of Colorado’s educational accountability system, including the Colorado Growth Model.  

Professor Derek C. Briggs is chair of the Research and Evaluation Methodology Program at the University of Colorado at Boulder, where he also serves as an associate professor of quantitative methods and policy analysis. In general, his research agenda focuses upon building sound methodological approaches for the valid measurement and evaluation of growth in student achievement. His daily agenda is to challenge conventional wisdom and methodological chicanery as they manifest themselves in educational research, policy and practice.

  • Baker, B. D. (2011). Take your SGP and VAMit, Damn it!
  • Betebenner, D. W. (2009). Norm- and criterion-referenced student growth.
    Educational Measurement: Issues and Practice, 28(4):42-51.
  • Betebenner, D. W. & Linn, R. L. (2010). Growth in student achievement: issues of measurement, longitudinal data analysis, and accountability. Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda: Center for K-12 Assessment and Performance
  • Berk, R. A. (2003). Regression Analysis: A Constructive Critique. Sage, Thousand Oaks, CA
  • Berk, R. A. & Freedman, D. A. (2003). Statistical assumptions as empirical commitments. In T. G. Blomberg and S. Cohen (eds.), Law, Punishment,and Social Control: Essays in Honor of Sheldon Messinger, 2nd ed. (2003), Aldine de Gruyter, pp. 235–
  • Box, G. E. P. & Draper, N. R. (1987). Empirical Model-Building and Response Surfaces, Wiley
  • Braun, H. I. (2008). Viccissitudes of the validators. Presentation made at the 2008 Reidy Interactive Lecture Series, Portsmouth, NH, September,
  • Braun, H. I. (2005). Using student progress to evaluate teachers: A primer on value-added models. Technical report, Educational Testing Service, Princeton, New
  • Briggs, D. C. & Betebenner, D. (2009). Is Growth in Student Achievement Scale Dependent? Paper presented at the invited symposium ―Measuring and Evaluating Changes in Student Achievement: A Conversation about Technical and Conceptual Issues‖ at the annual meeting of the National Council for Measurement in Education, San Diego, CA, April 14, 2009
  • Briggs, D. & Domingue, B. (2011). Due Diligence and the Evaluation of Teachers: A review of the value-added analysis underlying the effectiveness rankings of Los Angeles Unified School District teachers by the Los Angeles Times. Boulder, CO: National Education Policy Center.
  • Freedman D. (1991) “Statistical Models and Shoe Leather,” in P. V. Marsden (ed.) Sociological Methodology, Volume 21, Washington, D. C.: The American Sociological Association.
  • Ho, A. (2011). Supporting Growth Interpretations using Through Course Assessments. Center for K-12 Assessment and Performance Management at
  • Linn, R. L. (2008). Educational accountability systems. In The Future of Test Based Educational Accountability, pages 3–24. Taylor & Francis, New York.
  • Lockwood, J., McCaffrey, D., Mariano, L., & Setodji, C. (2007). Bayesian methods for scalable multivariate value-added assessment. Journal of Educational and Behavioral Statistics, 32, 125–150.
  • McCaffrey, D, Han, B., & Lockwood, J. (2008). From Data to Bonuses: A Case Study of the Issues Related to Awarding Teachers Pay on the Basis of Their Student’s Progress. National Center on Performance Incentives Working Paper Working Paper…/McCaffrey_et_al_2008.pdf
  • McCaffrey, D, Lockwood, J, Koretz, D, Louis, T, & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67-101.
  • Raudenbush, S. (2004). Schooling, statistics, and poverty: Can we measure school improvement? (Technical report). Princeton, NJ: Educational Testing Service.

  • Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537–571.
  • Rothstein, J. (2010). Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement. Quarterly Journal of Economics, 125(1), 175–214.
  • Rubin, D. B., Stuart, E. A., and Zanutto, E. L. (2004). A potential outcomes view of value-added assessment in education. Journal of Educational and Behavioral Statistics, 29(1):103–116.
  • Wright, P. S. (2010). An Investigation of Two Nonparametric Regression Models for Value-Added Assessment in Education, White paper.


First Person

As historians and New York City educators, here’s what we hope teachers hear in the city’s new anti-bias training

PHOTO: Christina Veiga

New York City Schools Chancellor Richard Carranza and Mayor Bill de Blasio just committed $23 million over the next four years to support anti-bias education for the city’s teachers. After a year in which a white teacher stepped on a student during a lesson on slavery and white parents used blackface images in their PTA publicity, it’s a necessary first step.

But what exactly will the $23 million pay for? The devil is in the details.

As current and former New York City teachers, and as historians and educators working in the city today, we call for the education department to base its anti-bias program in an understanding of the history of racism in the nation and in this city. We also hope that the program recognizes and builds upon the work of the city’s anti-racist teachers.

Chancellor Carranza has promised that the program will emphasize training on “implicit bias” and “culturally responsive pedagogy.” These are valuable, but insufficient. Workshops on implicit bias may help educators evaluate and change split-second, yet consequential, decisions they make every day. They may help teachers interrogate, for example, what decisions lead to disproportionately high rates of suspension for black children as early as pre-K, or lower rates of referrals to gifted programs for black students by white teachers.

But U.S. racism is not only split-second and individual. It is centuries deep, collective, and institutional. Done poorly, implicit bias training might shift disproportionate blame for unequal educational resources and outcomes onto the shoulders of classroom teachers.

Anti-bias education should lead teachers not only to address racism as an individual matter, but to perceive and struggle against its institutional and structural forms. Structural racism shapes the lives of students, families, and communities, and the classrooms in which teachers work: whether teachers find sufficient resources in their classrooms, how segregated their schools are, how often their students are stopped by police, and how much wealth the families they serve hold. Without attending to the history that has created these inequities, anti-bias education might continue the long American tradition of pretending that racism rooted in capitalism and institutional power can be solved by adjusting individual attitudes and behaviors.

We have experienced teacher professional development that takes this approach. Before moving to New York, Adam taught in Portland, Oregon and participated in several anti-bias trainings that presented racism as a problem to be solved through individual reflection and behaviors within the classroom. While many anti-racist teachers initially approached these meetings excited to discuss the larger forces that shape teaching students of color in the whitest city in America, they grew increasingly frustrated as they were encouraged to focus only on “what they could control.”

Similarly, at his very first professional development meeting as a first-year teacher of sixth grade in Harlem, Brian remembers being told by his principal that neither the conditions of students’ home lives nor conditions of the school in which he worked were within teachers’ power to change, and were therefore off-limits for discussion. The only thing he could control, the principal said, was his attitude towards his students.

But his students were extremely eager to talk about those conditions. For example, the process of gentrification in Harlem emerged repeatedly in classroom conversations. Even if teachers can’t immediately stop a process like gentrification, surely it is essential for both teachers and their students to learn to think about conditions they see around them as products of history — and therefore as something that can change.

While conversations about individual attitudes and classroom practices are important, they are insufficient to tackle racism. Particularly in one of the most segregated school districts in America, taking a historical perspective matters.

How do public school teachers understand the growth of racial and financial inequality in New York City? Consciously or otherwise, do they lean on tired but still powerful ideas that poverty reflects a failure of individual will, or a cultural deficit? Encountering the history of state-sponsored racism and inequality makes those ideas untenable.

Every New York City teacher should understand what a redlining map is. These maps helped the federal government subsidize mid-twentieth century white suburbanization while barring African American families from the suburbs and the wealth they helped generate. These maps helped shape the city, the metropolitan region, and its schools – including the wealth or poverty of students that teachers see in their classrooms. This is but one example of how history can help educators ground their understanding of their schools and students in fact rather than (often racist) mythology.

And how well do New York City educators know and teach the histories of the communities they serve? Those histories are rich sources of narratives about how New Yorkers have imagined their freedom and struggled for it, often by advocating for education. Every New York City teacher should know that the largest protest of the Civil Rights Movement took place not in Washington D.C., not in the deep South, but right here. On February 3, 1964, nearly half a million students stayed out of school and marched through the city’s streets, demanding desegregation and fully funded public schools. Every New York City teacher should know about Evelina Antonetty, a Puerto Rico-born, East Harlem-raised advocate who organized her fellow Bronx parents to press for some of the city’s first attempts at bilingual education and just treatment for language minority students in school.

Even if they don’t teach history or social studies, educators can see in the 1964 boycott and in Antonetty’s story prompts to approach parents as allies, to see communities as funds of knowledge and energy to connect to and build from. The chancellor’s initiative can be an opportunity to help teachers uncover and reflect on these histories.

Ansley first taught at a small high school in central Harlem, in a building that earlier housed Junior High School 136. J.H.S. 136 was one of three Harlem schools where in 1958 black parents protested segregation and inequality by withdrawing their children from school – risking imprisonment for violating truancy laws. The protest helped build momentum for later educational activism – and demonstrated black Harlem mothers’ deep commitment to securing powerful education for their children.

Although she taught in the same school – perhaps even the same classroom – where boycotting students had studied, Ansley didn’t know about this history until a few years after she left the school. Since learning about it, she has often reflected on the missed opportunities. How could the story of this “Harlem Nine” boycott have helped her students learn about their community’s history and interrogate the inequalities that still shaped their school? What could this story of parent activism have meant for how Ansley thought about and worked with her students’ parents?

Today, teaching future teachers, Ansley strives to convey the value of local and community history in her classes. One new teacher, now working in the Bronx, commented that her own learning about local history “taught me that we should not only think of schools as places of learning. They also are important places of community.”

The history of racism and of freedom struggles needs to be part of any New York City students’ learning as well as that of their teachers. Some of the $23 million should support the work of local anti-racist educators, such as those who spearheaded the Black Lives Matter Week of Action last February, in developing materials that help teach about this history. These efforts align with the chancellor’s pledge for culturally responsive education. And they offer ways to recognize and build on the knowledge of New York City’s community organizations and anti-racist education networks.

Attitudes matter, and educators – like everyone – can learn from the psychology of bias and stereotype. But historical ignorance or misrepresentation has fed racism, and history can be a tool in its undoing.

That would be a good $23 million investment for New York and all of its children.

Ansley Erickson is an associate professor of history and education at Teachers College, Columbia University and a former New York City high school teacher.

Brian Jones is the associate director of education at the Schomburg Center for Research in Black Culture of the New York Public Library and a former New York City elementary school teacher.

Adam Sanchez is a teacher at Harvest Collegiate High School in New York City and an organizer and curriculum writer with the Zinn Education Project.

First Person

In honor of Teacher Appreciation Week, 8 essays from educators who raised their voices this year

PHOTO: Incase/Creative Commons

Teachers are often on the front lines of national conversations, kickstarting discussions that their students or communities need to have.

They also add their own voices to debates that would be less meaningful without them.

This year, as we mark Teacher Appreciation Week, we’re sharing some of the educator perspectives that we’ve published in our First Person section over the last year. Many thanks to the teachers who raised their voices in these essays. Want to help us elevate the voices of even more educators? Make a donation in support of our nonprofit journalism and you’ll have the option to honor an important educator in your life.

If you’d like to contribute your own personal essay to Chalkbeat, please email us at [email protected]

A Queens teacher on Charlottesville: ‘It can’t just be teachers of color’ offering lessons on race

After racial violence erupted in Virginia last year, New York City teacher Vivett Dukes called on teachers to engage students in honest conversations about racism.

“We do our children and ourselves a disservice when we don’t have these difficult conversations as a part of our collective curriculums. However, many teachers from various walks of life are neither well-versed nor fully comfortable discussing race on any level with their students. Not talking about racism won’t make it go away.”

Why the phrase ‘with fidelity’ is an affront to good teaching

Too often teachers are blamed for bad curriculum, writes Tom Rademacher, Minnesota’s Teacher of the Year in 2014. And that needs to stop.

“It keeps happening because admitting that schools are messy and students are human, and teaching is both creative and artistic, would also mean you have to trust teachers and let them have some power.”

I’m a Bronx teacher, and I see up close what we all lose when undocumented students live with uncertainty

Two of Ilona Nanay’s best students started high school as English learners and were diagnosed with learning disabilities. But their educational careers came to an end after graduation because both were undocumented and couldn’t afford out-of-state tuition.

“By not passing the DREAM Act, it feels like lawmakers have decided that some of the young people that graduate from my school do not deserve the opportunity to achieve their dreams.”

I’m a Florida teacher in the era of school shootings. This is the terrifying reality of my classroom during a lockdown drill.

K.T. Katzmann is a teacher in Broward County, Florida. In this essay she shares what it’s like knowing that you could be the only thing between a mass shooter and a group of students.

“The experience of being isolated, uninformed, and responsible for the lives of dozens of children is now universal to our profession, whether because of actual emergencies or planned drills.”

I’m a Houston geography teacher. This is my plan for our first day back — as soon as it arrives.

Alex McNaughton teaches a human geography course in Houston. After Hurricane Harvey, he decided to move up a lesson about how urbanization can exacerbate flooding.

“Teachers have a unique power — the power to shape the minds of future generations to solve the problems that we face. Houston’s location means that it will always be susceptible to flooding. But by teaching about the flood I hope I can play a small role in helping our city avoid repeating some of the tragic scenes I witnessed this week.”

How one Harlem teacher gave his student — the ‘Chris Rock of third grade’ — a chance to shine

Ruben Brosbe, a New York City teacher, has a soft spot for troublemakers. In this story, he shares how he got one of his favorite pranksters, Chris, to go through a day without interrupting class.

“Dealing with him taught me a valuable lesson, a lesson I’ve had to learn again and again: At the end of the day, everything that we want to accomplish as teachers is built on our relationships. It’s built on me saying to you, ‘I see you,’ ‘I care about you,’ ‘I care about what you care about and I’m going to make that a part of our class.’”

Cut from the same cloth: Why it matters that black male teachers like me aren’t alone in our schools

Being a black educator can be isolating, writes William Anderson, a Denver teacher. He argues that a more supportive environment for black educators could help cities like Denver improve the lives of black students.

“Without colleagues of the same gender and cultural and ethnic background, having supportive and fulfilling professional relationships is much harder.”

I went viral for walking my students home from school in Memphis. Here’s what got lost in the shuffle.

For years, Memphis teacher Carl Schneider walked his students home to a nearby apartment complex. Then a photograph of him performing this daily ritual caught the attention of the national media. In this essay, Schneider reminds readers that he shouldn’t be the focus — the challenges his students face should. His call to action:

“Educate yourself about the ways systemic racism creates vastly different Americas.”


Thanks to our partners at Yoobi for supporting our Teacher Appreciation campaign.