First Person

Student growth percentiles and shoe leather

Editor’s note: This piece was submitted by Damian W. Betebenner, Richard J. Wenning and  Professor Derek C. Briggs. Thumbnail biographies of the three authors appear at the bottom of this article.

Bruce D. Baker recently published a critique of The Colorado Growth Model and its use of Student Growth Percentiles in his School Finance 101 blog (cross-posted on Education News Colorado).  In his blog, he both mischaracterizes the SGP methodology and the policy context.  Having participated in creating the Colorado Growth Model and leading the policy development associated with it, we thought it would be useful to clarify these misconceptions.

In work over the past decade with over two dozen State Education Agencies (SEAs) to develop models of student growth based upon state assessment results, one lesson that is repeatedly learned is that data, regardless of their quality, can be used well and can be used poorly. Unfortunately Professor Baker conflates the data (i.e. the measure) with the use. A primary purpose in the development of the Colorado Growth Model (Student Growth Percentiles/SGPs) was to distinguish the measure from the use: To separate the description of student progress (the SGP) from the attribution of responsibility for that progress.

There is a continuum of opinion about how large-scale assessment data and derived quantities can be used in accountability systems. On one extreme are those who believe large-scale assessment results are the ONLY “objective” indicator and thus any judgment about educator/education quality should be based on such measures. At the other extreme are those that hold that any use of large-scale assessment data is an abuse.

Our experience in discussing these issues in numerous contexts with stakeholders ranging from parents to policy makers, students to superintendents, is that they fall in between these two extremes. We believe that the results of large-scale assessments, particularly when examined in a longitudinal fashion, can yield numerous insights (some quite profound) about the manner in which the education system is functioning.

Not all growth models are value-added models

In work with the Colorado Department of Education and numerous other SEAs we clearly state that all growth models (including the Colorado Growth Model) can be turned into a value-added model (VAM). A VAM is a type of growth model but not all growth models are necessarily VAM models. We propose that a VAM is, in fact, constituted by its use, not by any particular statistical model specification. A simple gain score model, for example, is often used as an example (usually a bad example) of a value-added model. Other examples abound in the literature (see, for example, McCaffrey, Bin & Lockwood, 2008).

After deriving quantities of individual growth it is natural (and responsible) to ask whether there are contexts or curricular programs where students demonstrate higher or lower rates of growth, on average, than others. This is where investigations of growth start to become investigations of value-added. Believing that “value-added” is a hypothesis to be tested (Ho, 2011) and not a quantity derived from a model, the challenge in Colorado and other states we work with is to develop indicator systems that facilitate the investigation of what programs, districts, schools, teachers, and contexts promote (and fail to promote) the greatest growth amongst students in the state.

Furthermore, going beyond traditional VAM approaches focused on attributing responsibility, to use student growth to investigate growth toward career and college readiness and issues of equal educational opportunity through the examination of growth gaps between demographic and other student subgroups of interest.

The causal nature of the questions together with the observational nature of the data makes the use of large-scale assessment data difficult “detective work”. Indeed, good detective work requires shoe leather, looking at multiple sources of evidence, particularly as stakes become high, to ensure that conclusions about responsibility are warranted. We believe that the education system as a whole can benefit from such scrupulous detective work, particularly when all stakeholders hold a seat at the table and are collectively engaged in these efforts to develop and maintain an education system geared toward maximizing the academic progress of all students.

Test scores cannot be the sole determinant

To be clear about our own opinions on the subject: The results of large-scale assessments should never be used as the sole determinant of education/educator quality.

No state or district that we work with intends them to be used in such a fashion. That, however, does not mean that these data cannot be part of a larger body of evidence collected to examine education/educator quality. The dichotomy of appropriate/inappropriate does not and should not lead to an all or nothing dichotomy of data use. The challenge is to enable appropriate and beneficial uses while minimizing those that are inappropriate and detrimental.

Despite Professor Baker’s criticism of VAM/SGP models for teacher evaluation, he appears to hold out more hope than we do that statistical models can precisely parse the contribution of an individual teacher or school from the myriad of other factors that contribute to students’ achievement.

Numerous published writings by scholars on the subject over the past decade (see, for example, Raudenbush (2004); Rubin, Stuart, & Zanutto (2004); Braun (2005), Lockwood, McCaffrey, Mariano, & Setodji (2007); Linn (2008); Rothstein, 2009; 2010; Betebenner & Linn (2010); Briggs & Domingue (2011)) have taken issue with this presumption.

Professor Baker emphasizes this with SGPs:

Again, the whole point here is that it would be a leap, a massive freakin’ unwarranted leap to assume a causal relationship between SGP and school quality, if not building the SGP into a model that more precisely attempts to distill that causal relationship (if any). [Emphasis in original]

We would add that it is a similar “massive … leap” to assume a causal relationship between any VAM quantity and a causal effect for a teacher or school, not just SGPs. We concur with Rubin et al (2004) who assert that quantities derived from these models are descriptive, not causal, measures. However, just because measures are descriptive does NOT imply that the quantities cannot and should not be used as part of a larger investigation of root causes.

There are a number of excellent papers and books published over the last two decades that lay out the use and abuse of regression techniques in the social sciences, particularly with regard to making unsubstantiated causal claims. David Freedman’s “Statistical Models and Shoe Leather” (1991), Richard Berk’s “Regression Analysis: A Constructive Critique” (2003) are particularly good. Berk’s book, in fact, details the importance of using regression analyses descriptively as part of a larger program to identify root causes. And this aligns with Linn’s (2008, p. 21) call for descriptive accountability:

“Accountability system results can have value without making causal inferences about school quality, solely from the results of student achievement measures and demographic characteristics. Treating the results as descriptive information and for identification of schools that require more intensive investigation of organizational and instructional process characteristics are potentially of considerable value. Rather than using the results of the accountability system as the sole determiner of sanctions for schools, they could be used to flag schools that need more intensive investigation to reach sound conclusions about needed improvements or judgments about quality.”

The development of the Student Growth Percentile methodology was guided by Rubin et al’s (2004) admonition that VAM quantities are, at best, descriptive measures. Taken seriously, we are tasked with constructing the best and most useful description possible. Believing that the quality of a description is judged primarily by its utility, the goal with the development and use of the SGP methodology is to maximize utility while maintaining the technical sophistication of a growth model that serves both norm- and criterion-referenced purposes (Betebenner, 2009).  Given that all data, regardless of its quality, can be abused, the challenge is to produce an indicator system that maximizes the beneficial use cases of data.

We encourage the continued investigation of measures of student growth with the goal of producing indicator systems that address fundamental policy considerations and maximize utility without compromising technical quality. Comparisons between models (especially those utilizing the full achievement history of student scores) often produce results that are highly correlated (> 0.8), making determinations of which model is “best” difficult if not impossible to resolve using technical criteria alone. For example, comparisons of SGPs with value-added model results have high correlations (Briggs & Betebenner, 2009; Wright, 2010).

Claims of model “bias” that Professor Baker refers to are often difficult to disentangle because, as McCaffrey, Bin, and Lockwood (2008) point out in their comprehensive comparison of VAM measures, there is no gold standard “teacher effect” or “school effect” against which to judge any of these measures. And scenarios where differential performance by demographic subgroup on a growth/value-added measure occur do not necessarily imply “bias” any more than scenarios with differential achievement level performance by demographic subgroup (e.g., percent at or above proficient) does. On the contrary, such growth gaps can be indicative of unequal educational opportunity. The determination of model validity is complex, involving judgments that are both technical and practical. This reality, we believe, reaffirms the wisdom of Box’s (1987, p. 424) famous maxim: “All models are wrong, but some are useful”.

Returning to the opening point, our work is directed toward the use of large-scale assessment results as an evidence base to promote and help facilitate the difficult detective work associated with investigations of quality and effectiveness in an education system. Ultimately, we contend, the goal is to use what we learn to improve the education system for the benefit of all children. To that end, the validity of an accountability system is determined by the consequences that derive from it.

Assessment practices and systems of accountability are systemically valid if they generate useful information and constructive responses that support one or more policy goals (Access, Quality, Efficacy, Equity, and Efficiency) within an education system, without causing undue deterioration with respect to other goals. (Braun, 2008)

Large-scale assessment results are an important piece of evidence but are not sufficient to make causal claims about school or teacher quality. Black and white polemics about appropriate/inappropriate use of data often undercut valuable descriptions of the reality of a system in which large percentages of students are not receiving the education they deserve and we desire. Our goal is not to promote scapegoating for these unpalatable realities but to give stakeholders interpretable and actionable data that enable sound decision making, promote learning, and marshal a consensus for change.

Dr. Damian W. Betebenner is a Senior Associate with the National Center for the Improvement of Educational Assessment (NCIEA). Since joining the NCIEA in 2007, his work has centered exclusively on the research and development of student growth models for state accountability systems. He is the analytic architect of the student growth percentile (SGP) methodology developed in collaboration with the Colorado Department of Education as the Colorado Growth Model. 

Richard J. Wenning served until June 2011 as the Associate Commissioner of the Colorado Department of Education (CDE) and led CDE’s Office of Performance and Policy.  His responsibilities included public policy development and the design and implementation of Colorado’s educational accountability system, including the Colorado Growth Model.  

Professor Derek C. Briggs is chair of the Research and Evaluation Methodology Program at the University of Colorado at Boulder, where he also serves as an associate professor of quantitative methods and policy analysis. In general, his research agenda focuses upon building sound methodological approaches for the valid measurement and evaluation of growth in student achievement. His daily agenda is to challenge conventional wisdom and methodological chicanery as they manifest themselves in educational research, policy and practice.

  • Baker, B. D. (2011). Take your SGP and VAMit, Damn it!
  • Betebenner, D. W. (2009). Norm- and criterion-referenced student growth.
    Educational Measurement: Issues and Practice, 28(4):42-51.
  • Betebenner, D. W. & Linn, R. L. (2010). Growth in student achievement: issues of measurement, longitudinal data analysis, and accountability. Exploratory Seminar: Measurement Challenges Within the Race to the Top Agenda: Center for K-12 Assessment and Performance
  • Berk, R. A. (2003). Regression Analysis: A Constructive Critique. Sage, Thousand Oaks, CA
  • Berk, R. A. & Freedman, D. A. (2003). Statistical assumptions as empirical commitments. In T. G. Blomberg and S. Cohen (eds.), Law, Punishment,and Social Control: Essays in Honor of Sheldon Messinger, 2nd ed. (2003), Aldine de Gruyter, pp. 235–
  • Box, G. E. P. & Draper, N. R. (1987). Empirical Model-Building and Response Surfaces, Wiley
  • Braun, H. I. (2008). Viccissitudes of the validators. Presentation made at the 2008 Reidy Interactive Lecture Series, Portsmouth, NH, September,
  • Braun, H. I. (2005). Using student progress to evaluate teachers: A primer on value-added models. Technical report, Educational Testing Service, Princeton, New
  • Briggs, D. C. & Betebenner, D. (2009). Is Growth in Student Achievement Scale Dependent? Paper presented at the invited symposium ―Measuring and Evaluating Changes in Student Achievement: A Conversation about Technical and Conceptual Issues‖ at the annual meeting of the National Council for Measurement in Education, San Diego, CA, April 14, 2009
  • Briggs, D. & Domingue, B. (2011). Due Diligence and the Evaluation of Teachers: A review of the value-added analysis underlying the effectiveness rankings of Los Angeles Unified School District teachers by the Los Angeles Times. Boulder, CO: National Education Policy Center.
  • Freedman D. (1991) “Statistical Models and Shoe Leather,” in P. V. Marsden (ed.) Sociological Methodology, Volume 21, Washington, D. C.: The American Sociological Association.
  • Ho, A. (2011). Supporting Growth Interpretations using Through Course Assessments. Center for K-12 Assessment and Performance Management at
  • Linn, R. L. (2008). Educational accountability systems. In The Future of Test Based Educational Accountability, pages 3–24. Taylor & Francis, New York.
  • Lockwood, J., McCaffrey, D., Mariano, L., & Setodji, C. (2007). Bayesian methods for scalable multivariate value-added assessment. Journal of Educational and Behavioral Statistics, 32, 125–150.
  • McCaffrey, D, Han, B., & Lockwood, J. (2008). From Data to Bonuses: A Case Study of the Issues Related to Awarding Teachers Pay on the Basis of Their Student’s Progress. National Center on Performance Incentives Working Paper Working Paper…/McCaffrey_et_al_2008.pdf
  • McCaffrey, D, Lockwood, J, Koretz, D, Louis, T, & Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67-101.
  • Raudenbush, S. (2004). Schooling, statistics, and poverty: Can we measure school improvement? (Technical report). Princeton, NJ: Educational Testing Service.

  • Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537–571.
  • Rothstein, J. (2010). Teacher Quality in Educational Production: Tracking, Decay, and Student Achievement. Quarterly Journal of Economics, 125(1), 175–214.
  • Rubin, D. B., Stuart, E. A., and Zanutto, E. L. (2004). A potential outcomes view of value-added assessment in education. Journal of Educational and Behavioral Statistics, 29(1):103–116.
  • Wright, P. S. (2010). An Investigation of Two Nonparametric Regression Models for Value-Added Assessment in Education, White paper.


First Person

I’m a teacher in Memphis, and I know ‘grading floors’ aren’t a cheat — they’re a key motivator

PHOTO: Creative Commons / Shelly

Growing up, my father used to tell me not to come to him with a problem unless I had a solution.

That meant I learned quickly what kinds of solutions wouldn’t go over well — like ones involving my father and his money. His policy also meant that I had to weigh pros and cons, thinking about what I was able to do, what I wasn’t, and whom I needed help from in order to make things happen.

I sometimes wish decision-makers in Memphis had a father like mine. Because more often than not, it seems we are talking about the problems void of a solution or even possible solutions to vet.

Right now, the issue in Memphis and Shelby County Schools is the “grading floor,” or the policy of setting a lowest possible grade a teacher can assign a student. They have been temporarily banned after a controversy over high-school grade changing.

Grading floors aren’t new to teachers in Memphis, or to me, a fifth-grade teacher. I have taught and still teach students who are at least two grade levels behind. This was true when I taught fourth grade and when I taught sixth grade. Honestly, as the grade level increased, so did the gaps I saw.

More often than not, these students have been failed by a school, teacher, leader or system that did not adequately prepare them for the next grade. Meanwhile, in my classroom, I have a responsibility to teach grade-level material — adjusting it for individual students — and to grade their work accordingly.

That’s where “grading floors” come in. Without a grading floor, all of my current students would have grades below a 65 percent.

Can you imagine seeing the face of a fifth-grade boy who tried his hardest on your test, who answered all the questions you gave orally, who made connections to the text through auditory comprehension, only to receive a 0 on his paper?

I don’t have to imagine – I see similar reactions multiple times a day. Whether it’s a 65 percent or a 14 percent, it’s still an F, which signals to them “failure.” The difference between the two was summed up by Superintendent Hopson, who stated, “With a zero, it’s impossible to pass a course. It creates kids who don’t have hope, disciplinary issues; that creates a really bad scenario.”

I know that as years go by and a student’s proficiency gap increases, confidence decreases, too. With a lowered confidence comes a lower level of self-efficacy — the belief that they can do what they need to do to succeed. This, to me, is the argument for the grading floor.

In completing research for my master’s degree, I studied the correlation between reading comprehension scores and the use of a motivational curriculum. There was, as might have guessed, an increase in reading scores for students who received this additional curriculum.

So every day, I speak life into my students, who see Fs far too often in their daily lives. It is not my job as their teacher to eradicate their confidence, stifle their effort, and diminish their confidence by giving them “true” Fs.

“This is not an indication of your hard work, son. Yet, the reality is, we have to work harder,” I tell students. “We have to grind in order to make up what we’ve missed and I’m the best coach you have this year.”

In education, there are no absolutes, so I don’t propose implementing grading floors across the board. But I do understand their potential — not to make students appear more skilled than they are, or to make schools appear to be better than they are, but to keep students motivated enough to stay on track, even when it’s difficult.

If it is implemented, a grade floor must be coupled with data and other reports that provide parents, teachers, and other stakeholders with information that accurately highlights where a student is, both within the district and nationally. Parents shouldn’t see their child’s progress through rose-colored glasses, or be slapped by reality when options for their child are limited during and after high school.

But without hope, effort and attainment are impossible. If we can’t give hope to our kids, what are we here for?

I don’t have all the answers, but in the spirit of my father, don’t come with a problem unless you have a solution.

Marlena Little is a fifth-grade teacher in Memphis. A version of this piece first appeared on Memphis K-12, a blog for parents and students.

call out

Our readers had a lot to say in 2017. Make your voice heard in 2018.

PHOTO: Chris Hill/Whitney Achievement School
Teacher Carl Schneider walks children home in 2015 as part of the after-school walking program at Whitney Achievement Elementary School in Memphis. This photograph went viral and inspired a First Person reflection from Schneider in 2017.

Last year, some of our most popular pieces came from readers who told their stories in a series that we call First Person.

For instance, Carl Schneider wrote about the 2015 viral photograph that showed him walking his students home from school in a low-income neighborhood of Memphis. His perspective on what got lost in the shuffle continues to draw thousands of readers.

First Person is also a platform to influence policy. Recent high school graduate Anisah Karim described the pressure she felt to apply to 100 colleges in the quest for millions of dollars in scholarships. Because of her piece, the school board in Memphis is reviewing the so-called “million-dollar scholar” culture at some high schools.

Do you have a story to tell or a point to make? In 2018, we want to give an even greater voice to students, parents, teachers, administrators, advocates and others who are trying to improve public education in Tennessee. We’re looking for essays of 500 to 750 words grounded in personal experience.

Whether your piece is finished or you just have an idea to discuss, drop a line to Community Editor Caroline Bauman at [email protected]

But first, check out these top First Person pieces from Tennesseans in 2017:

My high school told me to apply to 100 colleges — and I almost lost myself in the process

“A counselor never tried to determine what the absolute best school for me would be. I wasted a lot of time, money and resources trying to figure that out. And I almost lost myself in the process.” —Anisah Karim     

Why I’m not anxious about where my kids go to school — but do worry about the segregation that surrounds us

“In fact, it will be a good thing for my boys to learn alongside children who are different from them in many ways — that is one advantage they will have that I did not, attending parochial schools in a lily-white suburb.” —Mary Jo Cramb

I covered Tennessee’s ed beat for Chalkbeat. Here’s what I learned.

“Apathy is often cited as a major problem facing education. That’s not the case in Tennessee.” —Grace Tatter

I went viral for walking my students home from school in Memphis. Here’s what got lost in the shuffle.

“When #blacklivesmatter is a controversial statement; when our black male students have a one in three chance of facing jail time; when kids in Memphis raised in the bottom fifth of the socioeconomic bracket have a 2.6 percent chance of climbing to the top fifth — our walking students home does not fix that, either.” —Carl Schneider

I think traditional public schools are the backbone of democracy. My child attends a charter school. Let’s talk.

“It was a complicated choice to make. The dialogue around school choice in Nashville, though, doesn’t often include much nuance — or many voices of parents like me.” —Aidan Hoyal

I grew up near Charlottesville and got a misleading education about Civil War history. Students deserve better.

“In my classroom discussions, the impetus for the Civil War was resigned to a debate over the balance of power between federal and state governments. Slavery was taught as a footnote to the cause of the war.” —Laura Faith Kebede