Why are most teachers rated effective when most students test below standards?

On Board Online • December 16, 2013

By Cathy Woodruff
Senior writer

Here’s a word problem that could stump even the savviest student of Common Core-aligned mathematics:

Less than one-third of New York students passed the state math and English Language Arts tests they took in April. Yet, more than 90 percent of the state’s teachers were rated effective or highly effective under the state’s new Annual Professional Performance Review (APPR) rating system. Explain.

It’s a head-scratcher, all right. How can New York’s teachers possibly be so effective if their students are struggling so mightily to meet the state’s new academic standards?

Even a more sophisticated analysis – limiting the sample of teachers to those in the elementary and middle school classrooms where students took state exams in April, and limiting the ratings solely to the 20 points tied to those state test results – still reveals a sharp apparent contradiction.

More than 83 percent of the teachers in grades 4-8 were rated effective or highly effective on the portion of their Annual Professional Performance Reviews (APPR) tied to their students’ test scores. But just 31 percent of students who took ELA and math tests met the new standards for proficiency on each of them.

How is that possible? Isn’t a high level of teacher effectiveness supposed to correlate with high student achievement? Isn’t this supposed to be an accountability system?

According to state Education Commissioner John B. King Jr., trying to resolve the apparent paradox of good teacher ratings despite disappointing test scores for their students is a lot like the folly of trying to compare apples to oranges. Students are being tested on their mastery of the standards, but teachers are not actually being evaluated on their students’ level of mastery of the standards. Rather, they are being evaluated, in part, on their students’ growth in mastering the standards.

That’s why teacher scores can be high while student scores are low, educators say.

“I don’t think the two are connected at all – at least as the system currently is set up,” said Herricks School District Superintendent Jack Bierwirth, who serves on the Metrics Work Group of the State Education Department’s APPR Task Force.

But if some New Yorkers were puzzled by the sunny teacher evaluation scores, it would be hard to blame them. After all, they were told repeatedly by state leaders that a teacher evaluation system would be instrumental in improving academic achievement and holding teachers accountable for student learning. They also were told that students’ performance on state tests would be a strong, objective indicator of their teachers’ effectiveness.

“The new statewide evaluation law sets clear standards for measuring educators based on how our students are performing in the classroom,” Gov. Andrew Cuomo declared when he announced a March 2012 agreement with legislative leaders “to put the governor’s new groundbreaking teacher and principal evaluation system into law.”

Despite that rhetoric, New York’s new APPR system does not draw anything close to a straight line between student achievement and teacher and principal evaluation ratings.

First, it must be noted that teachers’ overall or “composite” APPR ratings give far more weight to other factors, such as classroom observations and local measures of student learning, than they give to the portion linked to state test results.

And while it’s often said that 20 percent of a teacher evaluation is based on state test scores, it would be more accurate to say that portion is based on the degree of change in student test scores. That component is derived from a calculation designed to determine how much a student has improved as a result of a teacher’s instruction that year. Performing the growth calculation requires comparisons with prior test performance.

Estimating the student growth component was especially tricky this year because this year’s tests measured students against the new Common Core standards, while state tests in previous years were designed to measure performance based on standards set in 2005. That’s why the State Education Department sent out a flurry of charts, Excel worksheets, tables and guidance documents in August. The tools were intended to help administrators place old and new student test scores on a common scale so administrators could compare them.

Without comparisons, raw test results are virtually worthless for judging teacher performance, said Bierwirth, the Herricks superintendent. A 2013 score, alone, “doesn’t take into account where students started. It only describes where they ended up,” he noted.

“I do think the effort to measure a teacher’s value, based on what they contribute to a student’s learning, is the right direction,” Bierwirth added, but he is critical of the metric gymnastics now being used to calculate student growth for use in APPR formulas. “I believe the teacher evaluation system is, as it is now set up, highly flawed and not a terribly good measure of effectiveness,” he said.

The complexity and the lack of clear connection between the test scores and APPR ratings is what can make it so hard for policy makers, including school board members, to explain how, exactly, the new system makes schools more accountable for results.

Aaron M. Pallas, a professor of sociology and education with Teachers College at Columbia University, has doubts about how precise educators can expect APPR to be in diagnosing an individual teacher’s impact on academic achievement. He says there are just too many other variables in play, including groundwork laid by teachers in earlier grades and whatever is going on in a student’s home life.

“It’s really hard to isolate the contribution of one teacher to a cumulative level of performance,” Pallas said. “I think one recommendation would be to forego some of the false sense of quantified process that APPR has created. All of the components are things that I think are a bit fuzzy. Yet, we are adding them up and treating them as though the result is not fuzzy.”

Pallas and other observers say it’s also likely that this year’s strong overall teacher effectiveness ratings were bolstered by positive ratings for classroom observations and other locally-developed criteria, which were crafted amid concerns about the unpredictable impact of state test scores.

Again, political rhetoric played a role in forming perceptions about the aims and hazards of APPR. For instance, Gov. Cuomo issued a March 2011 news release that touted a statewide teacher evaluation system as an alternative to the “so-called ‘last in, first out’ seniority policy,” which he said “lacks objectivity by maintaining teachers simply based on years of service without factoring classroom effectiveness, performance or need.”

“I think that the way the state framed it put too much emphasis on the APPR process as a way to identify ineffective teachers who ought to be drummed out,” said Pallas. “In some cases, districts thought they already knew who the good and bad teachers were.”

Attorney Howard Goldsmith of the Harris Beach law firm has coined a phrase to describe the problems with perceptions about APPR and school reform in general. He calls it “the disconnect gap.”

Writing on the Harris Beach municipal affairs blog, nymuniblog, Goldsmith said that low student scores and high teacher scores are emblematic of a broader problem in which the elements of New York’s education reform operations don’t work together in a way that’s clear.

Better communication could help, Goldsmith said, but he argues that a solution that restores faith in educational reform will require more substantial action.

His suggestions include extending and coordinating the multiple timelines for implementing Common Core standards, new curriculum and tests and APPR, with common benchmarks and transition dates for the various initiatives. He also suggests simplifying the APPR process and removing entirely the second component, which relies on locally determined measures of student achievement.

“Closing the disconnect gap will require adjustments in the actual implementation of the reform agenda initiatives, not just some positive but minimal changes in state testing policies,” Goldsmith wrote in nymuniblog. “To close the disconnect gap, adjustments must be taken to properly align the implementation plans of the respective reform agenda elements, making them connected in a logical and easy-to-understand common-sense fashion.”

Why are most teachers rated effective when most students test below standards?

Contact Us

Quicklinks

Get Social