Thursday, May 21, 2009

Research based education programming

I found this blog post on a recent column by David Brooks to be very interesting. As FAVL moves increasingly into evaluation of programs, it is worth bearing in mind that even the best program evaluators (i.e. Fryer) can be subject to lots of immodestly... All of it worth keeping in mind the next time you read a study about the amazing power of books.
Just How Gullible Is David Brooks?
Gotham Schools ^ | 8 May 2009 | Aaron Pallas

Posted on Monday, May 11, 2009 4:46:09 PM by Bob017

Now that I have your attention … Today’s New York Times column by David Brooks touts a new study by Roland Fryer and Will Dobbie of the Harlem Children’s Zone (HCZ) Promise Academy charter schools, two celebrated schools in Harlem. Fryer and Dobbie’s finding that the typical eighth-grader was in the 74th percentile among New York City students in mathematics leads Brooks to state that HCZ Promise Academy eliminated the black-white achievement gap. He’s so dumbstruck by this that he says it twice. Brooks takes this evidence as support for the “no excuses” model of charter schools, and, claiming that “the approach works,” challenges all cities to adopt this “remedy for the achievement gap.”

Coming on the heels of yesterday’s release of the 2009 New York State English Language Arts (ELA) results, in which the HCZ schools outperformed the citywide white average in grade 3, but were well behind the white average in grades 4, 5 and 8, skoolboy decided to drink a bit more deeply from the datastream. The figure below shows the gap between the average performance in HCZ Promise Academy and white students in New York City in ELA and math, expressed as a fraction of the standard deviation of overall performance in a given grade and year. The left side of the figure shows math performance, and the right side shows ELA performance.

It’s true that eighth-graders in 2008 scored .20 standard deviations above the citywide average for white students. But it may also be apparent that this is a very unusual pattern relative to the other data represented in this figure, all of which show continuing and sizeable advantages for white students in New York City over HCZ students. The fact that HCZ seventh-graders in 2008 were only .3 standard deviations behind white students citywide in math is a real accomplishment, and represents a shrinkage of the gap of .42 standard deviations for these students in the preceding year. However, Fryer and Dobbie, and Brooks in turn, are putting an awful lot of faith in a single data point — the remarkable increase in math scores between seventh and eighth grade for the students at HCZ who entered sixth grade in 2006. If what HCZ is doing can routinely produce a .67 standard deviation shift in math test scores in the eighth grade, that would be great. But we’re certainly not seeing an effect of that magnitude in the seventh grade. And, of course, none of this speaks to the continuing large gaps in English performance.

But here’s the kicker. In the HCZ Annual Report for the 2007-08 school year submitted to the State Education Department, data are presented on not just the state ELA and math assessments, but also the Iowa Test of Basic Skills. Those eighth-graders who kicked ass on the state math test? They didn’t do so well on the low-stakes Iowa Tests. Curiously, only 2 of the 77 eighth-graders were absent on the ITBS reading test day in June, 2008, but 20 of these 77 were absent for the ITBS math test. For the 57 students who did take the ITBS math test, HCZ reported an average Normal Curve Equivalent (NCE) score of 41, which failed to meet the school’s objective of an average NCE of 50 for a cohort of students who have completed at least two consecutive years at HCZ Promise Academy. In fact, this same cohort had a slightly higher average NCE of 42 in June, 2007.

Normal Curve Equivalents (NCE’s) range from 1 to 99, and are scaled to have a mean of 50 and a standard deviation of 21.06. An NCE of 41 corresponds to roughly the 33rd percentile of the reference distribution, which for the ITBS would likely be a national sample of on-grade test-takers. Scoring at the 33rd percentile is no great success story.

How are we to make sense of this? One possibility is that the HCZ students didn’t take the Iowa tests seriously, and that their performance on that test doesn’t reflect their true mastery of eighth-grade mathematics. The HCZ Annual Report doesn’t offer this as a possibility, perhaps because it would be embarrassing to admit that students didn’t take some aspect of their schoolwork and school accountability plan seriously. But the three explanations that are offered are not compelling: the Iowa test skills were not consistently aligned with the New York State Standards and the Harcourt Curriculum used in the school; the linkage of classroom instruction to the skills tested on the Iowa test wasn’t consistent across the school year, and Iowa test prep began in February, 2008; and school staff didn’t use 2007 Iowa test results to identify areas of weaknesses for individual students and design appropriate intervention.

If proficiency in English and math are to mean anything, these skills have to be able to generalize to contexts other than a particular high-stakes state test. No college or employer is ever going to look at the New York State ELA and math exams in making judgments about who has the skills to be successful in their school or workplace. I’m going to hold off labeling the HCZ schools as the “Harlem Miracle” until there’s some additional evidence supporting the claim that these schools have placed their students on a level academic playing field with white students in New York City.
1 posted on Monday, May 11, 2009 4:46:09 PM by Bob017
And killjoy Charles Murray (who I always understood to be a little weird) actually posted a sensible comment
I’m not being mindlessly pessimistic. The problem is that we have had 40 years of “Miracle in X”—the early Head Start results, the Milwaukee Project, Perry Preschool, the Abecedarian Project, Marva Collins’s schools, and the Infant Health Development Project, to name some of the most widely known stories—and the history is depressingly consistent: an initial research report gets ecstatic attention in the press, then a couple of years later it turns out that the miracle is, at best, a marginal success that is not close to the initial claims.
I haven’t seen the study by Roland Fryer and Will Dobbie that was the basis for Brooks’s column, but if I’m going to be such a grinch I might as well lay out the kinds of things I will be looking for (these are generic issues, not things that I necessarily think are problems with this particular study) when I get hold of a copy:

1. Selection factors among the students. Did the program deal with a representative sample? Was random assignment used?
2. Comparison group. Who’s in it? Are they comparable to the students in the experimental group?
3. Attrition. What about the students who started the program but dropped out? How many were there? How were they doing when they dropped out?
4. Teaching to the test. After seven years of No Child Left Behind, everybody knows about this one. Worse, there are the school officials who have rigged attendance on the day the test was taken or simply faked the scores—that’s been happening too with high stakes testing.
5. Cherry-picking. Do the reported test scores include all of the tests that the students took, or just the ones that make the program look good?
6. The tests. Do they meet ordinary standards for statistical reliability, predictive validity, etc.
7. Fade-out. Large short-term test score improvements have, without exception to date, faded to modest ones within a few years.”

No comments: