The Trends in International Mathematics and Science Study seems to indicate that the U.S. improved relative to the rest of the world between 2003 and 2007. Compared to the average score of 500 across the 36 nations studied, U.S. fourth graders scored an average of 529 in math and 539 in science; eighth graders, for which 48 nations were studied, scored 508 in math and 520 in science.
East asian nations topped all the lists; the results were particularly striking in the eighth grade math scores, where the the five east asian participants - Taiwan, Korea, Singapore, Hong Kong, and Japan - averaged between 598 and 570, while no other nation averaged above 517. The bottom of the list was dominated by middle eastern nations, though it's to be noted that the study didn't really cover some major areas of the world, like subsaharan Africa and Latin America.
Massachusetts fourth graders scored an average of 572 in math and 571 in science, comparable to the east asian nations; eighth graders' average scores were 547 and 556 respectively. I wonder what we're doing better than the rest of the U.S.?
Source for pdf data comparing the U.S. to the rest of the world: http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2009001
Page containing Massachusetts numbers: http://www.doe.mass.edu/news/news.asp?id=4457
I'm also curious. Possible bad reasons include teaching to the test, careful selection of which schools get tested, and discouraging bad students from taking the tests. Note I have no evidence of any of these, but they are things a serious investigation should be looking to disprove.
The schools and students appear to be initially selected randomly, based on reading the basic report. Schools can elect not to participate, which may introduce self selection bias; the U.S. report mentions this, and mentions that they've done some analyses that indicate this probably isn't affecting the results much, but that they haven't done enough analysis yet to prove that there's no effect at all or to quantify any small effect.
Even if there is self selection bias, though, would that explain a discrepancy between Massachusetts and the rest of the U.S.? I think you'd have to have strong self selection in Massachusetts and not in the rest of the U.S., which seems somewhat unlikely - or preferential participation by the less good schools in the rest of the U.S., which seems even less likely. And, of course, even if it is self selection among the schools, it seems fairly unlikely that 95 schools in Massachusetts would manage to score nearly a full standard deviation higher than other reporting schools without some real effect beyond random variation. It's likely there's something different about those schools, even if they're self selected, and it would be interesting to find out what.
It's possible schools are teaching to the test, though I'd think they'd more likely be teaching to the No Child Left Behind tests rather than to the tests from an academic study. Again, though, this doesn't by itself explain the difference between Massachusetts and East Asia, on the one hand, and the rest of the world on the other hand - you'd have to believe people are teaching to the test in the places that did well, and not in other places.
Teaching to the test - or perhaps a more general Flynn effect - might help explain a general increase in absolute test scores. Without additional ad hoc assumptions, though, it would still not explain U.S. improvements relative to the rest of the world. However, those improvements are small, which is why I think the differences between high scoring areas and lower scoring areas are more interesting.
I've been too busy to respond in detail, so I've probably missed the chance for an inclusive discussion, but here are a few things I would have mentioned two days ago.
First, we always need to be wary of the ecological fallacy. The things that's actually being measured is the test scores of individual students. Averaging them across a state and comparing that to the larger average can make differences look a lot larger than they really are, because you're treating averages as if they were data points, which they aren't.
The next thing that raises a warning flag is that only two out of fifty states asked to be split out. I'm not sure how that might indicate something screwy going on, but it is odd, and I would like to understand better why that is. I also can't help but point out that the study is run out of a Massachusetts university. Again, I don't know precisely how that might affect the results or the choice to be broken out separately, but it is a bit odd that one of two states that asked to be broken out is the one where the study is headquartered.
I still haven't actually looked at even the short report, but I know that TIMSS has had a couple of problems in the past, and I don't know if they've been addressed. One big issue is that the definition of the population being studied varies a lot from one country to the next. What constitutes "fourth grade" or "eighth grade" is not an international standard, and TIMSS scores tend to be about 40-50 points higher for students a year older, so if one country's "fourth grade" has an average age of 8.7 and another's has an average age of 9.7, that could easily account for much of the difference. I wouldn't expect a year difference in average age between Massachusetts fourth grade and Alabama fourth grade, but there could easily be a few months there if one state generally uses a cutoff birthdate significantly different from the others', or if policy regarding making exceptions to the cutoff is significantly different.
By the way, it doesn't directly relate to your questions, but at least in 2003, none of China, Brazil, India, France, Italy, Germany, or Canada were in TIMSS at all.
Can you explain how the ecological fallacy "can make differences look a lot larger than they really are"? My understanding of the ecological fallacy is just that we should be wary of making assumptions about individuals - and comparisons between individuals - based on the subgroup averages.
It's not clear to me whether two out of fifty states actually asked to be split out, or whether two out of fifty states noticed that they did really well and so featured their state specific information on their web sites. I tried to download the master PDFs from the TIMSS site, but my viewers said they were corrupted - I'm guessing they work on Windows or a more recent version of Acrobat. Being run from a Massachusetts university does suggest that the case of Minnesota might be more interesting.
They mentioned grade matching, but it's not clear to me whether there might not still be a discrepancy of up to 6 months. That could affect some of the differences in the report, but given the standard error is in the single digits of points, it would not change the qualitative conclusion that the east asian participants, Massachusetts, and Minnesota did significantly better than the rest of the world.
Most of the countries you mentioned as nonparticipants in 2003 were still nonparticipants in 2007, for what it's worth.
Can you explain how the ecological fallacy "can make differences look a lot larger than they really are"?
That was sloppy language on my part. What I really should have said was that ecological correlations make associations between variables look a lot stronger than they really are. If I'm trying to explain student academic achievement, with an eye to making policy decisions, it's of interest to know how strong the correlations are. An R-squared on the association between state average income and state average test scores will be much larger than an R-squared on the association between individual parental income and individual test scores.
One consequence is that if I say that Massachusetts has an average score one standard deviation above the national mean, then it's critically important to know if that standard deviation is measured from a sample of students or from a population of states. If I have fifty subgroup averages and I'm taking the variance of those, I'm going to get a smaller variance than if I directly measure the variance of individual student scores.
An interesting investigation would be into how individual test scores are associated with parental income as one independent variable and average family income as a separate independent variable. I.E., measuring how much is a family effect and how much is a community effect. If we only use average scores that investigation is impossible, but with individual scores that potentially very useful information may be extractable.
Ah, that makes sense in the context of examining correlations. To be honest, I wasn't paying much attention to any correlations other than to the geographical study areas because it didn't seem to me that the available candidates for independent variables were things that either most parents or schools would have control over.
For what it's worth, the standard deviations I was thinking of were the standard deviations of the individual students within national populations, which were typically around 60-70 test points. The standard deviations of the national average numbers within the population of national averages is probably in the same range, just eyeballing the results. The standard deviation of the individuals within the overall population of individuals would be, as you say, higher - I'm guessing it might be exactly 100 points because the point values might have been normalized to that number.
I agree that investigating the relative contributions of parental and community effects would be interesting. I'm guessing this study won't provide the data for the former, though.
So I've downloaded the 630-page 42MB monster off the BC site, and Massachusetts and Minnesota are listed as being "Benchmarking Participants" along with Alberta, British Columbia, Ontario, Quebec, the Basque Country, and Dubai. I haven't gone through in detail to figure out how being a "benchmarking participant" could affect results relative to the countries being studied, although I notice that some of the obscure stratification procedures are only mentioned as being used for the actual study participants and not for the benchmarks.
According to this, an average Massachusetts fourth grader is 10.3 years old, the same as for Minnesota and for the entire US. (An average Yemeni fourth grader is 11.2 years old, an average Qatari fourth grader is 9.7, and those are the extreme values.)
One more possible explanation for the performance of Massachusetts relative to the rest of the country: excluded students were (a) special education students, (b) disabled students in regular classes, and (c) non-English speakers. If Massachusetts is more liberal in its classification of students in one or more of those categories, it wouldn't take much to skew the average.
Er ... my impression is that it would take much, since the average needs to be skewed by more than half an individual standard deviation. I think you'd have to exclude something like the bottom third of students, which seems to me like a lot.
Then again, I don't actually have much of an idea on how many students might be excluded into those categories; for all I know, maybe it is a third.
Well, it turns out that it doesn't much matter. I found the tables detailing exclusion rates, and there's isn't much difference between MA and the US. For fourth graders, Massachusetts had a 10.4% exclusion rate, Minnesota 8.3%, and the Us as a whole 9.2%. For eighth-graders Massachusetts was 8.4%, Minnesota 7.5%, and the US 7.9%. Unless the excluded students differ markedly state-to-state for some reason I can't imagine, that's not nearly enough of a difference to explain much of the gap in average scores.
Some odd things about those exclusion rates, though: outside of the US and Canada, the highest 4th grade exclusion rate was Italy at 5.3%, and almost all countries were below 2%; Singapore's was 0.0%, Hong Kong 0.5%, Japan 0.6%. For the eighth grade things look pretty similar, except for some odd numbers out of Canada, where British Columbia's exclusion rate was 15.0% and Quebec's was 12.1%, both markedly higher than their fourth grade exclusion rates of 6.9% and 4.3%. Wacky.
(Classified differently were the rates of exclusion due to language; at the extreme, Latvia only included students taught in Latvian, who make up only 72% of the school population, so that's a much bigger gap than any of the special needs exclusions.)
(Just for the record, I did a back of the envelope calculation, and to shift a mean up by half a standard deviation you'd have to remove roughly the bottom 15% of the sample, or everyone lower than z=-1.07. The average individual removed would have had z=-1.58.)
I had no idea that we classified 10% of students as learning disabled these days. That seems awfully high - and the Singapore, Hong Kong, and Japan numbers kind of indicate that it's not necessary to have that kind of exclusion to maintain quality of education.
I do know of two regulatory effects that might encourage large "disabled" lists. One is that schools are now required to give special schooling to the learning impaired, or pay for whatever alternative the parents choose to use, which can be quite expensive. The other is that, I think, handicapped students are excluded from the No Child Left Behind testing on which schools are graded.
(Uh, 0.15 x 1.58 ~= 0.25, not 0.5? What am I missing here?)
You're missing the fact that when I try to do quick calculations late at night, I'm frequently an idiot. The actual answer is that you'd have to strike the bottom 30%, all the way up to z=-0.53, they'd have an average z=-1.16, and 0.7*0.5 + 0.3*-1.16 ~= 0. So your original "something like the bottom third" is right. Anyway, nothing like what we actually see when comparing MA to the whole US.
Oh, and that ~10% includes "students unable to be tested in English" in addition to special education students. Nonetheless, I think you're right that we've classified an awful lot of kids into special education, and the reasons you mention are definitely a big part of that. I wouldn't necessarily conclude that the East Asian numbers demonstrate that it's unnecessary, though; it could easily be that if they adopted our approach their test scores would be stratospheric.
That was my hypothesis, with the caveat that I think the inheritance is mostly social rather than genetic.
The following article brings up a similar hypothesis, but then provides a counterexample in the last paragraph:
That's a fine counterexample, but doesn't really count as evidence of much except that that one school is doing something right (that might or might not generalize well). I don't see any reasonable claim that this one school is somehow representative of any larger trend.
Well, I'm only presenting it as a counterexample, suggesting that there's more to it than smart parents resulting in smart kids, and not necessarily an example of a different trend. I'll even admit that it's not a complete counterexample, as it could be that smart parents are more likely to pressure the school district into doing the positive things that particular person mentions. I think there's definitely some relation to parental intelligence and education.
On the other hand, I do suspect that there's more to it than just that the parents are smart, and I suspect that expectations are part of it. I think it's likely that kids in schools with a lot of academically oriented parents might do better because the parents expect more, even relative to schools with equally smart parents that aren't in academia.
I would also be surprised if parental intelligence could fully explain the strongly bimodal distribution of the 8th grade math averages.
If you're saying that a child's academic performance is determined by more than parental intelligence, then I think anyone would agree with that. If you're saying that this one school represents evidence that there is something other than parental intelligence that helps explain why Massachusetts does better than the American average, then I would disagree.
(Also, when I say parental intelligence -> academic performance, I'm not especially trying to distinguish between better academic performance due to genetically smarter kids, better upbringing, parental pressure on school system, parents' employment in technical industries, or any of several other ways that parental intelligence could indirectly affect academic performance. I think disentangling all those effects might be partially possible with an enormous amount of careful work, but it would be very hard.)
The bimodal distribution (that I haven't actually seen myself) doesn't seem like something that any of my suggestions would go very far in explaining. I don't really have any idea why there would be such a distribution.
"Bimodal distribution" may not be exactly accurate given it's a discrete distribution, but there's a huge empty gap between the pacific rim and the rest of the world. The first 10 numbers are 598, 597, 593, 572, 570, 517, 513, 512, 508, 506, with the remaining thirty eight numbers pretty uniformly distributed down to 307. Perhaps this is more striking if one looks at the differences between the places: 1, 3, 21, 2, 53, 4, 1, 4, 2, etc. The 53 kind of stands out, especially since it also represents a pretty clear geographical boundary.
Personally, I wonder if it might represent a difference in expectations in how much math a typical adult should be expected to have. For example, I think that even smart U.S. parents generally think that arithmetic is sufficient to get by in every day life, even if they'd prefer their own children progress farther. If the expectation in the Pacific rim is that you really need algebra to get by in every day life, that might help explain why pacific rim countries continue to stand out in 8th grade, while Massachusetts and the U.S. average start falling back towards the norm.
Ah! That bimodal distribution. I was confused and thought you meant something else entirely.
After some clumsy fiddling with a spreadsheet, I sorted the scores for 4th and 8th grades for both math and science. I should probably present these as a table, but I'm lazy:
For 4th grade math, the first nine gaps are 7, 24, 8, 19, 5, 3, 4, 2, 5; the top four countries are Hong Kong, Singapore, Taiwan, and Japan (South Korea does not appear to have participated at the 4th grade level). For 4th grade science, the gaps are 30, 3, 6, 2, 4, 0, 3, 2, 1; the top country is Singapore, and while the next three are Taiwan, Hong Kong, and Japan, the gaps are not too dramatic. 8th grade math is the one you already listed. 8th grade science gaps are 6, 7, 1, 12, 2, 0, 1, 7, 1; the top four countries are Singapore, Taiwan, Japan, and South Korea, and for some reason Hong Kong slips to ninth place here.
There are a lot of oddball things in those numbers. The gaps between Japan and the Tigers vs. Rest Of World are highly variable; they're more distinct in math than in science, and they get more distinct from 4th to 8th grade. For 4th grade science it's really Singapore and then everyone else, but by 8th grade that's completely different, as the Singapore kids have fallen back to earth, and Hong Kong fell below Hungary, the Czech Republic, and Slovenia.
Remind me not to send my kids to school in Qatar. 296, 294, 307, 319.
Oh, and if Massachusetts were a country, it would have ranked fourth in 4th grade math, second in 4th grade science, sixth in 8th grade math, and third in 8th grade science.
(Then there are a whole pile of tables breaking down math and science into different subcategories, including a breakdown into knowledge vs application vs. reasoning.)
I doubt that the rich, smart parts of MA are unlike other rich, smart places in the US in this regard. But, I think it's possible that the rich, smart parts of MA are large relative to the size of the state, unlike similar areas in other states.