How to Lie With Test Scores

Sean Carroll comments on an item in the Atlantic Monthly on test scores compared across nations. There are two things that really bug me about this item, the most important of which is the deeply dishonest graphic the Atlantic did to illustrate the item.

Here's the honest version of the graph, redone using data from this table (the relevant figures don't appear in the report cited in the original piece). (Click on the graph for a larger version.)

i-ba518544db2aa101dc9208f14031d61f-med_TIMMS.jpg

I've plotted the normalized test score (the score for each country divided by the reported maximum score, because I'm a physicist and like numbers between 0 and 1) for the same seven countries cited in the original. I made the simplfying assumption that the minimum score is zero, because I can't seem to find a clear statement of what the actual score range is. It may very well be something like 200-800, because the people who design standardized tests like to make the results as confusing as possible, but that doesn't really change the point.

If you look at that graph, you wouldn't say that there's a major crisis in American science education. Yeah, we're at the bottom of the range, with a score of 66% compared to Singapore's 72%, but the difference isn't all that huge.

The dishonest version of the graphic is below the fold:

i-ad8b90f6277d51c9a3a589d5dcfac69d-eighthgrademath.jpg

The top part of the graphic is the same set of scores, rescaled to make everything look more dramatic. If you zoom way in, suddenly the US bar is one-third the height of Singapore's. Aiieeee! Crisis in science education!

Now, look, I'm not going to argue that we couldn't be doing a better job with science education-- obviously, we could. But this sort of cheap graphical manipulation is straight out of How to Lie with Statistcs. The sky isn't actually falling, and we've actually closed the gap somewhat since the 1995 test-- the US score went from 513 to 527, while the top-scoreing countries remained the same (Singapore dropped from 580 to 578, and Japan went from 552 to 554). Granted, our score increase was nothing compared to Hong Kong, which jumped from 510 to 556, but it's not completely trivial, either.

Even if the point is to scare people into taking positive actions, though, there's just no excuse for this level of graphical deception.

More like this

"But this sort of cheap graphical manipulation is straight out of How to Lie with Statistcs."
I suppose it would be an unnatural stretch to give the author the benefit of the doubt that a deceptive graphic in an article about the failings of our science education was a self-concious bit of irony?

Sadly, it does not surprise me. The irony is that while the United States media tend to lie in order to paint the US in educational crises, at least some other nations lie to inflate their standings.

about six or eight months ago there was a report coming from China talking about the number of trained engineers they were graduating, which as one might expect, based on their population alone, was a rather large number. This report was even out of proportion to that. And for months I was seeing the shockwaves of that report echoed in US media-- crisis in US education, crisis in US engineering, projected crises in this, national crises in that, and thirty years from now we're all going to be speaking Chinese.

About a month later, since my news trackers tend to fetch me these things, a counter report came from a US university pointing out that the Chinese report was, shall we say, fatally flawed. Among other things, the Chinese standards for "engineering graduate" are about on par with the United States requirements for a tech/associates degree. When all those things were factored in, China came in faring quite a bit worse. It was a piece of PR/propaganda that the US/Western media took almost completely uncritically.

Even more than that, criticism of the piece was avoided precisely by (most of) those who should have disregarded it, because of politics and money-- it made a great public hammer to use when begging (or demanding) greater funding for this and that. Don't worry about whether it's actually true, just use it to wrangle more money. (Obviously, the exception was whatever school did the counter-study-- Harvard, or possibly MIT, if memory serves.)

But even so, for months afterward, even when pointing to the counterstudy, the dominant theme of every conversation I was in regarding the subject was, "We're doomed!"

I love the United States. I love the freedom of our media. I'll even go so far as to opine that our tendency to believe the absolute worst about ourselves and the absolute best about others, combined with our frantic need to do something about it is probably a part of why we do so well.

But it drives me nuts, sometimes.

By John Novak (not verified) on 07 Oct 2006 #permalink

If the Atlantic did self-conscious irony, maybe.

Our seminar topic for the science majors this week is exactly this... how to present information visually... thanks for the example.

But, we don't know the significance of the range from 527 to 578. So we can't honestly say if this 50point variation means we are much dumber, or barely significantly dumber than say Sinopore. I'd prefer their original chart, at least you can tell who is where. It would be better not to use bars in a case like this, where we don't know where to place a proper zero point.

Not that I disagree with you, but the format of the graphs may have been partially governed by space and readibility issues - it would not be particularly easy to see any differences at all on your graph if you squashed it down to the aspect ratio of the ones in the article.

You could say that's your point (!) but I thought that the more interesting thing was not so much the relative test scores, but more the fact that US and Australian students seem far less aware of their possible ignorance. Shame there's no data for the UK...

Not having read the Atlantic article, but knowing psychological and educational scaling, I'll take an educated guess. The standard deviation will be about 100 for each nation. Thus, the difference between the US and Singapore is about 1/2 standard deviation. If so, then about 70% of Singaporean students score higher than the average US student. The problem with absolute scaling is there is no real meaning to the score range, it all relies on the standard deviation.
That said, the US and Australian student sseem to share a joyous ignorance in Grade 8. What does this mean by the end of high school - how many are still taking science courses then?

I disagree that the original is misleading. The numbers are there on the graph for anyone to compare. It's just uninformative, since we have no idea if a 50-point difference is meaningful. Your rescaled plot is equally uninformative. One highlights the difference, the other hides it. Neither says whether it's meaningful.

It's not like those USAToday graphs where the length of the bars has no relationship to the numbers next to them. I saw one once where a *higher* average SAT score was given a *shorter* bar, just so they could make their point.

A 200-800 test "score" should ((is supossed to?) be normalised so that the center of the gaussian is at 500. As every country has more than 500 points, it is clear it is not the case. If it were a normalised score, the proper way to draw the bars should be from the center of the gaussian up and down. This could explain why the test is plotted taking 500 as origin.

By Alejandro Rivero (not verified) on 07 Oct 2006 #permalink

"As every country has more than 500 points, it is clear it is not the case."

Many more countries were considered than the few selected for this comparison. The average country score is 473, which is likely different from the average studen't score.

-------

"But, we don't know the significance of the range from 527 to 578."
I didn't look at the detailed statistics (althoguh they are in the report), but they used color coding to indicate significant differences compared to the US, at the 0.05 level. The US was at 527. The Netherlands at 536 were not significantly different, but Hungary at 543 was. New Zeland at 520 was not, but Lithuania at 519 was. That should giva an idea, but further details are in the report:
http://nces.ed.gov/pubs2005/2005005.pdf

Not that I disagree with you, but the format of the graphs may have been partially governed by space and readibility issues - it would not be particularly easy to see any differences at all on your graph if you squashed it down to the aspect ratio of the ones in the article.

In that case, the appropriate display format is a table of numbers.

The original report linked by the Atlantic has a graph of math scores that was more acceptable-- it runs from 350-650, which isn't the full range, but does show enough of it that the distortion isn't too great.

You could say that's your point (!) but I thought that the more interesting thing was not so much the relative test scores, but more the fact that US and Australian students seem far less aware of their possible ignorance. Shame there's no data for the UK...

I agree that the main point of the graph was to highlight the difference between the scores and the confidence ratings, and they probably squashed the graph to highlight that. The net effect is deceptive, though, because it looks like the average US student is a blithering idiot with an inflated sense of self-confidence, when in reality, the score difference isn't all that big.

I'm also highly skeptical of the self-confidence thing, given the rather drastic cultural differences between the US and Japan. I half think that if you asked about student self-confidence in conversational Japanese, American students would come in higher than Japanese ones.

The joke is that you get 200 points on the SAT if your check doesn't bounce. Don't know if that's still a good mythologization, but anecdotally I'd say we can safely bet that the bottom ain't zero, maybe somewhere between 0 and 200.

-----------

The net effect is deceptive, though, because it looks like the average US student is a blithering idiot with an inflated sense of self-confidence, when in reality, the score difference isn't all that big.

Except that this, in either of the graphs as presented in this post, is simply not true -- no error bars, no error statistics, so we cannot say either way. If, as #10 says, this is supposed to be Gaussian, then the usual 1,2, and 3-sigma interpretation applies. So, if a standard deviation was, as an example, 2 points (or even better, the sample standard deviation for the country with the smallest sample, which is, after all, essentially identical to the MLE for the standard deviation for large sample sizes, and we all know that to statisticians large is more than 30) then the difference between the US and Singapore would have staggering implications. Whereas, if the sample standard deviation were more like 25-30, then it wouldn't really be.

And for the purposes of comparing against other countries who are significant economic competitors or partners, you're already filtering out, which shifts the relevant range of statistics. You'd only care about whose doing better than you or whose catching up to you (so say, not SAT scores for people who become English majors); this implies that you threshold somewhere on only the data you want to look at, since there's no useful information below 500 if the lowest score for the countries you care about if 527. Really this is a ghastly graph, but that's because, as a number of people pointed out, you can't tell how important the differences are. If it weren't for the error bar issue (which in my opinion renders the graph completely useless), this would be a perfectly fine plot.

Correction: You'd only care about whose doing better than you or whose catching up to you in ways you care about economically (so say, not SAT scores for people who become English majors)...

Upon review, I realize that these aren't necessarily SAT scores.

They're not SAT scores-- they're scores on the "Trends in International Mathematics and Science Study" test, and I haven't been able to find anything giving the score range. Percentage-wise, a shift to a 200-800 score range wouldn't make a huge difference-- the US-Singapore gap goes from 66-72 to 55-63.

The bigger issue in comparing scores across countries is structural-- many other countries have educational systems that are much more strongly "tracked" than the American system. It wouldn't surprise me terribly if there are students taking the test in the US who wouldn't be in a position to take it in some other countries.

(Back when I was in high school, a local district got some positive media attention for a dramatic uptick in their average SAT scores. It took a few years before somebody took a closer look, and discovered that their guidance counselors were advising the weaker students not to take the SAT, and thus removing them from the test pool. It's sort of the "evaporative cooling" method of boosting your test scores...)

Lie Factor FTW?

Tufte:

"The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented."

By Adam Ciapponi (not verified) on 07 Oct 2006 #permalink

I'm also highly skeptical of the self-confidence thing, given the rather drastic cultural differences between the US and Japan. I half think that if you asked about student self-confidence in conversational Japanese, American students would come in higher than Japanese ones.

Agreed - the fact that the major difference is not between the US and the rest but English-speaking/'Western' and SE Asian socities would suggest to me that cultural factors are at least partially involved (that's why UK data would be useful).

Then there's the question of whether having more confidence is necessarily a bad thing; people with more self-confidence may be more likely to continue with, or at least maintain an interest in, science.

Colst, thanks for the link to the original report.

( http://nces.ed.gov/pubs2005/2005005.pdf )

Page 15 says that the international average for 4th degree is 495. Round-off errors? Or something worse? Because next page list the international average for 8th grade to be 466! Surely, as you say, they are not taking weight to average the countries.

The plots in page 12, 21, etc are done in the way Orzel suggest, no mine :-(

By Alejandro Rivero (not verified) on 08 Oct 2006 #permalink

There are small selections effect: in some countries non English native students are selected only if they have taken at least one year of English course. Also, sometimes the lenght of formal schooling (4 or 8 years) varies.

It is not only a international (spread in space) study but also evolutive (spread in time). So perhaps the distribution was adjusted to 500 in some preliminary test years ago.

By Alejandro Rivero (not verified) on 08 Oct 2006 #permalink

It's sort of the "evaporative cooling" method of boosting your test scores...

I know this is a dead thread, but I have to day, this is just hilarious.