More Deceptive Graphs: Scales Matter

Yet More Deceptive Graphs

As you've probably heard, there was a horrible incident in Pittsburgh this weekend, in
which a crazed white supremacist who believed that Obama was coming to take his guns shot and
killed three policemen. Markos Moulitsas, of Daily Kos, pointed out lunatics like this shooter
are acting on conspiracy theories that are being relentlessly promoted by the likes of Glen
Beck and Michelle Bachman. It's not an unreasonable thing to point out, given the amount of
time that Beck and Bachman have spent lately talking about the impending socialist/fascist
crackdowns that will require a revolutionary response from all right-thinking patriotic
citizens.

Now, you may think that Kos is an idiot. In fact, even though we agree on many
political issues, I think that Kos is an idiot. I (obviously from what
I wrote above) happen to agree with the basic hypothesis that if you tell
people that the government is going to come and get that and that they need to
defend themselves, that some people are going to believe that the government is
coming to get them and that they need to defend themselves. But the way
that Kos responded was disgusting; it was latching on to a tragic event in
a shallow, snide, heartless way.

But whether you think Kos is an ass ore not isn't the point. Regardless of your opinion of
the man, there's no arguing the fact that he's created a website that draws a really
astonishing amount of traffic, and has become a nexus for many activists on the political
left.

And that, in turn, naturally draws hatred and mockery from the political right. Because,
you see, no one who disagrees with those fine patriotic folks could possibly be an
honest, serious person. They must be a bunch of scheming bastards, obviously.

So, when Kos came out bitching about how the rantings of various crazies really do
have a connection to the actions of people like the Pittsburgh killer, naturally it couldn't be that he actually believed that people ranting about how the President is
creating a fascistic tyranny that's going to come take all of your guns could actually
inspire a crazy person to believe that the President creating a fascistic tyranny that was going to come and take away his guns. No, that couldn't be. He must be up to something - like trawling for hits!

Which, finally, brings us to our topic.

A conservative blogger named Moe Lane posted his theory about why Kos spoke out about the Pittsburgh shooter. It's because his pageviews have declined so much. But, of course, it wouldn't be good enough to just say that DKos pageviews are down - he's got to show that it's specific to those dirty liberals. So he produces two graphs - one for DKos, and one for RedState, a major conservative site. Here are his graphs; DKos first, Redstate second:

i-e2d454db676c79ec4ddd086b3d40e465-dkos-300x244.jpg
i-e2992d4c22e06ac04b150b3cd17ad5f8-rs-300x244.jpg

A quick glance shows that both had a huge spike right around the elections, and then they
dropped off pretty dramatically. Then both had a slow upward trend. But the RedState trend
looks a lot steeper.

What you won't notice in a quick glance is that the scales are totally different. The DKos scale runs to 80 million hits in October '08; the RedState scale runs to 3 million hits. The absolute increase in hits per month since the election is
actually larger at DKos than at RedState. The DKos increase per month since December is over a million pageviews; the total increase in pageviews at RedState over
those four months is around 1/2 a million.

This is a very common problem in published graphs. It's often done by mistake: if you use
a tool like Excel, and tell it to plot data, it will automatically select scales based on the data. If you're clueless, and you do things the most naive way, you'll wind up
with two graphs that are, individually, perfectly accurate; but taken together, are
very misleading.

On the other hand, it's also a common misleading tactic. People mess with scales
to produce very misleading results all the time. For two graphs to be meaningfully comparable, they need to have the same zero point, and the same scale. So if you're going to use graphs for comparison of two sets of data, at a minimum, you need to make sure that you match the axes - both the zero points and the scales. Better, if comparison is the goal, then
you should plot both sets of data on the same graph.

To give you a sense of how the data actually compares, I took the two charts posted
by Moe Lane, and eyeballed them to try to get numbers, and using Google Docs, I combined those numbers into one graph. Here's the result.

i-43a746fab23240335c9f63051cba083e-dkos_vs_redstate.png

There are two interesting things about that graph. One is just how badly DKos traffic dwarfs RedState. The two really aren't comparable. When you look at website traffic on
community-oriented sites like DKos and RedState, you get vastly different behaviors
at different scale. It's not fair to RedState to compare it to DKos - in community oriented sites, size begets size, and RedState simply isn't close enough to DKos to be able to
sustain the comparison. But if you insist on making it, the one relevant comparison would be
the slopes of the increase lines from December to now. I don't know how to superimpose it
on the graph (I'm not a GDocs wizard), computing a best-fit line to the four
points does produce a slightly steeper slope for Redstate - about 1.11 to 1.07. (And that's
being a bit generous in the computation; I can't really claim to have more than
two significant figures in my measurements; but those slopes are 3 sigfigs each.) In other
words, the growth rates are, pretty much, equivalent. In fact, overall, they track each other extremely well - each bump, each dip, appears in both graphs. The exact month-to-month pageview ratios vary somewhat, but overall they're pretty similar.

Which isn't exactly what Mr. Lane tried to suggest.

I actually think that in this case, he's just clueless. Those graphs look almost exactly like the graphs produced by sitemeter, a common web-service that monitors pageviews
on a website. I think that he just looked at the pageview graphs for the two sites, and
really genuinely thought that they were comparable.

Categories

More like this

I really like this series on deceptive graphs.

I have to disagree on this. Particularly:

It's not fair to RedState to compare it to DKos - in community oriented sites, size begets size, and RedState simply isn't close enough to DKos to be able to sustain the comparison. But if you insist on making it, the one relevant comparison would be the slopes of the increase lines from December to now.
If the comparison of absolute numbers isn't fair because of size, looking at the slope of the lines doesn't solve the problem. You would need to look at the % change in traffic, or something similar. Of course, this is basically what the two graphs that were originally posted let you easily do by eye.

No need to eyeball the original graphs to get the numbers, Marc. Those are clearly Site Meter graphs and Site Meter publishes the numbers, too. The one for Daily Kos is here and the one for Red State is here. The links were even provided as part of Moe Lane's original paranoid rant.

In fairness, I should point out that my blog Halfway There also spiked in the run-up to the November election. The awesome data is here, but looking at the vertical scale brings it all down to earth with a thud. Instead, I prefer to point out that my October 2008 numbers exceeded (the square root of) the corresponding numbers for Daily Kos. Wow!

Made a mistake on the tags in the above comment. Last paragraph is my words, not part of the quote.

Redstate should only stick to relative increase. In this eyeball comparison Redstate has gained ~45% while DKos has only gained ~5%. You are soooo right about the scales.

By natural cynic (not verified) on 06 Apr 2009 #permalink

When he looked over the graphs himself, Moe Lane was probably excited that the Red State bars appear to have a steeper slope than the Daily Kos bars. Even if that were true, however, given the order of magnitude difference (actually more like a 1.5 order of magnitude difference) in the actual traffic numbers, that would just mean that Red State might overtake Daily Kos if given a sufficiently large number of decades. Since the slopes aren't actually dramatically different, perhaps I should say centuries instead of decades.

I'm not certain, but I think it was "The Rachael Maddow Show" (which surprised me) which recently had a graph and show something like this:
9
8
7
6
5
0
evenly spaced on the left and then data which had gone from 5 to 8 or 9. That of course completely distorted the information, making it look like it had quadrupled rather than doubled. Not to mention that with the 0 there, the graph was just _wrong_.

By Robert Thille (not verified) on 06 Apr 2009 #permalink

A related deception that bugs me every time I see it: stock market graphs with no zero. Every day it looks like the market fell off a cliff. But in reality, the market lost a smaller percentage of it's total value than is apparent.

The bottom of the graph should be zero, even if it appears less dramatic.

The stock market graph that doesn't go to zero is now such a convention that nobody is misled. There are occasions where a graph that doesn't go to zero makes an important point that would be obscured in a full-range plot, but I like to see a big, obvious break in the y-axis to alert the viewer that something is being subtracted.

As far as stock market graphs go, often they don't go to 0 because they /can't/ - many places use a semilog graph, which makes percentage changes (the important bit) the same size no matter where they are... but makes 0 impossible to display because then the graph would have to be literally infinitely tall.

I've been considering using sinh as a graph scaling; acts logarithmic away from 0, but more linear near 0, and allows use of both positive and negative values. Not sure how well it would work, and I haven't had data that would find it useful lately.

The daily movement of the various stock market averages is almost always less than 5%. With a 5% daily change probably being more than a couple standard deviations from the mean. If the chart is meant to represent the movement during the day, and for many people this is what they want, the requirement for a zero baseline would require a huge chart to show the intraday detail. People using these charts should be familiar with what they are looking at. Should. The recent problems with derivatives show that plenty of people do not really understand the various markets, even if they work full time in the markets.

There are many ways of representing data. Understanding the scaling of the graph, the time frame, the other variables (dividend announcement and distribution, earnings announcements, mergers, addition to or subtraction from an index, . . .), . . . . may help to figure out what a stock did, or what a stock is going to do. Or it might just lead you to believe you know more than you think you do.

It always is good to have a sharp eye to evaluate graphically presented data. There is one thing I would point out, regarding the critical evaluation of data:

a crazed white supremacist who believed that Obama was coming to take his guns shot and killed three policemen

We do not know what the shooter thought. It is impossible to know what he thought. We only know what he said (or wrote).

Even non-deranged persons often say one thing, but think another. In this particular case, there is at least some evidence that the person was seriously deranged, meaning that the correlation between spoken word and actual thought could be rather poor.

In this case, it is somewhat likely that the message was contrived for some unknown reason.

There are these techniques called "2-way ANOVA", and "trend tests in residuals".
1. Take the logs (so we are looking at relative changes, not absolute)
2. Get the residuals from a two-way ANOVA
(residual = raw log - month_mean - site_mean + grand_mean)
3. Plot residuals versus time.
After adjustment for site size and for monthly variations, KOS is declining and the other one is increasing. If you want to get fancy, you can drop it into SAS or R and do a test for a trend interaction.

Oops. Wrote that backwards. Kos is increasing readership and the other is decreasing. The test for trend is significant.

Ah, "How to lie with statistics" called the out-of-scale or non-zero-based graph a "Gee-whiz graph" because it exaggerates what it purports to show.

My comparison of the right-wing doomsayers is that they risk the kind of extreme reaction that "The War of the Worlds" radio broadcast produced. In spite of repeated disclaimers during the show that it was a work of fiction, people in the U.S. were killing themselves to avoid the horrors of the Martian invasion.