Election Fraud? Or just bad math?

By goodmath on June 6, 2006.

I've gotten an absolutely unprecedented number of requests to write about RFK Jr's Rolling Stone article about the 2004 election.

RFK Jr's article tries to argue that the 2004 election was stolen. It does a wretched, sloppy, irresponsible job of making the argument. The shame of it is that I happen to believe, based on the information that I've seen, that the 2004 presidential election was stolen. But RFK Jr's argument is just plain bad: a classic case of how you can use bad math to support any argument you care to make. As a result, I think that the article does just about the worst thing that it could do: to utterly discredit anyone who questions the results of that disastrous election, and make it far more difficult for anyone who wanted to do a responsible job of looking into the question.

Let's get right into it. He starts his argument by claiming that the exit polls indicated a different result than the official election results:

The first indication that something was gravely amiss on November 2nd, 2004, was the inexplicable discrepancies between exit polls and actual vote counts. Polls in thirty states weren't just off the mark -- they deviated to an extent that cannot be accounted for by their margin of error. In all but four states, the discrepancy favored President Bush.

The key sentence that indicates just how poorly RFK Jr understands the math? "they deviated to an extent that cannot be accounted for by their margin of error". That is a statement that is, quite simply, nonsensical. The margin of error in a poll is a statistical measure based on the standard deviation. Contrary to popular opinion, a poll with a margin of error of "4%" doesn't mean that the actual quantity being measured must be within plus or minus 4% of the poll result.

A margin of error is measured to within a level of confidence. Most of the time, the MoE that we see cited is the MoE with 95% confidence. What this means is that 95% of the time, the sampled (polled) result is within that +/- n% range. But there is no case in which a result is impossible: the margin of error is an expression of how confident the poller is in the quality of their measurement: nothing more than that. Like any other measurement based on statistical sampling, the sample can deviate from the population by any quantity: a sample can be arbitrarily bad, even if you're careful about how you select it.

Elections have consistently shown a bias in the exit polls: a bias in favor of the democratic vote. For some reason, which has not been adequately studied, exit polls almost always err on the side of sampling too many democratic voters. This could be the result of any number of factors: it could be a question of time (when were the polled people asked?); it could be a question of location (where were the pollsters located relative to the polling place?); it could be a social issue (the group of people that consistently votes for the democratic party may be more willing/have more time to answer pollsters questions); it could be something else entirely.

But you can't conclude that an election was stolen on the basis of a discrepancy between official election results and exit polls. The best you can do is conclude that you need to look at both the election and the exit polling process to try to determine the reason for the discrepancy.

According to Steven F. Freeman, a visiting scholar at the University of Pennsylvania who specializes in research methodology, the odds against all three of those shifts occurring in concert are one in 660,000. ''As much as we can say in sound science that something is impossible,'' he says, ''it is impossible that the discrepancies between predicted and actual vote count in the three critical battleground states of the 2004 election could have been due to chance or random error.''

That entire quote is, to put it crudely, utter bullshit. Anyone who would make that statement should be absolutely disqualified from ever commenting on a statistical result.

Now, thanks to careful examination of Mitofsky's own data by Freeman and a team of eight researchers, we can say conclusively that the theory is dead wrong. In fact it was Democrats, not Republicans, who were more disinclined to answer pollsters' questions on Election Day. In Bush strongholds, Freeman and the other researchers found that fifty-six percent of voters completed the exit survey -- compared to only fifty-three percent in Kerry strongholds.(38) ''The data presented to support the claim not only fails to substantiate it,'' observes Freeman, ''but actually contradicts it.''

Again, nonsense. There are two distinct questions in that paragraph, which are being deliberately conflated:

In each given polling place, what percentage of people who voted were willing to participate in exit polls?
In each given polling place, what percentage of the people who were willing to participate in exit polls were voters for each of the major parties?

The fact that a smaller percentage of people in places that tended to vote for the democratic candidate were willing to participate in exit polls is entirely independent of whether or not in a specific polling place a larger percentage of democratic voters than republican voters were willing to participate in the exit polls. This is a deliberate attempt to mislead readers about the meanings of the results - aka, a lie.

What's more, Freeman found, the greatest disparities between exit polls and the official vote count came in Republican strongholds. In precincts where Bush received at least eighty percent of the vote, the exit polls were off by an average of ten percent. By contrast, in precincts where Kerry dominated by eighty percent or more, the exit polls were accurate to within three tenths of one percent -- a pattern that suggests Republican election officials stuffed the ballot box in Bush country.

It could indicate that. It could also indicate that democratic voters were consistently more willing to participate in exit polls than republican voters. And therefore, in polling places that were strongly democratic, the sampling was quite representative; but in polling places that were strongly republic, the sampling was lousy.

Just to give an idea of how this works. Suppose we have two polling places, each of which has 20,000 voters. Suppose in district one, there is a 60%/40% split in favor of democratic voters; and in district two, there's the opposite; 60% republican, 40% democrat. And let's just use similar numbers for simplicity; suppose that in both polling places, 60% of democrats were willing to participate in exit polls, and 40% of republicans were willing. What's the result?

District one will have 12000 votes for the democrat, and 8000 for the republican. The exit polls will get 7200 democratic voters, and 3200 republican voters, or 69% of the vote going to democrats according to exit poll, versus 60% actual.
District two will have the opposite number of votes: 8000 for the democrat, and 12000 for the republican. The exit polls would get 4800 democrats, and 4800 votes for republicans - predicting a 50/50 split.

The democratic margin of victory in the democratic area was increased; the republican margin was decreased by slightly more.

It continues very much in this same vein: giving unreliable samples undue evidence; bait-and-switch of statistics; and claims of measurement errors being impossible. But none of the mathematical arguments are true.

Was there fraud in the election? Almost certainly. Particularly in Ohio, there are some serious flaws that we know about. But this article manages to mix the facts of partisan manipulation of the election with so much gibberish that it discredits the few real facts that it presents.

RFK Jr. should be ashamed of himself. But based on his past record, I rather doubt that he is.

More like this

RFK Jr. is at it again, just not about autism this time

I'm probably going to regret posting this article, as I normally don't venture much into these areas. Chalk it up to its being 6/6/06 and say that the Devil made me do it, but I plan on diving in. Besides, I feel the need for a brief change of pace. Regular readers of this blog know my low opinion…

Political polling

I heard it again the other night. One of the TV chin strokers talking about this poll or that poll showing Obama (or McCain) ahead with a "statistically insignificant" lead, and I thought to myself, no one who knew much about statistics would use a phrase like that. Strictly speaking, while there…

Margin of Error and Election Polls

Before I get to the meat of the post, I want to remind you that our DonorsChoose drive is ending in just a couple of days! A small number of readers have made extremely generous contributions, which is very gratifying. (One person has even taken me up on my offer of letting donors choose topics.)…

Basics: Margin of Error

The margin of error is the most widely misunderstood and misleading concept in statistics. It's positively frightening to people who actually understand what it means to see how it's commonly used in the media, in conversation, sometimes even by other scientists! The basic idea of it is very…

Well-done. Where can one find better data/anaysis on fraud, or lack thereof, in the election?

Sounds to me like you just want to poke holes in his argument, instead of take on the fundamental obviousness of his claims. seventy percent or more of each block of votes thrown out were African-American. Given that 98 percent of blacks now hate Bush, and it was maybe more like 85 or 90 in 2004, throwing out those huge blocs of votes had a major impact. If you think you could do such a good job, why don't you contact his people and offer to help him clean up his math? He had staffers and fact-checkers looking into this, while the Salon debunk had none. Maybe you could even get a job with Kennedy.

My guess is you're just a Republican tool masquerading as a high-minded independent. Give it up. This President is a fascist, and if we don't stop him soon, mountaintop removal will be coming to a town near you, defended by an army of illiterate Minute Men. Enjoy.

Question: I don't quite see how a scientific procedure could both have a systematic bias and an evenly distrubuted margin of error...There are occasionally margins of error that are uneven, not just "plus or minus x," right? If exit polls always skew Democratic, why doesn't the exit poll estimation of the results take this into account, either in the predicted outcome or the margin of error? Thanks, Mark!

Remember that what we're talking about here isn't a scientific procedure - it's a statistical sample of the real world.

The "margin of error" is a precise mathematical term which refers to a degree of certainty about the quality of measurement based on the sample size and the measured deviations within the sample. It is not a measure that attempts to describe the sum of all possible sources of errors: it's just a statistical measure of variation within the sample.

The value of the margin of error assumes that the sample is a properly collected, valid, representative sample. The consistent democratic bias of exit polls represents a sampling error - which is something that simply is not included in the meaning of the margin of error.

Pollsters can, in theory, try to modify their sampling procedure to eliminate the bias, but since we don't know exactly what the source of the errors in exit polls are, it's hard to figure out how they should be changed, or to even make a good guess of how large the error is in a given election.

Greg:

There is a wealth of information about the elections online - sources that purport to be balanced, sources with democratic bias, sources with republican bias... I don't want to stick myself into a giant rats-nest like that in my first day at my new blog :-). Any source I point at is going to offend someone.

If you're interested, I would suggest just doing websearches on Ohio in 2004. There were a variety of issues that have been covered in great detail: differentials between numbers of voting machines per voter in different districts; differing wait times; registration irregularities, etc.

Mark,

Great to see you here at Scienceblogs. I hope it works out well for you.

When I first heard about the RFK Jr. article, I just rolled my eyes and said "That guy has lost all credibility long ago." It's a shame because I don't think he's a bad guy and he's pretty smart.

Anyway, good luck again at the new site.

nice explanation...good to see a math guy on scienceblogs, finally. i like the title too...as a high school math teacher, i spend basically all day trying to squelch the bad math. thanks for the work.

Like any other measurement based on statistical sampling, the sample can deviate from the population by any quantity: a sample can be arbitrarily bad, even if you're careful about how you select it.
Nitpick: this is not quite true. In most cases every sample, representative or otherwise, is a *subset* of the population, so anything that's true of at least n members of the sample *must* be true of at least n members of the population, as the worst case. (The exception to this is something like weather forecasting, where you sample past events to predict the behavior of a population of future events. This requires an assumption of uniformity, otherwise you can't draw any conclusions whatsoever.)

The value of the margin of error assumes that the sample is a properly collected, valid, representative sample.
I don't think this is quite right either. The margin of sampling error assumes that the sample is collected in an unbiased way (which as you point out may not be true of exit polls), but that still doesn't mean that it will actually be representative. It's possible that *purely by chance* you happened to pick a sample that was significantly different from the overall population distribution.

Your general point is quite correct, of course; it's possible (although unlikely for a sample of any great size) to have the exit poll say 70% Kerry while the whole precinct actually voted 70% Bush, and the much smaller discrepancy actually reported is even less surprising.

I'd like to believe there was fraud, but I'm afraid I can't attribute to malice what can be adequately explained by (voter) stupidity.

You do realize that now you'll have to do a post on why you think the election was stolen? I've seen lots of accusations and would greatly appreciate your analysis.

"I'd like to believe there was fraud, but I'm afraid I can't attribute to malice what can be adequately explained by (voter) stupidity."

A colleague of mine had a fun working paper presented last year at the American Political Science Association annual meeting. She did some interesting simulations to demonstrate how very small biases could in principle be induced by petty fraud at the precinct level that would sway the election outcome, but would be very difficult to distinguish from random variation and honest counting errors.

Rothkin, Karen. "Voting as (Weak) Insurance: Tamper-Evidence of Low-Level Disenfranchisement" Paper presented at the annual meeting of the American Political Science Association, Marriott Wardman Park, Omni Shoreham, Washington Hilton, Washington, DC, 2005-09-01

Abstract: While normative political theorists have never been puzzled about why individuals choose to vote when there is little chance that their vote would be decisive, formal theorists continue to seek a quantitative explanation of why voting remains rational in democratic societies. I argue that voting is the best insurance an individual has to make election fraud detectable. It is not good insurance, so if the risk of occult disenfranchisement is troubling, we should seek more effective ways to detect, prevent and punish it. This paper assesses the magnitude of impact of one hard-to-detect form of vote suppression, a ï¿½conspiracy theoryï¿½ discussed after the 2004 election.

Imagine a scheme that targets precincts where the other party will do well, to somehow reduce the number of votes in those precincts. That plan would cost the schemerï¿½s candidate votes, but cost the other candidate more. The Condorcet jury theorem shows that, in a binary choice, the aggregation of individual propensities-to-choose is highly sensitive to small changes in the number of individual choices on each side. A small, unequal change in the number of votes on each side can be decisive in the aggregate.

Based on precinct-level data from the 2000 Election, I construct a distribution of number of districts as a function of two-party vote-share of the district. I simulate different strategies for suppressing turnout in precincts chosen for partisan leanings, and calculate who wins, by how much, and what fraction of citizens are ï¿½disenfranchisedï¿½ locally and nationally in the process.

This kind of scheme could alter the election outcome by suppressing far fewer votes nationwide than are already lost due to tabulation errors, and far fewer than the number reporting they were discouraged from voting by inconveniences. Thus, such a scheme could be effective and difficult to trace (though it may be impractical for other reasons). Nothing limits these conclusions to cases of malice. If there were systematically different obstacles in precincts of different partisanship, the outcome would be biased in the same way.

For those who've been asking why I think the election was stolen:

There were a lot of irregularities scattered around the country in the last election. What I find most compelling is the well-documented actions of Ken Blackwell in Ohio. There is no denying that the guy deliberately tried to throw away voter registrations for being on the wrong thickness of paper when it was noticed that they were tending to be registering more democrats; to the outrageous voting delays in democratic districts; to the refusal to send extra voting machines to democratic districts, even though the state deliberately had a number of unused machines reserved for solving that problem, and so on. Mr. Blackwell quite clearly did everything in his power to depress the democratic vote, and to stand in the way of any investigation.

(And he's still at it today. Go to todays NYT for info about the bizzare (and illegal under federal election law!) rules he's pushing in Ohio to govern registration for this falls gubernertorial electrion _in which he is a candidate_.)

While I agree there were flaws in RFK's article, I think this post conflates two types of polling: standard telephone-style polls, and exit polls. The blog post is correct that in telephone polls, democratic votes are overcounted: democrats are simply more likely to be available and answer the phone when polled.

However, for exit polls on election day, there is no such democratic bias -- in fact, there is a bias the other way, towards republicans. Exit polls are in person after people vote, and there is actually no evidence that in exit polls the democratic bias mentioned in this blog post somehow distorted the exit poll results towards Kerry.

The short answer is there was fraud. Because there was no real paper trail, though, all we have is statistics that suggest something smells bad in Ohio, individual stories of voter indimidation, long lines, and documentation of thousands of people (predominantly democrats) suddenly kicked off the voter rolls. Statistics, however, are viewed through the filter of the viewer, and can be used to suggest there was fraud, and can be used to suggest there wasn't fraud.

The numbers from Ohio can be used to legitimately support both analyses. For example, you can perform a Benford's law analysis on the Ohio results on a precinct-by-precinct level, and it will strongly suggest that, for the most part, noone sat down and randomly changed or added numbers to the vote totals, helping to rule out one kind of fraud. However, a Benford's law analysis will not rule out a corrupt elections board simply switching the results between candidates (something alleged to have happened in both Ohio and Florida in 2000, where in Flordia in 2000, for example, in many precincts down-ballot initiatives such as a minimum wage amendment would get 60-40 support, and the presidential ballot would be 60 bush, 40 gore. It is suspicious, but doesn't prove anything, that the same % that voted for bush also sought to increase the minimum wage.

The point that everyone agrees on, though, is that steps are necessary to return voters' confidence in our election system. I think this can be done in three simple steps: (1) disallow election officials from being in any way involved in political campaigns, (2) provide electronic chain of custody and verification to vote totals, something that is not an intractable problem, and (3) make all publicly used vote counting equipment, all vote counting done by officials, and all tallies and software used for tallies open for public inspection and viewing both before and during use.

doug:

Every piece of data that I've been able to find on the web seems to support the idea that democrats are more likely to participate in exit polls, and that the exit polls have error towards higher democratic vote percentages in each of the last six presidential and congressional elections. Where are you getting data to suggest that the exit polls skew republican?

I do agree with you that the current vote system is sufficiently screwed up that there is no way to actually prove conclusively whether recent elections have been fair, or if they've been manipulated/stolen; and that a big part of that is the highly partisan nature of the people in charge of running the elections. Partisans should not have any role in making the rules during an election.

Just to use Ohio as an example again: how can anyone feel sure that this falls election is going to be fair, when one of the candidates on the top of a party ticket is the person who is currently changing the rules for how people register and vote?

Mark, have you seen this Josh Mitteldorf paper? The graph on the last page shows ï¿½Hand Counted Paperï¿½ errors much less than the other four kinds of counting methods. Do you have an ideas why this would be? Is the difference statistically significant?

Thanks

Mark Chu-Carroll?s cavalier dismissal of the Kennedy Rolling Stone article displays an utter ignorance of the vast amount of detailed quantitative work that has been done on the relationship between partisan exit poll response rates and exit poll discrepancies that precisely addresses the issues that he prognosticates on in his blog.

I would strongly urge Mark to check this out before he again ?shoots from the hip without looking?!

See for example Baiman June 5 reports at www.freepress.org that summarize in a recent AAPOR power point and paper various NEDA studies since the election.

In short estimates using precise relationships between partisan exit poll response bias and actual precinct partisanship, shows that a fixed bias will give (on average ? subject to noise) a ?U? shaped exit poll discrepancy. Ohio precinct level exit poll discrepancies (the only precinct-level discrepancies we?ve been able to get as Mitofsky has refused to release detailed data) show no such ?U? shape but rather inexplicable unbiased discrepancy on the left and extreme Kerry discrepancy on the right with an overall large and highly significant Kerry discrepancy. All this adds up to a clear pattern of vote shifting and cannot be explained by overall partisan exit poll response bias in any way shape or form. The ?most explanatory? level of bias (1.18 - 59% exit poll response by Kerry voters and 50% response by Bush voters) leaves 30% of Ohio precincts with significant exit poll discrepancies and more than double the discrepancy necessary to give Bush the election.

Let me though summarize the non-exit poll data that is most damning ? this is excerpted from my second response to Farhad Manjoo of Salon:
b) Manjoo?s efforts to dismiss what he calls the ?purported rural vote shift? is even more outlandish. As Kennedy points out he doesn?t seem to understand the difference between a popular incumbent who earned more votes statewide than Gore in 2000 and a former Republican judge from Cincinnati who got a ?favored son? boost in that region; and an unknown, under funded, very liberal judge from Cleveland, who got 24% less votes than Kerry statewide, inexplicably getting more votes that Kerry in 12 of the most conservative counties (judging by their Bush vote shares) in Ohio!
Moreover, these same 12 counties just happen to be among the only 14 (out of 88 counties) where Bush?s vote is larger than Moyer?s (the incumbent conservative judge) by more than 43%. Moreover, the amount of ?excess Bush? vote (more than Bush?s state average of 21% more than Moyer) just happens to roughly match both by county and for entire state the ?lost Kerry? vote (what Kerry would have gotten if he had received his state average of 32% more votes than Connally in these counties) without any overall substitution from Moyer to Connally (Moyer?s vote is larger than the state average and Connally?s is smaller than the state average in all but one of these 12 counties).
Farhad, do you understand how absolutely remarkable such a series of ?coincidences? is?!!
I challenge you or anyone else to provide a plausible non-vote shifting explanation for these patterns.
Note that the Bush to Moyer ratio is independent of the Kerry to Connally ratio when there is no substitution between Moyer and Connally. It is simply impossible to understand why, out of all the 88 counties, 9 out of 14 cases where Bush does extraordinarily well relative to Moyer, just happen to be in the same counties where Connally does extraordinarily well relative to Kerry?!!!! And it is even more impossible to understand why the relative magnitudes of these impossible undercounts for Kerry and over counts for Bush should so closely match!!!!
I would take this evidence to a trial. Clearly a crime was committed in Ohio. There is simply no other explanation for these patterns other than vote shifting. The only thing we don?t know is who did it and how. And exactly this kind of information is necessary to get serious electoral reform - that you claim to support.

See www.baiman.blogspot.com for more tables and graphs on these points.

See documents stored at www.freepress.org for spreadsheets and detailed calculations and graphs. See www.uscountvotes.org for more papers and analysis.

Ron:

I approved your comment this time. I won't do it again if all you're doing is re-posting a column that you've written for another site on the web.

The bulk of Kennedy's argument is exit poll discrepencies. The fact of the matter is, if the exit polls are selecting non-representative samples, you *do* get a biased result. And the historical evidence is that exit polls *do* skew left.

There's plenty of stuff that went wrong in Ohio. It's my opinion, based on the evidence that I've seen, that there was vote manipulation in Ohio, and that (as I said in the original post), the election was stolen.

RFK's article credibly accepts any reasoning, any analysis, no matter how poorly done, no matter how poorly supported, if it supports the argument he wants to make. By mixing together the good and the bad indiscriminately, he produces an unconvincing case which is all too easy for partisans to knock down by pointing out all of the obvious errors. RFK has done a disservice to all of us who want to see the election system in this country get fixed: he's provided ammunition to the people very people who he believes fixed the election.

Unfortunately, this is not unusual for RFK. His writing has shown a strong trend towards accepting *anything* that supports his beliefs, and to ignore anything that doesn't.

Mark, some of what you say is on-target but much is seriously flawed, from both a math and statistics perspective. In addition, your comments on Freeman and Kennedy are intemperate and inappropriate for a blog focusing on science and good math.

Let me make four points. First, if you look at the data for prior elections in the U.S., you will not find a Democratic bias. At the national level we don't have enough observations to state that the errors are normal, but there is no consistent bias. And that is exactly what one would expect if those conducting the exit polls have been trying to be as accurate as possible. If there were a factor causing a bias in one election, one should expect the pollsters to correct it for the next election.

Second, "But you can't conclude that an election was stolen on the basis of a discrepancy between official election results and exit polls. The best you can do is conclude that you need to look at both the election and the exit polling process to try to determine the reason for the discrepancy." This statement sounds very reasonable, and I in general I'm willing to concur. However, counterexamples are very easy to find. The Ukraine comes right to mind.

Third, in your second quote you dismiss Steven Freedman crudely but you never state a reason why. Suppose Professor Freedman has stated ''As much as we can say in sound science that something is extremely unlikely, 'it is extremely unlikely that the discrepancies between predicted and actual vote count ..." would you still dismiss him? It seems that his "offense" is to use the word "impossible" rather than a term like "highly improbable." Given the intended audience and given his caveat included in the quote, your dismissal of his view is at best problematic.

Fourth and most important, when there is a discrepency between the vote and the exit poll, clearly either could be in error. There are, in fact, good reasons to believe that there were errors in the exit poll. If Mitofsky had "liberal-looking" pollsters outside heavily Republican precincts, then one could obtain the observed results. However, your analysis as a matter of statistics does not explain the discrepancy that Freedman cites. Your example finds "The democratic margin of victory in the democratic area was increased; the republican margin was decreased by slightly more." That is not, I believe, what the data indicate. I would recommend that you contact Professor Freedman and he can provide much more detail on what the data actually state rather than the oversimplification that you have provided.

Given the problems raised above, I have a serious issue with the statement "It continues very much in this same vein: giving unreliable samples undue evidence; bait-and-switch of statistics; and claims of measurement errors being impossible. But none of the mathematical arguments are true."

Mark - Baiman is right. Do the math! If you graph the example you use in your original post from 100% Bush to 0% Bush and apply any amount of reluctance Bush responder bias (rBr) and you will see exactly what Baiman is talking about - a U shaped within precinct discrepancy (WPD).

Then graph the actual data. It is not U shaped. It is a downward (or upward depending on the conventions you use) sloping line with a slight U shape (or inverted U depending on the conventions you use). The only explanation I can come up with is a slight rBr plus a stronger vote shift. Please explain!

It seems you are the one shooting from the hip not Kennedy and people like Baiman that Kennedy consulted with for his article. Please investigate to the depths that others have before trashing them.

Rich,

(1) "First, if you look at the data for prior elections in the U.S., you will not find a Democratic bias." Well, in 1988, 1992, 1996, 2000, and 2004 (and apparently at least most of the intervening off-years), you will find a Democratic bias. (See the Edison/Mitofsky evaluation report, http://exit-poll.net/election-night/EvaluationJan192005.pdf, p. 34.) What prior elections did you have in mind?

(2) I'm not sure how you think Ukraine 2004 is a counterexample, except in the sense that most observers concluded that there was credible evidence of fraud sufficient to put the outcome in doubt. But that evidence was much more robust than in the U.S., and as far as I can tell, the Ukraine exit polls did not figure prominently in the debate. (I say "polls" because there were two of them, and the difference between the two polls was larger than the nationwide difference between the U.S. poll and official returns.)

(3) I don't know why Chu-Carroll wrote that, and I agree that he should have slowed down and explained it. I would say that it is an eye-popping statistic directed at what, at this point, amounts to a straw man -- although I wouldn't necessarily hold that against Freeman.

(4) There has been a lot of subtle -- perhaps too subtle -- measurement discussion arising from the "Bush stronghold" argument. I invite you to stare at http://inside.bard.edu/~lindeman/aapor1b.jpg, which depicts the original percentage-difference "WPE" measure, only inverted so that red shift is positive. The scatterplot depicts a slight positive correlation, with percentage "red shift" slightly larger toward the right of the plot. As Elizabeth Liddle pointed out over a year ago, to some extent this correlation is likely to be an artifact. (Most simply put, consider the fact that large red shift is simply impossible in "Kerry strongholds," and large blue shift is impossible in "Bush strongholds." Although Elizabeth didn't emphasize this point, there are many more Kerry strongholds than Bush strongholds. You can almost visualize a diagonal line at upper left tracing the bounds of arithemetical possibility. This is more difficult at lower right because there are so few precincts.) All that said, I don't see how this plot supports a meaningful story about "Bush strongholds."

While I am at it, here is the rest of what I think:
http://inside.bard.edu/~lindeman/beyond-epf.pdf

Mark Lindeman

Skeptico, what we can say is that the difference isn't statistically significant once one controls for location -- urban versus rural. Edison/Mitofsky made this point in their evaluation report. Freeman has challenged it, but I don't think his position is tenable. See numbered pages 12-13 (PDF pages 13-14) of my paper http://inside.bard.edu/~lindeman/beyond-epf.pdf ; for further details, see around p. 40 of the E/M evaluation report at http://exit-poll.net/election-night/EvaluationJan192005.pdf .

Honestly, I think it is Very Strange to look at a chart that shows that the largest red shifts occurred in lever-machine precincts (n = 118), yet to focus analytically on the paper ballot precincts (n = 40). I might think differently if I had heard anyone venture a hypothesis about vote miscount in lever-machine jurisdictions _before_ this result appeared. As a New Yorker, I am still waiting for someone seriously to make the case that Kerry won my state by about 30 percentage points, as the exit polls indicate.

Dave:

Sorry, but the "math" used by Kennedy is terribly bogus. Exit polls *do* have a history of error in presidential elections; there are a variety of possible causes for that bias; and there is ample evidence of the bias being from benign causes. For example, see http://www.mysterypollster.com/main/2005/05/aapor_exit_poll.html for an overview of some of the analyses of the exit polling data. The fact is, a variety of ways of looking at the bias shows very normal bias distributions.

And for the arguments about vote shift - look at the Bush vote if you assume a vote switch like Baiman argues - it requires dramatic, non-credible declines in Bush votes in republican stronghold districts.

The fact of the matter is, Ohio is a state that tends very much towards the conservatives outside of the cities; and Ken Blackwell's actions WRT to urban voter registration, voting machine allocation, and voter ID verification were more than enough to ensure a republican victory in Ohio - there's no need to invoke crappy mathematical pseudo-analysis to support an overly elaborate conspiracy theory.

Mark:

Thanks. I can't follow the math but I think I understand you to say that the difference in the "paper counted" districts is not statistically significant.

Sorry to go on about this, but I just wanted to ask you one more thing that puzzles me. Regarding the greatest disparities between exit polls and the official vote count came in Republican strongholds: you say this could also indicate that democratic voters were consistently more willing to participate in exit polls than republican voters, and you give an example. But in your example, if say you were in an area with only a 10% Bush vote, you would expect:

Kerry - 18,000 votes - 10,800 exit poll
Bush - 2,000 votes - 800 exit poll

Kerry exit poll - 93%
Kerry election - 90%

-Still a difference. But if you look at the same Mitteldorf study and look at the graph at the top of page four, you see the 0-20% Bush precincts the discrepancy is about zero. It seems to increase with the % of Bush votes, leveling off at approximately the 50% level. Doesn't the graph, assuming it is correct, show that Bush districts had a greater shift to Bush (compared with the exits) that the Kerry districts? How else would you explain the 0% difference in the 0-20% Bush region?

Sorry to bother you again - I'm just trying to understand this. Thanks again.

skeptico:

If you're seeing a bias in exit polls that tends to over-sample democratic voters, you would expect it to disappear as the conservative voters approach zero. So that's not meaningful. Remember that for my example, I exaggerated the difference - the real data would supported by a democratic oversample of much less than 20%.

WRT the issue of the kinds of voting machines; I just don't know. The thing is, the different voting methods have a lot of differences, and they correlate with location in interesting ways. So actually isolating the voting machine impact from the number of people voting in a location, the economic profile of the location; the average wait to vote in a location; the political leanings of the location; and all of the other factors isn't easy, and I simply have the data.

Mark - Sorry but your answer does not address the issue I raised. Reluctant responder bias has a signature. In very partisan precincts the bias will have less effect. It will have maximum effect in competitive precincts. Skeptico just pointed this effect out above. Its pretty simple math.

Baiman even controlled for the best fit rBr bias and still found very statistically significant evidence of something else that looks very much like vote shift.

If the WPD curve is not U shaped and is skewed what is causing it? Vote shift is a very plausible explanation. Try to come up with a better explanation and I'll listen.

I've listened very carefully to Liddle's explanation but it doesn't work for me. I've read all of Blumanthal,s (AKA Mystery Pollster) stuff and it is not convincing. You're surely a smart guy, but you've got to look much deeper before you'll sound very smart to the people that have been studying this since Nov 04. I'm sorry to say, you're just scratching the surface. Keep digging!

Mark - Just another quick note. Most people that refuse to believe vote shifting could occur believe so on the basis that it would require something impossible - a massive conspiracy. Well - here's some proof that it can happen

http://www.zwire.com/site/news.cfm?newsid=16751509&BRD=

The quick synopsis: machine vote count 79 for incumbent to 99 for challenger. Hand count 153 incumbent to 25 challenger. Maybe it was some kind of malfunction, but how hard is it to intentionally create such a malfunction. The motive, means, opportunity and evidence all exist. Everyone should at least be suspicious.

to Skeptico, re the Mitteldorf paper.

There is a confound between the size of place served by a precinct and the method used. Only precincts serving places less than 50,000 or suburban and rural districts used paper, and those were very much in the minority.

In those smaller places, there was no significant difference between the discrepancy in precincts using any of the five different voting methods: levers, punchcards, DREs, optical scanners, paper, and indeed in such places the discrepancy was less anyway.

Of precincts serving places with populations greater than 50,000, none used paper ballots, but the discrepancy between precincts using older technologies (levers, punchcards) was significantly greater than in precincts using DREs or optical scanners. This may suggest higher residual vote rates for Democratic votes on these older technologies, such as those on punchcards that cost Gore the presidency. On the other hand there may be some other collinear factor.

When the Mitteldorf paper was written, this was not clear, although the E-M report notes the confound by size of place. I myself was involved in further analysis of the data for Mitofsky, and was able to clarify this point at the recent AAPOR meeting in Montreal.

Dave:

Sorry, but no - responder bias does not always have a single specific signature. There are a variety of kinds of responder bias, which can produce different signatures. The fact that one specific form of bias isn't clearly reflected in the data doesn't mean that the data shows that there is no bias. Go to the link that I put in the message to skeptico - there's a series of analyses there of the data - including a range of scatter-plots showing the bias in the data - and what you find is that the data has all of the randomness and uniformity properties that you would expect if it were a matter of responder bias, but which would be remarkably difficult to produce deliberately.

WRT vote shifting: I don't believe that it's impossible. What I believe is that it doesn't match the data that we have from the election. As the analysis that I linked to shows - there are a variety of different ways of looking at the properties of the actual vote compared to the exit polls; and they all show the correct structure and distribution. It *does* require a fairly elaborate conspiracy to do large-scale vote shifting without perturbing the structure and distribution of data statewide.

As I've said before: I *do* think that the election was stolen. But dragging in bad arguments that simply *do not match* the data does nothing but discredit the entire argument that there was a deliberate effort by election officials to steal the election in more straightforward ways. In my opinion, the data just doesn't support any vote-shifting conspiracy - it *does* reflect a deliberate effort by election officials to make sure that democratic voters were disenfranchised; that democratic districts were deliberately provided with less reliable machines; that democratic districts were deliberately underequipped so that democratic voters would be unable to vote; and that problems on election day in democratic districts were deliberately ignored in order to supress the democratic vote.

They didn't *need* to do any more than that. And the data doesn't support the argument that they did any more than that. But if that's how they stole the election, then conspiracy mongering about vote-shifts is just a distraction - and if they *do* try to fix the next election using vote shifting, it's going to be a heck of a lot harder to convince anyone of that once those arguments regarding this election are widely discredited.

Mark - uniform bias produces a unique signature that does not fit the data. We should be able to agree on that, it's pretty elementary. We should also be able to agree that bias is necessary to reconcile the poll data with the vote data. So, what are the necessary characteristics of such a bias?

A very specific and unusual (IMO) non-uniform bias can produce a signature that fits the data. I've gone through the exercise of forcing the poll data to fit the vote data by introducing a non-uniform bias. In my opinion you'd have to be a conspiracy theorists to believe society behaves in manner consistent with this type of bias.

I don't see any basis to believe that non-uniform bias exists except for faith that it must be so since the votes can't be shifted. I prefer to be more scientific than that.

Dave:

No. A *specific model* of uniform data produces the signature you're demanding. Data in the real world is rarely so simple.

If you assume that there is *exactly one* source of selection bias; and you assume that that *one* source of bias is completely uniform accross a non-uniform population, then you would expect to see a clear signature like you're demanding.

But if we accept the fact that we're dealing with a non-uniform population, and that there are multiple sources of selection bias, then you would not expect to see a single simple bias signature. What kind of signature would you expect to see if there were, say, five kinds of selection bias that tended to reduce the republican participation in exit polls, and two kinds of selection bias that tended to reduce democratic bias, and another two kinds of selection bias that tended to reduce split-ticket voter participation? (Note that I'm not saying that that's the exact scenario in Ohio - just giving an example of multiple non-independent biases.)

Real world statistics are rarely as clean as classroom examples. Simple bias signatures are not likely to appear in real data about millions of people gathered by thousands of pollsters in hundreds of locations. You need to do deeper analyses to recognize bias in complex real-world situations. And it's rarely as clear-cut as we might like it to be. But when you do the math - you wind up seeing that the complex properties of that data *are* what you would expect them to be; and that getting that kind of data distribution if there were significant vote shifting would be extremely difficult.

I just got email from a statistician who's studied this - one of the people mentioned in the article I linked to in my response to skeptico. She's got a good analysis of the data at http://inside.bard.edu/~lindeman/ASApaper_060409.pdf . Why don't you look at that, and see if you can respond to her analysis, instead of just repeating the "but it doesn't have a simple uniform bias signature" mantra? (I edited this comment to correct an error in the link; clicking it should work now -MarkCC)

Mark:

Re: "If you're seeing a bias in exit polls that tends to over-sample democratic voters, you would expect it to disappear as the conservative voters approach zero."

Shouldn't it also approach zero as Kerry voters approach zero? Surely it should be approaching zero at both ends of the curve? But the graph shows it stays roughly the same after the 50% mark.

skeptico:

No, you wouldn't expect the pro-democratic bias to approach zero as the number of kerry voters approach zero. Because the real situation is that the number democratic votes essentially never drop below 20% or so; and because one of the major factors in the exit-poll skew is that *republican* voters are less likely to participate in exit polls. Even when the democratic voting population approaches very low values in the 15 to 20% range - the over-sampling of those people produces an upward skew; and as long as the democratic population remains large enough to influence the exit polls at all, the over-representation of democrats will have a skew factor because when the actual percentage of democratic votes gets smaller, it takes *less people* to create an upward skew in the percentage of that group of voters.

The bias only starts to decrease *when the undersampled population starts to decrease*. Not when the oversampled population decreases (until the oversampled population becomes too small to measure accurately by sample).

Also, describing the bias in terms of things like "as X-type voters approach zero" is actually just not a realistic description of the population being sampled. It isn't one single factor producing a tendency to skew; it's multiple overlapping factors; and none of the three main populations in the vote (republican voters, split ticket voters, democratic voters) ever really get particularly close to zero.

On U shapes:

Ron is correct that uniform participation bias (or indeed uniform fraud of a certain type) would produce a U shape (actually, a slightly assymmetric U shape, a point which has some relevance - more later). However, because both uniform fraud and uniform response bias would produce the same shape, it wouldn't distinguish the two anyway, even if there were no other error variance in the data (which there must be - sampling error for a start) to muck up the U.

However, most things in life, whether fraud or participation bias are not uniform; they have a distribution, often a symmetrical distribution about a mean. If we look at the result of a normal distribution of something that causes bias (whether in count or poll), equally distributed regardless of the vote-share in the precinct, what we see is a family of assymmetric U shapes. If these are plotted as discrete datapoints (they can also be plotted as a surface) then we see an ovoid distribution that is slightly tilted. The variance is higher in the centre of the distribution (precincts where support for each candidate is even) and goes to zero, as Ron says, at the extremes. And because the plot tends to be tilted in the presence of non-sampling error, a regression line run through the plot will tend to have a non-zero value.

And when we turn to look at the data Ron cites from Ohio, that is exactly what we find. In the most extreme precinct (96% Kerry) the WPE is indeed zero. There are no such extreme data points at the Bush end (max 78%), and so we do not see the drop-off to zero at that end. However, what we do see is a slight tilt to the regression line, again, as predicted under the null hypothesis of equally distributed bias. We cannot therefore, without looking more closely, infer from that slope that there is something fishy about one end of the plot. A slope would be present if the distribution of bias was orthogonal with respect to vote-share.

What we need to do therefore, is to compute the coefficient of the slope. It is in fact significantly greater than zero (as we expect under the null) but only at p=.05, i.e. on the borderline of the criteria for significance usual in the social sciences. And because the expected slope under the null is in the observed direction, the observed slope cannot be significantly greater than the expected slope, and we must retain the null.

Although a more marked slope might well be a signature of vote-shifting (because it might reflect a Bushward movement of the fraudulent precincts coupled with negative movement of their WPE), the fact that it is not present (and it is not) does not rule out fraud. Ron's paper points out that if fraud was not uniform, but proportional to Kerry's vote share (more Kerry votes flipped to Bush where there were more Kerry votes to flip), then this might cancel out the slope that Ron thinks is present, but which I argue is not.

One test of this hypothesis would be to regress WPE (or some other measure of discrepancy - I recommend the one Lindeman, Brady and I describe in the paper Markcc links to above) on some variable that might represent an uncorrupted vote count. Unfortunately, Ron and his co-author do this using the exit poll share itself, and because they find the predicted negative correlation between WPE and Kerry's exit poll share (more negative WPE where Kerry's "uncorrupted" vote share is greater) they infer fraud. The trouble with this inference is that of course all error variance in WPE (sampling error; any non-sampling error) will also be reflected in exit poll share (same sources of variance on both sides of the equation). So any correlation cannot be a unique signature of fraud, and, again, could as easily be attributable to error in the poll, including sampling error. Indeed, as all polls have sampling error, it would be surprising not to find some degree of correlation between WPD and exit poll share for one candidate.

In short: there is no reason to suppose that a U shape, if found, would mean fraud rather than polling bias; a distribution of fraud/polling bias would not produce a U but a distribution with a positive slope, as observed; extreme precincts would tend to have near zero WPEs, as indeed the one extreme precinct in this dataset does; and in any case, a quite plausible mechanism of fraud would tend to cancel out the pattern Ron adduces as evidence of of it.

Mark, earlier you said:

If you're seeing a bias in exit polls that tends to over-sample democratic voters, you would expect it to disappear as the conservative voters approach zero.

And you just said:

Re: you wouldn't expect the pro-democratic bias to approach zero as the number of kerry voters approach zero. Because the real situation is that the number democratic votes essentially never drop below 20% or so

Just so I'm sure I understand you - are you saying the Bush voters did approach zero in some areas, but the Kerry voters never dropped below 20%? Is that a fact? That would explain the graph - is that what you are saying? Thanks.

Skeptico:

Yes: as Elizabeth mentioned, the data shows that in the most extreme democractic precincts, the Kerry vote was greater than 95%; whereas in the most extreme republican precincts, the Bush vote was less than 80%.

Just for information: someone tried to post a comment with absolutely no identification of the poster. I did not approve the comment in comment moderation. (Anything posted without typekey authentication comes to me for moderation.)

In moderation, I will never reject posts for content unless they are very seriously personally abusive (towards me, or towards other commenters). But I think it's reasonable to ask you to attach *some* kind of identification to your posts, to make it possible for other commenters to respond to you by name.

You're welcome to use pseudonyms for posting if you're uncomfortable using your real name. But any posts with *no* identification whatsoever, I will delete.

Clarification: the data I quoted were from Ohio, and were "blurred" data used in the analysis by Election Science Institute (ESI), and referred to by Ron.

A plot showing WPE against Bush's vote share for the nationwide dataset analysed in the WPE section of the Mitofsky report (1250 precincts) is here:

http://inside.bard.edu/~lindeman/aapor1b.jpg

It shows that there are more extreme Kerry precincts in the sample than extreme Bush precincts, and also shows nicely the ovoid distribution predicted by the null hypothesis of bias distributed orthogonally with respect to vote share, with maximum variance in the centre, going to zero at the extremes, and with a slight overall tilt (note that WPE is plotted with negative upwards).

However as "bias" could arise from either fraud or vote-shifting, it doesn't get us a whole lot further.

Sorry Mark, the unidentified poster was me. I forgot to put my name and email adress in before I hit post.

Dave:

The moderation software doesn't allow me to get back posts that I've rejected in moderation. If you resubmit the post, I'll approve it.

Skeptico, I think MarkCC isn't quite right about the partisanship distro, as displayed in the plot that Elizabeth recently linked to. As I read his statement, it would mean that there were _no_ precincts with Bush vote share > 80%, in which case we wouldn't have a WPE figure for that group at all.

What the plot shows is that there are few precincts above, say, 85% Bush vote share, whereas there are many more precincts more than 85% Kerry. Also note that (e.g.) for true Bush vote proportion 0.2, the maximum possible red shift (overstatement of Kerry vote share) is 40 points -- so observations at upper left and lower right are arithmetically impossible.

I would not want to try to diagnose Vote Shift by examining the mean WPEs (or mean anything-elses) of those precincts with Bush vote share < 0.2 on the one hand, and those > 0.8 on the other. If vote shift is what we are looking for, a comparison with past returns should get us further -- see http://inside.bard.edu/~lindeman/slides.html (basically my report and interpretation of work by Mitofsky and others). I have additional work filling some of the gaps in that writeup and responding to criticisms.

The confusion was due to me: Markcc picked up my reference to the Ohio distro, where the most extreme precinct is Kerry 96% (and has a WPE of zero) and the most extreme Bush precinct is only 78% for Bush

In the full 1250 precinct, the range is wider, although, as Mark Lindeman points out, still with a heavier tail at the Kerry end.

Folks:

I just discovered that the way that our software here at SB handles comments, if it thinks that a comment is spam, it doesn't show up in the normal place in my moderation queue. As a result, there were a couple of comments in this thread which were hung up in the queue. I just discovered the junk list, and published those comments. Sorry to the commenters. You can avoid any risk of getting modded out by registering with typekey; typekey verified posts get published without moderation by me. Also, now that I know about the spam queue, I'll check it regularly to try to publish mis-categorized posts.

Tahoma:

Take a look at the other comments in this thread. The fact is, an honest and careful look at the data does not show a clear case for fraud by vote-shifting.

And there's no reason to create elaborate schemes to explain how the election was stolen. It's very hard to manipulate data in a way that creates a credible data distribution with the uniformity properties of the actual Ohio vote data. It's *easy* to use the power of the state official in charge of elections to selectively surpress votes. Blackwell's highly irregular use of his power running the elections is a matter of undisputed public record. They didn't need to do more than that. And they don't need to do more than that in any of the coming elections, either. Just look at what Blackwell has been up to in Ohio in the last two weeks!

If we waste time and energy focusing on easily discredited arguments about how the election was stolen, the only thing that gets accomplished is to make people believe that all of the arguments about what really happened are equally easily discredited. People like RFK that naively accept any argument that supports the case they want to make do far more damage than good.

The True Vote Model

http://www.geocities.com/electionmodel/TruthIsAllFAQResponse.htm#TrueVo…

The True Vote Model encapsulates the mathematical arguments which strongly suggest that Kerry easily won the 2004 election.The model uses 2000/2004 election data, 2000 voter mortality and 2000 voter turnout in 2004 in order to determine mathematically feasible (and plausible) weights. The 12:22am NEP vote shares are the base case assumptions which can be overridden by the user. According to the Census, 125.74mm votes were cast (including 3.44mm uncounted votes). A powerful sensitivity analysis feature enables the user to view the effects of incremental changes in the assumptions on Kerry's national vote. Many scenario combinations are displayed in various tables.

The Base Case (most likely) True Vote:

Kerry 66.10mm (52.57%)
Bush 58.38mm (46.43%)
Other 1.27mm (1.01%)

Users of the True Vote model are challenged to find one plausible Bush win scenario.

Election Fraud? Or just bad math?

More like this

RFK Jr. is at it again, just not about autism this time

Political polling

Margin of Error and Election Polls

Basics: Margin of Error

Moving on

Goodbye, Scienceblogs

Seed, Conflicts of Interest, and Sleaze

Searching for Topics

Saturday Recipe: Ginger Scallion Sauce

Ask Ethan: Do 234 Sun-Like Stars Show Evidence For Aliens? (Synopsis)

The Stingray Nebula and XKCD

Ask Ethan #24: Cheating Time and Space