Schools of thought in Probability Theory

By goodmath on April 7, 2008.

To understand a lot of statistical ideas, you need to know about
probability. The two fields are inextricably entwined: sampled statistics
works because of probabilistic properties of populations.

I approach writing about probability with no small amount of trepidation.

For some reason that I've never quite understood, discussions of probability
theory bring out an intensity of emotion that is more extreme than anything else
I've seen in mathematics. It's an almost religious topic, like programming
languages in CS. This post is intended really as a flame attractor: that is, I'd request that if you want to argue about Bayesian probability versus frequentist probability, please do it here, and don't clutter up every comment thread that
discusses probability!

There are two main schools of thought in probability:
frequentism and Bayesianism, and the Bayesians have an intense contempt for the
frequentists. As I said, I really don't get it: the intensity seems to be mostly
one way - I can't count the number of times that I've read Bayesian screeds about
the intense stupidity of frequentists, but not the other direction. And while I
sit out the dispute - I'm undecided; sometimes I lean frequentist, and sometimes I
lean Bayesian - every time I write about probability, I get emails and comments
from tons of Bayesians tearing me to ribbons for not being sufficiently
Bayesian.

It's hard to even define probability without getting into trouble, because the
two schools of thought end up defining it quite differently.

The frequentist approach to probability basically defines probability in terms
of experiment. If you repeated an experiment an infinite number of times, and
you'd find that out of every 1,000 trials, a given outcome occured 350 times, then
a frequentist would say that the probability of that outcome was 35%. Based on
that, a frequentist says that for a given event, there is a true
probability associated with it: the probability that you'd get from repeated
trials. The frequentist approach is thus based on studying the "real" probability
of things - trying to determine how close a given measurement from a set of
experiments is to the real probability. So a frequentist would define probability
as the mathematics of predicting the actual likelihood of certain events occuring
based on observed patterns.

The bayesian approach is based on incomplete knowledge. It says that you only
associate a probability with an event because there is uncertainty about it -
because you don't know all the facts. In reality, a given event either will happen
(probability=100%) or it won't happen (probability=0%). Anything else is an
approximation based on your incomplete knowledge. The Bayesian approach is
therefore based on the idea of refining predictions in the face of new knowledge.
A Bayesian would define probability as a mathematical system of measuring the
completeness of knowledge used to make predictions. So to a Bayesian, strictly speaking, it's incorrect to say "I predict that there's a 30% chance of P", but rather "Based on the current state of my knowledge, I am 30% certain that P will occur."

Like I said, I tend to sit in the middle. On the one hand, I think that the
Bayesian approach makes some things clearer. For example, a lot of people
frequently misunderstand how to apply statistics: they'll take a study showing
that, say, 10 out of 100 smokers will develop cancer, and assume that it means
that for a specific smoker, there's a 10% chance that they'll develop cancer.
That's not true. The study showing that 10 out of 100 people who smoke will develop cancer can be taken as a good starting point for making a prediction - but a Bayesian will be very clear on the fact that it's incomplete knowledge, and that it therefore isn't very meaningful unless you can add more information to increase the certainty.

On the other hand, Bayesian reasoning is often used by cranks.
A Bayesian
argues that you can do a probabilistic analysis of almost anything, by lining
up the set of factors that influence it, and combining your knowledge of those factors in the correct way. That's been used incredibly frequently by cranks for
arguing for the existence of God, for the "fact" that aliens have visited the
earth, for the "fact" that artists have been planting secret messages in
paintings, for the "fact" that there are magic codes embedded in various holy texts, etc. I've dealt with these sorts of arguments numerous times on this blog; the link above is a typical example.

Frequentism doesn't fall victim to that problem; a frequentist only
believes probabilities make sense in the setting of a repeatable experiment. You
can't properly formulate something like a probabilistic proof of God under the
frequentist approach, because the existence of a creator of the universe isn't a
problem amenable to repeated experimental trials. But frequentism suffers
from the idea that there is an absolute probability for things - which is often ridiculous.

I'd argue that they're both right, and both wrong, each in their own settings. There are definitely settings in which the idea of a fixed probability based on a model of repeatable, controlled experiment is, quite simply, silly. And there
are settings in which the idea of a probability only measuring a state of knowledge is equally silly.

More like this

These are interpretations of statistics. Probability is a mathematical theory about measures of total measure 1 on sigma-algebras. The name of the reverend Bayes only enters on Bayes's theorem. There is a great deal of use of probability entirely separate from statistics. It provides a convenient language for existence proofs in algebraic combinatorics. It gives the underlying structure for stochastic processes, which can be given their interpretation as tools for working with PDEs, again independent of statistics.

Probably the real reason that frequentists don't write screeds is that the ones who are really serious about being frequentist consider that Fisher already wrote sufficient attacks on Bayesian statistics, and the rest of us are completely willing to use Bayesian techniques anywhere we can give ourselves prior information. They're far, far easier.

"I'd argue that they're both right, and both wrong, each in their own settings. There are definitely settings in which the idea of a fixed probability based on a model of repeatable, controlled experiment is, quite simply, silly. And there are settings in which the idea of a probability only measuring a state of knowledge is equally silly."

What's a setting in which the idea of a fixed probability based on a model of repeatable, controlled experiments is silly, but where the idea of a probability only measuring a state of knowledge is not silly?

What's a setting where the idea of a probability only measuring a state of knowledge is silly, but in which the idea of a fixed probability based on a model of repeatable, controlled experiments is not silly?

I'm not quite sure I understand the nature of the Bayesian arguments about the "intense stupidity of frequentists", unless there are frequentists that insist that everything is experimentally measurable, which obviously isn't true.

Can you elaborate on the nature of the vitriol, or point me to some examples?

The Bayesian position, as I understand it, is very simple: the problem of reasoning under incomplete information is very important. There's one set of rules that lets you do this consistently, that being probability theory. Therefore I will gladly use probability theory to do so.

Sure, cranks can misuse probability theory. They, pretty much by definition can misuse anything. I think in most cases the validity of their use can be tested by thinking about whether there are simple experiments that can be done to refine their probabilities that they don't think of. (Or that have already been done, and that they ignore.)

Frequentism, strictly speaking, lets you ask a lot of seemingly useless questions, such as "what percent of samples will be within 10% of the true value of this parameter", but not "what is the chance that the true value of this parameter will be within 10% of the mean of the samples I've already collected", because, strictly speaking the parameter isn't a random variable.

Actually, the two systems are not mutually exclusive, unless you wish to be dogmatic. Bayesian statistics analyzes probability distributions of current unobservable parameters, and frequentist statistics analyzes the probability distributions of future or hypothetical data (or of functions of future or hypothetical data).

In fact, in some situations the two methods coexist quite nicely. For example, in a few adaptive clinical trials Bayesian statistics is used to determine when enough evidence has been accumulated to make a determination, and frequentist statistics is used to make the determination.

Add to this the fact that for most common models, Bayesian and frequentist models give the same answer. This is true when a flat prior is used, and the frequentist method is based on maximum likelihood.

I would say that among mainstream statisticians the controversy is essentially over. I'm pretty skeptical of attempts to refute an argument on the basis of what kind of statistics they use without a detailed discussion of why a particular model is inappropriate. You know, the kinds of things often discussed on this blog.

> Probably the real reason that frequentists don't write screeds is that the ones who are really serious about being frequentist consider that Fisher already wrote sufficient attacks on Bayesian statistics, and the rest of us are completely willing to use Bayesian techniques anywhere we can give ourselves prior information. They're far, far easier.

Spoken like a true pure mathematician! :) Bayesian methods are certainly far more elegant theoretically, but generally far, far more difficult to apply in practice, due to computational issues and the necessity of identifying an appropriate prior (a fundamentally non-mathematical problem, and often intractable). Also, scientists find (or think they find) p-values easier to understand than posterior distributions. This is why the medical literature (for instance) is full of frequentism.

As for the perceived one-way nature of the Bayesian / Frequentist dispute, I think you're right to bring Fisher into the discussion, because I think this is a generational issue. In fact, when Bayesians were a tiny minority in academic departments (the 50s - 70s), it was the die-hard Frequentists who more regularly spewed vitriol at their subjectivist colleagues.

But now almost everyone will use Bayesian methods in at least some circumstances, and so there's only a tiny minority of true Frequentists who get the brunt of abuse from the majority Bayesians. And there's a reason for the change in dominance (and for the abuse heaped on Frequentist true believers): Frequentism is philosophical garbage, and has been fairly thoroughly discredited. Even Mark's justification for not being a complete Bayesian is really just an appeal to emotion, not a scientific or philosophical case. ("UFOlogists use Bayesian methods, so I'm part Frequentist"? Pretty weak.)

Bayesians see the few-and-far-between Frequentists (and I mean True capital-"F" Frequentists, who accept no Bayesian interpretations of probability) essentially as a modern chemist would view someone using phlogiston to balance their chemical equations. What do you do with someone like that besides ridicule them?

None of which is to say that frequentist methods are totally without use. The principles are philosophically shaky, but the methods are often quite reasonable in a common-sense kind of way. Indeed, frequentist methods can very often be derived as special cases of Bayesian methods.

My stats professor is intensely frequentist. He bashes Bayesians all the time, mostly in terms of hypothesis testing.

I tend to lean Bayesian myself - God does not play dice. =P

almost seems like what little I know of the dispute in quantum mechanics between shrodinger wave equation and Dirac's matrix fomulation. or something like that...

I too use statistics and probability alot in my work and I think there is a place for both systems depending on the particular situation. A very good example illustrating the value of Bayesian analysis was in a spoof paper analyzing the effect of ex post facto prayer on length of hospital stays. The paper clearly showed that patients that were prayed for after they left the hospital had statitically significantly shorter stays in the hospital than those who were not prayed for. The purpose of the paper was to illustrate the importance of choosing a reasonable alternative hypothesis for your experiment. If you use Bayesian statistics to analyze the data you would have to assign an a priori probability that prayer works a very small value. This will reduce the statistical significance of the result tremendously; to the point of insignificance.

Back in my undergraduate days, I remember my probability professor emphasizing the difference between probability and statistics and to be very careful about interpreting statistics as probabilities. Like your smoking example, my "favorite" is divorce statistics. That 50% of marriages end in divorce is not the same as telling a newlywed couple that they have a 50% chance of divorcing.

I think the issue with Frequentists by Bayesians is that simply counting up statistics and extending that to a prediction *tends* to lead people to have the mind-set that past==future, which for hard science phenomena, that also have nicely behaved Gaussian distributions, is reasonable, but that there are enough cases, even in the hard sciences, where such an extrapolation is a dangerous default assumption.

An extreme example (which will also trip-up a naive use of Bayesian probabilities also) is the Thanksgiving turkey who has 364 days of a good life and, if he uses retrospective statistics per a Frequentist methodology, will get the probabilities seriously wrong about how good life will be on day 365 (Thanksgiving).

Starting from a presumption of uncertainty seems like a more likely way to catch such cases than starting from the presumption that past events indicate future probabilities. Strictly the turkey case *should* be answered as "the turkey doesn't have enough knowledge/information to even be making a prediction for day 365" because the turkey would need to somehow know what Thanksgiving is and what the role of turkeys are in Thanksgiving *or* he would need additional empirical information about previous turkey's life statistics. All of this sounds more Bayesian than Frequentist.

I guess underlying this is the tacit assumption of Gaussian statistics that is often assumed in the thinking of people who *apply* the Frequentist methods. Presuming a Gaussian distribution *isn't a necessary condition of being a Frequentist* but its prevalance may contribute to some of the Bayesian's ire. Like the author I'm a bit more middle of the road.

Even more I see any "extremist" position as being more inherently prone to being wrong as such tend to oversimplify explanatory models and exclude new information when their model is wrong.

If you emply "diffuse" or "non-informative" prior distributions, then Bayesian statistics gives identical results to frequentist statistics.

Its also possible to be an "Empirical Bayesian" i.e. you can use Bayesian statistics for appropriate types of problems wihout acquiring the messianic fervour of some.

But Mark's experience has also being mine... many Bayesians are dark, bitter and twisted, indulging their victimhood at every opportunity.

However, Bayesianismm was hampered for many years by the fact that incorporating the prior knowledge often led to intractable integrals that could only be solved thought numerical approximations, or through the use of the limited numbers of prior distribution types that led to tractable results.

In the last ten to fifteen years, use of Markov Chain Monte Carlo algorithms has changed this appreciably. I would say that the use of Bayesian statistics is growing. Maybe that's also why Bayesians are getting to be more at ease with the statistics world:))

BradC:

Just look at the unfolding comment threads. For instance,
the comment by "js", who asserts that "frequentism is
philosophical garbage", and takes my attempt at a reasoned
explanation of both schools, and turns it into "but bad people use bayesian".

In fact, my criticism of Bayesianism - as I said in the original post - is that not every problem is naturally described in terms of incomplete states of knowledge. To
give a trivial example, I think that the example I used in my original stats post - semiconductor manufacturing - is a good example. The basic idea of statistical analysis of
production runs of semiconductors is that the manufacturing process is strictly controlled, and so things like the failure properties will be consistent from run to run. And the actual observations fit that: the failure rates and failure conditions are nearly identical between different
runs. You know, within the margin of error, what percentage of chips produced by a run will work correctly.

You can restate that in Bayesian terms - but I find the Bayesian statement of it has quite a bit less intuitive
value for understanding they key points. (I won't restate
it in Bayesian terms myself, because I've learned that if I don't find the most perfect possible statement, I'll be flamed from here to the moon and back for slandering the clear and obvious perfection of the Bayesian statement. Bayesians, feel free to produce your own version of the statement to prove me wrong.)

On the other hand, Bayesian reasoning is often used by cranks.

I have to say, that's the worse argument against Bayesianism I've seen. Cranks use maths as well, so are you going to be consistent and eschew algebra?

Actually, the two systems are not mutually exclusive, unless you wish to be dogmatic. Bayesian statistics analyzes probability distributions of current unobservable parameters, and frequentist statistics analyzes the probability distributions of future or hypothetical data (or of functions of future or hypothetical data).

Once you actually try doing stats, you'll find that they are mutually exclusive. Frequentists condition on their (unknown) parameters, because they are constants and only the data vary. Bayesians condition on the data (because that is what is known), and let the parameters vary.

If you emply "diffuse" or "non-informative" prior distributions, then Bayesian statistics gives identical results to frequentist statistics.

Only for flat models, and for suitable flat distributions. One you add in hierarchical structures, you can get different results because of the way random effects are treated. It's at this point that the frequentist approach becomes an absolute mess: you marginalise over some parameters, but condition on estimates of others, and you're not allowed to estimate random effect parameters, you have to predict them.

I think things have calmed down a lot in statistics now. Bayesian methods have shown themselves to be useful, so they tend to be accepted as part of the toolbox. As with a lot of disputes, I think the dogmatists are dying out.

I don't know enough about probability or statistics to enter the flame war between interpretations, but I do have a suggestion about the vitriol: everyone needs a hobbyhorse.

My impression is that for the majority of everyday work, Bayesian and Frequentist interpretations get essentially the same results and even go through most of the same steps and calculations. So you're free to do the "boring stuff" (read: useful) with whichever interpretation you feel like. It's only the interesting unusual cases, or the philosophy that differs, which is probably more entertaining to disagree about than to actually do.

Clearly I've been missing something. I've been under the impression that the very term probability is defined in 'Frequentist terms' as the limit of the relative frequency as the number of experiments approaches infinity. Certainly that is a perfectly reasonable definition. The 'Bayesian' definition of probability is obviously quite different. Is this really a quibble over semantics or is there something that I'm not getting? I really don't see any fundamental incompatibility with the ideas themselves, even though I tend to prefer the 'Frequentist' definition.

Isn't the problem just semantics? Surely frequentists don't think a coin flip is fundamentally 50/50, in the same way that a quantum measurement might be. If the exact parameters of coin position, flicking force, finger position, air density, etc. were known, you could in theory predict better than 50/50. The frequentist position, it seems to me, says that repeated trials *with the current level of uncertainty* would achieve the same result x% of the time, not that repeated trials with perfectly identical initial conditions would return the same result x% of the time.

Am I mistaken?

Just look at the unfolding comment threads. For instance,
the comment by "js", who asserts that "frequentism is
philosophical garbage", and takes my attempt at a reasoned
explanation of both schools, and turns it into "but bad people use bayesian".

Mark, this is what you provided as the (quantitative) bulk of your argument for Frequentism:

On the other hand, Bayesian reasoning is often used by cranks. A Bayesian argues that you can do a probabilistic analysis of almost anything, by lining up the set of factors that influence it, and combining your knowledge of those factors in the correct way. That's been used incredibly frequently by cranks for arguing for the existence of God, for the "fact" that aliens have visited the earth, for the "fact" that artists have been planting secret messages in paintings, for the "fact" that there are magic codes embedded in various holy texts, etc. I've dealt with these sorts of arguments numerous times on this blog; the link above is a typical example.

Now who's playing the victim? :P

Isn't the problem just semantics? Surely frequentists don't think a coin flip is fundamentally 50/50, in the same way that a quantum measurement might be. If the exact parameters of coin position, flicking force, finger position, air density, etc. were known, you could in theory predict better than 50/50. The frequentist position, it seems to me, says that repeated trials *with the current level of uncertainty* would achieve the same result x% of the time, not that repeated trials with perfectly identical initial conditions would return the same result x% of the time.

Am I mistaken?

You're only mistaken if you think that Frequentism provides a satisfactory answer to this question. This is exactly one of several reasons why I referred to Frequentism as philosophical garbage (yes, I'll defend that claim). No experiment is truly replicable, much less infinitely replicable, so how does a Frequentist decide when a trial constitutes a replication?

If you press on this point, the answer usually comes down to something version of "gut feelings." The terrible irony is that die-hard Frequentists (and let me again stress that I'm really only talking about a very few people left in academia) condemn Bayesians for subjectivity!

Really, why define probability in terms of something that doesn't actually exist (identical replications) in the first place? Mark's point is correct, that it's sometimes more and sometimes less natural to talk about probabilities as beliefs, but at least it always makes sense. (Even if you want to have a Frequentist interpretation of semiconductor assembly, you can always couch the interpretation as describing how the replications affect your degree of belief.) The Frequentist interpretation doesn't make sense in any real-world circumstances, if you dig closely enough.

js:

Okay, you've got a point. Looking back, I did do a lousy job in the post. What I was trying to say is that the Bayesian approach often gives people an artificial sense that they can produce a probability estimate of anything by assembling a set of priors. After all - it's all about
measurement of uncertainty - so you can always line up sets of factors that express certainty and uncertainty, and combine them.

The problem there is that that's often an extremely deceptive process, where you can arrange the priors to produce whatever result you want. This results in some very strange Bayesian "proofs" - not just the flakes and crackpots; they're the most extreme example of this phenomenon, but I've also seen it used in all sorts of
incorrect ways - ranging from the insane crackpottery to
things as mundane as questionable interpretations of genetic tests or x-rays. (For a trivial mundane example, my wife is chinese. But our ob/gyn insisted on doing a tay-sachs blood test on her, despite the fact that the test doesn't really work on people of asian descent; it looks for markers that are specific to europeans. The testing company insisted on having me tested, because based on their mathematical analysis, there was a risk. Basically, since the test result for her was "Unknown", they treated that as the default prior: 50/50, despite the fact that tay-sachs is unknown in Han chinese.)

The frequentist approach in those cases would, basically, that you can't make a reasonable prediction in a non-reproducible setting - they wouldn't just assign a 50/50
default prior. (Not that a non-clueless Bayesian would do something this stupid - but the point is, the general perception that you should always be able to line up your priors to come up with a reasonable assessment of your state of knowledge can be quite problematic.)

The problem there is that that's often an extremely deceptive process, where you can arrange the priors to produce whatever result you want.

True. But the priors are specified, so you can explicitly see the assumptions that are made.

I was recently asked to look at a Bayes net model for predicting whether polar bears would decline. One criticism that had been made was that all of the priors came from one expert, and another expert might come to a different conclusion. This a valid criticism (and was acknowledged in the report), but the process of creating the model and the priors had the effect of making the assumptions explicit and open to criticism and modification. I think this is a good thing - if people mis-use the tools, the mis-use is explicit. Isn't this a good thing?

Any system can be abused - the Prosecutor's Fallacy is a classic abuse of frequentism (and one where the Bayesian White Knight charges over the hill to the rescue!). I wouldn't use that to argue that frquentists are wrong, only that the method can be abused.

Hell, p-values are a frequentist evil, but even then that's because people abuse the methodology rather than thinking through what it means.

Gosh, has Uncommon Descent, Egnor et al. soured your opinion of the value of information theory too? It might be worth doing some digging into why Bayesians and frequentists think the way they do -- you could make an excellent post on Cox's theorem for instance.

Philosophically, I find it interesting that when you read the falsificationists' reasons for rejecting probabilist epistemology as unworkable in the middle of the 20th century, you find that they are based on a very frequentist understanding of probability. Popper even went so far as to propose "propensity," i.e. an absolute probability, metaphysically attached to events, in order to make sense of probability theory. Properly you would call this 'propensity theory' as a third class apart from frequentism and bayesianism -- Fisher certainly did not postulate an absolute probability for events.

Bob O'H:

I wasn't arguing that Bayesianism is wrong - just that one of its potential weaknesses is a tendency to see everything as amenable to a Bayesian approach, which can lead to some bizzare results.

Mark, this is great. But I'm not sure I'd be able to tell the difference between the two approaches if I saw them. Could you maybe give a few examples, and show the math too? I'm rather curious how the two approaches would see, say, the Monty Hall problem.

Monty Hall problem.

I don't think Bayesian applies as it is a contrived problem with known conditions and behaviors (as normally stated).

However, if instead of stating the entire problem, you are only given the statistics of the results of players keeping their original choice or switching for just a few players, with no other knowledge of Monty's strategy. Then Bayes would be appropriate to include the question, "what is the likelyhood that Monty opens a random door versus always showing a goat?" As you collect more plays, you would start moving the probability of "always a goat" towards 1. Whereas the normal problem statement tells you that Monty always reveals a goat before offering you the choice to switch.

Although Bayesian statistics has its uses, the user (and the journal article reader) must take care to ensure that the hypothesis is reasonable.

In the 1950s the Journal of Irreproducible Results published an article by Henry R. Lewis on the Data Enrichment Method, which is an example of misapplied Bayesian statistics. The argument is that if you are doing an experiment in which the independent and dependent variables are known a priori to be correlated--for example, radiation dose and semiconductor failure, although the article uses a different example--you can create what appear to be additional cases by counting, e.g., the chips that failed at a given dose to have failed at all higher doses, and the chips that survived a given dose to have survived all lower doses. The article gives two examples (I don't remember what the first one was; my copy of the article is at home and the article in question isn't turning up in the first page of my Google searches). The first one was something which sounded perfectly reasonable, like Mark's failure rate versus radiation dose example. The second investigated the hypothesis that the probability of obtaining "heads" in a coin flip is positively correlated with altitude, and the article uses the method to "prove" this hypothesis. The logical flaw, of course, is that using the same technique, the data can be shown to support the opposite hypothesis equally well.

Again, I'm not saying that Bayesian statistics are useless, just that you need to have a good BS detector when you encounter an argument from Bayesian statistics. Which I think is part of Mark's point: the reason cranks tend to use Bayesian arguments is that it is much easier to construct a plausible-sounding-but-wrong Bayesian argument than a plausible-sounding-but-wrong frequentist argument. As long as the assumptions are reasonable (and this is easier to check with frequentist arguments), a plausible sounding frequentist argument is overwhelmingly likely to be right.

I'm confused about this part of #8:

"That 50% of marriages end in divorce is not the same as telling a newlywed couple that they have a 50% chance of divorcing."

If you had no further information, what odds would you give the newlywed couple?

An unrelated point:

The interpretation of the wavefunction in quantum mechanics is defined precisely in frequentist terms: |psi(x)|^2 is probability of measuring the particle at x in the sense of making many measurements on identical systems. It turns out that for a surprisingly large number of systems it's fairly easy to prepare identical systems. The frequentist definition is completely reasonable for QM.

js:

"Also, scientists find (or think they find) p-values easier to understand than posterior distributions."

Regarding your parenthesized qualifier, I think a lot of people interpret a p-value to be something actually akin to a Bayesian posterior. (e.g., they think p<0.05 means "There is less than a 5% chance the null hypothesis is true.") If they knew the actual definition, I wonder if they'd find it so easy to understand ...

E. Lund, I think it's quite the opposite. Frequencist techniques have priors but they're implicit and hidden it's easier to get them wrong or manipulate them whereas Bayesian priors are explicit and visible for anyone to see. I see manipulation through frequencist techniques _all the time_. Pharmaceutical companies run with p-values like it's the Olympic torch and often same goes with researchers who just want one more publication to put on their CV. For a lot of problems, flat priors are obvious in either school of thought but take, for example, the repeated experiment of measuring the size of a cube. You can't have the length, the area and the volume of the box all take a uniform prior because they differ by an exponent, only one of these scales can be uniform and you get different bounds for your confidence interval depending on which one you pick. Bayesians have found another prior for these types of problems, the Jeffreys prior, which reconciles the knowledge and gives the same estimate of probable dimensions whether you're using a measuring tape or a scale to find the size of your box. Sometimes the priors are hard to find. You have to look for assumptions of symmetry in the problem and use things like group theory. Frequencists use ridiculous tools like null hypothesis testing which are designed to reject nothing, literally nothing, nadda, "null". A null hypothesis test that rejects a value of 0 does not reject the next values on its side +-0.000001. You'd have to do an infinite number of null hypothesis tests for them to be useful. You can easily manipulate the results by simply using a big N in you sample. You will be able to reject any null hypothesis because of inevitable small biases in you experiments and measuring tools. I'm flabbergasted that people still trust p-values, when they are clearly mathematical nonsense. Some frequencists even go as far as saying your samples shouldn't be too big. Who decides how big of an N is too big? Also whoever thinks that less data results in better knowledge is full of it.

Ambitwitsor:

Regarding your parenthesized qualifier, I think a lot of people interpret a p-value to be something actually akin to a Bayesian posterior. (e.g., they think p

Looks like you got cut off, but yes, your point is exactly what I meant by my parenthetical remark. Many scientists (I would venture "most", but I don't have any actual surveys at hand - although I know I've seen one somewhere!) erroneously believe or act as though they believe that the p-value is the probability of the hypothesis given the data which, of course, it isn't.

Jacob Cohen's classic article, The Earth is Round (p < .05) engagingly discusses this and several other common misinterpretations of p-values.

As a practical matter, Bayesians are Bayesians only for the initial iteration, and once they start getting data they become frequentists. Bayes method, after all, is useful in the case where initially there is no data and one must make reasoned guesses. As the work proceeds, one gets the missing data.

> cut off

This blog software interprets any attempt to type something with the "less than" (left angle bracket) as the beginning of some HTML code and supplies what it thinks you intended to be doing. It's that damned helpful paperclip again ...

View Source and search for your phrase and you'll see what it did for you there.

Common problem. Dunno how to get around it.

Frequencists use ridiculous tools like null hypothesis testing which are designed to reject nothing, literally nothing, nadda, "null".

The above quote demonstrates that its author knows nothing about the type of statistical inference he criticizes at great length.

Let's suppose that you are trying to prove that two sample populations are drawn from distinct underlying populations. In this case you have a well-formulated null hypothesis, namely that the sample populations are derived from the same underlying population. Note that there is exactly one null hypothesis in this scenario. If you can show that the probability of this is sufficiently small, you have disproven the null hypothesis. Of course, you cannot prove that the null hypothesis is true, only that it is consistent with your data.

The concept can certainly be used, but that does not mean that the concept is bogus.

SteveM:
Maybe I'm misunderstanding how Bayesian works, but I only really understood the Monty Hall problem from a Basysian point of view. The Monty Hall problem is confusing, because the way people normally think about probability, it appears that the "true" probability of picking the car changes depending on if you switch doors or not. But when you realize that since monty hall is always opening a goat, he is introducing new information into the problem, then it makes sense. Similarly, if someone not watching the show came in and picked a door after the goat door had been opened, he would have a 50% chance of picking the car, instead of the original contestants 60%.

Last year at the ASA meeting, 12 people were arrested.

They were caught adjusting their priors :-)

Speak for yourself on your knowledge of inference, Eric. Regardless of the validity of Neyman-Pearson inference it is completely incoherent to claim that a hypothesis test will "show the probability of the null hypothesis to be small." That's quite simply not what a hypothesis test does. Hypothesis tests do not measure P(H). They compare P(X|H0) to P(X|~H0). The null hypothesis is on the right of the conditioning sign. The probabilities you compute in hypothesis testing are those of your observations, not of your hypotheses.

The way you can get from P(X|H0) to P(H|X) is, well, Bayes' theorem, and so you have to introduce a prior. Except that when you look at what kind of prior you have to introduce -- say you take a measure of effect size Z -- well, Z is a continuous variable and you would have to adopt a continuous prior, so P(Z=0) is in general 0. This is why it makes no sense to talk about measuring the probability of the null hypothesis, and this is why Popper adopted falsificationism and not probabilism.

Which is not to say that hypothesis tests are no good. For instance, the P-value of a t-test can be quite simply interpreted as the probability that the effect size we measured has the correct sign.

Ok Eric, but if you have even a tiny bias between the two groups anywhere in your experiment or in your measuring instruments, you will always, given a large enough N, be able to reject your null.What's the point of doing the test? I stand by my assertion that null hypothesis tests, even if interpreted correctly, are useless. The only time they are just barely useful is when they turn out to show non-significance and don't allow you to reject the null. Then at least you know, for sure, that your experiment doesn't have enough data to tell you anything. But if you do find significance, it only reveals that there was some small effect greater than delta-y where delta-y tends toward zero (since the width of the null hypothesis is null) and this possibly tiny difference, given your analysis, has as much chance of being caused by the weather being different the day you studied the second group than anything else. You can't do any inference about properties of your population. Only with a Bayesian estimate, with a confidence interval of the amplitude of the difference between the groups can you find that the difference is probably great enough that it could not be explained by the margin of error of your instruments or experimental environment. Then only can you start thinking: "hey, these populations are probably not the same". Frequencist methods don't allow you to do that and are thus, i reiterate: mostly useless.

Jon L; I may be looking at it weird, but to me Monty introduces no new information at all. Even before he reveals a goat, we know that there is at least one goat that he can show. And while it looks like Monty is eliminating a choice by opening a door, what he is really offering is A (our original choice) or "not A".

jbh: okay, maybe I'm just eing sensitive, but I think there is a difference between asking me what I think some random couple's chances of getting divorced are, versus telling a particular couple what their chances are. Tell them what the statistics are but not in terms of probability. That is, I agree with you, in the absence of any other information, I can treat divorce rates as a random process. But to a particular couple, I don't believe it is a random process, whether they divorce or not depends on their actions and personalities, a whole host of things that are known, if only to them. What I'm saying is that divorce is not a random process. It only becomes probability in terms of sampling couples from a population; then you can say there is a 50% of finding a divorce.

About 20 years ago I did a PhD that looked at the issues between frequentist and Bayesian principles (in an unsuccessful attempt to better the Bayesian "informationless prior" methods). I was a Bayesian in a faculty of frequentists. Some observations:
* frequentists never scathingly attacked me because instead many "knew" they were right, and were in a majority, and so didn't need to get worked up
* the two primary reasons as I see it why each side of the debate survives are: (1) the frequentist theoretical framework is ultimately at odds with actual experiments (2) it doesn't matter, because a good practitioner of frequentist techniques can avoid silly results

Example of (1), a random sample of two observations happens to give two numbers that are identical. The estimate of standard error is zero. Common sense tells you that this is an underestimate so its unbiasedness property is meaningless whenever you actually get a sample.

To put what celeriac and I said more succinctly: Null hypothesis tests cannot deal with cases where there are even tiny biases in the experimental setup or measuring instruments. It doesn't tell you anything about the magnitude of measured effects so it is impossible to determine when you are within the normal experimental difference margin or not. And since in _this universe_ pretty much all experiments have at least some variables not 100% controlled, initial conditions that are not exactly the same, measuring devices that weare out with time, temperature, humidity changes, or even changes in the mood of the experimenters: null hypothesis tests are useless. Of course if you were in an other universe where you could rewind time and experiment on the second group at the exact same time and place with the exact same initial conditions, with the instruments in the exact conditions and with the manipulators doing the exact same movements, things would be different.

In response to Jesse:

I tend to lean Bayesian myself - God does not play dice. =P

Well, obviously the famous Einstein quote. Only, we already know that Einstein was wrong on that one ... see EPR Paradox

I'm not very well versed in this, but here's my simplified understanding:

o Basic tenet: Learning from data is not possible without assumptions.

- Bayesians make these assumptions explicit (priors).

- Frequentists dance around elaborate concepts and provide all kinds of values calculated that only obfuscate the assumptions used. Usually results are presented as if they only used the data as input (not mentioning the assumptions).

- In practice Bayesian inference very often falls back to not very well explicitly justified priors (just assume a normal distribution) and then it sucks too.

- So as an end result, we have data that is fit to some fashionable curves and some values are then calculated and everybody is happy, even though the results might be further from the reality than with more logical assumptions. (The assumptions are not much discussed.)
Rigorous meta-studies usually find appalling quality of statistics in a vast array of research papers. Starting from randomization already.

- Still this all has to be lived with and there's a lot of good stuff done too.

- I'm not a scientist

Hank Roberts: This blog software interprets any attempt to type something with the "less than" (left angle bracket) as the beginning of some HTML code and supplies what it thinks you intended to be doing. It's that damned helpful paperclip again
...
Common problem. Dunno how to get around it.

The best bet it to use the ASCII code for less than (<) and greater than (>), the precise code to use can be found here. One problem is that if you preview your post, the blog software will convert the ASCII code into the less than or greater than sign so you need to replace the code every time you preview.

And since in _this universe_ pretty much all experiments have at least some variables not 100% controlled, initial conditions that are not exactly the same, measuring devices that weare out with time, temperature, humidity changes, or even changes in the mood of the experimenters: null hypothesis tests are useless.

As an engineer, maybe I have a different definition of "useless". I find hypothesis testing to be quite useful, extremely useful actually as it is the basis of most manufacturing process control. You just have to be aware of the "noise" in the system; all those uncontrolled variables you mentioned. Just be sure to interpret the statistical results in the light of your knowledge and experience, don't just accept the p-value blindly.

since the width of the null hypothesis is null

As Steve #42 correctly points out (and as you admit in your post), the width of the null hypothesis in real (as opposed to idealized mathematical) statistical analyses is not null. The width decreases without bound as the number of samples increases, but time and funding constraints will always limit you to a finite population size. (Maximizing N subject to these constraints is one of the secrets of good experimental design in fields which depend on statistics.) As long as the systematic variation in your instruments is smaller than the statistical noise in the populations you are comparing, null hypothesis testing remains a well-posed problem. For most applications, that criterion is met. In physics, you go to some trouble to minimize, or calibrate out, any systematic variations (except for the one you are controlling) in your apparatus. If you're dealing with human subjects, remember that we're a diverse lot, and the noise level will be inherently high.

Steve "Just be sure to interpret the statistical results in the light of your knowledge and experience, don't just accept the p-value blindly." This translates to using a subjective Bayesian assessment of the situation to compensate for the inadequate null tests. If you're going to do it informally why not do it on paper?

"As Steve #42 correctly points out (and as you admit in your post), the width of the null hypothesis in real (as opposed to idealized mathematical) statistical analyses is not null."

This is blatantly incorrect, and Steve said no such thing. The width of the null hypothesis remains null. It is the margin you need to reject this zero width hypothesis that shrinks with N.

"As long as the systematic variation in your instruments is smaller than the statistical noise in the populations you are comparing, null hypothesis testing remains a well-posed problem."

That is, again, patently false. The margin you need to reject the null tends towards zero as N becomes high even if the noise is high. With a high N, even a small bias in your experiment will allow you to reject the null (a fixed small bias is greater than "tends towards 0"). When your N becomes great enough that the margin is smaller than the experimental biases the null will be rejected even if you have identical populations noisy or not and even if you had identical samples! Take for example a high noise situation of interviewing humans. You want to evaluate, say, the tendency to become anxious of two groups of people. You evaluate them by showing them a scary movie and interviewing them afterwards. You hire a bunch of students to do the interviews and write down the answers. Maybe not all the students are available at the same time, maybe good weather makes everyone in a good mood on a certain day. Even if you would be evaluating the exact same sample of people, if you had enough of them, you would be able to reject the null. IMO, this kind of flaw accounts for a majority of published scientific articles in the social sciences. Since it helps them fill their CVs a lot of researchers piss out flawed p-value based research like drunken teenagers.

I just want to second celeriac's suggestion for a post on Cox's theorem. It's essential reading if you're going to talk about the grounds for Bayesian probability.

Bob writes:

As a practical matter, Bayesians are Bayesians only for the initial iteration, and once they start getting data they become frequentists.

Well, there is a reason for that. The real difference in practice between Bayesian and frequentist approaches is the Bayesian use of priors. But if you repeat the same experiment many times, the sensitivity to your prior goes away. That's actually the nice thing about Bayesian probability theory, it has a subjective element, but as you acquire more data, it becomes more and more objective (in the sense that the subjective element becomes less and less important).

Mark,

It seems to me that the examples that are not amenable to Bayesian analysis are examples that are also not amenable to a frequentist approach. I suppose you could say that it's more honest to say "There just isn't enough repeatable data to say anything at all" than to take the Bayesian approach "Well, I don't have any data to go on, but I can still use my priors to make a prediction". But unfortunately, frequentism by itself doesn't suggest a criterion for when you have enough data. That criterion is inevitably subjective or ad hoc (it seems to me).

I don't see anything wrong with using Bayesian probabilities to compute the odds of wacky things. As others have said, you can criticize these predictions by questioning the priors. If the prediction is extremely sensitive to the prior, then that's a good reason to reject the prediction or at least to take it with a grain of salt.

Only with a Bayesian estimate, with a confidence interval of the amplitude of the difference between the groups

Confidence intervals are very useful, much more so than bare null hypothesis tests. But since when are they Bayesian?

Kudos to Mark for walking the tightrope. One starts out asking simple questions, and making apparently reasonable assumptions. Yet fistfights quickly break out. And stuff just can't help but get complicated. Here's an example from today's edition of the arXiv which is unlikely to convince any ID-iots.

A New Estimator for the Number of Species in a Population
Authors: L. Cecconi, A. Gandolfi, C. C. A. Sastri
Subjects: Applications (stat.AP); Probability (math.PR); Statistics (math.ST); Quantitative Methods (q-bio.QM)
(cross-list from stat.AP)

We consider the classic problem of estimating T, the total number of species in a population, from repeated counts in a simple random sample. We look first at the Chao-Lee estimator: we initially show that such estimator can be obtained by reconciling two estimators of the unobserved probability, and then develop a sequence of improvements culminating in a Dirichlet prior Bayesian reinterpretation of the estimation problem. By means of this, we obtain simultaneous estimates of T, of the normalized interspecies variance gamma^2 and of the parameter lambda of the prior. Several simulations show that our estimation method is more flexible than several known methods we used as comparison; the only limitation, apparently shared by all other methods, seems to be that it cannot deal with the rare cases in which gamma^2 is greater than 1.

Benoit, I think I see the problem. You are apparently assuming that whatever is being measured is being measured with arbitrary precision. Heisenberg might have laughed in your face for thinking this, but I can't be certain.

Back in the real world which I inhabit, whatever I am using to measure the quantity of interest can only do so with a finite precision. For example, a typical ruler has marks every millimeter, and it is difficult to use such a ruler to make more precise length measurements--you can gain more precision by switching to a Vernier caliper or the like, but that only reduces your granularity. Anybody with a physical science background knows this, and I assume that social scientists are also aware of the problem, which is even more acute for them (and unfortunately, harder to quantify).

Here's why that matters: It means that even as N goes to infinity your statistically aggregated measurement error remains finite, because for large N the uncertainty inherent in your measurement device eventually dominates over random fluctuations. (This is a perfectly valid reason IMO for concluding that your value of N is large enough.) For example, if you use a standard meter stick to measure widget length I will never believe your claim that the average length of widgets from factory A is 23.81 mm and therefore different from the average length of widgets from factory B of 23.76 mm, no matter how large a value of N you use. A meter stick is simply incapable of distinguishing these two lengths. If you use a Vernier caliper instead, and therefore could distinguish the two lengths, I'll actually look at the details of your experiment before concluding whether or not your result is correct.

Now factor in my point above that N is always finite. Here is the other problem with your claim that the null hypothesis has null width: There is always a finite error bar in the measurement. This is why almost all of the quantities in any table of physical constants have error bars (the exceptions are things like c and mu_0 whose value is defined). To take an example from the most handy such table I have (the CRC handbook which has been on my bookshelf since my undergraduate days), the ratio of proton mass to electron mass is given as 1836.15152(70). That 70 in parentheses represents the uncertainty in the last two digits. Thus any experiment I do which produces a proton to electron mass ratio between 1836.15082 and 1836.15222 is consistent with the value in said table. That is certainly not a set of measure zero. The actual error bar may be smaller now due to additional measurements made since that table was produced, but it isn't 0, and that is at least partly because there have been only a finite number of experimental runs made to measure that number.

Frequencists do suggest the use of the confidence intervals but they don't prescribe them as bayesians do. Bayesian say that they should be mandatory. Also, frequencists have convoluted backwards ways of calculating them which hides the prior and the assumptions and sometimes leads to errors although I admit most of the time the bayesian and frequencist intervals agree and when they disagree the difference is usually minimal.

*sigh* Eric, As I already explained, it is the null hypothesis tests that assume infinite precision and perfect experiments and become non-sensical in real world situations. Confidence intervals can take these uncertainties into account. Null tests cannot.

Eric Lund,

For example, if you use a standard meter stick to measure widget length I will never believe your claim that the average length of widgets from factory A is 23.81 mm and therefore different from the average length of widgets from factory B of 23.76 mm, no matter how large a value of N you use.

If you happen to have a computer handy, you can simulate this experiment. Assuming means as specified, manufacturing errors Gaussian with standard deviation 100 mm, and measurements rounded to the nearest millimeter, you'll find that you need about 6,200 measurements from each factory to tell that the factories have different means.

If you really believe that there's no drift in the manufacturing error and that the measurements are reliable up to rounding, there's no reason to doubt the claim.

Sorry, that should be "manufacturing errors Gaussian with standard deviation 1 mm".

I happen to be a Bayesian, but this is almost straightforward t-testing stuff, isn't it? Even if the data are rounded, you can use the central limit theorem to justify a Gaussian approximation to the sampling distribution of the mean.

It's interesting to compare Bayesian probability with another approach to adjusting one's beliefs based on empirical data, namely Popperian falsificationism. The Popper view of scientific progress is (greatly oversimplifying) we make observations, we develop theories to explain those observations, we use the theories to make new predictions, we make more observations to test those predictions, and then we throw out the theories that made incorrect predictions.

In practice, though, there are difficulties. In particular, no single experiment definitively falsifies a theory, because, at least with modern theories, the predictions are probabilistic. We can certainly give confidence intervals and say that if the difference between the predicted probabilities and the measured frequencies lie outside the confidence interval, then the theory is falsified.

A purely Bayesian approach, in contrast, would never falsify any theory. One could imagine (this isn't of course feasible to actually do) writing down every conceivable theory that might explain the way the world works, and then giving each a prior probability based on intuitive plausible, or whatever. Then, in light of new data, you adjust the probabilities. With this approach, the probabilities of some theories would slowly drift down towards zero, and the probabilities of other theories would rise. But no single experiment would ever be definitive in falsifying anything; each experiment would just result in a refinement of the probabilities associated with each of many competing theories.

When making a probabilistic prediction in this approach, one would compute a weighted average over all theories. The best tested theory would presumably make the greatest contribution, but the other theories would make a slight contribution, as well.

One could imagine (this isn't of course feasible to actually do) writing down every conceivable theory that might explain the way the world works, and then giving each a prior probability based on intuitive plausible, or whatever.

I believe that the appropriate prior has actually already been described in algorithmic complexity theory. It's related to the Kolmogorov complexity of each theory (coded as a program relative to some universal Turing machine), and is therefore, sadly, definable but not computable.

As I understand it, Bayes' theorem is a way of modifying a prior probability P in the light of evidence. By hand-waving the initial value of P, you can make your conclusion seem reasonable: "either there is or is not a god, so our initial probability P is 50%". If you start with a probability of 50%, then the fact that most people think that trees look designed is indeed pretty compelling evidence that they are.

------

What's a setting in which the idea of a fixed probability based on a model of repeatable, controlled experiments is silly, but where the idea of a probability only measuring a state of knowledge is not silly?

A man has two children. What's the chance that he has two boys? If he tells you "my first child is a boy", whats the chance now?

Now ... how are you going to perform repeated experiments on this man?

What's a setting where the idea of a probability only measuring a state of knowledge is silly, but in which the idea of a fixed probability based on a model of repeatable, controlled experiments is not silly?

Hmm. How about when you discuss the outcomes of hypothetical dice rolls? If I roll 100 die, I'd expect the total to be around 350, right? Well - how does it make any sense to talk about the "state of my knowledge" about my hypothetical rolling of these hypothetical die, when there are in fact no actual die present and I am not actually going to roll them? What referent would the "state of my knowledge" have? Knowledge about what, exactly?

I haven't read all the comments here, but all this really only BEGINS how wide an interpretation of probability theory(ies) exists if you really think about. Many mathematicians will says, as I understand it, that the sum or integral of probabilities has to equal 1. As I understand it, the probability theory used in quantum mechanics doesn't have or need this requirement and we can have complex probabilities. One might also compare a relative frequency approach to probability to an axiomatic characterization of probability theory. Also, besides probability of events that either happen or don't, there exist probabilities of fuzzy events. For instance, let's talk about the probability that a person randomly selected from your school or a school you attended stood very tall. Many factors will affect the decider as to if that selected person stands very tall or *how* tall that person stands. In other words, the decided might say the person stands tall, but not very tall. Or that s/he stands very tall, but just barely so, and consequently that person ends up somewhat very tall as opposed to definitely very tall. Or one could talk about the probability of autism in people. Since there exist different degrees of autism, probabilities end up more complicated as above. Some people even basically say it incorrect to talk about probabilities of random events. Instead, we should talk about probability distributions, and that only in the light of probability distributions (functions) do probabilities make much sense, as in the theory of probabilistic metric spaces. In other words, it doesn't make sense to talk about the probability of a coin flip... after all the coin supposed acts deterministically according to the laws of physics. But, it does make sense to talk about the probability distribution function of possible coin flips, where heads has a value of 1/2 and tails a value of 1/2 (assuming a fair coin). Or so some people argue.

Paul, the knowledge you lack when you throw a dice are things like the initial conditions, the characteristics of the surfaces the dice will bounce off, the perturbations cause by the air and the speed vector of the throw. If you knew all of these exactly you could predict the outcome. For an interesting read about this: http://omega.albany.edu:8008/JaynesBook.html Chapter 10: The Physics of a "Random Experiments", subsection: How to Cheat at Coin and Die Tossing.

@jbh (No.25)

Saying that 50% of marriages end in divorce is not the same as saying that a couple has a 50% chance of getting divorced, mostly because some people get divorced more than once. Imagine you had a very small town where four men and four women got married and stayed married forever, but there were two men (call them A and B), and two women (1 and 2) who got married and divorced in various combinations---call the resulting marriages A1, A2, B1 and B2. Then the town has had eight marriages, four of which ended in divorce, producing a 50% rate of marriages that ended in divorce, but notice that only 1/3rd of the people in the town have ever had a divorce.

In fact, if you kept a pathological case like that (a large group that stays married forever and a small group that marries and divorces in all combinations) you could keep a 50% rate of marriages ending in divorce while having an arbitrarily small percentage of people ever having a divorce.

Although this is rather contrived, the fact that most divorcees get remarried and most non-divorcees do not, implies that the rate of marriages ending in divorce will probably be higher than the rate of individuals having divorces. The marrying couple, however, are more interested in their personal chance of getting divorced than the chance any given marriage dissolves.

How about when you discuss the outcomes of hypothetical dice rolls? ... there are in fact no actual die [sic] present and I am not actually going to roll them ...

Nice try, but this is a bait-and-switch. If we are to take as given the fact the whole experiment will not take place, that rather simplifies our state of knowledge about the potential outcomes, don't you think? ;-)

I'd describe myself as a "practical Bayesian"; I broadly identify with the Bayesian arguments (though in cases these are not entirely without issues), but I use both Bayesian and "frequentist" or "classical" approaches (I don't like the term frequentist for a variety of reasons).

Often the typical non-Bayesian approaches tend to correspond to something that's at least not ridiculous in a Bayesian world, though sometimes the implied priors can give one pause. When sensible ideas like shrinkage are applied in the non-Bayesian world, the differences often become smaller. I think it it essential for us to be aware of the properties of whatever methodologies we apply, and to avoid them or alter them when their properties are not what we would wish.

These days, in many applications, there is so much data that issues like priors become almost a non-issue. Indeed, in many applications, efficiency is often a non-issue. In many cases, we're arguing about the wrong things!

On the discussion above about Frequentist vs. Bayesian confidence intervals (usually called "credible intervals"): while Frequentists do use confidence intervals, these intervals don't have a very useful interpretation in most cases. In fact, most scientists think they can apply a Bayesian interpretation to Frequentist confidence intervals (i.e. "We can have 95% confidence that the true parameter is in this interval"). This is quite wrong, of course, because personal confidence has no place in Frequentist theory.

The correct interpretation of a Frequentist 95% confidence interval is this: "95% of intervals constructed in this way will contain the true parameter." This has no direct relationship with the confidence one should have in this particular interval, and especially not with the confidence one should have in this particular parameter estimate.

Bayesian credible intervals mean what you want them to mean. Another point for Bayes.

I am not sure what the problem is. When I calculate a confidence interval, I only really want the probability that the "true value" is within the interval. What you are saying is that there is a 5% chance that the parameter is not within the interval. That is also how I understand a CI is to be interpreted. It is this chance that I use to affect my confidence in what course of action to take based on that result. So, I guess I agree with you that "confidence interval" may be a poor choice words but I don't see how this is "another point for Bayes"

I don't know whether I am a Bayesian or a Frequentist, my probability and statistics knowledge comes mostly from its application in statistical process control.

I am not sure what the problem is. When I calculate a confidence interval, I only really want the probability that the "true value" is within the interval. What you are saying is that there is a 5% chance that the parameter is not within the interval.

No, unfortunately this is the exactly wrong interpretation again. You're not getting the probability that the "true value" is in the interval. You have to think of it as an infinite series of intervals, obtained from an infinite series of experiments, 95% of which contain the true value. The probability that the particular interval you calculated from your particular experiment contains the true parameter is completely unknowable.

It's natural to imagine that the two ideas are equivalent, but they're simply not. Wikipedia's explanation is pretty good on this topic. And this paper contrasts confidence and credible intervals directly.

You have to think of it as an infinite series of intervals, obtained from an infinite series of experiments, 95% of which contain the true value.

Really, I am not trying to be contrary, I truly am confused by this statement. This sounds equivalent to, "you have a barrel with a very large number of marbles, 95% of which are white". Why can't I say that the probability that I will select a white marble is 95%?

To rephrase my question:

You're not getting the probability that the "true value" is in the interval. You have to think of it as an infinite series of intervals, obtained from an infinite series of experiments, 95% of which contain the true value.

To me, this reads pretty much the same as Marc's description of the frequentist's definition of a probability:

The frequentist approach to probability basically defines probability in terms of experiment. If you repeated an experiment an infinite number of times, and you'd find that out of every 1,000 trials, a given outcome occured 350 times, then a frequentist would say that the probability of that outcome was 35%.

What am I missing?

Really, I am not trying to be contrary, I truly am confused by this statement. This sounds equivalent to, "you have a barrel with a very large number of marbles, 95% of which are white". Why can't I say that the probability that I will select a white marble is 95%?

This you can say, actually.

To me, this reads pretty much the same as Marc's description of the frequentist's definition of a probability:
...

What am I missing?

This is exactly the point, the frequentist confidence interval has to be interpreted in the context of the frequentist definition of probability. To a frequentist, the parameter is fixed - it's meaningless to a frequentist to talk about the probability that the parameter is in an interval. It either is or it isn't - it's not a question of probability. Therefore it's not possible to interpret a 95% frequentist confidence interval as saying anything about the probability that the true parameter lies in it. It's just formally meaningless.

Of course, as in many other areas, frequentist methods and Bayesian methods will often give very similar results. In fact, in the important special case of a CI for the mean of a Gaussian distribution, the frequentist confidence interval is identical to a symmetrical Bayesian credible interval under a flat prior. So, if you want, you can say that the 95% confidence interval in that case really does mean that there's 95% probability that the parameter is in that interval, but you're being a Bayesian when you say that. :) (A Bayesian who's assuming a flat prior.)

How do Bayesians respond to things like quantum physics in which a lot of low levels stuff truly appears to be "pure probability" in which previous states not only don't appear to determine an outcome, but can't as that would result in different outcomes altogether?

I'd say the frequentist approach, at least from what little I know, is the only applicable view of those events.

That said, a dice roll occurs on a more macroscopic level and could in theory be predicted.

Dark Jaguar:

With the caveat that I'm not a hardcore Bayesian, I think that the Bayesian approach to things like quantum physics would be pretty straightforward. Given a quantum phenomenon, it's got certain possible outcomes. Until one of them happens, and we can observe the outcome of it, we can only talk about it in terms of probability - that is, the relative degrees of certainty about the various outcomes.

So it's no different than anything else Bayesian: until an event happens, it outcome is unknown, and can only be described in terms of probabilities.

I think that what throws you there is that many people look at Bayesian versus frequentist as if the distinction is that frequentists think that events have an intrinsic probability to them, whereas Bayesians only think that we lack certainty about things, and the probability is a measure of that uncertainty. That leads to an idea that the Bayesian approach dictates that there is an outcome *knowable in advance* if only you had all of the inputs.

That's not what Bayesianism says, and it's not what frequentism says. Frequentism never talks about a specific event. We can't talk about the probability of different quantum states in a frequentist framework; frequentism only works in the context of a series of identical trials. And Bayesianism doesn't imply that all priors are knowable. The state that results from a quantum
event is fully determined after it's been observed; up until that point, there's a prior which is unknowable.

SteveM:

A Bayesian 95% credible interval tells you that the true value of the parameter lies within that interval with probability 0.95.

A frequentist 95% confidence interval means that the procedure for generating confidence intervals will, 95% of the time, produce an interval that contains the true value.

The former is a statement about the probability that the true value lies within the specific interval constructed.

The latter is a statement about the general reliability of a procedure for generating intervals, over the space of all possible intervals that one might construct. It does not (and cannot) tell you the probability that the true value lies within any specific interval constructed by the procedure.

Let me give a silly but illustrative example to amplify the point. Here is a stupid algorithm for constructing a 95% confidence interval:

95% of the time, return the entire real line (-infinity,+infinity).
5% of the time, return the empty interval {}.

Technically, this is a valid frequentist procedure for constructing a confidence interval. (It's stupid and nobody would actually follow it, because it totally ignores the data. But it still fits the definition!) 95% of the intervals constructed by this algorithm will contain the true value, no matter what it is, because they contain all possible values. 5% of the intervals will not contain the true value (because they don't contain anything!).

Of course, these intervals are totally useless. 95% of the time you have an interval you know the true value lies within, but which tells you absolutely nothing about where the true value might likely be. 5% of the time you have an interval that cannot possibly contain the true value. That's because "95% confidence" doesn't apply to any SPECIFIC interval you've been handed — if you're handed one of the empty intervals, of course there is not a 95% probability that the true value lies within it; there is a 0% probability. "95% confidence" is a statement about how often the algorithm produces an interval that contains the true value.

So the mere fact that you have a frequentist procedure for constructing a 95% interval doesn't necessarily help you make inferences. You need to ask more of your confidence interval than just coverage. Then you get into the frequentist literature of how best to evaluate the quality of your algorithm for constructing intervals.

On the other hand, with a 95% Bayesian credible interval, there really is a 95% chance that the single, specific interval you've constructed will contain the true value. (Of course, this interval will depend on your prior assumptions.)

Dark Jaguar:

The quantum wavefunction gives you a likelihood function: given that you're in some state, what is the probability that you will measure an observable to have a particular value?

Both frequentists and Bayesians use likelihood functions. If you're just predicting the outcome of an experiment for a system prepared in some state, there is no difference between the two: you're conditioning on your hypothesis (the initial quantum state), and deducing the likelihood of observing something in an experiment.

What Bayesianism adds is inference, working backwards inductively from observations to hypotheses. It allows you to address the inverse question: given that I have measured an observable to have a particular value, what is the probability that the system was originally prepared in a given state?

The answer to that question depends on how probable you think a given state was before you made the measurement: that's where the prior comes in.

SteveM,

Most of the standard confidence intervals people are taught have a nice Bayesian interpretation, so they don't deceive. But if you try to construct one for a nonstandard problem, watch out! David MacKay's free textbook has an example starting on page 464.

I recently encountered a very amusing paper regarding the interpretation of confidence intervals for null hypothesis significance testing by Jacob Cohen entitled "The Earth Is Round (p < 0.05)". It points out how easy it is to misinterpret p-values and why. I wrote a brief blog post about it.

Arrgh! The less than sign in the paper's title ate the HTML for the link - even after I previewed it and replaced it with an `lt` entity.

@mark: I had the same problem when I linked to the same paper in comment 28! Someone else pointed out that when you preview, the ampersand-lt gets resolved to the less than sign, so you have to redo the code every time you hit preview.

I really appreciate the several replies that have tried to explain bayesian vs frequentist confidence intervals to me. So let me see if I understand. So the way that I am familiar with calculating CI's from a set of samples assumes the samples are drawn from a gaussian distribution. So does this assumption make it essentially a bayesian credible interval rather than a confidence interval?

SteveM,

Assuming normality is something that either a frequentist or a Bayesian can do; they use the same likelihood function. Your distributional assumptions don't determine whether you're a frequentist or a Bayesian; what you condition your conclusions on does.

(Frequentists results are conditional on an assumed hypothesis, and they make conclusions about the probabilities of data which can be generated by that hypothesis. Bayesian results are conditional on an assumed set of observational data, and they make conclusions about the probabilities of hypothesis which can generate the observed data.)

If you have normality and a flat (uniform) Bayesian prior, then it turns out that the most common way of constructing a Bayesian credible interval gives the same result as the most common way of constructing a frequentist confidence interval. The text by Bolstad has a nice comparison of this situation in Section 12.2 ("Comparing credible and confidence intervals for the mean"). I think others in this thread have given references which do this too.

In that specific case, the interval is "essentially a Bayesian credible interval". But I wouldn't say "rather than" a confidence interval. It is equally well a frequentist confidence interval. The two methods produce the same result in this case, but they interpret the meaning of the interval in very different ways. In general, the two approaches will lead to different intervals, differently interpreted.

SteveM (#36) write:

I may be looking at it weird, but to me Monty introduces no new information at all. Even before he reveals a goat, we know that there is at least one goat that he can show. And while it looks like Monty is eliminating a choice by opening a door, what he is really offering is A (our original choice) or "not A".

You are incorrect that Monty introduces no new information. This can be shown via math, via simulations, via enumerating possible worlds, etc. Look up "monty hall problem" in wikipedia for more details. link

So the real question is, why do you (and so many other people), think that there is no new information?

This is a question about your intuitions, not a question about the correct answer.

Perhaps this will help: assuming that Monty will always open some door (after you make your first choice), he usually doesn't have a choice as to which door to open. That's where the extra information comes from.

You choose door A originally. There's a 1/3 chance that you've chosen the car. In this case, both the other doors have a goat, and Monty can open either one. And in this (1/3) case, switching to a new door results in you getting a goat too.

But, when you choose door A, there's a 2/3 chance that you've picked a goat, not the car. And in that case, only one remaining door has a goat, and Monty must open that exact door. In this (2/3!) situation, switching to a new door gets you the car.

Does that help? If not, study the wikipedia page.

Hello Mark,

I apologize in advance for my complete ignorance of this topic (aside from my child's-level exposure to the fact that if you flip a coin the probability is 0.5 that it will come up heads).

My question regarding your brief explanation of the Bayesian approach is based on the following:

The bayesian approach is based on incomplete knowledge. It says that you only associate a probability with an event because there is uncertainty about it - because you don't know all the facts. In reality, a given event either will happen (probability=100%) or it won't happen (probability=0%). Anything else is an approximation based on your incomplete knowledge.

How does this school of thought take into account quantum theory, which from what I have read states that knowledge can never be perfect? For example, the uncertainty principle.

Thanks.

Monty IS introducing some new information: he's letting you know which door one of the goats is behind.

At first, you picked a goat (2/3) or the car (1/3). Monty reveals a goat. The probability that there is a goat behind the door you originally chose is still 2/3, though, because you made that choice before you saw which door Monty opened. Therefore, the probability that the car is behind your door is only 1/3. The probability that it is behind either of the other two doors is 2/3, but you know that one of the doors is hiding a goat, so the probability that the car is behind the remaining, untouched door must be 2/3.

We ponder two approaches to statistics in my courses: probability theory, if one can "map" the outcomes in some fashion (a tree or a chart, perhaps), AND, if theory fails, the experimental approach, sometimes using computer sims if we cannot do *real* experiments in sufficient numbers.

I don't know if that makes me a Bayesian or frequentist or what.

"Math," I tell my students, "is what one does if one has all the time and money one needs to obtain an exact model."

"Statistics is what one does if we need a 'close enough' answer by Thursday."

BTW, I don't know why high schools emphasize the calculus (which I also teach) over stats and probability, as I think stats and probability are both (1) easier and (2) more interesting and useful to the average joes and josephinas in my classes. I'm just sayin'.

I agree with you. The so-called "Bayesian school" is a farse. Not the mathematical equations they produce, those are accurate, and sometimes even interesting. But the "Bayesianist approach" is a false dilemma created to call attention. Pure rhetoric and sophistry. I also have never met a "frequentist", but only quiet, humble and open-minded scientists on one side, and nervous "bayesians" on the other, shouting exaggerated claims at the others, victims of some sort of paranoia. They are naÃ¯ve researchers who believe science can be explained as feeble manichaeist dichotomies.

When people say their research is no mundane science, but instead a struggle against a foreign enemy, things look more important than they really are, and more exciting. Also, it helps to unite people, because differences amongst trench-brothers pale in front of the threat of the outside enemy. This is an old technique for population control: to declare war on a foreign enemy to make things easier at home.

I like how you said that people criticize you for not being "Bayesian" enough. "Bayesianism" is sort of a "twist" that makes things suddenly look more challenging, inventive, unconventional and revolutionary. It's an attempt to fabricate a Khunian change of paradigm (I have read that claim explicitly once).

What strikes me the most is that trying to differentiate themselves from the imagined enemies, they spend huge amounts of time trying to explain how the so-called "frequentists" think, and attribute absurd beliefs to these "non-bayesianists", like saying that they are "forbidden" to do things such as treating parameters being estimated as random variables, or stipulating a probability distribution function to a variable before analysing it... There is even a poetic symmetry here: They attribute to their imaginary enemies the (normal) belief that they would not be able to arbitrarily attribute (probabilistic) belief to things.

Quixotes in the worst sense of the term.

I was just getting ready to write about this topic in my own blog, it's good to find someone else to refer to!...

Schools of thought in Probability Theory

More like this

Moving on

Goodbye, Scienceblogs

Seed, Conflicts of Interest, and Sleaze

Searching for Topics

Saturday Recipe: Ginger Scallion Sauce

Bird predation, sexual segregation and fission-fusion societies: the amazing noctules (vesper bats part XIX)

An extreme environment invaded by an 'extreme' marine reptile: Henodus part II

Further temnospondyl adventures: it's mostly about the dissorophoids (or some of them anyway)