Steve Novella at Science-Based Medicine, a level headed and judicious advocate of better use of scientific evidence in clinical medicine, has written his own view of the BPA issue we covered in a post the other day. Orac pointed to it in the comments as "another take" on the issue. We aren't sure if he meant it disagreed with the view we expressed or not. For the record, we have some differences, but not on the judgment about the BPA paper. Differences about that are mainly a matter of emphasis and that's pretty subjective. "Real but small effects" may be very important but we don't know that yet. Those judgments aside, we would like to take the opportunity to comment on some other parts of Steve's post which we think bear some discussion, particularly the question of how one demonstrates causation. He opines that the only way to do this is through a randomized experiment. We disagree. Our view is that there is no way to prove causation but many ways to demonstrate it. Unfortunately this subject quickly gets us into deep water and it can't be done in a single post. Indeed, since many books have been written on this subject and there is no consensus, many posts won't do the trick either. So we'll settle for making a couple of points.
The most important is that the question of causation and how to demonstrate it is not settled by philosophers of science. The only ones who think it's settled are scientists and that's because they aren't experts in the subject. As one wag once said, expecting a scientist to understand scientific method is like expecting a fish to understand hydrodynamics. Scientist are experts in doing science. But they do not often understand exactly the logic of what they are doing.
Consider the role of deductive reasoning, which most scientists take to be one of the hallmarks of scientific method. Yet its use is fairly restricted, mainly to constructing mathematical tools. Beyond that it has limited relevance because deductive reasoning requires something we don't have in empirical science, absolute certainty. Here's an example from the late ET Jaynes's book, Probability Theory:
Suppose some dark night a policeman walks down a street, apparently deserted. Suddenly he hears a burglar alarm, looks across the street, and sees a jewelry store with a broken window. Then a gentleman wearing a mask comes crawling out through the broken window, carrying a bag which turns out to be full of expensive jewelry. The policeman doesn't hesitate at all in deciding that this gentleman is dishonest. but by what reasoning process does he arrive at this conclusion?
[snip]
A moment's thought makes it clear that our policeman's conclusion was not a logical deduction from the evidence; for there may have been a perfectly innocent explanation for everything. It might be, for example, that this gentleman was the owner of the jewelry store and he was coming home from a masquerade party, and didn't have the key with him. However, just as he walked by his store, a passing truck threw a stone through the window, and he was only protecting his own property.
Now, while the policeman's reasoning process was not logical deduction, we will grant that it had a certain degree of validity. The evidence did not make the gentleman's dishonesty certain, but it did make it extremely plausible. This is an example of a kind of reasoning in which we have all become more or less proficient, necessarily, long before studying mathematical theories. We are hardly able to get through one waking hour without facing some situation (e.g., will it rain or won't it?) where we do not have enough information to permit deductive reasoning; but still we must decide immediately what to do. [emphasis in the original]
This passage appears in the first two paragraphs (Chapter 1, p. 3) of Jaynes's Probability Theory: The Logic of Science. It is the posthumous masterwork of one of the 20th century's foremost mathematical probabilists, a graduate level text in mathematical probability theory. He was no lightweight and certainly not a crank. The 700 pages that follow contain Jaynes's development of the seminal work of Jeffreys, Cox, Shannon and Polya, focusing not on deductive reasoning but on plausible reasoning and its rules. It's a book in the logic of probability, not the logic of certainty, which is what deductive logic studies.
The product of deductive reasoning is proof. Judgments about causation are the product of plausible reasoning. "Randomized" experiments are one (important) tool of plausible reasoning in epidemiology and some parts of clinical medicine. They are rarely used in the physical and chemical sciences laboratory or in the animal studies. Animals are rarely randomized nor the researchers unaware of the independent variable. The purpose of randomization, when it's done, is to reduce the chance of differences in two groups unrelated to the independent variable of interest. Randomization neither guarantees comparability nor demonstrates causation. There is a much deeper and more complicated argument that goes with each of those statements but for the moment let's observe that if a randomized experiment were required we would still not know if cigarettes cause lung cancer, be unsure about the validity of relativity and not know much about chemistry, astronomy or geology, much less evolution and paleontology. All of these sciences use rigorous forms of plausible reasoning without requiring randomized experiments to demonstrate their basic premises or accepted truths. Steve acknowledges this in discussing lung cancer and smoking without, in our view, fully grasping the nettle.
Because whether required or not there are some deep questions about randomization and causation. The role of randomization was the subject of some fundamental disagreements between RA Fisher and Egon Pearson. There is nothing cut and dried about the subject and the consequences of the differences are important. But this is a whole other set of topics that epidemiologists and statisticians argue about, particularly causation, where the counterfactual school battles with more conventional methodologists in the realm of theoretical and practical methods. It's an interesting subject of some interest to us, but there is too much to say within the confines of this post.
The bottom line here is this. We think laying down absolute markers on what is required to demonstrate causation in medicine is a fool's errand and potentially dangerous. If accepted as the summa of scientific methods in toxicology ("the only direct evidence of causation") it would zero out the bulk not only of toxicology results but of all science (because if it is a valid and required method for toxicology, why just for toxicology?) That doesn't make much sense to us. There is no "direct evidence" of causation. Causation is a judgment about the available evidence using plausible reasoning. There is no privileged form of evidence. Indeed we consider this to be a scientific sin: methodolatry.
We think Steve is a terrific blogger (as is Orac). What fun is having a blog if you can't start a food fight about ideas with someone you respect?
 
My position is closer to Steve's than to yours in that I tend to agree that a randomized prospective study is usually the most reliable way to determine causation. I wish he hadn't used the word "only" because I know that he's not that dogmatic and may have just been uncharacteristically sloppy with language with that sentence. However, I also think you're being a bit too coy here. Retrospective, epidemiological studies can never really be conclusive evidence of causation because it's impossible to control for every variable. Even the epidemiology of smoking wouldn't be conclusive without all the basic science and animal studies that support the hypothesis that cigarette smoke contains many carcinogens. Indeed, that's the very point tobacco companies tried to play up (while downplaying the other evidence. Randomized trials produce stronger evidence of causation and require less other evidence to support them. Evidence of the sort for cigarette smoking requires more ancillary evidence from other sources.
Regarding "plausible" evidence versus, the very point here is at what level of plausibility does one conclude that a correlation is highly likely to equal causation? Obviously that's a whole other can of worms. Also, Steve thinks like a physician, which is what he and I are. There are lots and lots of "statistically significant" correlations out there in medicine, but if a correlation is relatively small, we often consider it clinically insignificant because it doesn't produce an obvious difference that the patient cares about or because the cost of mitigating the correlation is higher than the actual small risk. If a correlation is real but small, we still sometimes don't do anything about it because it isn't clinically significant.
Actually, by "another" take, my point wasn't that I disagreed with you about the study itself, perhaps other than on your emphasis on the Bush administration at the end of your post. After all, the EU made a similar decision as the US did. It is possible that the EU may modify that decision based on these new studies, but the point was that, given the evidence available before these studies came out, the EU came to an even stronger conclusion than the US that elimination of BPA was not necessary. That's what I meant by "another take."
Orac: If by "coy" you mean evading my real meaning I guess I would have to plead guilty. So let me stop being coy. What I really think is this:
i. Many advocates of "evidence based medicine" think too highly of randomized clinical trials because they don't know how the sausage is made. In other words, they are naive and ignorant of methodology. Let me be very clear about you and Steve. I do not think either of you fall into this category. You both seem to have a good grasp of methods and know well how to critique a scientific article. It is a more general observation about the worship of RCTs by too many knee jerk EBM advocates;
ii. Having said nice things about you both, my second view is that you are both quite wrong -- and in a fundamental way -- about what you clearly feel is a privileged evidentiary position for RCTs; and that your error propagates a dangerous misconception about scientific evidence in medicine.
Nothing coy about this. Let me briefly (there is much to argue about here) justify my statements.
First, let's be clear about terminology, which I think is being used by both of you in a dangerously loose way. I will distinguish between two kinds of scientific studies: observational ones and experimental ones. They differ in only one respect. In an observational study the investigator does not have control over the independent variable, while in an experimental one, he/she does. Thus a RCT is a type of experiment, as are most laboratory experiments, whether involving animals or test tubes. An observational epidemiological study -- which can be either retrospective, prospective or neither -- observes a relationship between two variables (one of which may be thought of as an independent variable and the other as a dependent variable) but the investigator doesn't control either.
I hope you would both agree that a well done observational study is almost always better than a lousy RCT. Randomizing to achieve comparability is usually only a minor part of an experiment if it is part of it at all. Lots of experimental designs, even when randomized, are just plain lousy, incompetent, lacking in power or wrongly interpreted and they are not only worse than a good observational study but worse than nothing. While this is an obvious proposition (bad science is always bad), it is not taken into account when you privilege RCTs as a form of evidence. Doing a good RCT is quite difficult and involves a surprising number of subtleties that RCT worshippers do not appreciate. It is an art form and adept practitioners are fairly uncommon. As Steve's post notes, it is also an infeasible design for answering many questions of scientific and practical interest. So while it can be imagined it cannot be done, sort of like weighing the earth on a balance.
More importantly, an RCT needn't be done to make a causation judgment, as the smoking and lung cancer (and many other) examples show. The "triangulation" Steve talks about is just another description of the plausible reasoning I wrote about in my post and which is at the heart of the issue, not the use of randomization or an experimental design. That is a distracting and misleading view of scientific method (IMHO). So that's my first point: an RCT is not necessary to make a sound causation judgment in medicine, public health or toxicology (much less other disciplines).
Nor, quite obviously, is it sufficient. The last time I looked there were something like eight RCTs on the value of mammography. They produce discordant results. If evidence from an RCT were sufficient, all that would be required is to do one. That is clearly not the case. My complaint boils down to claiming that the exaltation of RCTs by EBM advocates leads too many naive readers and scientist consumers of the literature to conclude that an RCT is both necessary and sufficient when it is neither. Simultaneously it leads them to down grade, a priori, other kinds of information that may well be superior. I'll concede this is not a logical consequence of the EBM position (indeed true EBM advocates are careful about this when talking amongst themselves) but it is a practical consequence of some of the way EBM advocates often talk, and I take both Steve's post and your comment to be examples of EBM statements likely to produce this inference in naive readers.
Your use of the phrase "conclusive evidence" I assume is inadvertent but revealing. "Causation" is not an empirical discoverable property, it is a judgment about evidence against a theoretical background (RCTs do not provide direct evidence of causation, at least from the counterfactual point of view, but that's a whole different discussion). At issue is both the logic for making a causation judgment and the kinds of evidence that goes into making it (again, that's why I introduced the topic of plausible evidence and its logic in my post). In my view you emphasize the latter too strongly and mistakenly. I am taking you and Steve to task for privileging a particular kind of evidence when the weight that should be accorded to it depends on the context and the quality of the evidence itself. I am using the word privilege in a colloquial sense: giving the benefit of the doubt to something or someone before even examining its value. While I suspect you both recognize these points when pushed, you have to be pushed. Maybe that's because you assume others also recognize them. I can assure you from many years dealing with students, colleagues, the press and the public, they don't. And statements from scientists that don't explicitly recognize them contribute to that, IMHO.
Finally, I think your observation that you and Steve look at this like clinicians is highly pertinent. That leads to a particular prejudice. I look at it as an epidemiologist who has spent a good part of his career doing observational studies. That leads to another kind of prejudice. Returning to my post, that's why I said I welcomed it as an opportunity to have a discussion, so I'm pleased you commented.
Excellent post, excellent comment and excellent response.
The kind of plausible reasoning that Steve and Orac want to use depends on either understanding the details of what is being investigated, or using such a large and randomized trial that what is unknown is "washed out". In general such large trials are impossible because we are too ignorant of what needs to be randomized out and (in the specifics of endocrine disruption) because there are too many compounds with endocrine activity in food and in the environment. The background is not zero. Many plants generate endocrine disrupting chemicals to control predation by herbivores. Plants increase the levels until animals eating them get sufficient endocrine disrupting effects that predation is reduced. Domestication of plants by humans has decreased some of those levels in some plants, but has probably increased them in animals (selecting animals that grow big and fast and fat, selects for animals with high androgen levels).
In the case of endocrine disrupters, until we really understand the physiology of the endocrine system, understanding how agents perturb it is simply not possible. Especially when that endocrine system is not identical in every individual and the endocrine system itself changes over time during development, and due to environmental effects. Designing a RTC to prevent confounding requires that one know what potential confounds are. There may always be some that are being missed because they are not appreciated (my bacteria for instance).
This does bear on my NO research. NO is the major regulator of steroid synthesis (via inhibition of cytochrome P450 enzymes), so changing the basal NO level will act as an endocrine disrupter. Physiology is already using changes in NO levels to modulate the endocrine system, so changing the basal NO level will have endocrine disrupting effects with no threshold. Direct action on steroid receptors is not the only way that chemicals can have endocrine disrupting effects. Acting on any part of the feedback loop that regulates the synthesis or signaling of steroids will also disrupt signaling mediated through steroids.
Disrupting the normal biofilm of ammonia oxidizing bacteria that many eukaryotes have, will lower their NO level and will perturb their steroid synthesis pathways. I suspect that this may be how some compounds with seemingly anomalously low endocrine disrupting effects in vitro experimentally have anomalously large endocrine disrupting effects in vivo. This might be what is happening with atrazine and other agents that inhibit ammonia oxidizing bacteria. It could also be what is producing effects observed between the developed world and the rural undeveloped world; those effects hypothesized to be mediated through the "hygiene hypothesis". Since NO is inside the feedback loop, disrupting normal NO regulation will perturb steroid regulation with no threshold. It will perturb it in the characteristic direction of low NO. That is the direction that normal "stress" causes physiology to adjust itself. The effects of low NO are (in many cases) "the same" as and are indistinguishable from the effects of "stress". Many of those stress responses are completely adaptive in the short term, and only become maladaptive in the long term.
The question of when is there enough plausibility to take certain action depends on what action that one is contemplating taking. Switching from polycarbonate to polyethylene for infant formula bottles to me, takes a trivial level of plausibility. Not starting an addictive, health damaging habit such as smoking takes a trivial level of plausibility to me.
Bravo Revere and Orac!
Fantastic post and comments. I would love to be able to argue this stuff in person.
I'm also an epidemiologist but I tend to work with clinicians in answering clinical questions (I also work things through from a population perspective). My opinion is that in order to do that one needs both observational and experimental methods and these must come from a range of sources salient to the problem (i.e. animal models in vitro human observational and epxerimental etc).
The setting of the problem also requires some mental gymnastics,. This is true when clinicians make statements about public health effects but are still thinking about the setting of their waiting room. But the same is also true of public health people who don't adjust their picture of setting away from the population and into the doctors waiting room when it's appropriate to do so. This is typically (in my field at least) where you get the least fruitful arguments. It may also be part of the difference of opinion here.
When talking about the type of problem where it's both a public health and a clinical problem there is however a 'third way'. Systematic public health intervention and monitoring. The best example I can think of at the moment is the New Zealand Housing and Health Study (doi:10.1136/bmj.39070.573032.80)
Randomised controlled trials, that are also cohort studies, in the real world that employ endpoints that are both of clinical and public health import. This is of course not 'the answer' either. But it's another tool for picking away at what might be 'causal'.
Fascinating subject and posts - more of the same please!
To give a concrete example of how a lack of knowledge can limit what you can do with RCTs investigating potential treatments; consider ulcers. Before there was the realization that Helicobacter pylori was important in the cause and resolution of ulcers, RCTs on the various treatments not involving bacterial removal would be of limited utility. What you would observe is lots of noise because you are not investigating the actual causal phenomena.
daedalus: This gets us into the tricky territory of hypothesis seeking versus hypothesis confirming investigations. RCTs pertain to the latter. They are of no value in the former because a hypothesis is stipulated as the independent variable.
Wonderful post! I am fascinated by discussions on the philosophy of science and lately I'm particularly interested in EBM and its limitations (EBM is a concept which is terribly misused and glorified by physicians here in Ecuador as are RCT and meta-analyses). I've been wanting to write on the subject for some time, but before I venture into that I would like to ask for some bibliographical suggestions. Please bear in mind I live in Ecuador, and specialized books in English are very difficult to come by. So, if it were possible and you could possibly suggest publications (or send them to me by e-mail :), I would be eternally grateful. Thank you very much in advance.
alvaro: I'm not much of an EBM expert (except that I am a professional epidemiologist), but David Sackett's papers in the New England Journal in years past (they should be easy to locate; try Googling EBM) are a good starting place for a judicious and balanced view of EBM. The deeper questions of RCTs and controversies about methods unfortunately are probably mainly scattered in the research literature. A good discussion of RCTs versus observational studies can be found in Paul Rosenbaum's book, Observational Studies. I'm not sure how easy it will be to find that book where you are. Papers by Robins, Greenland, Rubin on counterfactuals and potential outcomes can be found in some commonly available journals, but these papers are not an easy read. The whole issue of causation is fraught with difficulty. I don't have any particular papers to send, alas. I try to keep an eye on this literature but it is not my main area of research at the moment. Good luck and I wish I could help more.
Thanks a million, the information is extremely helpful.