Hypothesis-Free Research?

By purepedantry on May 19, 2008.

Steven Wiley, writing in the Scientist, discusses the contradiction of the recent fad for "hypothesis-free" research:

Following a recent computational biology meeting, a group of us got together for dinner, during which the subject of our individual research projects came up. After I described my efforts to model signaling pathways, the young scientist next to me shrugged and said that models were of no use to him because he did "discovery-driven research". He then went on to state that discovery-driven research is hypothesis-free, and thus independent of the preexisting bias of traditional biology. I listened patiently, because I have heard this argument many times before.

I was too polite to point out that all biological research was hypothesis-driven, although the hypothesis might be implicit. Genomic sequencing projects might seem to lack a hypothesis, but the resulting data is exploited by hypothesizing specific evolutionary relationships between different genes.
The idea there are actually two distinct ways of conducting biological research was formally proposed several years ago in a Nature Biotechnology commentary (R. Aebersold et al., 18:359, 2000). The authors described "discovery science," like genome sequencing projects, as blindly cataloguing the elements of a system, disregarding any hypotheses on how it works. In contrast, they described "hypothesis-driven science" as being small-scale, narrowly focused, and using a limited range of technologies.

Although the authors' intent was to justify large-scale research as a valid way to approach biological problems (another frequent topic at after-meeting dinners), in my opinion, casting it as hypothesis-free did the emerging field of systems biology a great disservice. To imply that large-scale systems biology research can be productively conducted without a prior set of underlying hypotheses is nonsense. A good hypothesis is at the heart of the best science, regardless of scale.

For the unfamiliar, hypothesis-free or discovery driven research generally focuses on finding significant correlations in very large data sets such as sequencing or expression data. An example of this sort of project is one where they use microarrays to compare expression patterns in diseased vs. non-diseased tissue. Microarrays allow for the simultaneous measurement of the expression levels of thousands of genes. Researchers use the data to find those genes that show the largest expression changes in the diseased vs. the normal state as a way to understand disease pathophysiology.

Now these experiments are all fine and good. They exploit new technologies to find out information that we would not have already known or which would be prohibitively expensive to find out by other means. However, as Wiley points out, so-called hypothesis-free research still has a hypothesis. It is just implicit rather than explicit.

For the example experiment I discussed above, the experimenters may not enumerate in advance those genes that they expect to show the largest changes in expression. However, the implicit hypothesis is that expression changes in sets of genes -- modules of proteins -- will correlate with differences between the diseased and normal state. In many ways, the nature of the experiment implies a certain set of hypotheses.

It is always good to make hypotheses explicit, however, and this is why I am uncomfortable with the idea of hypothesis-free research.

A failure to state clear hypotheses in my experience also indicates a failure to think deeply about the problem at hand. We have a derogatory term for such poorly thought out research in non-computational biology. It is called a fishing expedition. Exemplars are research grants submitted to NIH where the authors have not explicitly stated the endpoints or how their experiments intend to address them. If you can't assert an interpretation even if the results are definitively positive or negative, you need to design a different experiment.

This requirement to explicitly state hypotheses improves experimental design. Before I had to write my thesis proposal, my experiments had numerous flaws all of which came to light when I had to write them down and justify them.

Further, the prejudice against such studies is justified on the part of NIH reviewers. You are dealing with a limited resource -- funding -- and you want to give it to the best experiments with the best chance of success. Nothing is certain in science, but having clear hypotheses is a good indicator of likely success. Even better is designing experiments where regardless of whether you are right or not, you still learn something useful.

I love the new technologies being developed in biology, and I realize that they allow us to ask questions we couldn't ask before. But I can't but think that hypothesis-free research is going to create a whole bunch of data whose interpretation is ambiguous -- and hence largely useless. We will just have to go back to hypothesis driven research to apply our findings.

More like this

If you can't assert an interpretation even if the results are definitively positive or negative, you need to design a different experiment.

This is totally wrong. Sometimes important experiments are only informative if the outcome is definitely positive (or negative), with the opposite result being uninformative.

I think a lot of fishing expeditions are just an excuse to play with a new toy. It's more or less inevitable with microarrays, which produce far too much data anyway.

I think things will calm down a bit, once the true biologists realise that they're not learning anything by fishing. We saw it in population genetics, when molecular markers started appearing: everyone rushed to use them, and then did little other than calculate FSTs. In the mean time, let the little dears play with their toys. They'll work out what they can do with them eventually.

PhysioProf,

I think you are missing what I am saying. Yes, there are good experiments where a positive or a negative results is clear, and the other is ambiguous. But you sometimes see people designing experiments where both the absolute positive and the absolute negative leave the interpretation still in doubt.

Basically, I am saying a bad experiment is one where you can't win either way. Winning one way is fine. Winning both ways is great. But if you can't win either way, why are you playing?

Hey, please think this through a little more, before you get on an NIH study section and become part of the problem.

Exploratory research is important too: looking at something new with open eyes. That's the way to find entirely new problems to study. When you've characterized something interesting, then you develop hypotheses.

Fruitful, interesting science is an interplay between exploration and hypothesis testing. Too much exploration, and you're just stamp collecting. Too much hypothesis testing, and you're just turning the crank.

What you've written is a justification of a status quo in federally funded science that has gone too far in the direction of risk minimization, squashing out exploration -- where every experiment has to have an explicitly stated outcome, even before you've done it. That's not science. That's an industry of timelines and deliverables.

I firmly believe that "fishing expeditions" are extremely important. You often don't know what will become an interesting question until you've mucked around in an interesting system for a while. Sometimes you have to find a question before you can ask it.

"Hypothesis-free" is probably a misnomer. I'd call exploratory research "hypothesis-generating" or "hypothesis-secondary." That is, it's not done with a particular hypothesis in mind, but it's not based on wild guesses either. Sean has it right -- the best way to do this is "looking at problems with open eyes," not "looking at problems but forcing your eyes shut every time you're afraid there's a testable hypothesis in there somewhere." The putative weakness of the latter approach seems like a straw-man argument; I don't know of anyone who works that way.

Both sides have it wrong. The problem with "discovery-based" research is not that there is no hypothesis. The problem is that there is no theory. In these kinds of experiments, you're not working with a firm theoretical model and trying to find out how it works -- you're trying to find out arbitrary "correlations" which are unlinked to any real mechanistic explanation.

What you get in that case is a random slice through a high-dimensional problem, which will inevitably lead to correlations that are meaningless. That is a problem whether or not you have a firm "hypothesis".

Everyone does "hypothesis-free" experimentation -- you look at a system and explore it. At the end, you write up a hypothesis, and confirm with good controls, etc. Hypothesis is the clean-up stage, and fetishizing it does just as much damage as dismissing it. Both come out of an imbalance in the valuation of theoretical work and wet-bench work; the opposite to both is the string-theorists some of whom have been accused of becoming unlatched from the bench in physics.

C'mon Jake, we do discovery-driven or hypothesis-free research every time we sequence a genome, or interrogate a transcriptome, or a whatever ome is fashionable at the moment.

That's not to say that there aren't hypotheses that can be examined once you have the data in hand, just that you don't always need to have the hypotheses first.

Basically, I am saying a bad experiment is one where you can't win either way. Winning one way is fine. Winning both ways is great. But if you can't win either way, why are you playing?

Got it. Yes, I did misunderstand what you meant.

I agree with the others who have stated that there is a place for exploratory or discovery-based science, but I also agree that such approaches do need to be grounded in some conceptual theory.

Maybe it shouldn't be called research at all, but rather development of a new data set. I'm not that familiar with biology, but in many fields of study people have been creating data sets for decades without much of a clear, or at least high-priority, purpose. With new hypotheses or research priorities coming from new directions, those data sets suddenly become quite useful. Examples include the temperature data series from Lake Baikal or the CO2 content data from arctic ice. No one cared until global warming became an issue. In economics, people come up with a hypothesis and then try to find a data set that happens to fit.

In biology, the genome seems like a classic example. The project is to generate a huge data set and then let people come up with hypotheses to test against it later. While I agree that you could in some cases create a poor data set if it's not oriented towards certain hypotheses, in many cases you simply don't have the full set of hypotheses up-front. Generating the data is still useful.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Best. Modelling Paper. Ever.

August 17, 2009

The abstract says it all: Zombies are a popular figure in pop culture/entertainment and they are usually portrayed as being brought about through an outbreak or epidemic. Consequently, we model a zombie attack, using biological assumptions based on popular zombie movies. We introduce a basic model…

Journal Editor Speaks about His Experiences

August 10, 2009

(I had this whole post ready talking about flexible representations, but now my computer is borked -- stupid monitor! -- so this is going to have to do.) Tyler Cowen over at Marginal Revolution links to a piece by a former editor at American Economic Review telling all about how papers are accepted…

Obesity is not a myth

July 30, 2009

There is a great conversation going on at Megan McArdle's blog with Paul Campos, author of The Obesity Myth. I say great because it give me the opportunity to show how astonishingly wrong Campos in suggesting that the obesity at the lower end of the BMI spectrum -- not just morbid obesity -- is…

Imaging a Superior Mnemonist

July 15, 2009

In neuroscience, we spend most of our time trying to understand the function of the "normal" brain -- whatever that means -- hence, we are most interested in the average. Under most occasions when scientists take an interest in the abnormal neurology, it is usually someone with who has something…

Key paper in depression genetics disputed

June 24, 2009

I wanted to draw attention to a new paper in JAMA recently because it reveals a lot about how conditional most of the statements we make in behavioral genetics are. Every time you hear a news article that says, "Gene for depression found," I want you to think about this case. Risch et al.…