Some comments on data and data reproduction.

Part of the problem with Science is the verification process. From the outside looking in, you may guess that there is a quick and easy solution ... data should be reproduced by others. In the end scientists should be concerned with the facts.

Alejandro Rivero comments on my entry on being scooped:

What do you mean by "Being scooped"? If the paper that comes out coincides with your research, that is good, shouldn't it be?. If it proofs that your research line is a failure, then really your work has been useless.

In response to my entry on Nature's new and experimental Peer review system,Polly Anna asks (rhetorically):

If one really believes in the scientific approach, why not let the global peer review system of reproducibility, utility, self-correction and evaluation of investigator reputation go to work after publication, not before?

My answer to both of these comments: Time and money. You see researchers want to find out the truth, but to keep their jobs, scientists also need to make original discoveries. And the pressure is great. I know of one lab whose grants was not immediately renewed despite the fact a Science paper, a Cell paper and a Current Biology paper were published using data produced from that grant. So yes, we can spend time to check other's facts, but we are likely to sink if we do so. And if our facts are first published by others, then there is no way that we will be able to obtain funds for future projects. I've tried to publish papers that try to resolve conflicts in a field only to be rebuked with advocates from one camp proclaiming "these finding has already been shown" and "this paper does not advance the field sufficiently to warrant publication". It set me back many months. Is this good for my career? And I was not checking data, but testing two opposing hypotheses in mammalian cells (previously most work was done in frog oocytes).

There is a nice comment on this topic in the latest Nature Cell Biology:

When publication pressures were more manageable, and before the old adage 'publish or perish' emerged as a primary driving force in the molecular biosciences, reproducibility was still a key step in the scientific process. Scientists at the beginning of their academic career often learned their craft by reproducing published data -- a bit like teaching art by copying the great masters.

These days, scientists request reagents from each other more than ever; however, the primary aim is not to reproduce, but to move to the next step. It is exceedingly difficult to convince a postdoc to spend months reproducing a complex set of experiments when the outcome is either an unpublishable confirmation, or a lack of confirmation, which would require much more work to ensure that the case made is watertight and often result in the publication of an abbreviated refutation (see editorials, May and November 2005). The PI will worry about the significant drain on resources that a rock solid refutation requires, and the drain on morale that may result from a protracted fight for acceptance of negative data by the original author and the broader community.

Consequently, competitive labs are not often motivated to reproduce data; more importantly, it is not something they are encouraged to do.

I agree, the biomedical science field is extremely competitive. There is too much pressure to move on and make a new discovery. Consequently data is not often reproduced directly. But I'm not worried. Often data is reproduced indirectly. And even more often many are working on the same project

So the answer? Well curiously the method of publishing could adapt, again from the NCB comment:

One way to address this would be allocate a percentage of the time of each lab and researcher solely for independent data confirmation. Granting agencies should take these endeavours seriously and give credit for documented evidence of data reproduction. Initiating an online repository for this data would also be worthwhile and the confirmatory nature of the data may allow for curation without full-blown peer review.

Online repository of data. (I guess this is what Polly Anna was suggesting).

Is this the future of scientific publishing? Many have suggested scrapping the old journal process in favor of an online repository. This would shortcut the whole journal issue and the problems of peer review. But there are two problems with this.

1) How to transition from one system to the next. Just like evolution, human endevours can't stop in mid stride and perform a 180 degree turn. Publishing data is the most important part of maintaining your viability as a scientist. To ask people to quickly abandon the old system in place of a new one will never happen ... UNLESS those that provide research money (the NIH, NSF) deem that data published in online repositories "count".
2) Journals are a hierarchical system. Things are published in "high" journals because they are (on the average) more important than findings in "lower journals. Sure there are problems (yes lots of problems) with the current review system, but important findings do get published generally in better journals. Self-sorting plays a big part in this. People send their best work to the big journals. Sometimes it's published there, other times in slightly "lower" journals. If every experiment and thought is published online, we'll be drowning in crap. We could not browse through important work easily. Here's a comment from Mark Paris:

My world, a contractor-government-program world, has lots of technical and scientific publication, mainly in the form of reports and briefings, essentially none of which is peer reviewed. As a result, there is a tremendous amount of garbage that is presented both to the community and to the government sponsors,and there is no good way for someone not techically competent in the area to tell the difference. A great deal is self-serving, but some comes simply from lack of technical oversight by competent scientists (I was about to say pure stupidity, but that would be ungenerous.)

Many might say that in Academia, we are already drowning. So then how to search this online repository? Pubmed and search engines help, but only if you are looking for something in particular. But flipping through Science, Nature, Cell, JCB etc. can lead you to read interesting papers that although may not be directly relevant to your research may provide key insights to further your own research. Good scientists will always flip through the major journals. It's essential to remain aware of what is going on in your field. One solution is to have the papers "rated" by readers, but then whose vote counts? Some resources may have opinions on what is important - and indeed we have such publications, literature reviews. Other sources are websites such as Faculty of 1000. But these are just introductions to fields. If you are really studying a particular process, you need to read the original data as it gets published.

**************************

In thinking about this whole topic there are two issues.
What is good for science.
What is fair for scientists. (who gets credit, who gets research funding)

I think that this is the way need to think about publishing. These two issues are not necesseraly in agreement. Right now we need to discuss peer review, reproduction of data, journlas, online access etc with these two points in mind. I know I'm copping out by leaving the discussion there. I'll try to write another entry on this soon. But tell me what you think ...

I'll let Polly Anna have the last word:

In the end, if you publish poop, you will be judged as a pooper, and that is the real value of peer review.

Tags

More like this

As UndergradChemist and I pointed out in comments to your previous post, Online repository of preprints for physicists exists in the form of arXiv.org. So, I guess we can learn from how it is working for them. Although I can't claim that I'm sufficiently familiar with how it works, I think it's common for a paper to first appear in arXiv and later to be published in a journal (Phys. Rev. Lett., etc.). Why can't that work for biology? Journals can still have a role of filtering worthy papers. Authors will still be motivated to publish in "high" journals, where their papers will get more recognition.

Hi HI (I just had to write that),

You and others have pointed out arXiv.org in several posts. I've read some of the online lit on it and I was going to write something about it at some point - in someways Harold Varmus wanted to create something similar to this early on. He had a hard time selling his ideas to the biomedical science community. Later his ideas changed to add an element of peer review within the system, and that is how PLoS was born. See this entry on the recent article in WIRED on Varmus.

One question I do have is how do the major journals treat manuscripts that come out first in arXiv.org? (Maxine if you're reading this ....) I know that Nature and Science have embargoes on even discussing publications before they are out in print. Also if journals are against open access, how would they see arXiv.org?

Also the biomedical community is a lot larger than the physical science community, and (I think) a lot rowdier. Competition is very tough. To attempt risky projects is allowable, but to publish the results from those projects in a risky manner is another thing. And hard to justify on an individual basis. You may say, professor X is well established, what does he have to lose? But the postdoc or gradstudent who did the work has a career at stake.

I am aware that Harold Varmus wanted to create something similar to arXiv. But while I applaud the creation of PLoS, I think it is more similar in nature to more conventional journals than something like arXiv, except that it is freely accessible online; it is a new kind of journal, but it is another peer-reviewed journal nonetheless.

I found a list of physics papers most cited in 2005. You can see that many of them are available from arXiv, even though they are all published in various journals. (There is a paper titled "HI in the galaxy" ranked #33. :-))

There is definitely a cultural difference between the physics community and the biomedical community. (I am suspecting that the "medical" part is particularly problematic.) In life sciences, there is much more emphasis on publishing in "high" journals (Nature, Science, and Cell). As a result, they have too much power. But I think if there is enough incentive to claim priority, people may choose to make their manuscripts openly accessible rather than risking the possibility of their papers rejected or their ideas stolen by the reviewers/competitors. And if that becomes a trend, Nature and Science may have to recondier their policies.

The essential issue is how to evaluate a scientist (for funding, hiring, etc.). Not all important papers are published in Nature, Science and Cell. The papers that led to a Lasker Award to McCulloch and Till were published in Rad. Res. Major papers of another Lasker winner, Yoshio Masui, were published in J. exp. Zool.

The essential issue is how to evaluate a scientist (for funding, hiring, etc.). Not all important papers are published in Nature, Science and Cell. The papers that led to a Lasker Award to McCulloch and Till were published in Rad. Res. Major papers of another Lasker winner, Yoshio Masui, were published in J. exp. Zool.

I agree. In part it's laziness in Biology, in part it's because the life sciences encompasses many more disciplines. So in many cases if it's not your field of expertise it's very hard to judge the importance of a certain individual's body of work.

I also must say that it has little to do with the "medicine" part - for example what we do here in our lab is truly basic research and we have little to do with any direct clinical applications.

Putting a preprint online does not violate Nature's embargo policy (nor Science's I think). My understanding is that you run into problems only if the popular press picks up on the preprint. I have seen that happen before but it's rare. It was probably done through the university's PR department with the researchers' knowledge, but I can't be sure. In any case, I think you can still put a preprint-looking version of a paper online even after it's been published for archival/access purposes.