Is it crazy to consider community curation?

or is it just an idea that's ahead of the curve?

Last week, I was stunned to discover at least 31 papers in an NCBI Gene database entry that were in the entry for the wrong gene. I wrote about this here, here, here, and here.

Now, an oversight like this is a little understandable. The titles of the entries do include the name of the wrong gene (DRD2 - the dopamine D2 receptor). And it was only four years ago that people figured out that the marker in the title of the articles mapped somewhere else.

If computers were responsible for the annotation, well, this would be understandable. The annotation programs would most likely add the article titles without ever noticing that some of them contradict each other.

But unlike GenBank, the NCBI Gene database is supposed to be a curated database.

And there is a mechanism for the community to help out. People have published papers during the last four years that had the marker in the right place. Clearly, I wasn't the first person to find the problem - with the citation, if not with the genetic mapping itself.

To me, it seems like this situation is analogous to my days in the lab. We didn't always live up to this ideal, but there was an implicit ethical standard that said that if you broke a piece of lab equipment or found a piece of broken equipment, you had some responsibility either for fixing it or for seeing it get fixed.

I think it might be time to view our shared informational resources in a similar way. Certainly the authors who published later papers on the TaqI polymorphism knew where the marker mapped and knew that the NCBI Gene database had it wrong. Just like the people who act responsibily in the lab, these authors could have acted responsibly in the information world and submitted corrections to the GeneRIFs.

Maybe this whole idea is absurd. With thousands of databases, researchers can't be expected to keep track of every place that their data has gone. However, many databases pull and repackage subsets of information from the NCBI collection anyway, maybe there could be a few NCBI databases where everyone shares the responsibility and work in keeping them up-to-date.

It could happen.

What will it take?

More like this

In a recent post, I wrote about an article that I read in Science magazine on the genetics of learning. One of things about the article that surprised me quite a bit was a mistake the authors made in placing the polymorphism in the wrong gene. I wrote about that yesterday. The other thing that…
It's pretty common these days to pick up an issue of Science or Nature and see people ranting about GenBank (1). Many of the rants are triggered, at least in part, by a wide-spread misunderstanding of what GenBank is and how it works. Perhaps this can be solved through education, but I don't…
Right or wrong, the word "dopamine" always conjures up images in my head of rats pushing levers over and over again, working desperately hard to send shots of dopamine into their tiny little rodent brains. Dopamine, like many other neurotransmitters (chemicals that send signals in the brain), works…
A few weeks ago, I wrote about a paper in Science(1) that I read on a connection between a mutation in the dopamine D2 receptor and the genetics of learning. Only, it turned out that when I looked at the gene map... the mutation mapped in a completely different gene. I presented the data here and…

It is definitely NOT crazy to consider community curation. We are actively doing that on ChemSpider every day using the crowds to assist in the validation of the data:

http://www.chemspider.com/docs/Whitney-Symposium-Lecture-June-2008.pdf

http://www.chemspider.com/blog/is-community-curation-of-chemspider-educ…

There is also now the request for a million minds to annotate proteins. http://conceptweblinker.wikiprofessional.org/default.py?url=nph-proxy.c…

I think it's a fine idea...but not everyone agrees.

A pilot project to mobilize the fission yeast community to curate their own papers has been very successful. The community recognize the advantages of broadly and accurately curated genome data and have eagerly participated. Currently the major obstacle to rolling out this initiative to all published papers is a user friendly interface to make the curation process as easy for the submitters, and the personnel to handle the submissions (at present curator input is still required to refine user submissions to fit the data models).
http://www.sanger.ac.uk/Projects/S_pombe/community_curation.shtml

Thanks Val! I just attended a conference in June on this subject. It's good to hear about the at Sanger, too!