Is it crazy to consider community curation?

or is it just an idea that's ahead of the curve?

Last week, I was stunned to discover at least 31 papers in an NCBI Gene database entry that were in the entry for the wrong gene. I wrote about this here, here, here, and here.

Now, an oversight like this is a little understandable. The titles of the entries do include the name of the wrong gene (DRD2 - the dopamine D2 receptor). And it was only four years ago that people figured out that the marker in the title of the articles mapped somewhere else.

If computers were responsible for the annotation, well, this would be understandable. The annotation programs would most likely add the article titles without ever noticing that some of them contradict each other.

But unlike GenBank, the NCBI Gene database is supposed to be a curated database.

And there is a mechanism for the community to help out. People have published papers during the last four years that had the marker in the right place. Clearly, I wasn't the first person to find the problem - with the citation, if not with the genetic mapping itself.

To me, it seems like this situation is analogous to my days in the lab. We didn't always live up to this ideal, but there was an implicit ethical standard that said that if you broke a piece of lab equipment or found a piece of broken equipment, you had some responsibility either for fixing it or for seeing it get fixed.

I think it might be time to view our shared informational resources in a similar way. Certainly the authors who published later papers on the TaqI polymorphism knew where the marker mapped and knew that the NCBI Gene database had it wrong. Just like the people who act responsibily in the lab, these authors could have acted responsibly in the information world and submitted corrections to the GeneRIFs.

Maybe this whole idea is absurd. With thousands of databases, researchers can't be expected to keep track of every place that their data has gone. However, many databases pull and repackage subsets of information from the NCBI collection anyway, maybe there could be a few NCBI databases where everyone shares the responsibility and work in keeping them up-to-date.

It could happen.

What will it take?

More like this

It is definitely NOT crazy to consider community curation. We are actively doing that on ChemSpider every day using the crowds to assist in the validation of the data:

http://www.chemspider.com/docs/Whitney-Symposium-Lecture-June-2008.pdf

http://www.chemspider.com/blog/is-community-curation-of-chemspider-educ…

There is also now the request for a million minds to annotate proteins. http://conceptweblinker.wikiprofessional.org/default.py?url=nph-proxy.c…

I think it's a fine idea...but not everyone agrees.

A pilot project to mobilize the fission yeast community to curate their own papers has been very successful. The community recognize the advantages of broadly and accurately curated genome data and have eagerly participated. Currently the major obstacle to rolling out this initiative to all published papers is a user friendly interface to make the curation process as easy for the submitters, and the personnel to handle the submissions (at present curator input is still required to refine user submissions to fit the data models).
http://www.sanger.ac.uk/Projects/S_pombe/community_curation.shtml

Thanks Val! I just attended a conference in June on this subject. It's good to hear about the at Sanger, too!