Is it crazy to consider community curation?

By sporte on June 24, 2008.

or is it just an idea that's ahead of the curve?

Last week, I was stunned to discover at least 31 papers in an NCBI Gene database entry that were in the entry for the wrong gene. I wrote about this here, here, here, and here.

Now, an oversight like this is a little understandable. The titles of the entries do include the name of the wrong gene (DRD2 - the dopamine D2 receptor). And it was only four years ago that people figured out that the marker in the title of the articles mapped somewhere else.

If computers were responsible for the annotation, well, this would be understandable. The annotation programs would most likely add the article titles without ever noticing that some of them contradict each other.

But unlike GenBank, the NCBI Gene database is supposed to be a curated database.

And there is a mechanism for the community to help out. People have published papers during the last four years that had the marker in the right place. Clearly, I wasn't the first person to find the problem - with the citation, if not with the genetic mapping itself.

To me, it seems like this situation is analogous to my days in the lab. We didn't always live up to this ideal, but there was an implicit ethical standard that said that if you broke a piece of lab equipment or found a piece of broken equipment, you had some responsibility either for fixing it or for seeing it get fixed.

I think it might be time to view our shared informational resources in a similar way. Certainly the authors who published later papers on the TaqI polymorphism knew where the marker mapped and knew that the NCBI Gene database had it wrong. Just like the people who act responsibily in the lab, these authors could have acted responsibly in the information world and submitted corrections to the GeneRIFs.

Maybe this whole idea is absurd. With thousands of databases, researchers can't be expected to keep track of every place that their data has gone. However, many databases pull and repackage subsets of information from the NCBI collection anyway, maybe there could be a few NCBI databases where everyone shares the responsibility and work in keeping them up-to-date.

It could happen.

What will it take?

More like this

It is definitely NOT crazy to consider community curation. We are actively doing that on ChemSpider every day using the crowds to assist in the validation of the data:

http://www.chemspider.com/docs/Whitney-Symposium-Lecture-June-2008.pdf

http://www.chemspider.com/blog/is-community-curation-of-chemspider-educ…

There is also now the request for a million minds to annotate proteins. http://conceptweblinker.wikiprofessional.org/default.py?url=nph-proxy.c…

I think it's a fine idea...but not everyone agrees.

A pilot project to mobilize the fission yeast community to curate their own papers has been very successful. The community recognize the advantages of broadly and accurately curated genome data and have eagerly participated. Currently the major obstacle to rolling out this initiative to all published papers is a user friendly interface to make the curation process as easy for the submitters, and the personnel to handle the submissions (at present curator input is still required to refine user submissions to fit the data models).
http://www.sanger.ac.uk/Projects/S_pombe/community_curation.shtml

Thanks Val! I just attended a conference in June on this subject. It's good to hear about the at Sanger, too!

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…