The Implementation of Molecular Evolution for the Masses

A couple of years ago, there was talk in the bioblogosphere about getting the general public interested in bioinformatics and molecular evolution:

The idea was inspired by the findings of armchair astronomers -- people who have no professional training, but make contributions to astronomy via their stargazing hobbies. With so much data available in publicly accessible databases, there's no reason we can't motivate armchair biologists to start mining for interesting results.

But how do we train these new comp-bio code-monkeys? The field of bioinformatics requires both some computational skills, as well as an understanding of biology. Finding people with both skill sets (and interests) can be tricky. Well, a framework has been laid out in a recent paper in PLoS Biology for teaching the skills (doi:10.1371/journal.pbio.0060296). The authors present a web-based interface through which students implement standard online tools for DNA sequence analysis (Annotathon).

The course described in the paper takes advantage of the vast amount of data deposited in sequence repositories from metagenomic projects (specifically, the Global Ocean Sampling sequences). Starting with these data, the students perform simple molecular evolutionary analysis, including gene prediction, alignment, and phylogenetic construction. Here's how the authors summarize their course:

The goal of the course is to teach students how to computationally annotate biological sequences (DNA and protein sequences). The starting point is a short stretch of DNA sequence (such as a single metagenomic sequencing read) that students are asked to study according to two major lines of inquiry: (1) prediction of gene product putative function and (2) prediction of taxonomic group of origin.

The question remains: how can we translate these courses offered at universities to the general public? Can we inspire armchair computational biologists to analyze data outside of the classroom?


Hingamp P, Brochier C, Talla E, Gautheret D, Thieffry D, et al. (2008) Metagenome Annotation Using a Distributed Grid of Undergraduate Students. PLoS Biol 6(11): e296 doi:10.1371/journal.pbio.0060296

More like this

No more delays! BLAST away! Time to blast. Let's see what it means for sequences to be similar.  First, we'll plan our experiment.  When I think about digital biology experiments, I organize the steps in the following way: 
Shotgun sequencing refers to the process whereby a genome is sequenced and assembled with no prior information regarding the genomic location of any of the DNA we sequence. There are quite a few steps that you have to go through before you have an assembled genome sequence.
A few weeks back, we published a review about the development and role of the human reference genome. A key point of the reference genome is that it is not a single sequence.
What tells us that this new form of H1N1 is swine flu and not regular old human flu or avian flu? If we had a lab, we might use antibodies, but when you're a digital biologist, you use a computer.

Very interesting. There is a long tradition of this sort of thing in natural history and there are groups out there who promote 'citizen science'. Christmas Bird Counts and Adopt a Pond (a Toronto Zoo + other partners program to monitor wetlands) are some contemporary examples. Perhaps starting with general citizen science groups and then promoting their discoveries would help it catch on. I think once people realize that amateurs can make contributions to a field that otherwise seams impenetrably complex and technologically inaccessible then those who are interested would recruit themselves.

Thanks;A couple of years ago, there was talk in the bioblogosphere about getting the general public interested in bioinformatics and molecular evolution