BLASTing through the kingdom of life

No biology course is complete these days without learning how to do a BLAST search.

Herein, I describe an assignment and an animated tutorial that teachers can readily adopt and use, and give teachers a hint for obtaining the password-protected answer key.

Development of the tutorial and the activity were supported by funding from the National Science Foundation.

This is reposted from the the original DigitalBio blog.

This popular activity, designed to accompany the BLAST for beginners tutorial, has been updated to incorporate student comments and teacher requests. Originally developed for the BIO 99 teacher workshop, this activity has been one of the most popular items on Geospiza's web site. We have seen the activity used in several venues from high school courses to workshops for researchers offered by the Lawrence Livermore National Laboratory.

Students BLAST through the kingdom of life by using blastn to identify 16 "unknown" sequences. The 16 sequences were chosen to represent diverse organisms ranging from RNA viruses that infect yeast, to humans. This set was compiled from a mixture of cDNA sequences or intron-less sequences from bacteria or viruses to minimize confusion. Further, every sequence in this set codes for some kind of protein that might be recognizable to students, such as amylase (an enzyme found in spit that breaks down starch) or DNA polymerase (makes DNA). This version of the activity, updated last summer, includes an example sequence along with the answers.

There are three pieces of information needed for this activity. These are:

1. The taxonomically diverse sequences, located in the Data Set section.

2. A worksheet and answer key, located in the Worksheet section.

3. The BLAST for beginners animated tutorial, located at the top of the tutorial section

All of these sections are part of Geospiza's Bioinformatics Teaching Materials.

Unlike "canned" activities, it should be noted that students use real sequences and real databases. Since new information is continually added to the databases, the exact information that is obtained from a database search, changes over time, even though the sequence itself and the original source of the sequence, do not.

On one hand, this can be disconcerting when it's unexpected. On the other hand, knowing that these are living and changing resources is exciting. Students know when they use these resources and programs that they're not using old or simplified techniques that are only employed in a classroom setting.

An unfortunate consequence is that grading gets a bit more challenging. The continual addition of information to the NCBI databases, used in this activity, means that some information that's unknown today might be known tomorrow. The majority of the answers in our key will not change - but new information might be added. Our current plan is to update the answers on a yearly basis, or when we're alerted to problems.

The answer key is password-protected to limit access by students. If you wish to get the password, send an e-mail to digitalbio at with your name, position, and the name of your school.

technorati tags: , , ,

More like this

I was wondering if you can give me suggestions on what simple exercise (re:bioinfomatics) I can give to my Genetics students? I will teach the genetics lab on Spring 2009. Our department is not using a particular lab manual, so I can easily incorporate additional exercise for my students. Thanks!

By Veronica Allen (not verified) on 30 Oct 2008 #permalink

Hi Veronica,

As an introduction, I like this activity (BLASTing through the kingdom of life) or Head, Shoulders, Knees, and Toes. I have a worksheet that goes with these activities at my web site ( I also have data sets and an animated tutorial that shows how to use BLAST at the NCBI.

The reason I like BLAST is that it's a very commonly used program for comparing sequences. I also like to use the nucleotide blast because it's easier for students to understand the scoring methods. Basically, 2 points are assigned for every matching base, and then an E value is calculated from that, the length of the sequence, and the size of the database.

Anyway, it's kind of cool because you find lots of interesting things.