My thoughts on biology, teaching, life, and exploring the living world via the digital one. Only my opinions are represented by these postings, they do not represent the viewpoints of any funding agency or Geospiza, Inc.
I am a microbiologist and molecular biologist turned tenured biotech faculty turned bioinformatics scientist turned entrepreneur. My passion is developing instructional materials for 21st century biology (Geospiza Education).
Workforce shortages are a growing problem in the biotech industry. Communities are concerned that a lack of trained workers will either keep companies away or cause companies to move. If companies do have to move, it's likely those jobs might be lost forever, never to return. According to Robert Reich, former U.S. secretary of labor, now a professor at UC-Berkeley, biotech companies that can't hire in the U.S. will recruit foreign workers or open research centers overseas (Luke Timmerman, Seattle PI).
Dave Robinson and Joann Lau from Bellarmine College in Kentucky are going to be describing their student project in a free webinar next Friday, May 16th. Their students clone GAPDH (Glyceraldehyde 3-Phosphate Dehydrogenase) genes from new plants, assemble the DNA sequences, and submit them to the NCBI. Here's an example.
Plus, since GAPDH is a highly conserved, it's a great model for looking at evolution.
Community colleges are such extraordinary places that even California's governor, Arnold Schwarzenegger credits his time at Santa Monica community college as one of the secrets to his success.
"People always ask me 'What is the secret of your success?' " he said Tuesday. "I always say, 'Come to America. Go to community college. And marry a Kennedy. It's all very simple.' "
Last week the Senate passed the Genetic Nondiscrimination Act (GINA). This week it was passed by the House. It only needs one signature and GINA will become law.
For years, those of us who teach genetics have had to caution students about genetic testing. The biggest reason was the fear that having a genetic test would cause them to lose their health insurance.
Yesterday morning I was sitting at conference table, downing coffee to keep my eyes open, when I heard someone say that it's springtime now and the snakes are waking up. Well, those kinds of statements at the breakfast table do have a way of getting my attention.
APRIL was so much fun, that I thought I should find a molecule for May. I searched both the Gene database, the structure database, everywhere, without any luck.
Finally, I decided to change the search and use the date instead of the name of the month. And here we have it, straight from PubChem. A molecule for May. 05012008 is the compound substance ID.
A potential link between lung cancer and human papilloma virus may make parents even more glad about vaccinating their children with Gardasil®. Not only are the children protected against viruses that commonly cause cervical cancer, they may be protected against some forms of lung cancer as well.
The April 25th version of Nature News reports (1) that two viruses, HPV (Human papilloma virus) and measles virus, have been found in lung tumors.
Over 2600 genetic diseases have been found where a change in a single gene is linked to the disease. One of the questions we might ask is how those mutations change the shape and possibly the function of a protein?
If the structures of the mutant and wild type (normal) proteins have been solved, NCBI has a program called VAST that can be used to align those structures. I have an example here where you can see how a single amino acid change makes influenza resistant to Tamiflu®.
This 4 minute movie below shows how we can obtain those aligned structures from VAST and view them with Cn3D.
Bill Gates, Eric Lander, Maynard Olson, Leena Peltonen, and George Church fielded questions last night at a fascinating panel discussion on personal genomics at the University of Washington.
We were fortunate to be in the audience. I'll share some of the questions and answers, in some cases shortened and paraphrased.
One of my favorite web 2.0 technologies is the webinar. When you work at a company and not a University, with constant seminars, it gets a bit harder to hop on a bus and travel across town to learn about new things. Webinars are a good way to fill that gap. I grab my coffee cup, put on my headphones, and I get to listen to someone tell me about their work for an hour and show slides over the web. It's nice.
Our company is even going to be involved in two webinars in the next two months. One of us is giving an Illumina webinar tomorrow on managing Next Generation Sequencing data. A description of the webinar and digital gene expression workflows is here.
Next month, yours truly will be assisting (if needed) in a science education webinar on cloning novel plant genes and using bioinformatics to sort out good and bad data and figure out what you've cloned. You can register for that one here.
In the class that I'm teaching, we found that several PCR products, amplified from the 16S ribosomal RNA genes from bacterial isolates, contain a mixed base in one or more positions.
We picked samples where the mixed bases were located in high quality regions of the sequence (Q >40), and determined that the mixed bases mostly likely come from different ribosomal RNA genes. Many species of bacteria have multiple copies of 16S ribosomal RNA genes and the copies can differ from each other within a single genome and between genomes.
Now, in one of our last projects we are determining where the polymorphic bases map within the structure of the 30S ribosomal subunit (see a video for background information on ribosomes here).
This video shows how we align the sequences and find the polymorphic sites in the three dimensional structure.
Ribosomes are molecular machines that build new proteins. This process of synthesizing a protein is also known as translation.
Many antibiotics prevent translation by binding to ribosomal RNA. In the class that I'm teaching, we're going to be looking at ribosome structures to see if the polymorphisms that we find in the sequences of 16S ribosomal RNA are related antibiotic resistance.
This is related to our metagenomics project where we investigate the polymorphisms we find in 16S ribosomal RNAs.
I know some of you enjoy looking at data and seeing if you can figure out what's going on.
For this Friday's puzzler, I'm going to send you to FinchTalk, our company blog, to take a look at lots of data from a resequencing experiment that was done to look for SNPs and count alleles. The graph is at the end of the post.
The graph shows data from 4608 reads (sequenced from both strands, forward and reverse). And there are some interesting patterns. Can you figure them out?
I love using molecular structures as teaching tools. They're beautiful, they're easy to obtain, and working with them is fun.
But working with molecular structures as an educators can present some challenges. The biggest problem is that many of the articles describing the structures are not accessible, particularly those published by the ACS (American Chemical Society). I'm hoping that the new NIH Open Access policy will include legacy publications and increase access to lots of publications about structures.
This morning I had a banana genome, an orange genome, two chicken genomes (haploid, of course), and some fried pig genome, on the side. Later today, I will consume genomes from different kinds of green plants and perhaps even a cow or fish genome. I probably drank a bit of coffee DNA too, but didn't consume a complete coffee genome since my grinder isn't that powerful and much of the DNA would be trapped inside the ground up beans.
Of course, microbes have genomes, too. But I do my best to cook those first.
So, what is a genome? Is it a chromosome? Is it one of those DNA fragments or sequences that people are always writing about?
Both my students and I have been challenged this semester by the diversity of computer platforms, software versions, and unexpected bugs. Naturally, I turned to the world and my readers for help and suggestions. Some readers have suggested we could solve everything by using Linux. Others have convincingly demonstrated that Open Office is a reasonable alternative.
But, now there's something new and cool on the web.
Our new Scibling, Jane, is a real life computer scientist. If you've ever wondered what computer scientists really do during the day, Jane will set you straight (I guess they're not playing Nintendo. Darn! Another illusion shattered, just like that.)
If you're old enough or you've taken microbiology, there's a chance that sometime in your life you heard of Legionaire's disease.
This disease was caused a bacteria that inhabited the air conditioners in a hotel where several veterans held a conference. Naturally, it was the microbiologists who collected samples of the bacteria and figured out what was going on.
Now, there's something else going on and I'm thankful to Mike for letting us know.
I made this video (below the fold) to illustrate the steps involved in making a phylogenetic tree. The basic steps are to:
Build a data set
Align the sequences
Make a tree
In the class that I'm teaching, we're making these trees in order to compare sequences from our metagenomics experiment with the multiple copies of 16S ribosomal RNA (rRNA) genes that we can find in single bacterial genomes. Bacteria contain between 2 to 13 copies of 16S rRNA genes and we're interested in knowing how much they differ from each other. Later, we'll compare the 16S ribosomal RNA genes from multiple species of bacteria to see how much these genes differ between a variety of bacteria.
Believe it or not, there is the remote possibility that I may get to have some influence in getting a web application built, that I can use in teaching, that will do something that I want.
Unfortunately, I know very little about the relative merits of AJAX/JavaScript vs. Flash vs. a custom C++ plug-in, that does something with WX Windows or QT.
Have you ever wondered how to view and annotate molecular structures? At least digital versions?
It's surprisingly easy and lots of fun.
Here's a movie I made that demonstrates how you can use Cn3D, a free structure-viewing program from the NCBI. Luckily, Cn3D behaves almost the same way on both Windows and Mac OS X.
I'll be there, doing some kind of bioinformatics workshop. I'll probably be talking about either metagenomics or comparing protein structures and drug resistance, but if you have topic requests, feel free to submit them in the comments.
One of my colleagues has a two part series on FinchTalk (starting today) that discusses uncertainty in measurement and what that uncertainty means for the present and Next Generation DNA sequencing technologies.
I've been running into this uncertainty myself lately.
Conflicts between predators like cougars and coyotes and human companions like pets and small children are becoming more common as people move into areas that used to be wildlife habitat.
The Seattle Times has a great story this morning about biologists in Washington who are studying cougars to learn if cougars and people can coexist. The biologists think most of the trouble might be caused by teenage male cougars who move in to the territory when the older, smarter males get killed.
It's a good thing my 13 yr old doesn't read my blog.
Why? Because I'm on to her. Being a biologist, well, acronyms are my life. And, for a long time, I've been able to interpret some of the lingo that she uses on AIM. Lately, we've certainly been having our little talks about cell phone bills for texting and the things she can around the house to earn the right to keep her phone.
I've been writing quite a bit this week about my search for a cross platform spread sheet program that would support pivot tables and make pie graphs correctly.
This all started because of a bug that my students encountered in Microsoft Excel, on Windows. I'm not personally motivated to look for something new, since Office 2004 on Mac OS 10.5 doesn't seem to have the same bugs that appear on Windows. However, I would like things to work for my students. Since I don't want to have to write instructions for every software system on the planet, Google Docs would be my ideal answer, if it supported pivot tables, since it runs over the web and presents a consistent interface on all systems.
Anyway, I heard from many enthusiastic Open Office fans and I asked if they could show me if OO has the features I want (pivot tables and pie graphs).
I think all of us; me, the students the OO advocates, a thoughtful group of commenters, some instructors; I think many of us learned some things that we didn't anticipate the other day and got some interesting glimpses into the ways that other people view and interact with their computers.
Some of the people who participated in the challenge found out that it was harder than they expected.
I've held off on blogging about Next Generation Sequencing here, but now that one of my colleagues has started blogging about it, it seems like a good time to write a little about FinchTalk, our company blog.
We've decided that we can serve an educational role for people who are interested in Next Generation DNA Sequencing.
Earlier this week, I wrote about my challenges with a bug in Microsoft Excel that only appears on Windows computers. Since I use a Mac, I didn't know about the bug when I wrote the assignment and I only found out about it after all but one of my students turned in assignment results with nonsensical pie graphs.
So, I asked what other instructors do with software that behaves differently on different computing platforms. I never did hear from any other instructors, but I did hear from lots of Linux fans. And, lots of other people kindly informed me that I could use OpenOffice and that it runs and behaves exactly the same on Macs, Linux, Windows XP, and VISTA.
One commenter, Chris Miller even went the extra mile and made a very nice screencast demonstrating how to use OpenOffice to count unique data. The screencast didn't show the things that I need students to do, but it did give me an idea.
I read about this in Bio-IT World and had to go check it out. It's called the Genome Projector and it has to be the coolest genome browser I've ever seen.
They have 320 bacterial genomes to play with. Naturally, I chose our friend E. coli. The little red pins in the picture below mark the positions of ribosomal RNA genes (It's not perfect, at least one of these genes is a ribosomal RNA methyltransferase and not a 16S ribosomal RNA.)
I'm not entirely happy about finding it now, after I've already written and posted all the assignments for my class, but still, I'll post a link for my students since it's just so cool and it's kind of a neat way to see all the ribosomal RNA genes in one picture.
Your canopy is disappearing, you're likely to freeze.
NASA's Earth Observatory reports that over 1,110 acres of forest were illegally logged, during the past four years, in the Monarch Butterfly Biosphere Reserve in central Mexico.
Monarch butterflies travel here from all over the United States and Canada. Images from the Ikonos satellite tell us though, that future migrating butterflies are likely have problems in this reserve. The top image is from 2004, the bottom image shows what things are like now.
The other day, I wrote that I wanted to make things easier for my students by using the kinds of software that they were likely to have on their computers and the kinds that they are likely to see in the business and biotech world when they graduate from college.
More than one person told me that I should have my students install an entirely different operating system and download OpenOffice to do something that looks a whole lot harder in Open Office than it is in Microsoft Excel.
I guess they missed the part where I said that I wanted to make the students work a little easier.
The NASA Earth Observing System is an incredible resource for both science and education. One of the amazing things about it is all the different kinds and quantities of data are assembled together into pictures that even grade school kids can immediately comprehend.
How do they do it?
Each of the EOS satellites delivers a terabyte or more of data per day from many different instruments.
How do they take satellite imagery, rainfall statistics, temperature information, and other kinds of data and assemble these data into meaningful pictures?
Three (or more) operating systems times three (or more) versions of software with bugs unique to one or systems (that I don't have) means too many systems for me to manage teaching.
Thank the FSM they're not using Linux, too. (Let me see that would be Ubuntu Linux, RedHat Linux, Debian Linux, Yellow Dog Linux, Vine, Turbo, Slackware, etc.. It quickly gets to be too exponential.) Nope, sorry, three versions of Microsoft Office on three different operating systems are bad enough.
This semester, I'm teaching an on-line for the first time ever. The subject isn't new to me. I've taught bioinformatics off and on in different venues for almost ten years. It is strange for me though to communicate entirely through e-mail, an occasional video, and through writing. I miss seeing the students face to face and without the ability to watch what they're doing when they use computers, I'm slower to diagnose problems and figure out how to help.
I've also found a new unanticipated challenge and I wonder how other people deal with this. This challenge is dealing with all the varieties of computing platforms, versions of different software, and bugs that seem to pop in the versions that I don't have.
Do different kinds of biomes (forest vs. creek) support different kinds of bacteria?
Or do we find the same amounts of each genus wherever we look?
Those are the questions that we'll answer in this last video. We're going to use pivot tables and count all the genera that live in each biome. Then, we'll make pie graphs so that we can have a visual picture of which bacteria live in each environment.
This is third video in our series on analyzing the DNA sequences that came from bacteria on the JHU campus.
In this video, we use a pivot table to count all the different types of bacteria that students found in 2004 and we make a pie graph to visualize the different numbers of each genus.
What do you do after you've used DNA sequencing to identify the bacteria, viruses, or other organisms in the environment?
What's the next step?
This four part video series covers those next steps. In this part, we learn that a surprisingly large portion of bioinformatics, or any type of informatics is concerned with fixing data entry errors and spelling mistakes.
For the past few years, I've been collaborating with a friend, Dr. Rebecca Pearlman, who teaches introductory biology at the Johns Hopkins University. Her students isolate bacteria from different environments on campus, use PCR to amplify the 16S ribosomal RNA genes, send the samples to the JHU core lab for sequencing, and use blastn to identify what they found.
Every year, I collect the data from her students' experiments. Then, in the bioinformatics classes I teach, we work with the chromatograms and other data to see what we can find.
This is the first part of a four part video series on using pivot tables to analyze the data.
Bora had an enjoyable post yesterday on obsolete lab skills. I can empathize because I have a pretty good collection of obsolete lab skills myself. These days I'm rarely (okay, never) called upon to do rocket immunoelectrophoresis, take blood from a rat's tail, culture tumor cells in the anterior eye chamber of a frog, locate obscure parasites in solutions of liquid nitrogen, or inoculate Kalanchoe leaves with pathogenic bacteria.
(Wow! It sounds like I worked for the three witches in MacBeth! Fire burn and cauldron bubble!)
Long ago, I worked in a large lab that was divided into several small rooms. For part of that time, I shared one of the small rooms with a graduate student from Taiwan. She was a wonderful person who taught me that many cultural norms are not normal in other cultures.
I love the way you show me secret things.
All I do is type: Select * from name_of_a_table
And you share everything with me.
Without you, my vision is obscured, and all I see is the display on the page.
In fact, this was the push that finally made me decide to learn SQL.