We had a great discussion in the comments yesterday after I published my NJ trees from some of the flu sequences.
If I list all the wonderful pieces of advice that readers shared, I wouldn't have any time to do the searches, but there are a few that I want to mention before getting down to work and posting my BLAST results.
Here were some of the great suggestions and pieces of advice;
1. Do a BLAST search. Right! I can't believe I didn't do that first thing, I think the trees I got surprised me so much all sense flew out of my brain.
2. Show us the multiple alignments. Okay. I'll post the files soon.
3. Make Maximum likelihood trees. David Koppstein has done that. My only complaint is that I'm too near-sighted to identify the sequences in his trees, so while I can see that the CA sequences are clustering together, I can't really interpret the results.
4. Use FASTA. Hmmm. I don't know why this would be helpful. FASTA is more sensitive than BLAST, but if I want to find sequences that are over 95% identical, I don't see where I need more sensitivity. It would be nice to have the reasoning explained, 'cause I don't get it.
5. Use the nucleotide sequences. YES! YES!
7. Gwen Aimes, and Victor Hanson-Smith, and Brian Foley from the Los Alamos National Lab (especially Brian Foley!) have given some great advice and shared their expertise on comparing viral sequences. And irayork (?) has had some good suggestions, too.
It's been really hard the past couple of days to focus on my other work, like grading homework and teaching my class, since I've really, really, wanted to get back to my computer and do more analyses. But now, I'm calmer and I can take a deep breath and look at the flu sequences in a more methodical and systematic fashion.
One of the things that I never liked about academic science was all the secrecy. It seemed to me that the people around me felt that you should keep every thing secret and not tell anyone anything until you were absolutely sure you were right. The trouble is, that philosophy makes people really afraid to ever be wrong. And, so many times, we are wrong. Or maybe just not 100% right.
So, since I don't have a lab or tenure worries, I thought: why not do science in the open? I've heard people suggest that original research shouldn't be published in blogs, that it should only published as peer-reviewed work. I don't buy that suggestion.
Crazy as it seems, I think this preliminary activity has gotten far better "peer-review" than some of the papers I submitted to official publications. I wish all peer review was as helpful and transparent.
I'll post more data in a bit, but first I want to say "thanks!"
I think you might like this: http://michaelnielsen.org/blog/?p=448
Why aren't there any mexican strain sequences yet in the NCBI Database (http://www.ncbi.nlm.nih.gov/genomes/FLU/SwineFlu.html)? I just looked it up, and they already have genes from a lot of places, but not from Mexico. When can we expect to have some?
Thank you (and other bloggers) for your sites. You are a great resource for my bioinformatics course.
Talk about up-to-date information! There's no way we can get this type of information from a textbook!
Bill: I did like the article, thanks for the link!
Ana: I don't know why the Mexican sequences aren't there. I would like to see them, too.
Ying-Tsu - thanks! I plan to post some more when I can fit it in. I hope I get to see you in Berkeley! We'll get to work with some Next Gen sequence data in my workshop.