Careers in biotechnology, part IV: the tip of the informatics iceberg

I don't usually blog about work for wide variety of reasons. But, last week, since I wanted to write about bioinformatics software companies, I broke with tradition and wrote about Geospiza as an example.

Naturally, I got some feedback about this. Some people liked it, but one of the most opinionated people said that I had given the software engineering and IT side short shrift and that I should write about that side a bit more.

Today, is my attempt at a remedy.

tags: , , ,

The tip of the iceberg
This diagram was given to me to use as an example of the engineering and IT areas in bioinformatics companies.

i-367e25136182c4f4a468c5aa6dfa8be6-iceberg.gif

Lots of people think that bioinformatics only concerns the algorithms, like Smith-Waterman, FASTA, phrap, or BLAST. Or they think it's the visual reports that you get from web servers like the UCSC genome browser, or the web forms where you request data from GenBank, or the databases like the PDB, or the graphical interfaces (GUIs) that you use to work with your data - like with FinchTV. But, that's only the tip. There's much, much more going on below the surface.

And of course, this is where I get into deep water.

Don't blame me, I'm not a native speaker
Bear in mind, I'm a biologist, so when it comes to writing about IT, I will certainly get a few things wrong.

Working in a software company, I continually encounter worlds that I never knew existed, and things that make me feel ignorant, so you can all feel free to correct me in the comments section if I make a mistake with some of the technology, terminology or acronyms.

Take databases, for example. I've decided that databases are probably at least as different from each other as different species of animals. You think they'll all behave the same, but no. Some don't like capital letters or spaces; some have different limits on the amount of text that can be entered.

Even the language, that you use to ask the database questions, has different dialects. Some databases like one version of SQL (structured query language), other databases use SQL with slight variations. (And that's only when they're ANSI compliant.) Even a single flavor of database will behave differently when it holds different amounts of data, or lives on a different operating system.

And then there are things like vacuuming and tuning.

Databases need to vacuumed? Who knew?

A very brief description of the IT & engineering side
Even buying computers isn't as simple as you might imagine. Servers are not like laptops. You have to make sure that the processors are up to the tasks that you have in mind. There has to be sufficient RAM, hard drive space and back-up systems.

(Hi! I'm Linux and I'll be your server today. Would you like that RAID 0, 1, or 5?)

So, some of the people who work at our company are very focused on all the subtleties of shopping for equipment and knowing which types are compatible with which databases.

We also have experts in databases, not just the knowledge architecture, but the methods for backing up information, checking integrity, measuring performance, tuning queries, upgrading systems, and things that to me are as comprehensible as metaphysics.

All the variety, of course, with different kinds and versions of databases, different operating systems, and different versions of our own software would leave us with an incomprehensible mess if we didn't have methods for tracking everything, version control, building software every night, documenting what's been done, API's and application frameworks. The people who work testing our systems and software have to be very organized, creative, and methodical to make sure that most important features have been tested with multiple databases, multiple operating systems, and multiple web browsers.

One of commenters in a previous post mentioned a tendency to view programming/software engineering as a skill akin to "advanced typing."

Nothing could be further from the truth. Certainly, we have people involved in designing software who have lots of lab experience and sometimes master's degrees or Ph.D.'s in biology. That background is essential when it comes to designing software that will be useful for working with biological data. But, that's only a small part of the iceberg. I almost forgot - every level of that iceberg requires some kind of documentation, written for it's own specialized audience. We hire software engineers, database specialists, software architects, programmers, technical writers, and others, because they have specialized technical knowledge in different areas.

It takes expertise in many areas to create software that will withstand the test of time.

Read the whole series:

  • Part I. Careers in biotechnology
  • A look at the jobs in biotech company, making biomedical products.

  • Part II: Bioinformatics
  • Where does bioinformatics fit into a biotech company? Who makes bioinformatics tools? Who uses them?

  • Part III: Life in a bioinformatics software company
  • How do people work together to make bioinformatics software?

  • Part IV: The tip of the informatics iceberg
  • What about the software engineering and IT side of bioinformatics software companies?

Categories

More like this

i loved this post...and the previous ones for that matter. i'm a computer science major and have been heavily considering applying to bioinformatics/computational bio phd programs after i finish. prior to reading this, i was overwhelmed by how much knowledge i needed to amass and schooling i needed to get through before i could enter the bioinformatics field. good to know there are a variety of jobs available to me even without a phd. thanks.

By chris eigner (not verified) on 31 Jul 2007 #permalink

Great post again. In my first job, I was a scientific programmer/algorithms guy and got to work with one of the best software engineering teams possible. Realized very quickly how little I knew about that side and got to appreciate the need to have the informaticians/programmers work hand in hand with the software/database geeks.

Now, a lot of that iceberg you're pointing to is a result of the pathological state of computer science in the past twenty years. Take version control systems. They are an attempt to remedy the fact that the crude file system model ubiquitous today has no support for versions, and compilers have no idea how to maintain historical traces. Perl has slight support for this (you can include specific versions of modules). A lot more of that code is there to try to handle getting data back and forth between different towers code, when the data is sitting in the same physical memory. And yet people regard these Chinese walls as normal.

Take version control systems. They are an attempt to remedy the fact that the crude file system model ubiquitous today has no support for versions, and compilers have no idea how to maintain historical traces. Perl has slight support for this (you can include specific versions of modules).

Then why nobody is building enterprise applications in Perl? (trick question) :))) Probably, because it's not the most important thing in a software engineering, IMHO.

By S. Johnson (not verified) on 02 Aug 2007 #permalink

Then why nobody is building enterprise applications in Perl? (trick question) :))) Probably, because it's not the most important thing in a software engineering, IMHO.

Why do assume you that they're not?

Oh, I'd say that quite a bit of Perl is in use in the Biopharma industry to serve up web services to scientists. Less than before, thats for sure, but still a significant bit.

how to start with a bioinformatics research organisation oriam interested in starting with a research in stem cells how to get collaboration with other research organisation how to raise funds