Careers in biotechnology, part IV: the tip of the informatics iceberg

By sporte on July 31, 2007.

I don't usually blog about work for wide variety of reasons. But, last week, since I wanted to write about bioinformatics software companies, I broke with tradition and wrote about Geospiza as an example.

Naturally, I got some feedback about this. Some people liked it, but one of the most opinionated people said that I had given the software engineering and IT side short shrift and that I should write about that side a bit more.

Today, is my attempt at a remedy.

tags: biotechnology careers, biotechnology, career+descriptions,
bioinformatics

The tip of the iceberg
This diagram was given to me to use as an example of the engineering and IT areas in bioinformatics companies.

Lots of people think that bioinformatics only concerns the algorithms, like Smith-Waterman, FASTA, phrap, or BLAST. Or they think it's the visual reports that you get from web servers like the UCSC genome browser, or the web forms where you request data from GenBank, or the databases like the PDB, or the graphical interfaces (GUIs) that you use to work with your data - like with FinchTV. But, that's only the tip. There's much, much more going on below the surface.

And of course, this is where I get into deep water.

Don't blame me, I'm not a native speaker
Bear in mind, I'm a biologist, so when it comes to writing about IT, I will certainly get a few things wrong.

Working in a software company, I continually encounter worlds that I never knew existed, and things that make me feel ignorant, so you can all feel free to correct me in the comments section if I make a mistake with some of the technology, terminology or acronyms.

Take databases, for example. I've decided that databases are probably at least as different from each other as different species of animals. You think they'll all behave the same, but no. Some don't like capital letters or spaces; some have different limits on the amount of text that can be entered.

Even the language, that you use to ask the database questions, has different dialects. Some databases like one version of SQL (structured query language), other databases use SQL with slight variations. (And that's only when they're ANSI compliant.) Even a single flavor of database will behave differently when it holds different amounts of data, or lives on a different operating system.

And then there are things like vacuuming and tuning.

Databases need to vacuumed? Who knew?

A very brief description of the IT & engineering side
Even buying computers isn't as simple as you might imagine. Servers are not like laptops. You have to make sure that the processors are up to the tasks that you have in mind. There has to be sufficient RAM, hard drive space and back-up systems.

(Hi! I'm Linux and I'll be your server today. Would you like that RAID 0, 1, or 5?)

So, some of the people who work at our company are very focused on all the subtleties of shopping for equipment and knowing which types are compatible with which databases.

We also have experts in databases, not just the knowledge architecture, but the methods for backing up information, checking integrity, measuring performance, tuning queries, upgrading systems, and things that to me are as comprehensible as metaphysics.

All the variety, of course, with different kinds and versions of databases, different operating systems, and different versions of our own software would leave us with an incomprehensible mess if we didn't have methods for tracking everything, version control, building software every night, documenting what's been done, API's and application frameworks. The people who work testing our systems and software have to be very organized, creative, and methodical to make sure that most important features have been tested with multiple databases, multiple operating systems, and multiple web browsers.

One of commenters in a previous post mentioned a tendency to view programming/software engineering as a skill akin to "advanced typing."

Nothing could be further from the truth. Certainly, we have people involved in designing software who have lots of lab experience and sometimes master's degrees or Ph.D.'s in biology. That background is essential when it comes to designing software that will be useful for working with biological data. But, that's only a small part of the iceberg. I almost forgot - every level of that iceberg requires some kind of documentation, written for it's own specialized audience. We hire software engineers, database specialists, software architects, programmers, technical writers, and others, because they have specialized technical knowledge in different areas.

It takes expertise in many areas to create software that will withstand the test of time.

Read the whole series:

Part I. Careers in biotechnology

A look at the jobs in biotech company, making biomedical products.

Part II: Bioinformatics

Where does bioinformatics fit into a biotech company? Who makes bioinformatics tools? Who uses them?

Part III: Life in a bioinformatics software company

How do people work together to make bioinformatics software?

Part IV: The tip of the informatics iceberg

What about the software engineering and IT side of bioinformatics software companies?

More like this

i loved this post...and the previous ones for that matter. i'm a computer science major and have been heavily considering applying to bioinformatics/computational bio phd programs after i finish. prior to reading this, i was overwhelmed by how much knowledge i needed to amass and schooling i needed to get through before i could enter the bioinformatics field. good to know there are a variety of jobs available to me even without a phd. thanks.

Great post again. In my first job, I was a scientific programmer/algorithms guy and got to work with one of the best software engineering teams possible. Realized very quickly how little I knew about that side and got to appreciate the need to have the informaticians/programmers work hand in hand with the software/database geeks.

Now, a lot of that iceberg you're pointing to is a result of the pathological state of computer science in the past twenty years. Take version control systems. They are an attempt to remedy the fact that the crude file system model ubiquitous today has no support for versions, and compilers have no idea how to maintain historical traces. Perl has slight support for this (you can include specific versions of modules). A lot more of that code is there to try to handle getting data back and forth between different towers code, when the data is sitting in the same physical memory. And yet people regard these Chinese walls as normal.

Take version control systems. They are an attempt to remedy the fact that the crude file system model ubiquitous today has no support for versions, and compilers have no idea how to maintain historical traces. Perl has slight support for this (you can include specific versions of modules).

Then why nobody is building enterprise applications in Perl? (trick question) :))) Probably, because it's not the most important thing in a software engineering, IMHO.

Then why nobody is building enterprise applications in Perl? (trick question) :))) Probably, because it's not the most important thing in a software engineering, IMHO.

Why do assume you that they're not?

Oh, I'd say that quite a bit of Perl is in use in the Biopharma industry to serve up web services to scientists. Less than before, thats for sure, but still a significant bit.

how to start with a bioinformatics research organisation oriam interested in starting with a research in stem cells how to get collaboration with other research organisation how to raise funds

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Communism V. Journalists: Beijing’s Crackdown on Press Freedom

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…