InChI (Semantic Web Chatter)

This might not apply to a lot of the readers, but I think a decent subset might find it important. As you know, it's not so easy to search for chemical information. With most search engines, you're limited to the tags associated with the document or (more often) the text within the document. Usually, chemical graphics exist as graphic files (as discrete raster image files, or as the aforementioned images embedded in PDFs, neither of which are yet amenable to searching).

IUPAC has settled on InChI as an almost-Open unique identifier for compunds. By tagging your pages with InChIs, you can make your chemical information searchable. Nature Chemical Biology and Beilstein have adopted it (among others, see here for more info).

I don't keep up with molecular- and bioinformatics as well as I probably should, but I think this is an easy enough thing worth considering (this goes double for organic chemistry professors and instructors, many of whom generate hundeds, if not thousands, of chemical images for their classes).

There are a number of ways to tag your compounds. Here is the easiest way I know of that I've been using for the past couple days:

I, like a pretty large subset of chemists, generate my structure drawings in ChemDraw and save them as GIFs. Generating an InChI is as easy as saving your compound as a .MOL and converting it here (note that this tool is not Cahn-Ingold-Prelog, multi-structure in one .MOL, or isotope-aware).

For example, here is 1-chloro-2-fluoro-3-bromo-4-methyl-5-hydroxy-6-cyanobenzene:

1-chloro-2-fluoro-3-bromo-4-methyl-5-hydroxy-6-cyanobenzene: InChI=1/C8H4BrClFNO/c1-3-5(9)7(11)6(10)4(2-12)8(3)13/h13H,1H3

Making your structures InChI aware is as easy as including an alt tag in their images. The image above has one:

InChI=1/C8H4BrClFNO/c1-3-5(9)7(11)6(10)4(2-12)8(3)13/h13H,1H3

Feel free to leave any comments about this sort of stuff (and whether I'm going about this the best way); I'll admit to quite a bit of ignorance on the subject, but this seems useful enough to make the small effort. Also take a look at Peter Murray-Rust's blog for more information - he's the one who turned me on to this.

More like this

Thanks very much for this.

If you want to try it out, use the example above. Go to our site (Open/free) at http://wwmm-svc.ch.cam.ac.uk/wwmm/html/googleinchiserver.html
and use the GoogleInChI tab if necessary. This will bring up a chemical sketching applet (Marvin) - draw the molecule in the normal way and press "Search". Within 1-2 seconds you will get this:
#

The size in kilobytes of the cached version: 53k

URL: http://www.neuralgourmet.com/brainsnacks?from=140
#

The size in kilobytes of the cached version: 38k

URL: http://scienceblogs.com/moleculeoftheday/2006/10/inchi_semantic_web_bab…

(Interesetingly Google appears to have indexed an aggregation of this blog rather than the blog itself. (That's a general limitation - you never know exactly what Google is doing...).

P.

Just curious, how do you pronounce InChI?
inch-eye, inchie...

I guess the latter sounds better once you make the plural.