Why sequencing matters for personal genomics

The first ever post on the new group blog I announced yesterday, Genomes Unzipped, is now live: it's Luke Jostins of Genetic Inference talking about the importance of sequencing for the future of personal genomics. Here's a taste:

There is a particular type of variation that genotype chips can never get at, the type of variation that most people will find most interesting: variation that is unique to you, or to your family. If you get sequenced now, about 200,000 single-base variants in your genome will never have been seen before, ever. These are likely to include changes that modify proteins in a unique way, that may make them act differently in your cells. A big proportion of indels and structural variants will be novel, and these can include strange and exotic things: genes that have been swapped around, jumbled up, fused together, or deleted entirely. There may well be stretches of DNA, hundreds of base pairs long or longer, that have never been observed in another human. Regardless of how "useful" these personal oddities are, to be able to look directly at new genomic discoveries that live inside you makes them invaluable.

Read the rest here.

More like this

Luke Jostins should note that the extremely low prior probabilities associated with true de novo variants make them very challenging to actually identify. At least with familial variants you have other individuals in the family tree which you can check against. But right now it's pretty hard to distinguish true de novos from sequencing error without a lot of confirmatory followup.

@asdf

That depends on what you mean by "identifying de novo mutations". The case I am talking about is more "identifying novel variants", i.e. finding mutations that have not been seen before in anyone else; with 30X of 2x100+bp read length this isn't that difficult. Yes, you can have an elevated FP rate, due to having a lower prior, but this can be overcome (at the expense of sensativity) using appropriate quality thresholds.

I expect what you are thinking about is identifying variants as de novo mutations from trio sequence - i.e. looking for all mutations that have arisen between the parents and the child. This is indeed difficult, but still not impossible, and with very up-to-date next gen stuff is very much possible.