Genetic Variation I: What is a SNP?

If you've read any of the many stories lately about Craig Venter or Jim Watson's genome, you've probably seen a "SNP" appear somewhere. (If you haven't read any of the stories, CNN has one here, and my fellow bloggers have posted several here, here, here, here, here, and here.)

You may be wondering, and rightly so: just what is a SNP?

Never fear, hopefully this post will answer some of those questions.

tags: , , ,

SNP stands for Single Nucleotide Polymorphism. That's a mouthful. It means some people, will have one base at a certain position, in a sequence of bases, and other people will have a different base at that position. The two forms of SNP are called "alleles." (Usually there are only two forms.)

If we compared two DNA sequences, and they contain a SNP, we might see something like this:


If we look at a trace from a chromatogram, and we have a mixed sample of DNA (you have DNA from both your mom and dad, so your DNA is a mixture), a SNP looks like this:


Image made with FinchTV

Those of you who've taken genetics are probably looking at this and saying, uh sure, that's a substitution mutation right? What makes that so special?

SNPs are different because they are inherited.

Mutations can happen in any DNA molecule, in any cell, but they are only inherited if they occur in the DNA that's passed on to our offspring. For example, mutations probably occur in my skin cell DNA whenever I spend too much time in the sun, but my children won't have those mutations. They only got the mitochondrial DNA and the set of chromosomes that I contributed when my body made their eggs.

That SNPs are inherited is pretty cool. We can use SNPs to look at human migration patterns and see where people's ancestors have been. We can also use SNPs to identify medical conditions and evaluate someone's ability to metabolize drugs, like warfarin or caffeine.

Other fun facts about SNPs:

  • SNPs occur about every 200-1000 bases.
  • SNPs are usually binary. That is, I might find an A or G at a certain position, but I'm far less likely to find an A, G, or C.
  • The process of doing a genetic test to identify which SNP you have, is called "genotyping."
  • Craig Venter has 3,213,401 single nucleotide polymorphisms (SNPs) (1).

SNPs are not the only form of genetic variation. I'll cover some of the others (indels, inversions, etc.) later on.

1. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. (2007) The Diploid Genome Sequence of an Individual Human. PLoS Biol 5(10): e254 doi:10.1371/journal.pbio.0050254

Copyright Geospiza, Inc.

More like this

Last year, Craig Venter became the first single person to have his genome sequence published (doi:10.1371/journal.pbio.0050254). That genome was sequenced using the old-school Sanger technique. It marked the second time the complete human genome had been published (which led to some discussion as…
"Come quickly, Watson," said Sherlock Holmes, "I've been asked to review a mysterious sequence, whose importance I'm only now beginning to comprehend." The unidentified stranger handed Holmes a piece of paper inscribed with symbols and said it was a map of unparalleled value. Holmes gazed…
Lookie here -- they've sequenced Craig Venter's genome. What did they learn about Craig? Well, he's European. He has 46 chromosomes. He's got some structural differences from the reference genome. Venter also differs from the reference genome by insertions and deletions.Like every other human, he's…
What do genetic testing and genealogy have in common? The easy answer is that they're both used by people who are trying to find out who they are, in more ways than one. Another answer is that both tests can involve DNA sequence data. And that leads us to another question. If the sequence of my…

Is it safe to say that if Craig Venter has 3,203,401 SNPs, I likely have 3,203,401 SNPs, or am I overlooking some effect like duplication or deletion of SNPs?

The authors of the paper use the term "SNP" to refer to positions where one base is substituted for another. The tally of insertions, deletions, copy number variations, etc. is presented separately.

Yes, it safe to say that you have at least 3,203,401 SNPs. In fact, I'd say that you probably have more than that.

The only way that we can find SNPs is by sequencing the same region of DNA from several individuals of varying ancestry. We call this "resequencing" or sometimes, "deep resequencing." This has only been done for a few select genes that are known to be medically important.

So, there are many SNPs that are yet to be discovered. We only know about Craig's SNPs because those are positions where his DNA differs from the reference sequence at the NCBI, or where his two chromosomes differ from each other.

You're going to lose a few people here; even those who have taken two or three semesters of Bio. in university. First, some are going to assume pairing sequences are A-T G-C, but that in the special case of SNP, there can also be pairings of T-C. I know that seems out there, but I have run into this interpretation before. Clarity above brevity.

Good point Mike! I changed the image a bit to emphasize that these are equivalent strands.