- Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped);
- Install the plug-in from here and point it to your 23andMe data;
- Browse to a website discussing one of the genetic variants included on the 23andMe chip, and you'll see highlights around the rsID of any variant on the page (rsIDs are unique codes assigned by dbSNP to most of the common variants targeted by personal genomics companies);
- Mouse over the rsID and your own genotype for that SNP will appear.
I love this idea. Is it 23andme only, or does it work with decodeme too?
Currently 23andMe only, but they have plans to extend it to other companies soon. Of course, for the impatient and informatics-savvy deCODEme customer it would be easy to reformat your deCODEme file as a 23andMe file, which would work just as well.
Beyond that - Luke has imputed everyone in Genomes Unzipped onto HapMap 2, and I'm planning to try those files out later this week.
Neat tool! Re the AT/GC flip tweeted about, is that referring to the cases where the common use of the SNP is the opposite of the result provided by 23andMe? As in your (coincidental?) example - in dbSNP and most papers the LCT SNP is C>T (so you would be TT, lactose persistent)?
I guess I'll have to wait till the Chrome extension comes out then.
Who uses IE or Firefox anymore??
Keith: yes, that's correct. Users will just have to learn how to do strand flipping in their heads - the only problem will be A/T and C/G SNPs, but fortunately these are reasonably rare (0.75% of SNPs on the 23andMe v2 chip and 1.5% of SNPs on the v3 chip). Alternatively, geneticists could agree to adopt a consistent standard for strand representation (cue hysterical laughter).
Who uses IE or Firefox anymore??
20.6% and 42.0% of ScienceBlogs readers, respectively. Chrome comes in third with a 17.9% market share. But that said, Chrome support is the first thing I asked about. :-)
Re: reverse complementing SNPs
In general, Illumina GWAS chips (unlike Affymetrix's) purposefully only include A/C, A/G, C/T and G/T SNPs, so the small number of A/T and C/G SNPs reported by 23andMe is from their custom component on top. Therefore, that strand flipping is not a problem is an artifact of the technology used.
Alternatively, geneticists could agree to adopt a consistent standard for strand representation (cue hysterical laughter).
To take the bait (and as Daniel knows full well) the obvious method would be to report only on the forward strand, but that implies a fixed genome assembly. So, a large chunk of lab work is still done on assembly NCBI36, as that is what HapMap's data has been reported on. (And people don't know how to cope with alternate named haplotypes in GRCh37, but that's a different story).
The alternative to fixing the assembly, is to fix which allele to report first by sequence context - Illumina's version of that is online.
Yes, I should have specified that the AT/GC problem will be more of an issue for other chips (e.g. Navigenics, which last time I checked used an Affy 6.0 array).
And all fair points on the strandedness issue; strandedness has long been the bane of many a geneticist's existence, but I agree that's because it's a genuinely difficult problem to solve.
I agree that's because it's a genuinely difficult problem to solve.
No! No its not! This isn't 1999! We're not talking about the variation in the long nosed weeve! The vast majority of the human genome does not change strand between builds, and thus a given SNP is very likely to change strand. Report on the FORWARD STRAND. Always the FORWARD STRAND. There is never any good reason, in this day and age, to report a variant on the negative strand.
The solution is certainly not to chuck away a third of human genetic variation because it doesn't fit your format requirements!
The Illumina GWAS chips do not assay A/T and C/G SNPs because it is cheaper not to - or, if you prefer, they have chosen to double the SNP density by not doing so:
Bead Type Definition
Depending on the type of SNP or marker being assayed, the Infinium HD Assay uses one of two probe (or bead type) designs, Infinium I or Infinium II. The Infinium II probe design, which stops at the base before the SNP of interest, uses only one probe per loci (i.e., one probe for both alleles). This probe design is suitable for the majority of loci in most organisms. Infinium I probe design is required for relatively less common A/T and C/G SNPs and requires two probes (or bead types) per SNP because the probe stops at the base representing the SNP of interest (i.e., one probe for each of both alleles).
Yes, I should get out more.
Hi everyone -
Yes, Chrome is definitely high on the port list, as Daniel points out. So far, it's the top vote in our informal survey from the release. Safari is a distant second, and IE is last with zero votes.
On the strand issues - these break down into two categories, basically - which strand the SNP is reported on, and whether the SNP is prone to ambiguity (A/T and C/G). Each probably calls for a different solution. 23andMe data is normalized to the + strand always, so we've taken the tack in this first release to just report it as-is, and assume that people will do whatever they normally do with their raw 23andMe data anyway (flip strands mentally, look up in dbSNP, etc.). We are looking at ways, however, to make this friendlier in future releases - at a minimum, we could flag the A/T and C/G SNPs so people have some warning when dealing with them. For strand orientation, we are contemplating ways to show dbSNP orientation alongside the native data, etc. This is a bit tricky because it either requires pre-processing of raw data or real-time API lookup, each of which has disadvantages.
DECODEme will probably be the platform we support next, just because it's one we are familiar with (so if you want Navigenics, or any other platform, *please* point us to some good data files to use for testing! email firstname.lastname@example.org if you don't mind). DECODEme conveniently puts strand orientation per SNP right in the file, even though everything isn't + strand normalized.
Do stay tuned to SNPTips and 5AM Solutions - we have more cool things in the works...
Increasingly SNPs are included on chips because they are the best tags for known effects, or because they are rare SNPs that cannot be tagged any other way, meaning that the GC/AT rate continues to rise. The illumina 1M chip has 0.4% GC/AT SNPs, whereas the 2.5M has about 2.5%. Plus, illumina genotyping is far from the only technology that calls variants, and sequencing calls are going to get increasingly common.
Now is the time to acknowledge that times have changed - with a stable genome, there is no good reason to not report something on the forward strand, and the future will thank you for doing so.