It's been an intensive week of genomics here at the American Society of Human Genetics meeting, and I haven't been able to grab time to blog as much as I'd have liked. In fact there's a whole load of genomics news I'll be trying to cover in some detail over the next couple of weeks; for the moment, though, I couldn't let today's presentation from personal genomics company 23andMe go by without at least some comment. (For other coverage of the conference, do check out Luke Jostins' blog coverage and the stream of live analysis on Twitter.)
The 23andMe presenter (Nick Eriksson) delivered an overview of the potential of the 23andMe cohort for association studies: all 23andMe customers have genetic information for over 500,000 common genetic variants, and they are also encouraged to provide self-reported phenotype data on a wide range of traits ranging from the presence of detached earlobes to longitudinal tracking of Parkinson's disease symptoms. Eriksson reported that the company now had sufficient numbers of returned surveys to perform genome-wide association studies for 22 traits, with sample sizes ranging between 2500 and 6000 individuals - reasonable sample sizes for an initial look at the genetic architecture of a complex trait.
The company seems to be doing a reasonable job of identifying and controlling for the various potential confounders that plague genome-wide association studies, such as population structure. However, 23andMe faces an unusual challenge that standard academic GWAS consortia don't: the possibility that a subject will give a biased trait report after seeing their own genetic data.
This was powerfully illustrated by results from the "athlete gene" ACTN3 (a gene close to my own heart). There was no association between the athletic performance-associated variant in this gene and self-reported sprinter/endurance preference in individuals who hadn't seen their genetic data - but in individuals who had already seen their genotype there was a marked shift towards carriers of the "sprint" or "endurance" allele self-identifying with those respective categories. In other words, people were altering their self-reported athletic affiliation on the basis of their genotype; Eriksson estimated that around 25% of individuals must be shifting their self-identification to explain the effect, a staggeringly large number.
Eriksson played down the potential impact of this effect, but this is still a rather worrying finding for a company relying on self-reported (and often quite subjective) phenotype data from a customer base that has often peeked at their genetic data before ever filling in a survey; at the very least there is potential for inflation of apparent association with known markers already in the 23andMe database. One way around this might be to provide some kind of incentive for customers to complete phenotype surveys before they ever see their genotype data, perhaps by providing discounts on future product updates.
Aside from this niggling concern, the major message from the talk is that 23andMe's approach works in terms of generating genome-wide significant associations for complex traits: the company has successfully replicated a series of known associations with eye, skin and hair colour, for instance. More interestingly, 23andMe has also nailed down a handful of genuinely novel genetic associations: a massively significant association between an olfactory receptor region and "asparagus anosmia" (the inability to smell asparagus in one's own urine), and two regions associated with hair curl.
These traits seem pretty trivial, but this is precisely the sort of area where 23andMe will be able to out-compete academic consortia, and these types of associations are also extremely (perhaps perversely) attractive to personal genomics customers; it's just cool to be able to see the region of the genome that underlies a trait you can see in yourself, and to follow the inheritance of these traits through a family. These types of associations won't contribute to clinical genetics, but they are likely to non-trivially boost 23andMe's appeal to consumers.
Will 23andMe be able to uncover novel associations with a greater relevance to disease genetics? I suspect their impact here will be much more modest, at least in the near future; academic consortia are generally vastly more well-powered to pick up disease risk associations given their more stringent quality control and phenotype definitions. However, it's important not to underestimate the importance of 23andMe's ability to recruit and maintain an active base of participants, and their Facebook-like viral marketing appeal (in which customers have an incentive to recruit other people). This may make it possible for 23andMe to tap long-term phenotypic change, such as the progression of symptoms in patients suffering from diseases such as Parkinson's.
It's been interesting to watch the perception of the genomics community towards 23andMe shift over time. There's still some hostility out there - and indeed, the first question directed towards Erikson was a needlessly combative and rather incoherent question about the ascertainment bias in 23andMe's sample towards wealthier individuals - but the strangeness of the 23andMe model is starting to wear off, and presentations like this one will no doubt help to convince scientists that this is a company that at least is capable of doing solid science.
There's one other small nugget of data worth mentioning. It's always been hard to get a solid estimate of the number of customers in 23andMe's database, but we now have a conservative lower bound: the company has at least 6,000 unrelated individuals of European ancestry enrolled who have taken phenotype surveys, suggesting a total active (i.e. engaged in phenotype surveys) customer base substantially higher than this. I don't think this number would surprise many regular readers, but it's a useful antidote to the sorts of ridiculously low recruitment numbers I've heard quoted by personal genomics critics.
Oh, damn, sorry to have not realized you were there. I would have tried to meet you.
We were even in that room at the same time because I had the same take on that question :)
I thought of you on another occasion when one of the talks I was in specifically had a "do not tweet" request, and I was wondering how that was being handled in other rooms.
23andMe advertisements will appear from federatedmedia.net
So this leads to two interesting questions:
1) Is 23andMe selling user data to attention brokers to deliver more targeted advertising?
2) If this is good for users ---and I'm willing to entertain the notion--- then why isn't Google selling the advertising, but instead, "federatedmedia.net"? What does Google know about 23andMe that others do not?
I agree with your comments regarding the utility of the 23andMe setup - they are excellently positioned to do the type of studies which academic institutions will never get direct funding (such as hair curl - fantastic!) As you say, they are very unlikely to be able to offer much help in the effort to identify human disease associations. Academic institutions with access to clinical samples are able to ascertain many more case individuals than population based ascertainment. For the most common diseases (where they will be able to recruit the greatest number of case individuals), heterogeneity is likely to pose a major problem. High quality subphenotype data will probably to be needed to delineate this heterogeneity, and unfortunately 23andMe are unlikely to be able to obtain this information accurately.
I wonder what journal will publish non-IRB approved human research? Will any clinical journal publish?
For several reasons, the effect of seeing one's ACTN3 genotype on self-reported running ability is likely to be much much larger than for most self-reported phenotype data. Most people have never run at a professional level, so they are unlikely to know whether they are natural sprinters or endurance athletes. The question, are you a natural sprinter or an endurance athlete, requires the person to make an inference based on numerous diverse experiences. For example, the person needs to decide what it means to be a natural sprinter or a natural endurance athlete. They also need to make an inference about the reference population. How does my sprinting ability compare to the average person's sprinting ability? Because this type of question asks the person to draw upon numerous past experience and infer a complex and under-defined trait relative to an subjectively defined reference population, responses are likely to be especially susceptible to suggestion.
For most of the self-reported data gathered by 23andme, such as whether one's hair is curly or straight, this is not the case.
Wow Steve, you've got a lot of nerve showing up here after you accused Daniel of talking out his "A$$" in a scientific publication and called his ACTN3 work "swill", a "crappy study designed to get on the cover of magazines", and "hype".
I'm glad we have such an "honest" person here to "compensate" for all the hooey!
Genetic Future is doing an admirable job bringing the recent developments in human genetics to its readers. I would like to bring the following to your kind attention. Perhaps it may be interesting to some of your colleagues.
Firstly, ;-) at the obscure but amusing reference in your name...
Secondly, Steve is welcome to keep on talking here as often as he likes. Steve's name came up a number of times at the ASHG meeting, and it appears he's done a pretty spectacular job of disqualifying himself from being taken seriously by virtually any of his potential allies among the genomics community. So long as he's willing to keep shooting himself in the foot, I'm happy to keep providing the firing range...
I tend to agree with you - I certainly don't think this potential bias kills the 23andMe model for association. It's most likely to have an effect in cases where 23andMe scientists are trying to use their customers to validate known variants associated with somewhat subjective traits; in such cases they would be expected to see a marked inflation of the associations. In terms of discovery of new variants, on the other hand, the effect will probably be pretty small.
However, even for the most sensitive associations, so long as 23andMe can encourage a sufficiently large proportion of its customers to fill in phenotype surveys before receiving their genetic data they can always restrict their association studies to these users (who will be free of bias).
So: an interesting and novel source of bias, but not one that will seriously undermine 23andMe's capacity to identify novel genetic associations.
Carl @3 is showing an excess of manners.
Perhaps 2,500 to 6,000 subjects sounds like enough to do something clinically relevant, because those are the sort of numbers that have been going into GWAS studies.
However, GWAS are cohort (case/control) studies - where perhaps half the subjects have been cherry-picked to have a particular disease.
There is a reason why prospective genetic studies (like the UK Biobank) are set up to include 100,000s of subjects - and that is because most "common" diseases are not, actually, that common.
As an example, without targeted recruitment, the number of subjects in a population sample of 6,000 people who would expect to develop Parkinson's disease - eventually - is perhaps 4-8.
Which isn't to say prospective studies don't have their place in genetics - see:
Teri A. Manolio, Joan E. Bailey-Wilson and Francis S. Collins (2009)
Genes, environment and the value of prospective cohort studies
Nature Reviews Genetics 7, 812-820 (October 2006) | doi:10.1038/nrg1919
which has some cracking tables to show what you would have to do, if you were really going to do this properly.