Adaptive Evolution in Genomes

Adam Eyre-Walker has published a review of adaptive evolution in a few well studied systems: Drosophila, humans, viruses, Arabidopsis, etc. These organisms have been the subject of many studies that used DNA polymorphism, DNA divergence, or a combination of the two to detect natural selection in both protein coding and non-coding regions of the genomes. Now that we have whole genome sequences for multiple closely related species from a few different taxa, many researchers are interested in determining the role of natural selection in the evolution of DNA sequences.

Eyre-Walker claims that the evidence for adaptive evolution is greater in Drosophila than in humans. But JP at GNXP thinks that Eyre-Walker doesn't give the full story of adaptive evolution in the human genome, leaving out important examples. Eyre-Walker relates the difference in adaptive evolution between these two well studied species to differences in population size; humans have a smaller population size, therefore they fix less weakly advantageous mutations.

One way of measuring adaptive evolution is by comparing polymorphism and divergence at synonymous and non-synonymous sites (the McDonald-Kreitman test). Unlike some other tests (ie, Tajima's D) the MK test is fairly immune to historical changes in population size, but an ancestral increase in population size may lead to an overestimate of advantageous substitutions. Eyre-Walker claims that this is not a concern for studies of a pair of model Drosophila species for two reasons:

"First, if anything, D. melanogaster appears to have gone through a population size decrease. Second, estimates using polymorphism data from either D. simulans or D. melanogaster are very similar; it is difficult to see how the bias could be the same given that the two species have different Ne." [References omitted.]

Eyre-Walker's citation for the D. melanogaster ancestral population size is a study that looked at codon bias. The effect of Ne on codon bias will persist much longer than that on polymorphism. It's more probable that D. melanogaster has been recovering from a small ancestral population size (one that left that signature in codon usage), and has in fact been increasing in population size. It seems to me that the estimate of adaptive evolution in Drosophila is a bit high because of the increased population size in both D. melanogaster and D. simulans.

As mentioned previously, MK tests are robust to many violations of the assumptions that underlie the tests. Eyre-Walker points out that slightly deleterious mutations may lead to biased estimates of advantageous substitutions:

"The exception is the segregation of slightly deleterious non-synonymous mutations, because these can bias the estimate of α [proportion of non-synonymous substitutions that have been fixed by adaptive evolution] either upwards or downwards depending on the demography of the population. If the population size has been relatively stable, the estimate of α is an underestimate, because slightly deleterious mutations tend to contribute relatively more to polymorphism than they do to divergence when compared with neutral mutations. These slightly deleterious mutations can be controlled for by removing low-frequency polymorphisms from the analysis, because such mutations tend to segregate at lower frequencies than do neutral mutations. However, slightly deleterious mutations can lead to an overestimate of α if population sizes have expanded, because mutations that might have been fixed in the past, when the population size was small, no longer segregate as polymorphisms. Even fairly modest increases in population size can create artifactual evidence of adaptive evolution." [References omitted.]

Slightly deleterious mutations exaggerate the effect of changes in population size. I'm not pointing this out because of how it relates to adaptive evolution in Drosophila. Instead, I find the solution to this problem quite fascinating: remove low-frequency polymorphisms from the data-set. This should remove most slightly deleterious polymorphisms from consideration (assuming constant population size). This immediately led me to think of a particular data set that has this quality built in: Hap-Map.

One major criticism of much of the SNP data in circulation is that it suffers from ascertainment bias (see here for example). Because SNPs are first identified in a small sample and assayed in a larger sample, many rare SNPs are missed. This poses a big problem for tests that depend on the site frequency spectrum of polymorphisms (eg, Tajima's D), but could actually be useful if slightly deleterious mutations are segregating in the population. This assumes two things: the researcher is using an MK based test and the population size has been constant for many generations. We know that human populations have increased greatly over many generations, so we're probably still overestimating adaptive evolution if we don't take mildly deleterious mutations into account.

More like this

You should also consider mildly deleterious synonymous mutations as a possible cause of a significant MK test. In a species with strong codon bias (such as D. simulans), many synonymous polymorphisms will be the result of a mutation from a preferred codon to an unpreferred codon. Selection against the unpreferred codon is weak enough that it may appear as a polymorphism, but it is unlikely to become a fixed difference. The lower fixed-to-polymorphism ratio for synonymous variation will be interpreted as a relatively higher fixed-to-polymorphism ratio for amino acid variation, and thus as an excess of fixed amino acid differences due to adaptation.
One way to deal with this, as you point out, would be to use a frequency cutoff, so variants below a certain frequency aren't counted as polymorphisms. I'm not sure what the right cutoff would be, however. If you have enough data, it might be better to compare the frequency spectrum of different kinds of polymorphisms, similar to what Hiroshi Akashi has done when investigating codon bias. Using polymorphisms and fixed differences in introns as the basis of comparison could be informative, too; an MK test comparing the fixed/polymorphism ratio for introns vs. synonymous variation is one way to detect selection on synonymous variation.
Of course, none of these complications apply if your species has a population size too small for mild selection on synonymous variation to be effective.

By John H. McDonald (not verified) on 19 Sep 2006 #permalink

Ideally, we would have both polymorphism and outgroup data so that we can look at site frequency spectra, polarize mutations (ie, unfold the SFS), and compare polymorphism to divergence. I definitely agree with you about the non-neutrality of silent sites. Although Andolfatto showed (for Dmel) that synonymous mutations are "more neutral" than non-coding changes. All of these tests require some neutral standard to which to compare our putative locus under selection. Where that neutral standard comes from seems to be an unresolved issue to me.

By the way, if you write about the MK test, it's pretty sweet to have John McDonald (the M in MK) comment.