Adaptive Evolution in Genomes

By evolgen on September 18, 2006.

Adam Eyre-Walker has published a review of adaptive evolution in a few well studied systems: Drosophila, humans, viruses, Arabidopsis, etc. These organisms have been the subject of many studies that used DNA polymorphism, DNA divergence, or a combination of the two to detect natural selection in both protein coding and non-coding regions of the genomes. Now that we have whole genome sequences for multiple closely related species from a few different taxa, many researchers are interested in determining the role of natural selection in the evolution of DNA sequences.

Eyre-Walker claims that the evidence for adaptive evolution is greater in Drosophila than in humans. But JP at GNXP thinks that Eyre-Walker doesn't give the full story of adaptive evolution in the human genome, leaving out important examples. Eyre-Walker relates the difference in adaptive evolution between these two well studied species to differences in population size; humans have a smaller population size, therefore they fix less weakly advantageous mutations.

One way of measuring adaptive evolution is by comparing polymorphism and divergence at synonymous and non-synonymous sites (the McDonald-Kreitman test). Unlike some other tests (ie, Tajima's D) the MK test is fairly immune to historical changes in population size, but an ancestral increase in population size may lead to an overestimate of advantageous substitutions. Eyre-Walker claims that this is not a concern for studies of a pair of model Drosophila species for two reasons:

"First, if anything, D. melanogaster appears to have gone through a population size decrease. Second, estimates using polymorphism data from either D. simulans or D. melanogaster are very similar; it is difficult to see how the bias could be the same given that the two species have different N_e." [References omitted.]

Eyre-Walker's citation for the D. melanogaster ancestral population size is a study that looked at codon bias. The effect of N_e on codon bias will persist much longer than that on polymorphism. It's more probable that D. melanogaster has been recovering from a small ancestral population size (one that left that signature in codon usage), and has in fact been increasing in population size. It seems to me that the estimate of adaptive evolution in Drosophila is a bit high because of the increased population size in both D. melanogaster and D. simulans.

As mentioned previously, MK tests are robust to many violations of the assumptions that underlie the tests. Eyre-Walker points out that slightly deleterious mutations may lead to biased estimates of advantageous substitutions:

"The exception is the segregation of slightly deleterious non-synonymous mutations, because these can bias the estimate of α [proportion of non-synonymous substitutions that have been fixed by adaptive evolution] either upwards or downwards depending on the demography of the population. If the population size has been relatively stable, the estimate of α is an underestimate, because slightly deleterious mutations tend to contribute relatively more to polymorphism than they do to divergence when compared with neutral mutations. These slightly deleterious mutations can be controlled for by removing low-frequency polymorphisms from the analysis, because such mutations tend to segregate at lower frequencies than do neutral mutations. However, slightly deleterious mutations can lead to an overestimate of α if population sizes have expanded, because mutations that might have been fixed in the past, when the population size was small, no longer segregate as polymorphisms. Even fairly modest increases in population size can create artifactual evidence of adaptive evolution." [References omitted.]

Slightly deleterious mutations exaggerate the effect of changes in population size. I'm not pointing this out because of how it relates to adaptive evolution in Drosophila. Instead, I find the solution to this problem quite fascinating: remove low-frequency polymorphisms from the data-set. This should remove most slightly deleterious polymorphisms from consideration (assuming constant population size). This immediately led me to think of a particular data set that has this quality built in: Hap-Map.

One major criticism of much of the SNP data in circulation is that it suffers from ascertainment bias (see here for example). Because SNPs are first identified in a small sample and assayed in a larger sample, many rare SNPs are missed. This poses a big problem for tests that depend on the site frequency spectrum of polymorphisms (eg, Tajima's D), but could actually be useful if slightly deleterious mutations are segregating in the population. This assumes two things: the researcher is using an MK based test and the population size has been constant for many generations. We know that human populations have increased greatly over many generations, so we're probably still overestimating adaptive evolution if we don't take mildly deleterious mutations into account.

More like this

You should also consider mildly deleterious synonymous mutations as a possible cause of a significant MK test. In a species with strong codon bias (such as D. simulans), many synonymous polymorphisms will be the result of a mutation from a preferred codon to an unpreferred codon. Selection against the unpreferred codon is weak enough that it may appear as a polymorphism, but it is unlikely to become a fixed difference. The lower fixed-to-polymorphism ratio for synonymous variation will be interpreted as a relatively higher fixed-to-polymorphism ratio for amino acid variation, and thus as an excess of fixed amino acid differences due to adaptation.
One way to deal with this, as you point out, would be to use a frequency cutoff, so variants below a certain frequency aren't counted as polymorphisms. I'm not sure what the right cutoff would be, however. If you have enough data, it might be better to compare the frequency spectrum of different kinds of polymorphisms, similar to what Hiroshi Akashi has done when investigating codon bias. Using polymorphisms and fixed differences in introns as the basis of comparison could be informative, too; an MK test comparing the fixed/polymorphism ratio for introns vs. synonymous variation is one way to detect selection on synonymous variation.
Of course, none of these complications apply if your species has a population size too small for mild selection on synonymous variation to be effective.

Ideally, we would have both polymorphism and outgroup data so that we can look at site frequency spectra, polarize mutations (ie, unfold the SFS), and compare polymorphism to divergence. I definitely agree with you about the non-neutrality of silent sites. Although Andolfatto showed (for Dmel) that synonymous mutations are "more neutral" than non-coding changes. All of these tests require some neutral standard to which to compare our putative locus under selection. Where that neutral standard comes from seems to be an unresolved issue to me.

By the way, if you write about the MK test, it's pretty sweet to have John McDonald (the M in MK) comment.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

What An Eclipse Means For US President Donald Trump

More by this author

This is a Good-bye Post

January 16, 2009

This is the final post ever at evolgen. It was a fun 4+ years, the last three spent at ScienceBlogs, but it has come time for me to close up shop. When I first got into blogging, I did it as a way to share what was on my mind to the few people who would read what I had to say (usually in topics…

Mendel's Garden #27 - Call for Submissions

January 2, 2009

Mendel's Garden is the original genetics blog carnival. The next edition will be hosted by Jeremy at Another Blasted Weblog. If you would like to submit a blog post to be included in the carnival, send an email to Jeremy (jcherfas at mac dot com). The carnival should be posted within the next few…

Eric Lander Teaches?

December 20, 2008

John Hawks points out that Eric Lander has been appointed to co-chair Obama's Council of Advisers on Science and Technology along with science adviser John Holdren and Nobel Laureate Harold Varmus. Here's how the AP article describes Lander: Lander, who teaches at both MIT and Harvard, founded the…

The Implementation of Molecular Evolution for the Masses

December 18, 2008

A couple of years ago, there was talk in the bioblogosphere about getting the general public interested in bioinformatics and molecular evolution: Amateur bioinformatics? Lowering the Ivory Tower with Molecular Evolution Molecular Evolution for the Masses The idea was inspired by the findings of…

Do people still use microarrays?

December 17, 2008

Larry Moran points to a couple of posts critical of microarrays (The Problem with Microarrays): Why microarray study conclusions are so often wrong Three reasons to distrust microarray results Microarrays are small chips that are covered with short stretches of single stranded DNA. People…