Phylogeny Friday - 2 June 2006

I wrote about the possibility of gene trees and species trees giving conflicting information in a previous Phylogeny Friday. In that example, the discordance was due to balancing selection maintaining multiple alleles across species boundaries. But can incongruities between genetic data and species history arise via entirely neutral processes? The answer is implied in the setup, but check out some of the details below the fold.

Humans and chimpanzees share HLA alleles that have been maintained by selection. These loci are far from ideal for constructing evolutionary relationships between species. Evolutionary biologists prefer neutral loci (genes not under selection) when studying species relationships. The evolutionary relationship between humans, chimps, and gorillas proved troublesome for many years because these three lineages diverged from each other in a (relatively) short period of time. There are three possible ways for these species to be related:

  1. Chimps and Gorillas are the closest relatives with Humans as the outgroup.
  2. Chimps and Humans are the closest relatives with Gorillas as the outgroup.
  3. Gorillas and Humans are the closest relatives with Chimps as the outgroup.

Different loci supported different hypotheses, but as the amount of data increased, hypothesis 2 turned out to be the most probable. This is based on the assumption that as you sample more loci throughout the genome, you stand a higher chance of capturing the "true" relationships of the species in question.

A recent study, however, calls that assumption into question. The figure below shows the relationship of four species, A, B, C, and D. The true species tree is shown by the outlined tree -- A and B are the closest relatives, C is the next closest relative, and D is the outgroup. Within the fat tree are lines representing possible gene trees constructed from sequences from each of these species. The gene tree in C agrees with the species tree, while the gene trees in A and B do not.

i-3142302663854cd8abded41703d07869-gene_species_trees.jpg

Anomalous Gene Trees for Four Taxa

Colored lines represent gene lineages that trace back to a common ancestor along the branches of a species tree with topology (((AB)C)D). The figure illustrates how a gene tree can have a higher probability of having a symmetric topology, in this case ((AD)(BC)), than of having the topology that matches the species tree. If the internal branches of the species tree--x and y--are short so that coalescences occur deep in the tree, the two sequences of coalescences that produce a given symmetric gene tree topology together have higher probability than the single sequence that produces the topology that matches the species tree.

(a) and (b) Two coalescence sequences leading to gene tree topology ((AD)(BC)). In (a), the lineages from B and C coalesce more recently than those from A and D, and in (b), the reverse is true.

(c) The single sequence of coalescences leading to gene tree topology (((AB)C)D).

If you skipped over the caption, the take home message is that when the internal branch lengths on the tree (ie, x and y) are short, it is more probable that the gene tree does not resemble the species tree than it does. How short is short? Well it depends on the lengths of the other branches. The authors acknowledge that the human-chimp-gorilla relationship probably does not suffer from the problem described above as the internal branches appear to be long enough. If you want more details, go read the article.

Some of my more informed readers may think that this sounds a lot like long branch attraction, but I don't think the authors derived the Felsenstein Zone. Long branch attraction occurs via homoplasies, whereas this model deals with lineage sorting.


Degnan JH, Rosenberg NA (2006) Discordance of Species Trees with Their Most Likely Gene Trees. PLoS Genet 2(5): e68

More like this