Is Nested Clade Analysis Worthwhile? [evolgen]

i-0f31133106de1e9748bd4fd398d96881-nested_clade.jpg

Population biologists often want to infer the demographic history of the species they study. This includes identifying population subdivision, expansion, and bottlenecks. Genetic data sampled from multiple individuals can often be applied to study population structure. When phylogenetic methods are used to link evolutionary relationships to geography, the approaches fall under the guise of phylogeography.

The past decade has seen the rise in popularity of a particular phylogeographical approach for intra-specific data: nested clade analysis (Templeton et al. 1995; Templeton 2004). Many of the methods used in intra-specific phylogeography have been called into question because of their lack of statistical rigor, as I have described previously (How do you really feel, Dr. Wakely?). Nested clade phylogeographical analysis (NCPA) is no exception. Lacey Knowles summarizes the criticisms of NCPA in the most recent issue of Evolution (Why does a method that fails continue to be used?).

The strongest part of Knowles' critique focuses on one primary issue: NCPA tends to result in false positives. While NCPA does an adequate job of inferring actual demographic events (such as population subdivision), it falsely identifies extra events. Knowles points out that it is especially biased towards inferring isolation by distance when there has not been any. Interestingly, these false inferences occur with both empirical data (for which the demographic history is assumed to be well known) and simulated data (for which the demographic history is known). Alan Templeton (the creator of NCPA and its most ardent defender) argues that the simulation studies are not an adequate test of NCPA because they only offer simple evolutionary scenarios. However, Knowles points out that if NCPA fails with simple scenarios, how can it be trusted with the complicated ones that exist in nature?

NCPA is a very popular method -- Remy Petit identified over 1700 citations as of about one year ago (doi:10.1111/j.1365-294X.2007.03589.x). Additionally, Knowles points out that a six year old critique of NCPA (Knowles and Maddison 2002) has been cited 210 times, often by empirical studies that still used NCPA! That raises the question: why do people continue to use NCPA if it hasn't been shown to work? It can't be because they don't know of the limitations of NCPA -- they're citing papers that layout those limitations.

Finally, I will relate this to previous rants on evolgen. Knowles points out that some of the criticisms of NCPA rely on the inference of historical events from simulation studies of a single locus. As I have mentioned previously, inference of historical demographic events from a single locus is not acceptable (see here and here). There is so much stochastic noise in evolutionary systems, and trying to identify demographic history using a sample size of one does not take the large variance of the system into account. Templeton and other NCPA defenders argue that simulations using a single locus are not an adequate test of NCPA. But 88% of the NCPA studies Knowles identified used only a single locus. Not only are people using a method that has never been shown to work, but they are also using the method with insufficient data. Double fail!


Knowles and Maddison 2002. Statistical phylogeography. Mol Ecol 11: 2623-2635 [link]

Knowles 2008. Why does a method that fails continue to be used? Evolution 62: 2713-2717 [link]

Petit 2008. The coup de grâce for the nested clade phylogeographic analysis? Mol Ecol 17: 516 - 518 doi:10.1111/j.1365-294X.2007.03589.x

Templeton et al. 1995. Separating Population Structure from Population History: A Cladistic Analysis of the Geographical Distribution of Mitochondrial DNA Haplotypes in the Tiger Salamander, Ambystoma tigrinum. Genetics 140: 767-782 [link]

Templeton 2004. Statistical phylogeography: methods of evaluating and minimizing inference errors. Mol Ecol 13: 789 - 809 doi:10.1046/j.1365-294X.2003.02041.x

More like this

Population biologists often want to infer the demographic history of the species they study. This includes identifying population subdivision, expansion, and bottlenecks. Genetic data sampled from multiple individuals can often be applied to study population structure. When phylogenetic methods…
I'm currently working my way through John Wakeley's book on Coalescent Theory. (The website has a few pre-publication chapters if you want to take a peek.) In his introductory chapter, Wakeley introduces the concept of gene genealogies. He's careful to point out that, unlike the phylogenies we…
One of the drums I beat around here pertains to inferring demographic history using molecular markers (i.e., DNA data). I've been known to go off on people who make claims about ancestral population sizes based on studies of a single locus or gene. You see, studying a single locus only gives you…
As I have mentioned before, de novo sequencing of whole eukaryotic genomes may be a thing of the past (or, at least, these whole genome projects won't be getting very much more common). Instead, I proposed that people would use the new high-throughput technologies to sequence parts of the genome…