A recent study of dog genetics, published in PLoS, seeks to improve the quality of genetic research by better understanding the underlying patterns of genetic variation at the level of specific dog breeds.
Sometimes we are interested in the evolutionary relationship between two "species" or populations, and genetics can be helpful. The more different the genetic sequence between two populations, the more distantly related they are (on average) and thus we can construct phylogenies ("family trees" of species or groups).
Sometimes we are interested in finding genes that are linked to particular phenotypes, like the gene for this or that disease. Finding a gene usually involves having a "probe," which is essentially a molecule that can locate a particular DNA sequence under the proper conditions. Probes tend to be small compared to the whole genome, and the genome is very big (and generally uninteresting at the detailed level). For this and other reasons, it is not the case that there are probes for just any "address" in the genome. One tends to work with the probes that exist where possible, using nearby addresses (nearby some area of interest) to navigate the actual genome in a particular sample.
Both of these efforts would be easiest if there was very little variation in the genetic makeup across individuals within a given population. If all cats had the same genome, and all rats had the same genome (at the most detailed level) than any one cat would be useful to inform us of all the genetic details of all cats, same with the rats, and the rat-cat relationship would be simple to work out. But of course, there is variation within species or populations, and that variation can be both large and patterned. By this I mean that it is not simply a matter of "more" or "less" variation ... there may be patterns to the variation that apply to one population that don't apply as much to some other population.
In other words, lack of a detailed understanding of the structure of genetic variation in a particular population leaves a fair amount of uncertainty. A better understanding of this structure would allow for the application of more appropriate analytical techniques, more secure results, and overall more useful research.
One example of variation is the location of genes themselves in relation to specific genes and their alleles. (An allele is a variant of a gene ... different alleles may result in different products, say a "normal" one vs. one connected with a disease.) For any pair of genes, there is a certain probability that during reproduction the specific alleles will be inherited together ... this is called linkage. The pattern of inheritance for genes on different chromosomes is typically thought of as random ... there is no link between them. However, if two genes are right next to each other on the same chromosome, there is a pretty good chance that they will be inherited together. The farther apart they are on the same chromosome, the more 'random' the inheritance pattern is, due to crossing-over.
The same is true of the linkage between a genetic marker that may be used (with a probe) to find a particular area of interest, and the actual DNA sequences of interest. A marker and a gene of interest should be very close to each other, or they may get passed on randomly ... that would be a very inaccurate marker.
In both cases, as DNA sequences change over time, with the insertion or removal of sections of junk, or the movement of genes in relationship to each other, the linkage patterns of genes and of genes and their markers change. Say there is a gene that causes Disease X in many species of carnivores. There is little reason to expect that a marker for this gene in raccoons would be useful in finding this gene in distantly related pandas. But would the marker serve reliable within raccoons, or within pandas? It depends. You get the idea.
This is an example of farily complex patterning in the DNA of a given population. If one is using markers to find disease-connected alleles, one would ideally have information on population-level patterning of linkage. Non-random behavior of genes (a particular allele being selected for or against, for instance) is often revealed by examining the linkage-related measures. So, understanding the pattern of linkage within a population is important.
What is needed is a better understanding of nature of genetic variation within populations or sub populations.
Now we come to the part about the dogs...
A new paper in PLoS, "Canine Population Structure: Assessment and Impact of Intra-Breed Stratification on SNP-Based Association Studies" by Quignon et. al. explores this issue.
In gene studies of dogs, the problem of variation within breeds is usually managed by using a sample of a number of individuals as controls and individuals with a particular gene or condition of interest. One way to increase the utility of these studies is to sample (within a breed) individuals from different geographic areas. The separation in time and space between these individuals makes the individual sampling points more independent, which makes the statistical analysis more powerful. However, these practices, of even sampling of treatment and control, or of using geographically distinct populations, are based on (reasonable) assumptions about how the genetic structure underlying the actual dogs looks. The present study looks more closely at the reality of the underlying genetic patterning, to replace assumption with measured observation where possible.
These researchers looked only at a small selection of common breeds recognized in the U.S. and Europe: In particular, the Rottweiler, the Bernese mountain dog, the flat-coated retriever, and the golden retriever. These all have a genetic susceptibility to a certain class of cancer (e.g. malignant histiocytosis in the Bernese). They looked at a particular set of genetic data on one chromosome (canine chromosome 1) across 119 dogs.
We showed that each population is characterized by distinct genetic diversity that can be correlated with breed history. When the breed studied has a reduced intra-breed diversity, the combination of dogs from international locations does not increase the rate of false positives and potentially increases the power of association studies. However, over-sampling cases from one geographic location is more likely to lead to false positive results in breeds with significant genetic diversity. ... [thus] ... These data provide new guidelines for [statistical] studies using purebred dogs that take into account population structure.
One question that comes to mind immediately for me is the difference between breeds that are, essentially, offshoots of some basic stock vs. breeds that are amalgams of multiple breeds. One could say that to some extent both are true of all breeds, but I think that would be wrong. For instance, the mountain dogs such as the Bernese and the Pyrenees are probably bred from Tibetan mastiffs more or less directly, thus involving a reduction in genetic variation within the breed. In contract, the Newfoundland is also bred from a mastiff stock but possibly with another very distantly related breed added in for special effect (thus offsetting the variation). The doberman is one of the most complex breeds of recent times, with several different breeds used to achieve a true breeding highly specialized form. Breeds that derive mainly from divergence should have different patterns (genetically) than breeds derived from combinatorial breeding.
Quignon, P., Herbin, L., Cadieu, E., Kirkness, E.F., HÃÂ©dan, B., Mosher, D.S., Galibert, F., AndrÃÂ©, C., Ostrander, E.A., Hitte, C., Awadalla, P. (2007). Canine Population Structure: Assessment and Impact of Intra-Breed Stratification on SNP-Based Association Studies. PLoS ONE, 2(12), e1324. DOI: 10.1371/journal.pone.0001324
Just out of curiosity, do you have any evidence of your claims of the ancestry of the Great Pyrenees and the Bernese Mountain Dog? A link to the Tibetan Mastiff certainly does not seem to be the consensus of public opinion. For both breeds the histories record them as having been bred from local farm dogs.
Nice post. This seems like a good thread for this question: When a genome for a species, like humans, is said to be completely sequenced, does this mean that the genome for a single individual has been mapped or is the result more of a composite sketch built from a number of individuals?
John, I don't have time to lay out the argument for you here. I've read every word written on the dogs, and this is what I think is most likely, and it accords with the genetic evidence. Not necessarily the tibent mastiff, but some mountain dog form the Atlas-Nepalewse axis. The TM is considered by most to be the most like the "aboriginal" mountain dog in the region.
This applies to the G.P. The BMD is a bit more obscure but it is probably derived from one of these mountain dogs.
(Oh: And public opinion and dogs... that's not a good source of information about dog history OR behavior!)
Gort: Good question. I think both have been true, usually the latter but sometimes the former. Maybe someone who knows will chime in here.
http://www.genome.gov/11006943 Frequently asked questions about the genome project from that link. Check out the "Whose DNA was sequenced for the Human Genome Project?" section.
The "official" or "reference" genome sequence that we usually refer to is a composite of different people. But there are individual genomes you can look at. For example, Jim Watson's genome can be viewed here: http://jimwatsonsequence.cshl.edu/
I just remembered Susie, though--too: the orangutan genome sequence (you can see the orangutan data in the UCSC Genome Browser here) came from Susie in the Gladys Porter Zoo; check the "assembly details" section on this page.
I would say that if all you have to go on is what the locals think is the history of the breed than that's all you have to go on. If there's more solid evidence then that is what you rely on. I don't see this as a life or death or even an ego thing. I just want to know more.
So to sum it up, I was hoping you'd cough up a couple of quick references to save me the trouble of looking for the papers myself. I guess I'll have to go do the work myself then.
Love the topic, btw. When I was a kid I had a chart of about 200 or so dog breeds and how it was thought they were related stuck to my wall. That and a map of Middle Earth.
This is a major research project of mine that I have put aside, but when I lay my hands on some materials I'll pass it on.
Per the Dog Genome Project:
The new "mountain" cluster, shown in purple in the structure graph on the left side of Figure 1, is anchored by the Bernese Mountain Dog and Greater Swiss Mountain Dog and includes other large dogs such as the German Shepherd and Saint Bernard (Fig. 1).
The other two clusters are the mastiff/terrier cluster, which first becomes apparent at K = 3, and the herding/sighthound cluster (Supplemental Fig. 5).
So it sounds like you're saying the Tibetan Mastiff falls into the Mountain Dog Cluster? In fact if what you say is correct it should anchor it.
Or possibly you dispute their findings (or their analysis) in which case I freely admit it's likely to go over my head pretty fast.
It doesn't look like the Great Pyrenees falls into that group. They're grouping it between Irish Wolfhound and Airedale Terrier.
I do not dispute this paper and the reseach it represents overall, but I absolutely do not agree that these clusters are accurate at any level of detail whatsoever. The primary reason for this is that the model assumes divergence of lineages. However, what has really happened with dog breeds is a steady flow of divergence (espeically earlier in history, most likely) followed by frequent cross-breeding. For instance, there is a pile of evidence linking the Newfoundland and the Pyrenees. That may be wrong, but it is reasonable. But these two dogs do not show up in the same cluster. Why? Because the Newfie was bred from the Pyr by adding alleles from other breeds. Depending on what part of the genome is driving the phylogeny, this could cause the Newfy and the Pyr to remain identical, or it could cause the Neuwfy to float off to some other cluster.
In this case the Newfoundland is linked with the corgi and a bunch of terriers. DNA is not truth!
Why is the African Basenju linked only with Asian breeds?
The St Bernard is linked with spaniels and the greyhound, etc. I don't think so....
Fortunately, this is not a study of dog phylogeny.
By the way, the mastiffs on this chart are not the southern (i.e. Tibetan) mastiffs. They are English.
For those interested, here's a link to another paper on this issue. We actually point out the difficulties in inferring the dog breed phylogeny in the paper.