How many genes do you share with your twentieth cousin?

John Hawks has an interesting post on what it means to be human in which he argues that our "human-ness" (humanity?) is our shared evolutionary history. I like it. But Hawks also writes the following:

It is our history that connects us to our distant relatives, not our genes. Even with a close relative like a twentieth cousin, there is a decent likelihood that you will share no genes at all because of your shared kinship from your most recent common ancestor. By the fiftieth generation, it is a virtual certainty. You are a genetic stranger to your ancestors.

I could share no genes with my twentieth cousin? This kind of sloppy use of terminology is not what I've come to expect from John. He's usually at his best when writing about human genetics. You see, the quoted statement could vary from true to wildly inaccurate depending on the definition of gene we're using.

Let's start with the molecular genetics definition of "gene". In this case, a gene is a locus on a chromosome that performs some function. Recent work has shown that individuals vary in the particular genes they carry (as a result of copy number polymorphisms, CNPs). That means I might have a gene you don't have, or you might carry a gene that I'm missing. But CNPs do not lead to totally different genes between two individuals. In fact, a large number of our genes are shared with our most distant mammalian relatives, indicating they're probably found in all humans as well. So, using this definition of "gene", Hawks is dead wrong. But that's not the definition he intended.

Next, let's consider a population genetics definition, where "gene" actually means "allele". In this case, "sharing a gene" with someone actually means having the same allele as another person. If we limit ourselves to the 20,000 or so protein coding genes in the human genome, what's the likelihood that two individuals have no genes with the same sequence in common? An analysis of coding sequence polymorphisms within humans and differences from chimpanzee genes found that 92.6% of the >10,000 genes analyzed had at least one protein coding polymorphism or difference (doi:10.1038/nature04240). That means, conservatively, a few percent of all human genes do not vary within human populations (if anyone can cite data on synonymous variation, I'd greatly appreciate it). If those genes don't vary within the human species, you're guaranteed to have at least some shared genes with everyone. But I that's not what John Hawks meant either.

Finally, we'll consider a genealogical definition of gene. In this case, we're interested in whether the genes two people have are identical by descent because they were inherited from one of the most recent common ancestors of the two people. For every gene, full siblings have a 75% chance of sharing the same gene, according to this definition of a gene. Because we have two copies of each gene (one from the mother and one from the father), we can also determine the probability full siblings will share zero, one, or two genes. There's a 25% chance full siblings share zero genes, a 50% chance they share one gene, and a 25% chance they share both genes. First cousins will have 25% chance of sharing a gene. The probabilities of sharing genes by descent continue to decrease as you begin to deal with more distant relatives. This is what John Hawks meant when he wrote that you probably don't share many genes with most of your relatives.

Despite the low probability of having the same genes passed on to you as your cousins had passed on to them, you still share an evolutionary history with them. In fact, all humans share an evolutionary history -- a tighter history than we share with our closest living relatives, the chimpanzees. And that's what Hawks says makes us human. As I said before, I like that definition, but I don't think it's the only one we can use.


Bustamante et al. 2005. Natural selection on protein-coding genes in the human genome. Nature 437: 1153-1157 doi:10.1038/nature04240

Gillespie 2004. Population Genetics: A Concise Guide. Johns Hopkins University Press, Baltimore.

Tags

More like this

I have a little bit of an infatuation with copy number polymorphism (CNP), which describes the fact that individuals within a population can differ from each other in gene content. Some genes, such as olfactory receptors (ORs), have many different related variants in any animal genome. New copies…
I straddle the line between being a population biologist and a molecular geneticist. That's a self-congratulatory way of saying that I am an expert in neither field. But existing in the state I do allows me to observe commonalities shared by both. For example, both fields have terminology (or what…
In a post at the Panda's Thumb, Ian Musgrave cites this paper by Bakewell et al claiming that 154 genes out of 13,888 surveyed show evidence for adaptive evolution in humans since the divergence with chimps (this is the "chimps more evolved than humans" paper). Ian brings this up in a discussion of…
Last year a group out of Australia published a paper which purported to explain eye color variation based upon a polymorphism around the OCA2 locus. The paper was A Three-Single-Nucleotide Polymorphism Haplotype in Intron 1 of OCA2 Explains Most Human Eye-Color Variation, and I blogged it here.…

Thanks, RPM -- I think that's a good clarification, but I thought the original "because of kinship from your most recent common ancestor" was clear also. Anyway, the point is as you describe -- you and your twentieth cousin share nothing that you don't share with a random stranger.

To someone who doesn't spend a lot of time thinking about relatedness & descent it would be a confusing statement, but it's not sloppy. If you start by assuming he meant it in the genealogical sense, which should have been implied given the context, then it's a perfectly precise statement, as evidenced by the fact that you and I know exactly what he meant.

The probabilities of sharing genes by descent continue to decrease as you begin to deal with more distant relatives.

To a point.

You're skipping the fact that our ancestry doesn't continue branching like a tree. This is most obvious if your father and mother are also brother and sister (leaving you 2 grandparents instead of 4), but is more typical for relations several generations removed and was even more common in earlier centuries. Go back 20 generations and you almost certainly can't count 2-to-the-20th distinct ancestors. Populations were largely separated and homogenized. That's why Icelanders look distinct from Polynesians.

Thanks for this post. As a layman, I still don't have a firm grasp on a number of concepts that get lumped together under the term "gene". Your differentiation between "allele" and "genealogical gene" is helpful...I'd love to see more written about distinctions between the two concepts.

To give you an example of my confusion, you point out that a meaningful percentage of "genes" (allele pairs?) do not vary within human populations. OK, that's consistent with the notion that humans are incredibly similar genetically (99.9%+). But then I read that I share less than 1% of my "genes" with my 5th cousin, who is obviously far more genetically similar to me than 99.9% of humanity.

Like I said, I'm confused. It would be great to find a primer that explains the physical distinctions between an allele and a "genealogical gene".

While I'm asking questions, help me with your first cousin math. Wouldn't there be a 5/16 chance (31%) that we share a particular gene? (1/4 chance that we share 1 gene + 1/16 chance that we share 2 genes) Or am I doing something wrong?

Welcome back! You've been missed!

John, it was clear, but only after I read it a few times. What you wrote makes perfect sense, but it didn't register with me until after I had written my post. At that point, I went ahead and published it because I still thought it clarified what you were saying.

Jinchi, Yes, I am assuming that there is no inbreeding within the genealogy. It's not a valid assumption, but it's still unlikely that 20th cousins share more genes by descent than a random draw from the same population even if we do allow for inbreeding. Now, if we draw from a different different populations, that's a whole other story, as you point out.

On the "what it means to be human" panel:

Collins got widespread agreement when he suggested that engineering or selecting genetic improvements was a non-starter: "it implies that someone knows what an improvement is, and our track record there is a little problematic."

I think cancer-free children would be an improvement. But as long as we can do it by removing some of their organs and subjecting them to lifetime medication, we should stick with that, because eugenics is bad, mmkay?

What comprises the genealogical definition of a gene? The first and second definition are clear, but what makes the third distinct? To a molecular biologist it seems theoretical or conceptual.

The chance of sharing a particular allele with a first cousin is actually 0.125, not 0.25. A first cousin is a third degree relative--one degree to your parent, the second degree to your aunt or uncle, and the third degree to your cousin. At each step, there's a 0.5 chance of sharing an allele, so for a third degree relative the overall chance is 0.5 to the third power, or 0.125.

Regarding Eric #5's question, normal first cousins cannot share both alleles of a gene, because each cousin has an allele from a unrelated parent who married into the family. (An exception would be double first cousins, the offspring for example of two sisters who married two brothers.)

The probability that first cousins have one allele in common is 0.25. For them to share the allele by descent, the same allele must be passed to each of them from one of their grandparents. For this to happen, the allele must be passed from one grandparent to both of that grandparent's children (ie, one of each of the cousins' parents). The parents of the cousins (ie, a pair of siblings) must also pass that allele on to each of the cousins. The probability of each of these events happening in 0.5, and there are four independent events (two grandparent to parent/aunt/uncle transmissions, and two parent/uncle/aunt to cousing transmission). That means the probability of that particular allele getting passed to both cousins is 0.54=0.0625. But there are four possible alleles that can get passed from grandparents to the cousins. Each of these are mutually exclusive, so we can add each of their probabilities, and we get 4*0.0625=0.25.

kghales, are you assuming a particular allele is restricted to a single lineage within a population? If not, it is possible that normal cousins could share two of the same alleles, but not from the same source. I think this would be the case for all but very rare alleles.

By Jim Thomerson (not verified) on 19 Jun 2008 #permalink

I would imagine that many people who are 20th cousins of each other are 20th cousins of each other by multiple genealogical pathways, as well as being multiple 21st cousins of each other, and so on.

Actually, being at most 20th cousins is pretty far apart. For example, my vague impression is that when genealogists compete to link the new President of the United States to the last President as closely as possible, the winner usually comes up with something like 10th cousins. But, of course, the two Presidents are linked by a lot of other slightly less close genealogical pathways. (Of course, the great majority of Presidents have been of British descent, so they are more closely related than two average random Americans.)

The Iceland national genealogy database could quantify this for one particular population.

RPM (#11), yes, your math makes sense. I think the confusion arises because both of the following statements are true:
1. First cousins share 12.5% of their alleles.
2. For a given gene, there's a 25% chance that first cousins share an allele.
It becomes more intuitive to consider that at the outset, each cousin has half of his/her genetic material from an unrelated person marrying in. Within the remaining half, there's a 25% chance of sharing an allele for any given gene. All together that means having 12.5% of alleles in common.

Jim #12, it's certainly true that alleles are in multiple lineages, but the discussion here is primarily on identity by descent.

But Hawks doesn't say that you and your 20th cousin share no genes. He says "Even with a close relative like a twentieth cousin, there is a decent likelihood that you will share no genes at all because of your shared kinship from your most recent common ancestor." You seem to have missed that important qualifier.

When I first started reading this post, I assumed Hawks was talking about genealogical descent from the outset. I thought RPM's objection would be that two people who share a great^19-grandparent (= 20th cousins) would very likely share other common ancestors at least as recently.

But in #7, RPM says:

I am assuming that there is no inbreeding within the genealogy. It's not a valid assumption, but it's still unlikely that 20th cousins share more genes by descent than a random draw from the same population even if we do allow for inbreeding.

Is that really true? 20 generations would be at least 300 years ago (assuming 15y/gen as a lower limit), and probably closer to 400, right? If two people in the US share a common ancestor from the 17th century, that ancestor likely didn't even live in North America.

So what is the average relatedness of, say, a US citizen of European descent? Does anyone know? If you pick two people at random, how many generations back would you have to go to have a 50%+ chance of finding a common ancestor?