Methinks it is like a fox terrier

I've had, off and on, a minor obsession with a particular number. That number is 210. Look for it in any review of evolutionary complexity; some number in the 200+ range will get trotted out as the estimated number of cell types in a chordate/vertebrate/mammal/human, and it will typically be touted as the peak number of cell types in any organism. We have the most cellular diversity! Yay for us! We are sooo complicated!

It's an aspect of the Deflated Ego problem, in which scientists exercise a little confirmation bias to find some metric that puts humans at the top of the complexity heap. Larry Moran is talking about the various techniques people use to inflate the complexity of the genome, making special case arguments for novel molecular gimmicks that we mammals use to get far more ooomph out of our genes than those other, lesser organisms do.

As I was reading it, I had this sense of deja vu, and using my psychic powers, I predicted that someone was going to make the argument that because we mammals have so many more cell types than other organisms, there must be some genetic trick we're playing to increase the number of outcomes from our developmental processes, and that therefore there must be something to it. Because we are measurably more complex than other animals, there must be a mechanism to get more complexity out of our 20,000 genes than nematodes get out of theirs.

And did I call it? I did. Very first comment:

I dont think its a sign of an inflated ego to think mammals are more complex than flies. There are objective measures one could use such as cell type number, number of neurons or neural connectivity.

There's a problem with this claim, though. Many people, including quite a few prestigious scientists, believe that cell type number in various organisms has actually been measured, and you'll even find respected people like Valentine putting together charts like this:

cellnumberchart

That chart is total bullshit. You know how I expressed my visceral repugnance for an MRA who made up a "sexual market value" chart? I feel the same rage when I see this chart. There is no data supporting it. There we see humans listed as having 210 cell types, and everything else is lesser: birds have only 187 cell types. Do you believe that? I sure as hell don't.

I periodically get a bit pissed off about this. I wrote about it in a thread on Talk.Origins in 2000, and I've put a copy of that below. I complained about it in a blog post from 2007. It hasn't sunk in. I still run into this nonsense fairly regularly.

The short answer: this number and imaginary trend in cell type complexity are derived entirely from an otherwise obscure and rarely cited 60 year old review paper that contained no original data on the problem; the values are all guesswork, estimates from the number of cell types listed in histology textbooks. That's it.

The long answer, my digging from 13 years ago:

This is a topic in which I've long had an interest, of a peculiar and morbid sort. It's been a case of occasionally running into these arguments about cell types, and wondering whether I'm stupidly missing something obvious, or whether the authors of these claims are the cockeyed ones. I can't see a middle ground, it's one or the other. Maybe somebody here can point out how idiotic I must be.

The issue is whether we can identify a good measure of organismal complexity. One way, you might think, would be to look at the number of different cell types present. I first ran across this metric in the late '70s, in JT Bonner's book _On Development: the biology of form_. He has a number of provocative graphs in that book, that try to relate various parameters of form to life history and evolution. Some of the parameters are easy to assess: maximum length, or approximate number of cells (which is just roughly proportional to volume). Others were messy: number of different cell types. Bonner didn't push that one too much, just pointing out that a plot of number of types vs. total number of cells was sorta linear on a logarithmic plot, and he kept the comparison crude, looking at a whale vs. a sequoia vs. a sponge, that sort of thing. He also said of counting cell types that it was "in itself an approximate and arbitrary task", but doesn't say or cite where the numbers he used came from, or how they were obtained.

It came up again in Stuart Kauffman's work. He tried to justify his claim that the number of cell states (or types) in an organism was a function of the number of genes, and he put together a chart of genome size vs. number of cell types. It was glaringly bogus. He (or someone) clearly selected the data, leaving out organisms with what I guess he would consider anomalous genome sizes -- and Raff and Kaufman thoroughly trashed that entire line of argument in their chapter on the C-value paradox in _Embryos, Genes, and Evolution_, showing that one axis of Kauffman's graph has to be invalid. Nobody has touched on that other axis, the number of cell types, and I'm still wondering how anybody determined that humans have precisely 210 different kinds of cells, while flies have 50 (those numbers seem to have become canonized, by the way -- I've found several sources that cite them, +/- a bit, but very few say where they came from).

And then Morton mentions this interesting little paper that I hadn't seen before:

Valentine, JW, AG Collins, CP Meyer (1994) Morphological complexity increase in metazoans. Paleobiology 20(2):131-142.

[note to Glenn: the citation on your page is incorrect. It's in Paleobiology, not Paleontology]

Abstract.-The number of cell types required fo rthe constructon of a metazoan body plan can serve as an index of morphological (or anatomical) complexity; living metazoans range from four (placozoans) to over 200 (hominids) somatic cell types. A plot of the times of origin of body plans against their cell type numbers suggests that the upper bound of complexity has increased more or less steadily from the earliest metazoans until today, at an average rate of about one cell typer per 3 my (when nerve cells are lumped). Computer models in which increase or decrease in cell type number was random were used to investigate the behavior of the upper bound of cell type number in evolving clades. The models are Markovian; variance in cell type number increases linearly through time. Scaled to the fossil record of the upper bound of cell type numbers, the models suggest that early rates of increase in maximum complexity were relatively high. the models and the data are mutually consistent and suggest that the Metazoa originated near 600 Ma, the the metazoan "explosion" near the Precambrian/Cambrian transition was not associated with any important increase in complexity of body plans, and that important decreases in the upper bound of complexity are unlikely to have occurred.

At least, the paper *sounds* interesting. After reading it, though, I'm left feeling that it is an awful, lousy bit of work.

The first major flaw: there is no data in the paper. The first figure is a plot of cell type number against age, in millions of years before the present -- the numbers and groups described are listed on Glenn Morton's page. These are the observations against which several computer models will be compared. These data were not measured by the authors, but were gleaned from the literature. The sources for these critical numbers are listed in an appendix, about which more in a little bit.

The bulk of the paper is about the computer models they developed. The final figure is the same as the first, showing the data points from the literature with the plot generated by their best-fit simulation superimposed. It's a very good fit. From this, they make several conclusions: 1) that their model is in good agreement with the historical data, 2) that the rate of increase in complexity was greatest near the origin of metazoans, 3) that that origin was relatively late, and 4) there was no particular change in rate during the Cambrian explosion. It is a fine example of GIGO.

The work is completely reliant on the validity of the data about cell type number, which is not generated by the authors, and worse, which is not even critically evaluated by the authors. It is just accepted. That data left me cold, though, with lots of questions.

What is a cell type? There was no attempt to define it. Histologically, it's a fuzzy mess -- you can go through any histology text and find long lists of cells types that have been recognized by morphology, location, staining properties, and so forth. I just skimmed through the index of an old text I have on hand (Leeson and Leeson), and without trying too hard, counted a bit more than a hundred distinct, named, vertebrate cell types in the first 5 pages...and there were 25 more pages to go. What criteria are the authors using? How well do these superficial criteria for identification mesh with the molecular reality of the processes that shape these cells?

Why did they throw out huge categories of cells? The nervous system is simply not considered -- it's 'lumped'. This seems to me to be grossly inappropriate. Here is this HUGE heap of cellular diversity, in which half the genome is involved, and it is discarded in what are supposedly quantitative models. I can guess that it was thrown out because it is impossible to quantify...but that doesn't sound like a good excuse if you are trying to model numbers. Furthermore, they only count cells in adults, so cell types found only in larvae or juveniles are rejected. Whoops. Isn't that an admission that complexity in arthropods is going to be seriously underestimated? I don't know, since they don't say how they define a cell type.

How did they get these tidy single numbers for a whole group? 'Arthropods' have only 50 cell types. They admit that "within some groups there is a significant range of cell type numbers". The range of variation, however, is not reflected in any of their graphs, nor which groups exhibit this range. Instead, they say, they picked a representative "primitive number" of cell types from "the more primitive living forms within each group". I guess the more primitive living forms haven't done any evolving.

A really bothersome and related point: the high end of their plot is anchored by the hominids, with 210 cell types and a time of origin within the last few million years. Remember, they are going to fit all these computer-generated curves to these data, and they explicitly scale everything to this endpoint and an earlier one. This point is invalid, though. We humans don't have any novel cell types that were generated a few million years ago -- that number of 210 cells ought to be applied to all of Mammalia, and the time of origin shoved back a hundred million years. Or more. Is there any reason to think 200 million year old therapsids were lacking any significant number of histological cell types found in mammals today?

For that matter, why should we think that these cell type numbers are anything but arbitrary indicators of the relative amount of time histologists have spent picking over the tissues of these various organisms? Do fish really have fewer cell types than mammals, or just different ones? Fish may lack all the cell types associated with hairs, but we don't have all the ones that form scales. The authors show amphibians as being more complex than fish, on the basis of cell type counts in living forms...and that is completely the reverse of what I would expect, if I thought there was any difference at all.

What was really the killer for me, and what I was really looking for, was the primary sources for these numbers. These are listed at the very end, in a separate appendix. A few are easy: it's not hard to imagine being able to count all the different cell types in a sponge or a jellyfish. One is admitted speculation by Valentine -- he estimates the number of cells a primitive hemocoelic bilaterian must have had. Another, the number of cells in arthropods, is cited as an unpublished ms by Valentine. However, almost all of the counts boil down to one source, a critical source I haven't yet been able to find. This very important paper, that purports to give cell type numbers for echinoderms, cephalopods, fish, amphibians, lizards, and birds, is:

Sneath, PHA (1964) Comparative biochemical genetics in bacterial taxonomy. pp 565-583 in CA Leone, ed. _Taxonomic biochemistry and serology_. Ronald, New York.

It's a paper about bacterial taxonomy? And biochemistry? The only discussion in the text of the Valentine paper about this source mentions that it compares DNA content to cell type number, a measure that Raff and Kaufman have shown most emphatically to be invalid. And it's from 1964, although the author seems to still be around and active in bacterial taxonomy and molecular biology right up until at least a few years ago. He doesn't look like a histologist or comparative zoologist though, that's for sure.

It's from 1964. Oh, boy. I did manage to track down a copy of this volume in a library a few miles away, but I haven't yet been able to get out and read it. I'm not too inclined to even try right now, because this appendix also has a little subscript in fine print at the bottom...virtually every source in this list, including Sneath, is marked with an asterisk, and the fine print tells us that that means "estimates NOT [my emphasis] documented by lists of cell types or by references to published histological descriptions". In other words, there ain't no data there, either.

I'm afraid to look up Sneath, for fear that it will turn out to be an estimate of cell number derived from measures of DNA content, with a bit of subjective eyeballing tossed in. At least that would explain why Kauffmann could find a correlation between DNA content and complexity, though.

From my perspective right now, this whole issue of cell type number is looking like a snipe hunt, a biological myth that is receding away as I pursue it. Does anybody know any different?

I didn't have quick access to the all-important Sneath paper, but Mel Turner did, and he summarized it for everyone.

…there's no original data. Here's the relevant text:

"Although there are many possible correlations, for example, that between cell size and DNA content (135), it seems plausible to suggest that the amount of DNA is largely determined by the amount of genetic information that is required and that this will be greater in the more complex organisms. Fig. 38-2 shows the distribution of DNA contents of haploid nuclei taken from the literature, mostly from several compendia (4,10,87,128,134,135). The haploid nucleus was chosen for uniformity, and because the genetic information in diploids is presumably mostly reduplicated. The values are plotted against the number of histologically distinguishable cell types in the life cycle of the organism (suggested by a figure of Zimmerman (141)). This number is some measure of complexity, and was estimated from standard textbooks (5,13,85,126). In Fig. 38-2 organisms incapable of independent multiplication (e.g., viruses) have been assigned to the 0.1 cell level. The values for some well-known organisms are shown in Fig. 38-3."

Fig. 38-2 is a graph of number of cell types (Y-axis) vs. log content of DNA/gamete, with a extra superimposed x-axis of "number of bits" ("one nucleotide pair = two bits").No species names are indicated, but there are clusters of multiple separate points plotted for "mammals", "birds", "fish", "angiosperms", "bacteria" "algae & fungi", "viruses", etc. [oddly, he scores "RNA viruses" as having DNA content].

Fig. 38-3 purports to show "the histological complexity of some well-known organisms" with a log graph placing examples like "Man, Mammals" at the top with ca. 200 cell types, and "birds", "reptiles", "amphibia", "fish" [again, no species names] just below that, then various cited generic names of plants animals, protists and bacteria [e.g., Pteromyzon (sic), Sepia, Helix, Ranunculus, Polypodium, Escherichia, etc.; about 50 taxa altogether]. Strictly unicellular organisms with different cell types during the life cycle [cysts, spores, gametes, etc. are properly scored as having histological complexity; e.g., Plasmodium scored with ca. 6 cell types]

There's also discussion of the significance of the reported rough correlation of complexity and DNA content, a suggestion that histologically complex organisms should require disprortionately many times the DNA amounts of simple ones [cell specialization and regulation], a mention of some plants and amphibia with 'unexplained' very large DNA contents, and a page of stuff on base-pair changes, informational "bits", & Kimura.

Table 38-3 "estimated amount of genetic and phenetic change in vertebrate evolution" looks pretty odd indeed [especially in a paper on bacterial biochemistry!]; it apparently tries to say something about times of origin and amounts of DNA change [% and in "bits"] for classes, orders, families, genera, species.... a bit dubious, to put it mildly.

Looking at the References list for the anatomical data sources cited for Figs 38-2 and 38-3, the "standard textbooks" were indeed just that:

5. Andrew, W. 1959. Textbook of Comparative histology. Oxford Univ. Press, London

13. Borradaile, L.A., L.E.S. Eastham, F.A. Potts, & J. T. Saunders. 1941. The Invertebrata: A manual for the use of students. 2nd ed. Cambridge Univ. Press, Cambridge.

85. Maximow, A.A. & W. Bloom. 1940. A textbook of histology. W. B. Saunders Co., Philadelphia.

126. Strasburger, E., L. Jost, H. Schenck, & G. Karsten. 1912. A textbook of botany. 4th English ed. Maximillian & Co. Ltd. London.

The Zimmerman citation from above is: Zimmerman, W. 1953. Evolution: Die Geschichte ihrer Probleme und Erkenntnisse. Alber, Freiburg & Munchen 623 pp.

Stephen Jay Gould wrote about a similar issue in Bully for Brontosaurus, in his essay on "The Case of the Creeping Fox Terrier Clone", which describes how certain conventions, like describing the size of a horse ancestor as being as large as a fox terrier, get canonized in the literature and then get reiterated over and over again in multiple editions of textbooks.

This one isn't as much a textbook problem as it is a deeply imbedded myth in the scientific literature. We haven't even defined what a cell type is, yet somehow, again and again, we find papers and books claiming that it has been accurately quantified, and further, that it supports a claim of increasing complexity that puts humans at the pinnacle.

STOP IT.

I seem to have written about this problem every 6 or 7 years, to no avail. I'll probably complain again in 2020, so look for a version of this post again, then.

Categories

More like this

Great article, thanks. Would you care to speculate about what kinds of parameters could be used to create a taxonomy of cell types?

The argument about number of cell types re fish/amphibians/"arthropods" is interesting. At a guess, fish would have a vast array of cell types compared to amphibians, but that's only because they're more widespread. What does that tell us about the complexity of any given lower level grouping? (Does it make sense to compare any group of animals by genus and number of cell types and ask, "What does this tell us about phylogenetic history, especially in relation to completely unrelated groupings?")

I suspect that the more familiar one is with an organism, the more types of cells will be recognized. Ask a plant biologist how many kinds of lymphocytes there are in a mammal, and he might remember two from freshman biology. Ask an immunologist, and she might come up with a couple of dozen.

By Nick Theodorakis (not verified) on 04 Nov 2013 #permalink

@Nick: That's kind of what I was thinking. Thanks for trying to answer my disorganized question.

Methinks it is like a weasel.

Does it make sense to compare any group of animals by genus

No. Genera don't even exist outside of classifications.

By David Marjanović (not verified) on 12 Nov 2013 #permalink

Great article, thanks.

By jump manual (not verified) on 13 Nov 2013 #permalink