Figure 4 time (the data strikes back)

I thought I'd expand a bit more on why Svensmarks figure 4 is unacceptable (fig 4 of arXiv; fig 6 in Cosmoclimatology: a new theory emerges). Bear in mind that there is more wrong in the article than just this, though! The fig is:

i-86faf158723e2e3b030ecf5fbfd10d35-svensmark-fig-4.png

I'm arguing about the lower line, which purports to be a 90-64S average. This is sourced to: http://data.giss.nasa.gov/gistemp/tabledata/ZonAnn.Ts+dSST.txt (90-64S zonal mean) and thats a perfectly reputable source. However, not one to be used blindly, as S does. You have to wonder about the data quality. And even a cursory think would lead you to wonder how much early data there is in there.

One hint is that the early data is more variable. If you look closely you can see this in S's fig. If you draw the raw data its more obvious; and if you take the standard deviation its 1 oC before 1957 and 0.4 after. Which is because there are a whole pile of extra stations available after 1957 which smooths things out. The table referenced says it also uses HadISST1 for SSTs in the early period, and it says its a land-ocean mean. But that isn't consistent with the change in variability (there is an other table, http://data.giss.nasa.gov/gistemp/tabledata/ZonAnn.Ts.txt, which only has the stations in. But the differences between that and the former, for 90-64S, is very small).

A good source of temperature data is the BAS READER project. A quick glance shows only Orcadas in the early years in the correct region (and even thats a bit wrong, since Orcadas is only 60 S; maybe GISS is taking 5 degree radius-of-influence to include it S of 64S?).

In fact, I can demonstrate that the early data *is* pure Orcadas by plotting it:

i-d9a0463ae24f92a771aab3160df941b2-giss-orcadas.png

Black: GISS data. Blue: Orcadas station data. Note that GISS are anomalies so I've adjusted them vertically to fit (by 3.6 oC, if you care). The early fit is so good its clear that the GISS data *is* pure Orcadas. Which means the table description is odd? Anyway. I've also added 4th order polys a-la S. Amusingly, the poly fit is fine, so I would have no complaints if S just used the Orcadas data. But then the fit wouldn't be so good after 1950; and he wouldn't be able to call it "Antarctic" temperatures.

So: just to be clear: the early part of S's data comes from Orcadas, an Island station at 60 S. it is *not* an average of 90-64S as he says: its data from a single station. Any competent Antarctic-type reviewer would have caught this glaring error. This is a teensy bit of a problem for him, as his "Antarctic theory" is most pronounced S of 75S; arguably, 60 S should actually be in rest-of-world as far as he is concerned. I've no doubt though that his theory will prove sufficiently pliable to account for this :-)

[Update: the data sparsity is a bit more obvious via a map: e.g. for 1910 (thanks G) -W]

[A read writes: Could you please be more clear? Write what is on the axes in different colors, what it is used by Svensmark for, and why you think that it's wrong as opposed to some vague cliches that the data are not enough. I thought I had been. OK, my pic (the lower one) shows in black the raw data used in the lower line in S's plot. They are the same, except I haven't put a 12y filter through the data. Overplotted in blue is the raw Orcadas station data. From 1905 to 1950 the data overly so exactly that its hard to see the blue line unless you look closely, except for a few excursions (1945 is the most obvious) that are presumably caused by more data becomming transiently available for that year. This demonstrates that the data S is using from GISS really is the Orcadas data. Therefore it isn't a mean for 90-64S or anything like it (that is obvious from the data from 1960 on, which disagrees. If anything there is an antiphase relation, especially from 1980 on. I hope thats clear now -W]

More like this

Could you please be more clear? Write what is on the axes in different colors, what it is used by Svensmark for, and why you think that it's wrong as opposed to some vague cliches that the data are not enough. Why they're not enough? I find it completely conceivable that you have a point but it is extremely hard to see any point in what you're writing.

One item I failed to notice yesterday - the GHCN data map tool has two smoothing settings - 1200 km radius (the default) and 250 km radius. The 1200 km smoothing left me thinking the peninsula was covered when it wwas not.

Not having time to do the full job I note that:

1. The temperature goes FLAT at ~1970. The curvature in the plots is due to the fitting form. Use wavelets or splines to avoid imposing a functional form on the data. This also was the point at which higher latitude stations began to appear.

2. After about 1960, the variation in the data from Oracades almost perfectly ANTICORRELATES with the averaged data. It also anticorrelates with the global rise observed in the 1930/1940s.

Oh yeah, if you look a little closer you see exactly how the pea is pulled out from under the shell. It's the old Landscheidt wide arrow ploy. Look at the phases of increase, decrease and where the max/mins are. First of all, having to use a fourth order fit tells you that you need a lot of free parameters to fit a smoothed curve. Second, the big incease in cosmic ray flux is after 1970 when the temperature is flat.
The increase in the temperature occurs when the cosmic ray flux is flat. Third, the curvature at ~1900 is an artifact of the fitting form which fools your eyes.

The smoothed red curve is nonsense for recent times; this smoothing of data reminds me of probit analysis in medical research (where you assume that all S-shaped dose-response relationships follow the integral of the normal distribution and just need to fit an LD50 for median lethal dose and a standard deviation to the data; in reality the S-shaped curve comes about because the probability of 1 organ dying from poisoning goes as simply p ~ 1 - e^x, where x is proportional to dose, but the probability of the organism as a whole dying depends on the combined response of for example 3 critical organs, hence p ~ (1 - e^x)^3, which gives the the S-shape survival curve and has nothing whatsoever to do with the Gaussian/normal distribution!), and statistical analysis of fallout particle size distributions (in early analyses they took ground deposited fallout samples and found they could reasonably fit a log-normal distribution to the data, with the logarithm of particle diameter as the variable in a Gaussian/normal distribution, but later they used aircraft and got cloud samples of early fallout before gravitational settling had occurred, finding that a power-law distribution described it far better than the log-normal distribution which approximated the grounded deposits).

What is the curve-fitting business in this global warming case? The scientific thing to do is to plot data versus predictions made from mechanistic models, not from plotting arbitrary curves through raw data to "smooth it out". Regression lines aren't scientific unless the formula used is theoretically defensible.

Looking at the lower (red) curve, at both extremes (earliest and latest data) the shape of the smoothed curve is pointing the opposite way to the data trend at those extremes. Near 1900, the smoothed curve shows falling temperature while the data points show rising temperature. Near 2000, the smoothed curve shows falling temperature, while the data points show rising temperature.

In fact, what's missing are error bars. Unless each of the temperature measurements since 1980 are accurate and representative to a standard deviation within about 0.2 K (which I don't believe), the small variations since 1980 in lower curve are statistically insignificant. Similarly the early data before 1920 probably isn't accurate enough to show statistically significant trends.

Hence, my recession curve for the lower data set would be of the form

T = a + bt/(c + t)

= -1 + 1.4(t - 1900)/[30 + (t - 1900)],

(t in years AD)

so that the red curve would become constant for times near 1900 and 2000, and only vary in the region where statistically significant variation occurs (1920-80).

I thought that this was a very interesting observation about the GISS data set and I'm going to do a cross-post referencing this. I thought that this post exemplifies nicely something that you can do on a blog, that's worth doing, which should be part of the discussion even though it's probably not the sort of thing that you'd write up in a journal article.

[Thanks. Were S to try to publish this in a P-R journal I'd write in to complain about it -W]