Rounding and Bias

By goodmath on March 1, 2009.

Another alert reader sent me a link to a YouTube video which is moderately interesting.
The video itself is really a deliberate joke, but it does demonstrate a worthwile point. It's about rounding.

The overwhelming majority of us were taught how to round decimals back in either elementary or middle school. (I don't even recall exactly when.) The rule that most of us were taught is:

If the first digit after the rounding point is 0, 1, 2, 3, or 4, then round the previous digit down;
If the first digit after the rounding point is 5, 6, 7, 8, or 9, then round the
previous digit up.

Here's the problem: those rules are wrong.

The problem is that if the first digit after the rounding point is zero, you're
not really rounding - that is, you're not doing anything that changes the value of the data point. But if the first digit after the rounding point is 5,
then it's exactly halfway in-between; it's not closer to the either the rounded up value or the rounded down value - it's exactly between them. Always rounding 5 up will create a bias, because it's taking the point at the middle, and shifting it as if it were closer
towards the upward side.

To demonstrate, let's try an easy example. Suppose we've got the following set
of numbers: {0, 0.5. 1, 1.5. 2, 2.5, 3, 3.5, 4, 4.5}. Let's compute the mean
of those numbers: 22.5/10 = 2.25.

Now, let's round them off: {0, 1, 1, 2, 2, 3, 3, 4, 4, 5}; and then compute the mean: 25/10 = 2.5.

With the standard rounding rule, we've biased the numbers upwards enough to create a significant error!

The correct way to round is to randomly round 5s either up or down. The standard rule, used in most scientific settings, is to pick either odd or even as the "preferred" outcome, and to always round 5s towards the preferred outcome. If we try that with our example, using
preferred even, the rounding is {0, 0, 1, 2, 2, 2, 3, 4, 4, 4}. Taking the mean of that, we get 22/10 = 2.2 - which is significantly closer to the mean of the original numbers than the
mean rounding 5s up. The practice of rounding up adds a systematic bias to the data. It's a very small systematic bias, but it's a real one.

Does it matter? Not usually. As the commentary to the video points out, over the space of a couple of years, that systematic error in rounding gas prices amounts to about a dime. For most things in our daily experience, the difference between random rounding and upward rounding for 5s is just not significant. But if you're doing statistical analysis of
large quantities of data, or you're doing computations that rely on a high degree of
precision, then it can introduce enough error to foul your results. If you're doing statistical analysis, it can do things like make an insignificant result appear to be statistically significant. If you're doing high precision computations for things like
navigation of a space probe through a gravitational slingshot, it can introduce enough error
to crash your probe.

More like this

Mark,

Your explanation only works if you're removing exactly one significant digit when rounding (ie. written as real numbers, the 0 or 5 that gets chopped off is followed by an infinite string of 0's). If you assume that you are very likely to encounter a non-zero digit somewhere beyond the digit that you're rounding off, then lopping off a 0 is indeed (almost) always rounding down, and also the "rounded up" value is (almost) always closer to the "true" value than if you just lopped off the 5 and rounded down.

What #1 said.

#1 - nonsense. By what process in the world do we produce truncated numbers like you suggest? You have this odd idea that if I take some measurement, and get 2.5 as a value, then the true value is of the form 2.5xxxxx where I just don't know what the x's happen to be (i.e. the measured value is just a truncated version of the true value). But it's not. If our instruments are good, we think the value is near 2.5. Maybe a little above or a little below. There just ain't many ways to produce data where we know all the digits are true in the truncation sense.

-kevin

Re #3.

o $1.09 rounded to the nearest dollar.
o 24-bit sample rounded to 16-bits

Comments #1-3 indicate it is time for a post on significant figures. Here is the quick version:

2.5 actually means a between 2.45 and 2.55 if this is not what you want it to mean you could perhaps write 2.50 or 2.5 +/- 0.2. if you want exctly two and a half it is properly written 25 *10^1 not the lack of any decimal point makes a number exact.

WRT Gas: The pump at my station measures price to 4 sig figs and volume to 5. I think the means that it only rounds wrong one time in 200.

"If you're doing high precision computations for things like navigation of a space probe through a gravitational slingshot, it can introduce enough error to crash your probe."
In that case maybe you shouldn't be rounding.

#6, It's the basic problem of computation. You can't store infinite length numbers on a computer, except for symbolically. The second you can't store, and calculate, everything symbolically, you have to account for the need to truncate or round. And it's not even numbers you would think need special handling. 0.1 is a classic example of a number that cannot be stored exactly using IEEE754 floating point, because .1 is a non-repeating fractional number in base 2. This is why you need to use proper numeric methods to guarantee N digits of accuracy, and this post about rounding is an example of how to reduce the error of the least significant number.

Often you have more than one non-significant digit, ie, digits you want to round away. In those cases #1 is correct. It's not that you have 2.5xxxx where you don't know x, it's that you have 2.534 and you don't care about anything after the decimal point.

Also, I would say that taking 1.0000(etc) to 1 actually IS rounding, it's just rounding with a no-op, in the same way that dividing something by one is still dividing... But that's a definitions thing...

Re #4:

$1.09 rounded to the nearest dollar? $1
24-bit sample rounded to 16-bits? my argument applies.

How about this: $1.05 rounded to the nearest dollar. Mark is right. There is no single "nearest" dollar. They are both equally near. #1 tried to imply that $1.05 really stands for a true value of $1.05+delta, with delta>=0, and therefore the "nearest" dollar should more likely be $2. This is nonsense. It is almost always going to be $1.05+/-delta for any kind of real sampling or measurements.

Just to be pedantic, I assume that when you say 1.05 to the nearest dollar you mean 1.50. But the point is that 1.50, ok, is equal, but 1.5x where x>0 is NOT equal, no matter

Yeah I assume that's what #9 meant.

#8 - I think any rounding algorithm would have to loose accuracy in general.

Mark is right about rounding (for the record, so am I, although it turns out I might be wrong re. gas pump rounding, but no one really knows, because depending on who you ask the machines are either much less or much more accurate than I gave them credit for in the video).

It is kind of my lifelong dream to be deemed moderately interesting by people who like math (I went so far as to write a novel about such people), so I appreciate the link and the thoughtful commentary. -John

#12: I still maintain that you and Mark are ONLY correct about (in the case of rounding to the next integer) xxxx.5 EXACTLY. if you prefer evens, and you round 4.51 down to 4, you are doing it wrong.

I agree a post on significant digits is needed. If you measure 3.52, what you know is that what you are measuring is 3.5xxx..., where 0.0xxx... is close to 0.02. (how close depends on your tool, and should be specified.)

Well, sure, but 3.51 isn't 3.5. Obviously this is only relevant if the calculation being done ends either by 0'ing out or if the calculator in question rounds wrongly.

(Example: I was taught in third grade that 3.3345 rounded to the nearest penny would be 3.34, because you have to round up the 4 and then you round up the 3, which is totally ludicrous. But I have heard--although no confirmation from the nice people at exxon--that gas pumps regularly round this way.)

re: 15, wait, so you're saying that 3.3344444444445 gets rounded to 3.34??? that's dumb. If you were taught that in 3rd grade your 3rd grade teacher should be fired. from a cannon.

Rounding isn't a recursive process. You pick a point, and round.

If you're doing high precision computations for things like navigation of a space probe through a gravitational slingshot, it can introduce enough error to crash your probe.

Especially if readings are processed in recursive equations, where little errors can accumulate over time.

Rounding is a form of quantization. And quantization can be done in various ways (truncation, rounding, rounding toward 0, rounding toward infinity, etc.). And quantization error (quantized value - actual value) can be handled by adding noise (dither). And dither can have a Gaussian PDF, or other PDFs, e.g., triangular, depending upon the application.

Anyway, MarkCC is mostly correct, and even in the case when he is less than correct, I get his point.

Thanks for all the great information, MarkCC. I always enjoy reading your blog.

I would also like to join those asking for a post about the concepts and methods regarding significant digits.

I have tried to read material about it from NIST and others in the past, but my understanding is still very low, and I would appreciate your treatment of this subject, if it's something that would interest you.

Thanks.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Moving on

August 2, 2010

Finally, at long last, I can tell you what I've been up to with finding a new home for this blog. I've created a new, community-based science blogging site, called Scientopia. With the help of many wonderful people, we're ready. We launched this morning. So to continue following GM/BM - along with…

Goodbye, Scienceblogs

July 7, 2010

So my decision is made. I'm closing up around here. I'm in the process of working out exactly where I'm going to go. With any luck, Seed will leave this blog here long enough for me to post an update with the new location. But I'm through with Seed and ScienceBlogs.

Seed, Conflicts of Interest, and Sleaze

July 6, 2010

As my friend Pal wrote about, Seed Media Group, the corporate overlords of the ScienceBlogs network that this blog belongs to, have apparently decided that blog space in these parts is now up for sale to advertisers. We've been advertiser supported since I joined up with SB. I've never minded…

Searching for Topics

June 28, 2010

As regular readers have no doubt noticed by now, posting on the blog has been slow lately. I've been trying to come back up to speed, but so far, that's been mainly in the form of bad math posts. I'd like to get back to the good stuff. Unfortunately, the chaos theory stuff that I was…

Saturday Recipe: Ginger Scallion Sauce

June 26, 2010

Today's recipe is something I made this week for the first time, and trying it was like a revelation. It's simple to make, it's got an absolutely spectacularly wonderful flavor - light and fresh - and it's incredibly versatile. It's damned near perfect. It's scallion ginger sauce, and once you try…

More like this

Moving on

Goodbye, Scienceblogs

Seed, Conflicts of Interest, and Sleaze

Searching for Topics

Saturday Recipe: Ginger Scallion Sauce

How Big Is Our Galaxy?

Weekend Diversion: I want to go to there

Climate Change, Cat 6 Hurricanes, Al Gore