Quantization of Books 3: How Many Books Is That?

By drorzel on December 28, 2009.

When I saw the data generated by the sales rank tracker Matthew Beckler was kind enough to put together, I joked that I hoped to someday need a logarithmic scale to display the sales rank history of How to Teach Physics to Your Dog. Thanks to links from Boing Boing, John Scalzi, and Kevin Drum, I got my wish:

For those not familiar with the concept, a log scale plots values on a scale that represents each order of magnitude as a fixed distance. So, the top horizontal line on that plot represents a sales rank of a million, the line below that a hundred thousand, the line below that ten thousand, and so on. This tends to blow up the detail at smaller ranges, allowing you to see more of the variation. On a linear scale, everything after the big downward spike at about 260 hours is just flat. Zooming in just a little, it still looks like this:

There's still a good deal of variation in the flat bit of that graph, from a minimum value of 396 to a maximum of just over 2500 (as of 8pm Eastern Sunday night), but it's hard to see just what's going on without losing the higher points of the data.

This is all very nice, but of course, the whole point of having this data is to try to extract information that you wouldn't be able to get otherwise. So, can we figure out from this plot how many books were sold in this interval?

If you recall my previous excursion into number-crunching of these data, you'll remember that I made a plot of the (downward) change in sales rank as a function of the starting sales rank. This turned out to be remarkably linear, corresponding to a model where a single sale at a lower rank produces less of a change in that rank than a single sale at a higher rank. In other words, if you start at a ranking of 100,000, selling one book leaps you past a large number of other books, while if you start at a ranking of 1,000, a single sale doesn't make as much difference.

Doing the same thing with the larger dataset yields the following plot (I've deleted a few oddball points where the rank changed by only a few places in the 70,000 range):

The blue points are data from before the publication and the big sales boost from Boing Boing/ Whatever, the red are points from after that. You can see that they clearly don't all fall on the same line. The two solid lines represent straight lines fit to the two data sets, and you can easily figure out which equation goes with which. It should be noted that while on this scale, the red points sort of look like they fit a line, if you zoom in, they really don't:

I suppose you could fit a line to that, if you were an economist or an astronomer, but I'm not going to waste anybody's time with that.

so, using this linear model, what does the big downward spike correspond to? Well, using the fit parameters from the plot above would suggest that the large jump Tuesday morning was 5.4 times bigger than the model would predict, suggesting that it represents the sale of 5-6 books.

That's nice, and all, but the problem is that the next spike down, according to the model, represents the sale of -1.4 books. That's because the fit above has a non-zero intercept, meaning that it predicts a ranking change of zero for a single sale at a rank of around 14,000, and below that level, the ranking change in negative. That's clearly wrong-- if 1.4 people returned their copies, my sales ranking would not get better.

So, how could we improve this? Well, logic dictates that a sales rank of 1 can't get any higher, so we could impose a model where the ranking change is 0 for a sales rank of 1. If we do that, the pre-publication data look like this:

I've done two different fits to this, one a linear fit constrained to go through the origin, the other a power law fit, just to have something with a bit of upward curve to it. Using the simple linear model, the big downward jump corresponds to about 2 books, and using the power-law fit, it's 4 books. Interestingly, the power-law fit gives higher values for some of the later downward jumps, with a peak of 13 for the jump from 1106 to 683 a few hours after the initial spike.

So, how many books does all this represent? Well, summing up all the changes from the power-law model gives 154 books. The same summing for the simple linear fit with zero intercept gives just 11-- the vast majority of the points after the spike correspond to less than one book's worth of that model's prediction. A third fit, using a second-order polynomial (which had a slightly better R² than either of the others) predicts around 30 books.

None of these models are particular good, though-- for one thing, the fits aren't great. And there's no particular justification for the use of a power law or a parabola-- they're just easy functions to work with, mathematically.

In the end, the best I can say is that, over the whole data period, there are just about 100 points where the sales rank improved from one hour to the next. If you take the incredibly naive picture that each of those improvements represents at least one sale, that gives a lower bound of about 100 books sold. That's more or less consistent with other peoples' analyses of what sales rank means in terms of sales.

Which of these figures is right? I have no idea, and no way to determine the answer. I won't get any kind of real sales numbers for at least six months, maybe a year (unless somebody at Scribner is feeling generous, and wants to send me numbers). What I eventually get won't be nearly fine-grained enough to determine the number of sales via Amazon in the first week after publication, either.

But, hey, playing with numbers is fun...

More like this

Quantization of Books 4: How Many Books Is That Again?

I've toyed around in the past with ways to use the Amazon sales rank tracker to estimate the sales numbers for How

The Direct Marketing Association's New Math

I came across this statistic the other day while doing some research on marketing fraud:

Quantization of Books 2: What Does One Sale Get You?

I've been playing around with the spiffy sales rank tracker Matthew Beckler wrote, because I'm a great big dork, and enjoy playing with graphs. Here's a graph of the sales rank vs.

Sunday Iconoclasm Blogging

Some irreverent souls have taken to Sunday blogging on a freethinking themes. I choose to Ozymandize* that which we worship the most: our economic system. That plant in the middle is

Your guess is probably a little low--I'd be willing to bet more like 2-400. But my personal experience with Amazon sales ranks is a little stale--several years old. But when I worked "in-house" as it were for New York publishers, I could actually see how much Amazon was ordering for any given title (and this was before Bookscan, so it wasn't always clear what actual point-of-sales were). But since the number represents a relative rate of sales, it's difficult to translate it to absolute numbers.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Go On Till You Come to the End; Then Stop

October 31, 2017

ScienceBlogs is coming to an end. I don't know that there was ever a really official announcement of this, but the bloggers got email a while back letting us know that the site will be closing down. I've been absolutely getting crushed between work and the book-in-progress and getting Charlie the…

Meet Charlie

October 30, 2017

It's been a couple of years since we lost the Queen of Niskayuna, and we've held off getting a dog until now because we were planning a big home renovation-- adding on to the mud room, creating a new bedroom on the second floor, and gutting and replacing the kitchen. This was quite the undertaking…

Physics Blogging Round-Up: August

September 1, 2017

Another month, another set of blog posts. This one includes the highest traffic I think I've ever seen for a post, including the one that started me on the path to a book deal: -- The ALPHA Experiment Records Another First In Measuring Antihydrogen: The good folks trapping antimatter at CERN have…

The Age Math Game

August 22, 2017

I keep falling down on my duty to provide cute-kid content, here; I also keep forgetting to post something about a nerdy bit of our morning routine. So, let's maximize the bird-to-stone ratio, and do them at the same time. The Pip can be a Morning Dude at times, but SteelyKid is never very happy to…

Kid Art Update

August 13, 2017

Our big home renovation has added a level of chaos to everything that's gotten in the way of my doing more regular cute-kid updates. And even more routine tasks, like photographing the giant pile of kid art that we had to move out of the dining room. Clearing stuff up for the next big stage of the…