A Statistics Question

By mikethemadbiologist on July 7, 2006.

Since my amateur discursion into stochasticity appears to have flushed out all of the mathematically savvy, I'm going pose a real life statistics question for you. I have some data that are non-normally distributed (in fact, they really don't seem to fit any distribution well--and, yes, I've tried various transformations and the data still don't fit anything). If the data were normally distributed, I would like to perform ANOVA to partition the sources of variance (using percent sums of squares).

Since the data aren't normally distributed, ANOVA is not the right test to use. Because I'm using a full-factorial model with four factors, using Friedman's test or any of the one-way non-parametric tests doesn't seem to apply either. I should add that, while my data are continuous, I can transform the data into ordinal and/or nominal categories if that would help. While logistic regression could circumvent the non-normality problem, I'm unaware of how one partitions variance with that test (if there's a way to do so, please let me know).

Any ideas? If there are any statistics programs or R packages out there that can handle this, please let me know. Don't be shy...

More like this

Some Thoughts About the Statistics of the Human Microbiome

Reporting on the human microbiome--the microorganisms that live on and in us--is quite the rage these days. As someone who is involved in NIH's Human Microbiome Project, it's a pretty exciting time because the size and scale of the data we're able to generate is unprecedented. This also means we…

Correlations between gun ownership, suicide and homicide

R Bryner said: Changing what is continuous data(numbers) to ranks to do an analysis on them is throwing information away. Why is it done, I will tell you why, someone did not like the information and decided to remove it. The funny thing is it even has a legitimate sounding name. Yeah, "non-…

Iterative Hockey Stick Analysis? Gimme a break!

This past weekend, my friend Orac sent me a link to an interesting piece of bad math. One of Orac's big interest is vaccination and anti-vaccinationists. The piece is a newsletter by a group calling itself the "Sound Choice Pharmaceutical Institute" (SCPI), which purports to show a link between…

I remember vaguely that the output for logistic regression (in Stata, SAS, SPSS etc) has the amount of variation 'explained' by the model called 'deviance' The amount of variation due to a certain factor is hence the difference between deviance with the factor in the model and without. It is also used for the chi-square test for significance of the factor.

R-wise, maybe the examples in Frank Harrell's Design::lrm package would help:

http://lib.stat.cmu.edu/S/Harrell/help/Design/html/lrm.html

With a logistic model the variance isn't a good measure of fit because the domains of the errors is restricted. Maybe your data is nasty enough that the optimizing some function of the model errors doesn't give you meaningful results, and any partitioning that error measure won't tell you anything significant. You need to have some idea of the error distribution to make meaningful inferences about apportioning that error to factors.

I had a similar problem.
I'm using JMP (the full-on, expensive version).
JMP lets you decree that your continuous numbers are ordinal, without having to change the actual numbers. (I found it worked better when I truncated my numbers to 3 or 4 sig figs, though.) Then it lets you run a general linear model, giving you logistic regression output (instead of parametric). The GLM can handle ties in the ordinal data, multiple input variables, interactions, and even a little nesting.
It gives you the likelihood ratio tests for each effect -- like effects tests in ANOVA -- and provides parameter estimates (with P-values and 95%CIs) for each level in each effect. Ergo, the analysis does partition the error for you.
It also has the refs for the analyses it's doing, so you can both figure out how the logistic GLM handles the error, and cite the authorities to back it up.
Finally, for those of us with advisors unfamiliar with logistic regression, you can run a (technically invalid) parametric ANOVA that is exactly congruent, just by switching your response variable back to "continuous." When they give the same answer, this congruent-but-invalid analysis can be useful for educating advisors.
I can email you an example output if you're interested.

PS
Mike, you've got two copies of the original post.
And they've got different comments.
Read both!

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Program Announcement: I'm Moving

September 1, 2011

I've dropped some hints in the past that my relationship with ScienceBlogs would be...altered. Well, I've decided to leave. Mostly, it had to do with the issue of pseudonymity, although I'm very excited to hang out my own shingle once again. I don't want to rehash the issue of pseudonymity,…

Note to Unions: This Is Not How You Build a Coalition

September 1, 2011

The old saw that 'we hang together or we get hung separately' is a perfect description of how the left has disintegrated into irrelevance. Too often, groups will focus on modest gains for their own narrow constituency, while selling out other allies. Over the long term, each component of the…

Links 8/31/11

August 31, 2011

Links for you. Science: Underground river 'Rio Hamza' discovered 4km beneath the Amazon What do accommodationists do about creationist politicians? I've Been Told You Can Get Flu From the Flu Shot: False! Federal Work Suspension of Leading Arctic Scientist Ended as Investigation of His…

Meet the New New Math, Same As the Old New Math? What We Can Learn from Finland

August 31, 2011

Recently, The New York Times published an op-ed calling for curricular changes in K-12 math education: Today, American high schools offer a sequence of algebra, geometry, more algebra, pre-calculus and calculus (or a "reform" version in which these topics are interwoven). This has been codified by…

Links 8/30/11

August 30, 2011

Links for you. Another Scientist Calls Out Sen. Coburn's Misleading, Juvenile "Report" XMRV: ITS EVERYWHERE! UUUUUGH! ITS IN MY RACCOON WOUNDS! AND MY QIAGEN COLUMNS! Coulter Goes All Science-y in Bid to Disprove Evolution Yet another bad day for the anti-vaccine movement 2011 Antibiotics: Killing…

A Statistics Question

More like this

A Statistics Question

Some Thoughts About the Statistics of the Human Microbiome

Correlations between gun ownership, suicide and homicide

Iterative Hockey Stick Analysis? Gimme a break!

Program Announcement: I'm Moving

Note to Unions: This Is Not How You Build a Coalition

Links 8/31/11

Meet the New New Math, Same As the Old New Math? What We Can Learn from Finland

Links 8/30/11

Send me your volcano questions

Weekend Diversion: A Second Language

Ask Ethan #37: The Earth’s Motion Through The Galaxy (Synopsis)