Limitations of ANOVA ©2005 Dr. B. C. Paul
The Data Size Effect
We Did ANOVA with one factor We Did it with two factors (Driver and Car) We could have used the same procedure to do three way (maybe add country or city driving)
Or Four Way (maybe add season of the year)
What happens to our test data requirements?
Data Set Size Illustration
Lets say we tested 4 cars with pairs of before and after MPG Minimum we would need 2 drivers on each car Ie need 8 paired driving tests
Why 8 tests for four cars? Need a way to measure variability not ed for by the test If all differences are ed for you have no SS error With no denominator you cannot do the F test to decide if your effect is significant
The Exponential Explosion
Now lets look for a Driver Effect Lets try 4 drivers Each driver drives each car twice 8 pairs of data is now 32 pairs
Now lets look for a Country, Urban Effect We’ll run the tests in downtown of a large city, in a suburb, and in the country Now we need 96 pairs of data
Now lets see if it depends on season We’ll do Winter, Spring, Summer, Fall Now we need 384 pairs of data
Problem of ANOVA
As you get more and more effects to study the amount of data needed grows exponentially
There are practical limits to how much you can do at once
There are specialized techniques that can be done We had every driver test every car under every condition. Eng 540 Design of Experiments does a lot for elegant alternatives
Relief from SPSS at a Price
Our experimental design called for equal numbers of tests under all conditions
Actually the procedure I showed with SPSS will run the test without equal numbers of tests under every condition.
The Price
If I do not check every driver in every car I will loose my ability to measure interaction effects (it will go into the SS error) If I have equal numbers of cases in every cell the test tends to be forgiving (“Robust”) against violations of the normal distribution assumption
If my cells are uneven my model will start spitting out more poorly fit answers if I violate the normal distribution assumption.
The Who Done It Mystery
ANOVA will easily tell you whether an effect exists When
it says the driver makes a difference
It
Did you have 2 wacko drivers and the other 8 are all the same?
tells you whether an effect exists, but it might still come from only part of the data set
Coping With Who Done It
You can run all sorts of plots to see if its just a few results that are different. You can run statistical tests to test different subsets of the data against each other The little options button on the SPSS field where you said Ok leads to a menu of optional tests and plots
Not going to deal with them right now other than telling you they are there.
Now I Know – What Does It Mean?
We found that the MPG improvement from the Red Rooster Carburetor varied with the driver
We put an ‘individual results may vary” disclaimer on our advertizing
Ok, but how much do individual results vary? ANOVA doesn’t tell us For some types of engineering works we have to know how big a difference something will make (Yes-No doesn’t always cut it)
Dealing with large numbers of possible causes
May have a large, but more randomly organized set of data and conditions that might have influenced it.
Trying to do ANOVA for 15 affect variables would be unwieldy
Solution to the I need to quantify the effect and for maxing out the computer memory (and corporate budget) from doing a 15 way ANOVA is a method called “Regression”
Our next exciting topic!!!