Thursday, January 13, 2011

Much Better Error Bars for Within-Subjects Studies

For any scientists reading this blog, and of those, the ones who use within-subjects designs, this will be a revelation. Everyone else should skip. There's a problem that came up in our last set of reviewer comments, that if you have a within-subjects factorial design, standard error bars or 95% confidence intervals on your bars representing means do not portray the results of the repeated-measures ANOVA. Basically they're way too big, because they don't incorporate the benefit of comparing people to themselves; they include the between-subjects variance. So the basic trick for comparing two means by eye to determine significance, as described here

http://scienceblogs.com/cognitivedaily/2008/07/most_researchers_dont_understa_1.php

(in a nutshell, if they represent standard error the error intervals have to be separated by about 1/2 interval before the difference between the means is signicant at alpha = .05)  doesn't work. You lose the very desirable property of being able to tell the story of your results purely in your figures.

To the rescue comes Cousineau (2005)'s within-subject confidence intervals.
http://www.tqmp.org/Content/vol01-1/p042/p042.pdf
The idea is so straightforward and easy to implement: if your data is organized with participants as rows and conditions as columns, simply take the mean of each row and subtracted it from the items in that row, making a new table. Then add the overall mean of the original table to all the entries of the new table. Each column will have the same mean as in the original data, but the row means will all be identical to each other and to the overall mean. Now construct your standard error bars or 95% confidence interval bars in the usual manner. Then the error bars will represent only the difference due to condition differences, and visually comparing any two error bars in the manner described above is the equivalent of doing a paired-samples t-test between the means (I haven't doublechecked that) When we did this to our latest paper, the difference was like night and day: all of a sudden nearly all of our significance findings were clear and easy to read off the bar graph.

There's a risk here, that your readers may not know what the heck you're doing, or even be suspicious that you are trying to make your results look better than they are. But the visual pairwise comparisons will be very close (not necessarily exactly the same) as the pattern of results from the corresponding ANOVA (and at least one reviewer out there is certainly applying that kind of visual test even when inappropriate, that is, for a within-subjects design), and there is a paper to cite for the idea. It has now been cited 67 times so it appears the idea is catching on.

Read the Cousineau paper first, but as a late breaking correction to it there's this paper, Morey 2008:
http://www.tqmp.org/Content/vol04-2/p061/p061.pdf
It appears that the error bars are slightly too small when done the Cousineau way, but can be fixed by an easy numerical correction.

3 comments:

Anonymous said...

AMAZING! I've been thinking of something like this for my within-subject design data because the figures never do it justice. Thanks for the post and the links to the papers!

I just wanted to add that if you are doing a mixed model ANOVA, then you need to tweak the method a tiny bit. In the last step of creating a new data set, when you add the mean of the entire data set to each value, you should be doing this part "between groups". i.e. calcualte the mean separately for each group, add the corresponding mean to the data. You can check that you've done it right by running the ANOVA and making sure the means are exactly as they were before (only the standard errors should change).

Anonymous said...

This may be a case of miscommunication between disciplines. This practice has been encouraged in psychology since the mid 90's (see Loftus & Masson, 1994).

Anonymous said...

The above comment misses the point. Loftus & Masson error-bars are based on the entire main effect's MSE and use the same value for all conditions. In other words, L&M's bars are common (equal) error-bars. In contrast, Cousineau-type error-bars (which must be corrected using something like Morey's trick or they are flat-out wrong), produce condition-specific error-bars. That's the whole point to Cousineau: to produce condition-specific error-bars.

With that said, it needs to be clear that even Morey-correct Cousineau-type error-bars do not take violations of sphericity into account and, therefore, do not match exactly what matters to the pairwise tests. This is why Loftus has a new paper (maybe still in press) that pretty much says to stop using any sort of condition-based error-bars and only use paired-differences error-bars. The problem with Lostus' new suggestion is that makes for ugly-smugly plots, so we're almost back at square one right now.