Home Science in Art & CultureBooks What’s stats got to do with it?

What’s stats got to do with it?

by Eva Amsen

I recently learned that I have an above average number of legs. This is no cause for concern: most of you do, too. It was something I first learned when watching Hans Rosling’s The Joy of Stats BBC documentary. He pointed out that, since there are a few people with only one leg or none at all, the average number of legs is about 1.99 – just short of most people’s two.

It shows that sometimes statistics are meaningless. There is no practical application to knowing the exact average number of legs per person. If you told a jeans manufacturer that he was accounting for too many legs, since the average person has less than two, he’d rightly say “What does that have to do with anything?”

And that is pretty much how I’ve seen all statistics for a very long time. Sometimes I understood it, or at least understood how to manipulate some numbers according to the proper rules, but I always thought “What does this have to do with anything?”

I managed to go through all of high school without learning a single thing about statistics. It was part of the other math class, the kind the kids that went into economics took. For the science path, you learned to solve differential equations or calculate the area of a plane intersecting a cube at weird angles, but not statistics.While in high school, we did a visit to the university

While in high school, we did a visit to the university once, and took intro classes in two departments. In the biochemistry group of the chemistry department (where I would later study), we isolated DNA from  E. coli, and got to take it home in a little vial in ethanol. Awesome. In the economics department, we calculated the odds of finding a yellow marble in a jar of lots of blue and yellow ones. I had no clue what this had to do with economics.

In undergrad, there was a bit of statistics in one of the required courses in the chemistry program, but it ended up being only one question on one exam, and you could get it wrong and still do really well overall. No need for statistics there.

I didn’t really need statistics at all until I had to analyse the data that came out of my PhD research. I tried to look at introductory texts and websites, but nothing made sense. Okay, so that’s the formula for the Student’s T test, but what does that have to do with anything? All textbooks that explained statistics either gave some formulas that I still didn’t know when or how to use, or were talking about situations entirely different from the research I was doing. I couldn’t relate the examples about profit margins to my data from the lab.

Eventually, a lab mate and I convinced our supervisor to buy Intuitive Biostatistics for the lab. It’s by the guy who developed GraphPad, but it’s not shilling the software in any way. It explains when to use which kind of calculations, and why, using examples from biology. Suddenly it all made sense, and I could analyse my work, and even give sensible answers to the questions about statistics at my defense, such as why I calculated confidence intervals for some of my experiments.

My feelings about statistics are best summarized in this section from my thesis. It’s one of my favourite paragraphs in there, and yet another way of saying “What does this have to do with anything?”

“However, a statistically significant difference between transfected and untransfected cells does not necessarily correlate with a biologically significant difference. This is clear from the data collected from non-silencing controls. After both 48 hour and 72 hour transfections, two out of eight non-silencing controls show a statistically significant (P<0.05) reduction (…). This means that there is a strong possibility that about a quarter of all “hits” for which P<0.05 are a false positive.”

But oddly, as much as I dislike the kind of stats that come with data analysis – the kind that I usually don’t see the point of – I really LOVE graphs and data.

Here is a graph that shows how much I love graphs compared to how much I dislike statistics:

 

graphs and stats

It’s a little confusing, though, because this love includes visualisation of website access numbers – which are still called “statistics”. But they’re just the data, and nobody is asking me to do a T-test on the numbers and then grill me about R values and curve fitting and probabilities.

I also really love Information is Beautiful. I have the book, and enjoyed David McCandless’ talk at Science Online this year.

And in undergrad, where I avoided statistics like the plague, a friend and I spent a few days drawing absurd graphs of absolutely everything, based on fictional scenarios. My favourite of the batch was the number of visitors to the university cafeteria plotted against the cooking time of the green beans they served. It peaked at 10 minutes, but of course it was never that busy, because the beans were always cooked for about 20 minutes.

Years later, in October 2008, a few months after I got completely annoyed by prettifying all the graphs in my PhD thesis, I made a similarly silly graph showing my happiness and IQ over time.

I also make more serious graphs for fun. For the past eleven months (since I moved), I’ve religiously tracked every single penny I spent, and sorted the resulting amounts in a pie chart, to see where my money goes. The different categories are roughly coloured by whether I can reduce them or not. (I’m not showing the legend for personal reasons, but can tell you that pink represents money spent on the cat. I didn’t have my cat with me most of the year, so this is pretty small, and the only section where I’m planning an increase in spending. Although, as I type this, the same cat is pulling decorations off the Christmas tree in the background. Reduce spending! Reduce!)

 

Another graph I really like is this one. They’re individuals visiting the Node, showing clear dips in weekends.

We now (finally) got stats on our Nature Network blogs as well, but since mine only started tracking on December 24, they’re not very exciting yet. I’m intrigued, though.

Why do I like graphs but not the other kind of stats?

What data visualisation does that R values and T tests don’t, is make it immediately clear what you’re looking at, and how it’s relevant to the real world. Setting P Here’s a graph I’ve discussed on the blog before. It comes from a study showing that in a certain field, reported P-values are almost never just above 0.05, implying data manipulation or adding more experiments to get to just below 0.05 and allow it to be “significant”.

And I’ve got to say: as much as I don’t like the arbitrariness of “statistical significance” and the rules about which analysis to use for what kind of data, I really like this graph.

So, you see, it’s complicated. Initially, I was going to title the post “The Fear of Stats”, but as I started writing, I realized I wasn’t scared of stats: just bored and annoyed and wondering, indeed, what they had to do with various things. Keeping track of lots of data makes for pretty graphs and useful trends. Those kinds of stats are cool. But statistical analysis of data doesn’t always make sense to the people using it. Not just because it’s complicated, but because it’s not always informative of what they’re looking at. It has to make sense in context. You have to be able to actually answer the question “what’s stats go to do with it?”, and not just use it rhetorically like I did in most of this post.

Save

Related Articles

19 comments

MuKa December 30, 2010 - 12:47 AM

Great post! I had one semester of stats during my biochem undergrad but didn’t touch stats until my honours year. Now that I am doing a psychology PhD, stats is a daily occurance, and I had to play catch-up. I found Salkind’s Statistics For People Who (Think They) Hate Statistics very helpful in explaining the basic concepts.

Heather Etchevers December 30, 2010 - 12:04 PM

It’s something in the water today? GrrlScientist also just blogged about data visualization.

I also love xkcd graphs. PhDcomics has a lot of graphs that communicate clearly, as well. And their axes are always labeled.

Heather Etchevers December 30, 2010 - 12:16 PM

Grr. Too many links in my other comment. Have a field day. Held for moderation.

Heather Etchevers December 30, 2010 - 12:19 PM

"What’s stats, but a second dis-tri-bu-tion…?"

HT Tina Turner.

It’s lunchtime, methinks.

Eva Amsen December 30, 2010 - 12:41 PM

Approved the link-heavy comment. No, I’ve had this post in my head for weeks – just finally had time and motivation to write it.

Linda Lin December 30, 2010 - 3:03 PM

 Oops, linked Peak Productivity twice. Here’s procras-correlation 

Richard Wintle December 30, 2010 - 3:54 PM

Excellent post, Eva.

Second-year stats was one of my worst University marks – not because it was difficult, but because it was a year-long course and I was bored to tears by about February. It did, however, have the mild advantage that we could log in remotely to do assignments, picking up the printouts from outside the room at the Arts & Science building later.

The key was that you were allowed to name your assignment anything at all, so as to make finding it easier. And it was printed on an old-style dot matrix printer, with your chosen name in HUGE letters on the front. Just imagine the fun and games. My most memorable one was called BUTTF****, which is a bit rude but amused us greatly at the time.

Also, reading Grrl’s post and yours reminds me that someone once told me that pie charts are a really terrible way of presenting data, because humans just aren’t very good at discriminating differences between small angles. Unfortunately I’m far too lazy to try and dig up a reliable source for that, though.

Austin Elliott December 30, 2010 - 4:47 PM

 I first heard that question used, a few years back, in the slightly different form:

 "Have I got the average number of arms?"

– in a medical student viva voce exam. I later heard it was a favourite question often used to test medical students’ understanding of statistics, and/or whether they could think on their feet. The person I heard use it was a well-known Professor of  Psychology and Medical Education who was one of our external examiners. I remember this was in a viva for a student on the pass/honours/distinction boundary for his/her preclinical exams, and faced with three examiners, including me and the questioner.

A wrinkle to the problem is that the correct answer can actually be “Yes” or “No” depending on which kind of average you use. It is “No” for the arithmetic average, of course, but it would be “Yes” if you were to take the median or the mode of the distribution.

Eva Amsen December 30, 2010 - 4:55 PM

Wouldn’t the median be 1?

Bob O'Hara December 30, 2010 - 5:00 PM

If you don’t find the stats you’re calculating are useful, then you’re doing it wrong. The problem is that stats is taught so poorly to scientists: it’s usually presented as a series of recipes to get the right p-value, rather than giving any understanding of what data anaöysis is about (i.e. extracting information from data). Graphs are one way to do this, but simply calculating the right statistic is also useful (e.g. saying that 98.4% of Swedes have 2 legs, rather than giving the mean).

Mike Fowler December 30, 2010 - 6:47 PM

Eva, you should have done some psychology or ecology courses as an undergrad. They generally do statistics in a much more user friendly manner. In 2nd year Psych, we had to do a 2 way ANOVA by hand. Tedious, but extremely satisfying, and gives a great insight into what the heck is going on.

I’ll also point out that graphs mix volatiley with human brains. We look for patterns everywhere, and find them where they don’t exist. An anecdote: one of the PhD students in my old group teamed up with his Dad – a game and fisheries employee with expert in the field for decades – to analyse some game-bird time-series to see if weather was affecting population trends. His Dad was convinced that the the weather had a critical effect on the populations. The statistical analysis suggested they did not. Other things were far more important (forest age, size, etc). His Dad didn’t believe the stats…

But you can be comforted by the fact that more people now realise that ‘biological significance’ can have a different value to ‘statistical significance’, and p-values can be modified by, or complemented with appropriate effect sizes which relate much more sensibly to the biological issue you want to measure.

Eva Amsen December 30, 2010 - 6:56 PM

You can’t really choose which courses to take in Holland: you choose a program, and take all the courses in there. We didn’t have psychology or ecology courses in the chemistry program. The first few years were completely fixed, and after that you just picked a stream of chemistry to specialize in.

Austin Elliott December 30, 2010 - 9:56 PM

The real point about taking stats as part of psychology or ecology, I think, is less the doing of the stats than having statistical ideas presented to you in a context where (i) you can see why they are useful, and (ii) where the concepts (like the distribution of a value of some parameter within a population) have clear meaning. As a chemistry undergrad in the early 80s I had to do a full term of statistics in my first yr, but I promptly forgot the lot as stats was never mentioned in a single one of our chemistry or biochemistry courses.   

You can see the same thing with our bioscience undergrads. We  bang on endlessly about statistics -partly since it is something examiners always say students are weak on – but it is usually very clear that it isn’t until their final year research project work (where they actually have some data of their own to analyse) that the students grasp what the point of it is.

Cath Ennis December 31, 2010 - 12:51 AM

Excellent post, Eva! I’m also stats-phobic (but love graphs, especially silly ones), so I might see if my boss will also shell out for a copy of Intuitive Biostatistics!

Eva Amsen January 4, 2011 - 2:10 PM

I’m currently analysing Node stats to the music of La Roux.

Music improves stats.

Tom Webb January 5, 2011 - 11:08 AM

 Enjoyed the post! Bit late on this, but re. the comment about pie charts above, Edward Tufte is very anti them (‘the only thing worse than a pie chart is several pie charts’) Might have to get my intended post on statistical graphics in shape soon…

Åsa Karlström January 13, 2011 - 5:25 PM

(I’m late – blaming the holidays and catching up)

Love the post Eva! I took statistics as undergrad and then again as graduate student but the one thing that stuck with me was always "what is the comparison about" since some of that "significancy" discussion would be what you refer to in the "1/4 is false positives".

As for ideas, I now want to go home and chart my costs/finances for last year. I fear there will be one HUGE category, 4 equally bigger ones and a some smaller ones – like your dark blue and the other blues… and then some pink ones 😉

Barbara Ferreira January 15, 2011 - 9:43 AM

This may come a bit too late but it’s so cool I had to share it. It’s the TED talk on the beauty of data visualization. I strongly recommended it for those of you who love graphs.

Comments are closed.