• Skip to Content
  • Skip to Sidebar
IU

Indiana University Bloomington Indiana University Bloomington IU Bloomington

Menu

ScIUConversations in Science at Indiana University

  • Home
  • Home
  • About ScIU
  • Write with Us!
  • Contact ScIU
  • The Writers and Editors of ScIU
  • ScIU in the Classroom
  • Annual Science Communication Symposium
  • Search

A skeptic’s guide to statistical significance

Posted on March 24, 2019 by Evan Arnet

“There’s no safe amount of alcohol,” CNN reported. This year the largest ever study on the health risks of alcohol was released, attracting mass media attention and igniting a science journalism furor over its interpretation.

In the study, researchers found a significant increased in risk of death for individuals who consume even one drink a day. That is statistically significant, which misleadingly has little to do with what most people think of as “significant.” Statisticians quickly went after the media’s interpretation of the study; even though the increased risk of death was statistically significant, it represented only a tiny increase in personal risk. Although, risks do spike rapidly for binge drinkers.

A meme about statistical significance. "The media" is looking past "Statistics" and focusing on a "new scientific study" instead.
Original Photo by Antonio Guillem.

The problem, however, is not with the study but with the interpretation of statistical significance. While this measure is a powerful tool and deployed throughout the sciences, misunderstandings are rampant even by scientists themselves.

Statistical significance is defined by reference to a “p-value,” most commonly p < 0.05 . These are precise technical concepts with some strengths, but many limitations. To understand this, let’s imagine a different study finding: “People taking antidepressants report a statistically significant decrease in depression (p = 0.03).”

If we see something like this, what are we supposed to think?

First, before even worrying about the p-value, don’t assume a “statistically significant” effect  is “clinically significant.” This problem is most acute in large-sample studies. It’s easy to think that a large study, with many participants, is better than a small study, with few participants; for most cases this is true. However, large studies often pick up extremely small effects, and we have to be careful not to confuse these “statistically significant” effects with meaningful effects. This is what happened with the alcohol study: there appears to be a real effect of alcohol on health at very low levels, but its effect on health is so minimal  that most people shouldn’t worry about it. Yet these overstated conclusions are not rare. Antidepressants, which are tested in massive pharmaceutical trials, are notorious for being of relatively minor effect, although this problem is not confined to psychiatric medication.

Second, and trickier, is p = 0.03. A common, but wrong, interpretation is that a p-value of 0.03 means that the study has only a 3% chance of being false. Unfortunately, the correct interpretation of “p” is clunkier and based on the idea of a null hypothesis. A null hypothesis is when we expect that there is no effect of the intervention we are studying. With regard to this example, the null hypothesis would be that the antidepressant has the same effect as placebo (sugar pills).

P-values cannot tell you how likely your preferred hypothesis (e.g., antidepressants improve depression) is to be true given your data, but can only tell you how likely your data are given your null hypothesis. In other words, they tell you how likely your data would be IF your null hypothesis were true. In this case, the null hypothesis is that antidepressants are no better than placebos.  IF antidepressants were no better than placebo, THEN we would get data that looked like ours only 3% of the time. The smaller the p-value, the less the data matches the null hypothesis, and the less confident we should be in the null hypothesis.

With the p-value in hand, scientists have a choice to make. Either, no, the data are too far-fetched if we assume the null hypothesis, therefore we should “reject the null.” That is, we should not assume the real world is like the null hypothesis. Or, yes, the null hypothesis does a sufficient job accounting for the data, and we should not reject it. This is a judgement call — there is no definite answer about how much certainty is required. However, different fields have their conventions. In the social sciences, when p-values are less than 0.05 scientists usually reject the null.

Finally, our interpretation of statistical significance should be informed by current knowledge. There’s already evidence that antidepressants are at least mildly effective, so another study corroborating that is no big surprise.  But what would happen if a new study came out that said, “Antidepressants actually make depression worse (p=0.03)?” This is just the kind of shocking finding that makes it into top scientific journals, and gets science journalists frothing at the mouth with excitement. Just one problem, there’s very little reason to think it’s right. Looked at in isolation, the single study may be compelling; but, considered on the background of all other studies that have been done on antidepressants, a fluke seems far more likely. The boring truth is, when a finding flies in the face of existing evidence, it’s probably wrong. Unfortunately, scientists and journalists don’t always play their part and it can be left to the reader to exercise due skepticism and interpret the finding carefully. Reading science is like drinking: consume responsibly.

Thanks to Dr. John Kruschke for helpful comments.

Edited by Abigail Kimmitt and Maria Tiongco

Print Friendly, PDF & Email

Related

Filed under: General Science, Scientific Methods and TechniquesTagged methods, Statistics

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Additional Content

Search ScIU

Categories

Tag cloud

#Education #scicomm animal behavior anthropology archaeology astronomy astrophysics Biology Black History Month Black Lives Matter brain Chemistry climate change Collaboration conservation coronavirus COVID–19 diversity Diversity in Science diversity in STEM Ecology endocrinology Environmental Resilience Institute evolution history and philosophy of science infectious disease Interdisciplinary Interview Mental Health methods microbiology neuroscience outreach Plants primates psychology Research science communication science education Science Outreach science policy Statistics STEM technology women in STEM

Subscribe

Receive a weekly email with our new content! We will not share or use your information for any other purposes, and you may opt out at any time.

Please, insert a valid email.

Thank you, your email will be added to the mailing list once you click on the link in the confirmation email.

Spam protection has stopped this request. Please contact site owner for help.

This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Current Contributors

  • Log in
  • Workspace
  • Sign up to write

Indiana University

Copyright © 2018 The Trustees of Indiana University | Privacy Notice | Accessibility Help

  • Home
  • About ScIU
  • Write with Us!
  • Contact ScIU
  • The Writers and Editors of ScIU
  • ScIU in the Classroom
  • Annual Science Communication Symposium
College of Arts + Sciences

Are you a graduate student at IUB? Would you like to write for ScIU? Email sciucomm@iu.edu


Subscribe

Subscribe By Email

Get every new post delivered right to your inbox.

This form is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.