🍋
Menu
Best Practice Beginner 1 min read 202 words

Best Practices for Statistical Significance Testing

Misunderstanding statistical significance leads to wrong conclusions. Learn proper hypothesis testing and common pitfalls to avoid.

Key Takeaways

  • A p-value is the probability of observing results at least as extreme as the data, assuming the null hypothesis is true.
  • The conventional threshold is p < 0.05, meaning a 5% false positive rate.
  • Small samples produce unreliable results.
  • Testing 20 hypotheses at p < 0.05 guarantees approximately one false positive by chance alone.
  • A statistically significant result may not be practically important.

What p-Value Actually Means

A p-value is the probability of observing results at least as extreme as the data, assuming the null hypothesis is true. It is NOT the probability that the hypothesis is true or false.

Choosing Significance Levels

The conventional threshold is p < 0.05, meaning a 5% false positive rate. For high-stakes decisions (medical treatments, safety systems), use p < 0.01 or p < 0.001. For exploratory analysis, p < 0.10 may be acceptable.

Sample Size Matters

Small samples produce unreliable results. A coin that lands heads 7 out of 10 times might be fair. A coin that lands heads 700 out of 1000 times is almost certainly biased. Power analysis helps determine the minimum sample size.

Multiple Comparisons Problem

Testing 20 hypotheses at p < 0.05 guarantees approximately one false positive by chance alone. Apply Bonferroni correction (divide alpha by number of tests) or use False Discovery Rate (FDR) for multiple comparisons.

Practical vs Statistical Significance

A statistically significant result may not be practically important. A drug that reduces blood pressure by 0.1 mmHg might achieve p < 0.001 with a large sample but is clinically meaningless. Always report effect sizes alongside p-values.

Herramientas relacionadas

Guías relacionadas