P-Value Calculator

Enter a z-score or t-statistic to find the exact p-value for your hypothesis test — one-tailed and two-tailed — with significance determination and step-by-step interpretation at common α levels.

Enter your values above to see the results.

Tips & Notes

  • Two-tailed tests are the standard default. Use one-tailed only when you specified the direction of the effect before data collection, not after seeing the results.
  • P-value depends on both the test statistic and the degrees of freedom (for t-tests). The same t=2.10 gives p≈0.037 at df=∞ (z-approximation) but p≈0.053 at df=15 — small samples require larger statistics to achieve significance.
  • The p-value is not the probability that H₀ is true. P=0.03 does not mean there is a 97% chance the alternative hypothesis is correct — it means the data is unlikely under H₀ alone.
  • P-values are affected by sample size. With n=10,000, even a trivial effect produces p<0.001. With n=10, even a large effect may give p>0.05. Always report effect size alongside the p-value.
  • A p-value just above 0.05 (e.g., 0.052) is not fundamentally different from one just below (0.048). Avoid binary thinking — consider confidence intervals and practical significance instead.

Common Mistakes

  • Interpreting p=0.05 as a 5% probability that H₀ is true. The p-value is the probability of the observed data (or more extreme) given H₀ is true — not the probability that H₀ itself is true. These are completely different quantities.
  • Using one-tailed tests post-hoc to achieve significance. Switching from two-tailed to one-tailed after seeing that the result is p=0.07 (two-tailed) is p-hacking. The tail direction must be pre-specified before data collection.
  • Treating p>0.05 as proof that H₀ is true. Failing to reject H₀ does not confirm it. The test may have been underpowered — too small a sample to detect a real effect. "Not significant" ≠ "no effect".
  • Reporting p-value without the test statistic or effect size. P=0.03 alone is uninformative. Always report t(df)=value or z=value and Cohen's d or other effect size alongside the p-value.
  • Comparing p-values across studies to judge effect strength. p=0.001 in one study and p=0.04 in another does not mean the first study found a stronger effect — p-values depend heavily on sample size. Compare effect sizes instead.

P-Value Calculator Overview

The p-value is the probability of obtaining a test result as extreme as the observed result (or more extreme) if the null hypothesis were true. It is the most reported number in scientific research and one of the most misunderstood. A small p-value means the observed data would be unlikely under H₀ — not that H₀ is proven false or that the alternative is proven true.

P-value from a z-score (large samples or known σ):

Two-tailed: p = 2 × P(Z > |z|) = 2 × [1 − Φ(|z|)]
EX: z = 2.10 → P(Z > 2.10) = 1 − Φ(2.10) = 1 − 0.9821 = 0.0179 → Two-tailed p = 2 × 0.0179 = 0.0357 → Significant at α=0.05
P-value from a t-statistic (small samples or unknown σ):
Two-tailed: p = 2 × P(T_{df} > |t|)
EX: t = 2.10, df = 15 → Two-tailed p ≈ 0.053 → NOT significant at α=0.05 (just above the threshold — note how df affects the result vs. z-test)
Significance thresholds — conventional α levels:
α LevelSignificance Labelp-value RequiredCommon Use
0.10Marginalp < 0.10Exploratory research
0.05Significant *p < 0.05Most social and medical research
0.01Highly significant p < 0.01Stricter scientific standards
0.001Very highly significant *p < 0.001High-stakes clinical or policy decisions
A p-value answers one specific question: given that the null hypothesis is true, how probable is it to observe a test statistic at least as extreme as this one by chance? A small p-value means the data would be unlikely under the null hypothesis — not that the null is false, and not that the result is practically important. Statistical significance and practical significance are different things, and conflating them is the most common misuse of p-values in published research. The choice of significance threshold (α = 0.05, 0.01, or 0.001) is a design decision made before seeing the data — not a magic boundary that separates real from spurious effects. A p-value of 0.049 and 0.051 represent virtually identical evidence; treating them categorically differently is intellectually inconsistent. Reporting exact p-values, effect sizes, and confidence intervals together gives readers far more information than a binary significant/not-significant judgment.

Frequently Asked Questions

The p-value is the probability of observing a test statistic as extreme as yours (or more extreme) if the null hypothesis were true. P=0.03 means: if H₀ is true (no real effect), there is only a 3% chance of getting a result this far from the expected value by random chance. Small p-values suggest the data is unusual under H₀ — not that H₀ is false. It is probabilistic evidence, not proof.

The 0.05 threshold was proposed by Ronald Fisher in the 1920s as a convenient rule of thumb — not a fundamental law. It means you accept a 5% false positive rate (rejecting a true H₀). Different fields use different thresholds: physics requires p < 0.0000003 (5-sigma) for new particle discovery; some clinical fields use p < 0.01. The 0.05 threshold is a convention, not a magic boundary between truth and falsehood.

Two-tailed: tests for a difference in either direction. P=0.04 (two-tailed) means 4% probability of this extreme a result in either direction. One-tailed: tests for a difference only in a specified direction (e.g., Group A > Group B). One-tailed p = two-tailed p ÷ 2. For the same data: two-tailed p=0.06 becomes one-tailed p=0.03. Always specify which before collecting data — switching post-hoc is invalid.

No. Statistical significance and practical importance are independent. With n=100,000, a difference of 0.1 units on a 100-point scale produces p<0.001 — statistically significant but completely trivial. With n=20, a difference of 10 units may give p=0.08 — not statistically significant but potentially practically important. Always interpret p alongside effect size (Cohen's d, η², etc.) and consider practical context.

Use z-test when: population standard deviation σ is known, or sample size is large (n > 30). Use t-test when: σ is unknown and estimated from the sample, and sample size is small (n ≤ 30). In practice, σ is almost always unknown, so t-tests are far more common. With large samples (n > 100), the t-distribution approaches the normal distribution and the distinction becomes negligible.

P=0.05 exactly sits on the conventional boundary, but there is nothing magical about it. Results at p=0.049 and p=0.051 represent virtually identical evidence — calling one 'significant' and the other 'not significant' is an artifact of the arbitrary threshold. Most methodologists recommend reporting exact p-values (p=0.05 or p=0.048 or p=0.052) rather than just 'significant/not significant', and always pairing with effect sizes and confidence intervals.