T-Test Calculator

Enter sample data to perform one-sample or two-sample t-tests — get t-statistic, degrees of freedom, p-value, Cohen's d effect size, and a clear conclusion on whether means are significantly different.

Enter your values above to see the results.

Tips & Notes

  • Use Welch's t-test (unequal variances) by default for two-sample tests. It is robust even when variances are equal and is safer when they differ. The traditional pooled t-test requires equal variances — verify with Levene's test first.
  • Two-tailed tests are almost always the right choice. Use one-tailed only when you predicted the direction of the difference before collecting data and have strong theoretical justification.
  • T-tests assume approximately normal data within each group. With n > 30 per group, the central limit theorem makes t-tests robust to non-normality. For small non-normal samples, use the Mann-Whitney U test.
  • Always report effect size (Cohen's d) alongside p-value. Large samples make tiny, meaningless differences statistically significant. d < 0.2 is negligible even if p < 0.001.
  • Paired t-tests (for before/after or matched designs) are more powerful than independent two-sample t-tests. If observations are matched or repeated, use the paired version — it removes between-subject variability.

Common Mistakes

  • Using a one-tailed test without pre-specifying the direction. Choosing one-tailed after seeing the data to get p < 0.05 is p-hacking. The direction must be theoretically justified before data collection.
  • Confusing independent and paired t-tests. Measuring 20 people before and after treatment gives 20 differences — use a paired t-test on the 20 differences. Treating this as two independent groups of 20 ignores the matching and reduces power.
  • Reporting p-value without effect size. p=0.001 with d=0.08 (negligible) is statistically significant but practically meaningless. p=0.06 with d=0.85 (large) may be practically important but not statistically significant at α=0.05.
  • Applying t-test to very small non-normal samples. With n=5 per group and strongly skewed data, the normality assumption fails and t-test p-values are unreliable. Use permutation tests or the Mann-Whitney U test.
  • Using multiple t-tests instead of ANOVA for three or more groups. Three t-tests comparing groups A vs B, A vs C, and B vs C inflates the false positive rate to 14.3% at α=0.05. Use one-way ANOVA instead.

T-Test Calculator Overview

The t-test is the most widely used hypothesis test for comparing means. It asks: is the difference between a sample mean and a reference value (or between two sample means) large enough to be statistically significant — or is it plausibly due to random sampling variation? Every time a study reports "p < 0.05" for comparing two groups, a t-test (or similar test) is almost certainly behind it.

One-sample t-test — comparing a sample mean to a known value:

t = (x̄ − μ₀) / (s / √n) | df = n − 1
EX: Sample of 25 students, x̄=78, s=10, hypothesized μ₀=75 → t = (78−75)/(10/√25) = 3/2 = 1.50, df=24 → p ≈ 0.147 (two-tailed) → not significant at α=0.05
Two-sample t-test (equal variances) — comparing two independent groups:
t = (x̄₁ − x̄₂) / [sp × √(1/n₁ + 1/n₂)] | sp = √[(s₁²(n₁−1) + s₂²(n₂−1))/(n₁+n₂−2)]
EX: Group A: n=20, x̄=85, s=8 | Group B: n=20, x̄=78, s=10 → sp=√[(64×19+100×19)/38]=√82=9.06 → t=(85−78)/(9.06×√(0.1))=7/2.86=2.45, df=38, p≈0.019
Welch's t-test (unequal variances) — safer default for two groups:
t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂) | df = Welch-Satterthwaite approximation
EX: Same groups → SE = √(64/20 + 100/20) = √(3.2+5.0) = √8.2 = 2.864 → t = 7/2.864 = 2.44 → similar to equal-variance but more reliable when variances differ
Cohen's d — effect size for practical significance:
Cohen's dEffect SizePractical Meaning
0.00 – 0.19NegligibleBarely detectable difference
0.20 – 0.49SmallNoticeable but modest
0.50 – 0.79MediumMeaningful and visible
0.80+LargeSubstantial and practically important
Statistical significance (p < 0.05) and practical significance (large Cohen's d) are independent. With n=10,000, even d=0.05 (trivial) produces p < 0.001. Always report effect size alongside p-value to give a complete picture.

Frequently Asked Questions

One-sample: tests whether a sample mean equals a known or hypothesized value. Example: does this class's average score of 78 differ from the national average of 75? Two-sample: tests whether the means of two independent groups are equal. Example: do students taught by Method A (x̄=85) score higher than those taught by Method B (x̄=78)? The formulas differ; both produce a t-statistic and p-value.

It means there is a 3% probability of observing a t-statistic this extreme (or more extreme) if the null hypothesis were true (equal means). Conventionally, p < 0.05 leads to rejecting H₀. P=0.03 means the result is statistically significant — the observed difference is unlikely to be due to random sampling variation alone. It does not mean there is a 97% chance the alternative hypothesis is true.

Use Welch's t-test (unequal variances assumed) by default for two-sample comparisons — it performs well whether or not variances are equal. Use the standard pooled t-test only when you have verified equal variances using Levene's test and have roughly equal sample sizes. Most modern statistics textbooks and software default to Welch's t-test, as it is more robust with no cost when variances happen to be equal.

Cohen's d = (x̄₁ − x̄₂) / pooled SD. It measures the size of the difference in standard deviation units, independent of sample size. With large samples, trivially small differences become statistically significant. d=0.08 means groups differ by 0.08 SDs — negligible. d=0.80 means groups differ by 0.80 SDs — large and practically important. Always report d alongside p to separate statistical from practical significance.

A paired t-test compares two related measurements — before/after, matched pairs, or repeated measures. Example: measuring blood pressure in 30 patients before and after medication. Instead of comparing two groups of 30, you compute 30 differences (after − before) and test whether the mean difference is zero. Paired t-tests are more powerful than independent t-tests because they eliminate between-subject variability.

Two-tailed tests H₀: μ₁=μ₂ vs. H₁: μ₁≠μ₂ (difference in either direction). One-tailed tests a specific direction: H₁: μ₁>μ₂ or H₁: μ₁<μ₂. One-tailed p-value = half of two-tailed p-value, making it easier to achieve significance. Only use one-tailed when you specified the direction before collecting data. Choosing one-tailed post-hoc to get significance is p-hacking and is methodologically invalid.