T-Test Calculator
Enter sample data to perform one-sample or two-sample t-tests — get t-statistic, degrees of freedom, p-value, Cohen's d effect size, and a clear conclusion on whether means are significantly different.
Enter your values above to see the results.
Tips & Notes
- ✓Use Welch's t-test (unequal variances) by default for two-sample tests. It is robust even when variances are equal and is safer when they differ. The traditional pooled t-test requires equal variances — verify with Levene's test first.
- ✓Two-tailed tests are almost always the right choice. Use one-tailed only when you predicted the direction of the difference before collecting data and have strong theoretical justification.
- ✓T-tests assume approximately normal data within each group. With n > 30 per group, the central limit theorem makes t-tests robust to non-normality. For small non-normal samples, use the Mann-Whitney U test.
- ✓Always report effect size (Cohen's d) alongside p-value. Large samples make tiny, meaningless differences statistically significant. d < 0.2 is negligible even if p < 0.001.
- ✓Paired t-tests (for before/after or matched designs) are more powerful than independent two-sample t-tests. If observations are matched or repeated, use the paired version — it removes between-subject variability.
Common Mistakes
- ✗Using a one-tailed test without pre-specifying the direction. Choosing one-tailed after seeing the data to get p < 0.05 is p-hacking. The direction must be theoretically justified before data collection.
- ✗Confusing independent and paired t-tests. Measuring 20 people before and after treatment gives 20 differences — use a paired t-test on the 20 differences. Treating this as two independent groups of 20 ignores the matching and reduces power.
- ✗Reporting p-value without effect size. p=0.001 with d=0.08 (negligible) is statistically significant but practically meaningless. p=0.06 with d=0.85 (large) may be practically important but not statistically significant at α=0.05.
- ✗Applying t-test to very small non-normal samples. With n=5 per group and strongly skewed data, the normality assumption fails and t-test p-values are unreliable. Use permutation tests or the Mann-Whitney U test.
- ✗Using multiple t-tests instead of ANOVA for three or more groups. Three t-tests comparing groups A vs B, A vs C, and B vs C inflates the false positive rate to 14.3% at α=0.05. Use one-way ANOVA instead.
T-Test Calculator Overview
The t-test is the most widely used hypothesis test for comparing means. It asks: is the difference between a sample mean and a reference value (or between two sample means) large enough to be statistically significant — or is it plausibly due to random sampling variation? Every time a study reports "p < 0.05" for comparing two groups, a t-test (or similar test) is almost certainly behind it.
One-sample t-test — comparing a sample mean to a known value:
t = (x̄ − μ₀) / (s / √n) | df = n − 1
EX: Sample of 25 students, x̄=78, s=10, hypothesized μ₀=75 → t = (78−75)/(10/√25) = 3/2 = 1.50, df=24 → p ≈ 0.147 (two-tailed) → not significant at α=0.05Two-sample t-test (equal variances) — comparing two independent groups:
t = (x̄₁ − x̄₂) / [sp × √(1/n₁ + 1/n₂)] | sp = √[(s₁²(n₁−1) + s₂²(n₂−1))/(n₁+n₂−2)]
EX: Group A: n=20, x̄=85, s=8 | Group B: n=20, x̄=78, s=10 → sp=√[(64×19+100×19)/38]=√82=9.06 → t=(85−78)/(9.06×√(0.1))=7/2.86=2.45, df=38, p≈0.019Welch's t-test (unequal variances) — safer default for two groups:
t = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂) | df = Welch-Satterthwaite approximation
EX: Same groups → SE = √(64/20 + 100/20) = √(3.2+5.0) = √8.2 = 2.864 → t = 7/2.864 = 2.44 → similar to equal-variance but more reliable when variances differCohen's d — effect size for practical significance:
| Cohen's d | Effect Size | Practical Meaning |
|---|---|---|
| 0.00 – 0.19 | Negligible | Barely detectable difference |
| 0.20 – 0.49 | Small | Noticeable but modest |
| 0.50 – 0.79 | Medium | Meaningful and visible |
| 0.80+ | Large | Substantial and practically important |
Frequently Asked Questions
One-sample: tests whether a sample mean equals a known or hypothesized value. Example: does this class's average score of 78 differ from the national average of 75? Two-sample: tests whether the means of two independent groups are equal. Example: do students taught by Method A (x̄=85) score higher than those taught by Method B (x̄=78)? The formulas differ; both produce a t-statistic and p-value.
It means there is a 3% probability of observing a t-statistic this extreme (or more extreme) if the null hypothesis were true (equal means). Conventionally, p < 0.05 leads to rejecting H₀. P=0.03 means the result is statistically significant — the observed difference is unlikely to be due to random sampling variation alone. It does not mean there is a 97% chance the alternative hypothesis is true.
Use Welch's t-test (unequal variances assumed) by default for two-sample comparisons — it performs well whether or not variances are equal. Use the standard pooled t-test only when you have verified equal variances using Levene's test and have roughly equal sample sizes. Most modern statistics textbooks and software default to Welch's t-test, as it is more robust with no cost when variances happen to be equal.
Cohen's d = (x̄₁ − x̄₂) / pooled SD. It measures the size of the difference in standard deviation units, independent of sample size. With large samples, trivially small differences become statistically significant. d=0.08 means groups differ by 0.08 SDs — negligible. d=0.80 means groups differ by 0.80 SDs — large and practically important. Always report d alongside p to separate statistical from practical significance.
A paired t-test compares two related measurements — before/after, matched pairs, or repeated measures. Example: measuring blood pressure in 30 patients before and after medication. Instead of comparing two groups of 30, you compute 30 differences (after − before) and test whether the mean difference is zero. Paired t-tests are more powerful than independent t-tests because they eliminate between-subject variability.
Two-tailed tests H₀: μ₁=μ₂ vs. H₁: μ₁≠μ₂ (difference in either direction). One-tailed tests a specific direction: H₁: μ₁>μ₂ or H₁: μ₁<μ₂. One-tailed p-value = half of two-tailed p-value, making it easier to achieve significance. Only use one-tailed when you specified the direction before collecting data. Choosing one-tailed post-hoc to get significance is p-hacking and is methodologically invalid.