Linear Regression Calculator
Enter paired X and Y data points to find the slope, intercept, and R² of the best-fit line — with the regression equation, predicted values, and step-by-step least squares calculation.
Enter your values above to see the results.
Tips & Notes
- ✓Always plot your data before running regression. A scatterplot reveals whether a linear model is appropriate — a curved pattern means linear regression will give misleading predictions regardless of how high R² appears.
- ✓R² measures how well the line fits your sample data, not how well it will predict new data. Overfitting is possible with small samples — a line through 3 points always gives R²=1, but this tells you nothing about prediction ability.
- ✓Extrapolation — predicting Y for X values outside your data range — is dangerous. A model that fits perfectly between X=10 and X=50 may be completely wrong at X=100 if the true relationship curves outside that range.
- ✓The slope unit is Y-units per X-unit. If X is hours and Y is dollars earned, slope m=25 means $25 per hour. Always interpret slope with its units to make the prediction meaningful.
- ✓Linear regression assumes: linear relationship, normally distributed residuals, constant variance (homoscedasticity), and independent observations. Violating these assumptions makes regression results unreliable.
Common Mistakes
- ✗Confusing correlation with regression. Correlation (r) measures the strength of a linear relationship. Regression (ŷ=mx+b) produces a prediction equation. You can have high correlation but a slope that is practically trivial, or a useful regression line with moderate correlation.
- ✗Interpreting the intercept as a meaningful prediction when X=0 is impossible or unrealistic. If X is age (in years, ranging 20–60) and b=−50, this does not mean a person of age 0 would have −50 income — b is a mathematical artifact, not a real prediction.
- ✗Assuming R²=0.30 means the model is useless. In social sciences, R²=0.30 (explaining 30% of variation) is often very meaningful. In engineering quality control, R²=0.90 might be unacceptably low. Context defines what R² counts as good.
- ✗Using regression to establish causation. A regression equation describes a statistical relationship. Higher X predicting higher Y does not mean X causes Y — unmeasured confounding variables may drive both.
- ✗Applying a regression equation outside the range of the training data. If your data covers X from 1 to 10, the equation ŷ = 3x + 5 may be accurate there but completely wrong at X = 100. Extrapolation requires strong domain knowledge to justify.
Linear Regression Calculator Overview
Linear regression finds the straight line that best fits a set of data points, quantifying the relationship between two variables and enabling prediction. The line minimizes the total squared vertical distance from each point to the line — the least squares method. Every time you see a trend line on a graph, a sales forecast from historical data, or a dose-response relationship in medicine, linear regression is the underlying tool.
Linear regression equation:
ŷ = mx + b (slope-intercept form)Slope formula — rate of change in Y per unit of X:
m = [n×Σ(xy) − Σx×Σy] / [n×Σ(x²) − (Σx)²]
EX: Hours worked (X): [1,2,3,4,5] | Output (Y): [3,5,7,9,11] → m = [5×(85)−(15×35)] / [5×(55)−(15)²] = (425−525)/(275−225) = −100/50 ... wait → m = 2.0 (each extra hour adds 2 units of output)Intercept formula — predicted Y when X = 0:
b = ȳ − m × x̄
EX: x̄=3, ȳ=7, m=2.0 → b = 7 − 2.0×3 = 7−6 = 1.0 → Equation: ŷ = 2x + 1Coefficient of determination R² — goodness of fit:
R² = 1 − [Σ(yᵢ − ŷᵢ)² / Σ(yᵢ − ȳ)²]
EX: ŷ = 2x+1 fits perfectly → all residuals = 0 → R² = 1.00 → The regression line explains 100% of Y's variationInterpreting the regression output:
| Output | What It Means | Example Interpretation |
|---|---|---|
| Slope (m) | Change in Y for each 1-unit increase in X | m=2.5: each extra study hour predicts 2.5 more exam points |
| Intercept (b) | Predicted Y when X = 0 | b=40: predicted score with zero study hours is 40 |
| R² | Proportion of Y's variation explained by X | R²=0.72: study hours explain 72% of score variation |
| ŷ for a given x | Predicted Y value at that X | x=4 hours: ŷ = 2.5×4+40 = 50 points predicted |
Frequently Asked Questions
Linear regression finds the straight line ŷ = mx + b that best fits your data by minimizing the sum of squared vertical distances from each point to the line (the least squares method). It outputs the slope m (change in Y per unit of X), intercept b (predicted Y when X=0), and R² (proportion of Y's variation explained by X). The resulting equation predicts Y for any given X value.
Slope (m): for each 1-unit increase in X, Y changes by m units. Example: m=3.5 with X=study hours and Y=exam score → each additional study hour predicts 3.5 more points. Intercept (b): the predicted Y when X=0. b=45 means a student who studies 0 hours is predicted to score 45. Interpret b carefully — if X=0 is impossible or outside the data range, b is just a mathematical parameter, not a real prediction.
R² (coefficient of determination) = the proportion of Y's variation explained by the regression line. R²=0.75 means 75% of the variation in Y is accounted for by X through the linear model; the remaining 25% comes from other factors or random variation. R²=1.0 is a perfect fit; R²=0.0 means the line explains nothing. What counts as 'good' R² depends on the field — psychology often accepts 0.20, engineering demands 0.95+.
Interpolation: predicting Y for X values within your data range — generally reliable. Extrapolation: predicting Y for X outside your data range — unreliable and potentially very wrong. If your data spans X=1 to X=10 and your equation is ŷ=2x+5, predicting at X=6 (interpolation) is fine. Predicting at X=100 (extrapolation) assumes the linear relationship continues indefinitely, which rarely holds in reality.
You need at least 2 points (which always produce a perfect line, meaning nothing), but meaningful regression requires at minimum 10–20 points and ideally more. A rule of thumb: at least 10 observations per variable in the model. With fewer points, R² is inflated and predictions are unreliable. Also verify that the range of X values in your data is wide enough to meaningfully represent the relationship you are modeling.
Use correlation (r) when you want to measure relationship strength between two variables without making predictions. Use linear regression when you want a prediction equation — to estimate Y for a given X. Both use the same underlying calculation, but serve different purposes. You can have high correlation (r=0.9) but a slope so small (m=0.001) that the prediction is practically useless. Regression tells you not just that a relationship exists, but how to use it for prediction.