Understanding Correlation and Linear Regression
Correlation measures the strength and direction of the linear relationship between two variables. The Pearson correlation coefficient (r) ranges from −1 (perfect negative linear relationship) through 0 (no linear relationship) to +1 (perfect positive linear relationship). It is one of the most widely used statistics in research and data analysis.
Pearson Correlation Coefficient (r)
The Pearson r quantifies the degree to which two continuous variables move together linearly. A value of r = 0.95 indicates a very strong positive relationship — as one variable increases, the other reliably increases in a proportional manner. Conversely, r = −0.85 indicates a strong negative (inverse) relationship.
Interpreting Correlation Strength
As a general guideline: |r| < 0.3 represents a weak correlation, 0.3 ≤ |r| < 0.7 is moderate, and |r| ≥ 0.7 is strong. However, interpretation should always consider the context. In social sciences, r = 0.5 may be remarkably strong, while in physics, r = 0.95 might be considered weak.
R-Squared: Coefficient of Determination
R² = r² tells you the proportion of variance in the dependent variable (Y) that is explained by the independent variable (X) through the linear model. An R² of 0.81 means that 81% of the variability in Y can be accounted for by X. The remaining 19% is unexplained variance.
Linear Regression: y = mx + b
Simple linear regression fits a straight line through the data points that minimizes the sum of squared residuals (ordinary least squares). The slope (m) tells you how much Y changes for each one-unit increase in X. The intercept (b) is the predicted value of Y when X = 0. Together, they form the regression equation used for prediction.
Covariance
Covariance measures how two variables change together. Unlike correlation, covariance is not standardized, so its magnitude depends on the units of the variables. Correlation is essentially a normalized version of covariance, making it more interpretable across different datasets.
Caution: Correlation ≠ Causation
A strong correlation between two variables does not imply that one causes the other. There may be confounding variables, reverse causation, or coincidental relationships. Always combine statistical analysis with domain knowledge before drawing causal conclusions.