- Understand the concept and practice of partitioning sums-of-squares
- Understand the uses of R2 and adjusted-R2 for linear models
8 April 2026
In general, we have something like
\[ DATA = MODEL + ERRORS \]
and hence
\[ \text{Var}(DATA) = \text{Var}(MODEL) + \text{Var}(ERRORS) \]
The total deviations in the data equal the sum of those for the model and errors
\[ \underbrace{y_i - \bar{y}}_{\text{Total}} = \underbrace{\widehat{y}_i - \bar{y}}_{\text{Model}} + \underbrace{y_i - \widehat{y}_i}_{\text{Error}} \]
Let’s consider a model for the relationship between
soil surface temperature (C) measured across 15 small plots on a hill, and
the aspect of each plot (direction in degrees; N to S)
Let’s fit this model:
\[temperature_i = \beta_0 + \beta_1 \times aspect_i + e_i\]
and find \(\hat{\boldsymbol{\beta}}\) and \(\hat{\boldsymbol{y}}\)
Create our design matrix \(\mathbf{X}\)
## construct our "Design matrix" XX <- cbind(intercept = rep(1, nn), x = aspect) head(XX)
## intercept x ## [1,] 1 49 ## [2,] 1 144 ## [3,] 1 66 ## [4,] 1 172 ## [5,] 1 114 ## [6,] 1 25
We can calculate our betas using
\(\hat{\boldsymbol{\beta}} = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{y}\)
## calculate betas beta_hat <- solve(t(XX) %*% XX) %*% t(XX) %*% temp
Recall that t() transforms (rotates) a matrix and solve() inverts it
We can calculate our model estimates (\(\hat{\mathbf{y}}\)) using
\(\hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}}\)
## calculate model estimates temp_hat <- XX %*% beta_hat
Let’s plot our model estimates (ie, the fitted regression line)
The sums-of-squares have the same additive property as the deviations
\[ \underbrace{\sum (y_i - \bar{y})^2}_{SSTO} = \underbrace{\sum (\widehat{y}_i - \bar{y})^2}_{SSR} + \underbrace{\sum (y_i - \widehat{y}_i)^2}_{SSE} \]
The total sum-of-squares \((SSTO)\) measures the total variation in the data as the differences between the data and their mean
\[ SSTO = \sum \left( y_i - \bar{y} \right)^2 \]
The model (regression) sum-of-squares \((SSR)\) measures the variation between the model fits and the mean of the data
\[ SSR = \sum \left( \widehat{y}_i - \bar{y} \right)^2 \]
The error sum-of-squares \((SSE)\) measures the variation between the data and the model fits
\[ SSE = \sum \left( y_i - \widehat{y}_i \right)^2 \]
Let’s calculate the total sum-of-squares (SSTO) using
\(SSTO = \sum \left( y_i - \bar{y} \right)^2\)
## mean of the response y_bar <- mean(temp) ## total sum-of-squares SSTO <- t(temp - y_bar) %*% (temp - y_bar)
Recall that \(\mathbf{x}^{\top} \mathbf{x}\) will give the sum of the squared elements in \(\mathbf{x}\)
Let’s calculate the model sum-of-squares (SSR) using
\(SSR = \sum \left( \widehat{y}_i - \bar{y} \right)^2\)
## model sum-of-squares SSR <- t(temp_hat - y_bar) %*% (temp_hat - y_bar)
Let’s calculate the error sum-of-squares (SSE) using
\(SSE = \sum \left( y_i - \widehat{y}_i \right)^2\)
## error sum-of-squares SSE <- t(temp - temp_hat) %*% (temp - temp_hat)
How about a measure of how well a model fits the data?
A common option is the coefficient of determination or \((R^2)\)
\[ R^2 = \frac{SSR}{SSTO} = 1 - \frac{SSE}{SSTO} \\ ~ \\ 0 < R^2 < 1 \]
Let’s calculate the \((R^2)\) for our model
## coefficient of determination SSR / SSTO
## [,1] ## [1,] 0.8831171
## or via 1 - SSE / SSTO
## [,1] ## [1,] 0.8831171
The number of independent elements that are free to vary when estimating quantities of interest
Beginning with \(SSTO\), we have
\[ SSTO = \sum \left( y_i - \bar{y} \right)^2 \]
The data are unconstrained and lie in an \(n\)-dimensional space, but estimating the mean \((\bar{y})\) from the data costs 1 degree of freedom \((df)\), so
\[ df_{SSTO} = n - 1 \]
For the \(SSR\) we have
\[ SSR = \sum \left( \widehat{y}_i - \bar{y} \right)^2 \]
We estimate the data \((\widehat{y})\) with a \(k\)-dimensional model, but we lose 1 \(df\) when estimating the mean, so
\[ df_{SSR} = k - 1 \]
The \(SSE\) is analogous
\[ SSE = \sum \left( y_i - \widehat{y}_i \right)^2 \]
The data lie in an \(n\)-dimensional space and we represent them in a \(k\)-dimensional subspace, so
\[ df_{SSE} = n - k \]
The expectation of the sum-of-squares or “mean square” gives an indication of the variance for the model and errors
A mean square is a sum-of-squares divided by its degrees of freedom
\[ MS = \frac{SS}{df} \\ \Downarrow \\ MSR = \frac{SSR}{k - 1} ~~~ \& ~~~ MSE = \frac{SSE}{n - k} \]
We are typically interested in two variance estimates:
The variance of the residuals \(\mathbf{e}\)
The variance of the model parameters \(\mathbf{B}\)
In a least squares context, we assume that the model errors (residuals) are independent and identically distributed with mean 0 and variance \(\sigma^2\)
The problem is that we don’t know \(\sigma^2\) and therefore we must estimate it
If \(z_i \sim \text{N}(0, 1)\) then
\[ \sum_{i = 1}^{n} z_i^2 = \mathbf{z}^{\top}\mathbf{z} \sim \chi^2_{n} \]
If \(z_i \sim \text{N}(0, 1)\) then
\[ \sum_{i = 1}^{n} z_i^2 = \mathbf{z}^{\top}\mathbf{z} \sim \chi^2_{n} \]
In our linear model, \(e_i \sim \text{N}(0, \sigma^2)\) so
\[ \sum_{i = 1}^{n} e_i^2 = \mathbf{e}^{\top}\mathbf{e} \sim \sigma^2 \cdot \chi^2_{n - k} \]
Thus, given
\[ \mathbf{e}^{\top}\mathbf{e} \sim \sigma^2 \cdot \chi^2_{n - k} \\ \text{E}(\chi^2_{n - k}) = n - k \\ \text{E}(\mathbf{e}^{\top}\mathbf{e}) = SSE \]
then
\[ SSE = \sigma^2 (n - k) ~ \Rightarrow ~ \sigma^2 = \frac{SSE}{n - k} = MSE \]
Recall that our estimate of the model parameters is
\[ \widehat{\boldsymbol{\beta}} = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{y} \]
Estimating the variance of the model parameters \(\boldsymbol{\beta}\) requires some linear algebra
For a scalar \(z\), if \(\text{Var}(z) = \sigma^2\) then \(\text{Var}(az) = a^2 \sigma^2\)
For a vector \(\mathbf{z}\), if \(\text{Var}(\mathbf{z}) = \mathbf{\Sigma}\) then \(\text{Var}(\mathbf{A z}) = \mathbf{A} \mathbf{\Sigma} \mathbf{A}^{\top}\)
The variance of the parameters is therefore
\[ \begin{aligned} \widehat{\boldsymbol{\beta}} &= (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{y} \\ &= \left[ (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \right] \mathbf{y} \\ \end{aligned} \\ \Downarrow \\ \text{Var}(\widehat{\boldsymbol{\beta}}) = \left[ (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \right] \text{Var}(\mathbf{y}) \left[ (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \right]^{\top} \]
Recall that we can write our model in matrix form as
\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{e} \\ \mathbf{e} \sim \text{MVN}(\mathbf{0}, \sigma^2 \mathbf{I}) \]
We can rewrite our model more compactly as
\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{e} \\ \mathbf{e} \sim \text{MVN}(\mathbf{0}, \sigma^2 \mathbf{I}) \\ \Downarrow \\ \mathbf{y} \sim \text{MVN}(\mathbf{X} \boldsymbol{\beta}, \underbrace{\sigma^2 \mathbf{I}}_{\text{Var}(\mathbf{y} | \mathbf{X} \boldsymbol{\beta})}) \\ \]
Our estimate of \(\text{Var}(\widehat{\boldsymbol{\beta}})\) is then
\[ \begin{aligned} \text{Var}(\widehat{\boldsymbol{\beta}}) &= \left[ (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \right] \text{Var}(\mathbf{y}) \left[ (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \right]^{\top} \\ &= \left[ (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \right] \sigma^2 \mathbf{I} \left[ (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \right]^{\top} \\ &= \sigma^2 (\mathbf{X}^{\top} \mathbf{X})^{-1} (\mathbf{X}^{\top} \mathbf{X}) \left[ (\mathbf{X}^{\top} \mathbf{X})^{-1} \right]^{\top} \\ &= \sigma^2 (\mathbf{X}^{\top} \mathbf{X})^{-1} \end{aligned} \]
Here is a plot of some data \(y\) and a predictor \(x\)
And let’s consider this model: \(y_i = \alpha + \beta x_i + e_i\)
Here is our model fit to the data
Let’s think about the variance of \(\widehat{\boldsymbol{\beta}}\)
\[ \text{Var}(\widehat{\boldsymbol{\beta}}) = \sigma^2 (\mathbf{X}^{\top} \mathbf{X})^{-1} \]
This suggests that our confidence in our estimate increases with the spread in \(\mathbf{X}\)
Consider these two scenarios where the slope of the relationship is identical