- Understand the use of F-tests for hypothesis testing
- Understand how to estimate confidence intervals
10 April 2026
Once we’ve estimated the model parameters and their variance, we might want to draw conclusions from our analysis
Imagine we had 2 linear models of varying complexity:
a model with one predictor
a model with five predictors
It would seem logical to ask whether the complexity of (2) is necessary?
Recall our partitioning of sums-of-squares, where
\[ SSTO = SSR + SSE \]
We might prefer the more complex model (call it \(\Theta\)) over the simple model (call it \(\theta\)) if
\[ SSE_{\Theta} < SSE_{\theta} \]
or, more formally, if
\[ \frac{SSE_{\theta} - SSE_{\Theta}}{SSE_{\Theta}} > \text{a constant} \]
If \(\Theta\) has \(k_{\Theta}\) parameters and \(\theta\) has \(k_{\theta}\), we can scale this ratio to arrive at an \(F\)-statistic that follows an \(F\) distribution
\[ F = \frac{ \left( SSE_{\theta} - SSE_{\Theta} \right) / (k_{\Theta} - k_{\theta})}{ SSE_{\Theta} / (n - k_{\Theta})} \sim F_{k_{\Theta} - k_{\theta}, n - k_{\Theta}} \]
The \(F\)-distribution is the ratio of two random variates, each with a \(\chi^2_{n}\) distribution
If \(A \sim \chi_{df_{A}}^{2}\) and \(B \sim \chi_{df_{B}}^{2}\) are independent, then
\[ \frac{\left( \frac{A}{df_{A}} \right) }{ \left( \frac{B}{df_{B}} \right) } \sim F _{df_{A},df_{B}} \]
Suppose we wanted to test whether the collection of predictors in a model were better than simply estimating the data by their mean.
\[ \Theta: \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{e} \\ \theta: \mathbf{y} = \boldsymbol{\mu} + \mathbf{e} \\ \]
We write the null hypothesis as
\[ H_0: \beta_1 = \beta_2 = \dots = \beta_k = 0 \]
and we would reject \(H_0\) if \(F > F^{(\alpha)}_{k_{\Theta} - k_{\theta}, n - k_{\Theta}}\)
\[ SSE_{\Theta} = \left( \mathbf{y} - \mathbf{X} \boldsymbol{\beta} \right)^{\top} \left( \mathbf{y} - \mathbf{X} \boldsymbol{\beta} \right) = \mathbf{e}^{\top} \mathbf{e} = SSE \\ SSE_{\theta} = \left( \mathbf{y} - \bar{y} \right)^{\top} \left( \mathbf{y} - \bar{y} \right) = SSTO \\ \Downarrow \\ F = \frac{ \left( SSTO - SSE \right) / (k - 1) } { SSE / (n - k)} \]
Later in lab we will work with the gala dataset\(\dagger\) in the faraway package, which contains data on the diversity of plant species across 30 Galapagos islands
For now let’s hypothesize that
\(\dagger\)From Johnson & Raven (1973) Science 179:893-895
We might ask whether any one predictor could be dropped from a model
For example, can \(\text{nearest}\) be dropped from ourf full model?
\[ \text{species}_i = \alpha + \beta_1 \text{area}_i + \beta_2 \text{elevation}_i + \beta_3 \text{nearest}_i + \epsilon_i \]
One option is to fit these two models and compare them via our \(F\)-test with \(H_0: \beta_3 = 0\)
\[ \begin{aligned} \text{species}_i &= \alpha + \beta_1 \text{area}_i + \beta_2 \text{elevation}_i + \beta_3 \text{nearest}_i + \epsilon_i \\ ~ \\ \text{species}_i &= \alpha + \beta_1 \text{area}_i + \beta_2 \text{elevation}_i + \epsilon_i \end{aligned} \]
Another option is to estimate a \(t\)-statistic as
\[ t_i = \frac{\widehat{\beta}_i}{\text{SE} \left( \widehat{\beta}_i \right)} \]
and compare it to a \(t\)-distribution with \(n - k\) degrees of freedom
Sometimes we might want to know whether we can drop 2+ predictors from a model
For example, can we drop both \(\text{elevation}\) and \(\text{nearest}\) from our full model?
\[ \begin{aligned} \text{species}_i &= \alpha + \beta_1 \text{area}_i + \beta_2 \text{elevation}_i + \beta_3 \text{nearest}_i + \epsilon_i \\ ~ \\ \text{species}_i &= \alpha + \beta_1 \text{area}_i + \epsilon_i \end{aligned} \]
\(H_0 : \beta_2 = \beta_3 = 0\)
Some tests cannot be expressed in terms of the inclusion or exclusion of predictors
Consider a test of whether the areas of the current and adjacent island could be added together and used in place of the two separate predictors
\[ \text{species}_i = \alpha + \beta_1 \text{area}_i + \beta_2 \text{adjacent}_i + \dots + \epsilon_i \\ ~ \\ \text{species}_i = \alpha + \beta_1 \text{(area + adjacent)}_i + \dots + \epsilon_i \]
\(H_0 : \beta_{\text{area}} = \beta_{\text{adjacent}}\)
What if we wanted to test whether a predictor had a specific (non-zero) value?
For example, is there a 1:1 relationship between \(\text{species}\) and \(\text{elevation}\) after controlling for the other predictors?
\[ \text{species}_i = \alpha + \beta_1 \text{area}_i + \underline{1} \text{elevation}_i + \beta_3 \text{nearest}_i + \epsilon_i \]
\(H_0 : \beta_2 = 1\)
We can also modify our \(t\)-test from before and use it for our comparison by including the hypothesized \(\beta_{H_0}\) as an offset
\[ t_i = \frac{(\widehat{\beta_i} - \beta_{H_0})}{\text{SE} \left( \widehat{\beta}_i \right)} \]
Null hypothesis testing (NHT) is a slippery slope
We can also use confidence intervals (CI’s) to express uncertainty in \(\widehat{\beta}_i\)
They take the form
\[ 100(1 - \alpha)\% ~ \text{CI}: \widehat{\beta}_{i} \pm t_{n-p}^{(\alpha / 2)} \operatorname{SE}(\widehat{\beta}) \]
where here \(\alpha\) is our predetermined Type-I error rate
The \(F\)- and \(t\)-based CI’s we have described depend on the assumption of normality
The bootstrap\(\dagger\) method provides a way to construct CI’s without this assumption
\(\dagger\)Efron (1979) The Annals of Statistics 7:1–26
Fit your model to the data
Calculate \(\mathbf{e} = \mathbf{y} - \mathbf{X} \widehat{\boldsymbol{\beta}}\)
Do the following many times:
Select the \(\tfrac{\alpha}{2}\) and \((1 - \tfrac{\alpha}{2})\) percentiles from the saved \(\widehat{\boldsymbol{\beta}}^*\)
Given a fitted model \(\mathbf{y} = \mathbf{X} \widehat{\boldsymbol{\beta}} + \mathbf{e}\), we might want to know the uncertainty around a new estimate \(\mathbf{y}^*\) given some new predictor \(\mathbf{X}^*\)
Suppose we wanted to estimate the uncertainty in the average response given by
\[ \widehat{\mathbf{y}}^* = \mathbf{X}^* \widehat{\boldsymbol{\beta}} \]
Recall that the general formula for a CI on a quantity \(z\) is
\[ 100(1 - \alpha)\% ~ \text{CI}: \text{E}(z) ~ \pm ~ t^{(\alpha / 2)}_{df}\text{SD}(z) \]
So we would have
\[ \widehat{\mathbf{y}}^* ~ \pm ~ t^{(\alpha / 2)}_{df} \sqrt{\text{Var} \left( \widehat{\mathbf{y}}^* \right)} \]
We can calculate the SD of our expectation as
\[ \begin{aligned} \text{Var} \left( \widehat{\mathbf{y}}^* \right) &= \text{Var} \left( \mathbf{X}^* \widehat{\boldsymbol{\beta}} \right) \\ &= {\mathbf{X}^*}^{\top} \text{Var}\left( \widehat{\boldsymbol{\beta}} \right) \mathbf{X}^* \\ &= {\mathbf{X}^*}^{\top} \left[ \sigma^2 (\mathbf{X}^{\top} \mathbf{X})^{-1} \right] \mathbf{X}^* \\ &\Downarrow \\ \text{SD} \left( \widehat{\mathbf{y}}^* \right) &= \sigma \sqrt{ {\mathbf{X}^*}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^* } \end{aligned} \]
So our CI on the mean response is given by
\[ \widehat{\mathbf{y}}^* \pm ~ t^{(\alpha / 2)}_{df} \sigma \sqrt{ {\mathbf{X}^*}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^* } \]
What about the uncertainty in a specific prediction?
In that case we need to account for our additional uncertainty owing to the error in our relationship, which is given by
\[ \widehat{\mathbf{y}}^* = \mathbf{X}^* \widehat{\boldsymbol{\beta}} + \mathbf{e} \]
The SD of the new prediction is given by
\[ \begin{aligned} \text{Var} \left( \widehat{\mathbf{y}}^* \right) &= {\mathbf{X}^*}^{\top} \text{Var}\left( \widehat{\boldsymbol{\beta}} \right) \mathbf{X}^* + \text{Var} \left( \mathbf{e} \right) \\ &= {\mathbf{X}^*}^{\top} \left[ \sigma^2 (\mathbf{X}^{\top} \mathbf{X})^{-1} \right] \mathbf{X}^* + \sigma^2\\ &= \sigma^2 \left( {\mathbf{X}^*}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^* + 1 \right) \\ &\Downarrow \\ \text{SD} \left( \widehat{\mathbf{y}}^* \right) &= \sigma \sqrt{1 + {\mathbf{X}^*}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^* } \end{aligned} \]
So our CI on the new prediction is given by
\[ \widehat{\mathbf{y}}^* \pm ~ t^{(\alpha / 2)}_{df} \sigma \sqrt{1 + {\mathbf{X}^*}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^* } \]
This is typically referred to as the prediction interval