- Understand types of random effects structures
- Understand how random effects are estimated
- Understand restricted maximum likelihood
- Understand approaches to make inference from mixed models
4 May 2020
Imagine we are interested in modeling the mass of fish measured in several different lakes
We have 3 hypotheses about the variation in fish sizes
Imagine we are interested in modeling the mass of fish measured in several different lakes
We have 3 hypotheses about the variation in fish sizes
differences in mass are due mostly to individual fish with no differences among lakes
differences in mass are due mostly to specific factors that differ among lakes
Imagine we are interested in modeling the mass of fish measured in several different lakes
We have 3 hypotheses about the variation in fish sizes
differences in mass are due mostly to individual fish with no differences among lakes
differences in mass are due mostly to specific factors that differ among lakes
differences in mass are due mostly to general factors that are shared among lakes
Our first model simply treats all of the fish \(i\) in the different lakes \(j\) as one large group
\[ y_{ij} = \mu + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \]
where \(\mu\) is the mean mass of fish across all lakes & our primary interest is the size of \(\sigma_{\epsilon}^2\)
In essence, we are pooling all of fish from the different lakes together so we can drop the \(j\) subscript
\[ y_{ij} = \mu + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \Downarrow \\ y_{i} = \mu + \epsilon_{i} \\ \epsilon_{i} \sim \text{N}(0, \sigma^2_{\epsilon}) \]
Our second model separates all of the fish \(i\) into groups based on the specific lake \(j\) from which they were caught
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \]
where \(\alpha_j\) is the specific effect of lake \(j\)
Here there is no pooling of fish from different lakes and the \(j\) subscript tells us about a specific lake
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \]
Our last model treats differences in fish mass among lakes as similar to one another (correlated)
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \alpha_j \sim \text{N}(0, \sigma^2_{\alpha}) \]
where \(\alpha_j\) is the effect of lake \(j\) as though it were randomly chosen
The degree of correlation among lakes \((\rho)\) is determined by the relative sizes of \(\sigma^2_{\alpha}\) and \(\sigma^2_{\epsilon}\)
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \alpha_j \sim \text{N}(0, \sigma^2_{\alpha}) \\ \Downarrow \\ \rho = \frac{\sigma^2_{\alpha}}{\sigma^2_{\alpha} + \sigma^2_{\epsilon}} \]
Here we could say that the lakes are partially pooled together by formally addressing correlations among lakes
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \alpha_j \sim \text{N}(0, \sigma^2_{\alpha}) \]
with
\[ \rho = \frac{\sigma^2_{\alpha}}{\sigma^2_{\alpha} + \sigma^2_{\epsilon}} \]
Simple model with complete pooling
## log of fish mass (lfm) as grand mean m1 <- lm(lfm ~ 1)
Fixed effects model with no pooling across lakes
## log of fish mass (lfm) with lake-level means m2 <- lm(lfm ~ 1 + as.factor(IDs))
Random effects model with partial pooling across lakes
## load lme4 package library(lme4) ## log of fish mass (lfm) with lake-level effects m3 <- lmer(lfm ~ 1 + (1|IDs))
In fixed effects models, the group means are
\[ \alpha_j = \bar{y} - \mu \]
In random effects models, the group means “shrink” towards the mean
\[ \alpha_j = (\bar{y} - \mu) \left( \frac{\sigma^2_{\alpha}}{\sigma^2_{\alpha} + \sigma^2_{\epsilon}} \right) \]
Let’s return to our model for fish mass across different lakes
Now we want to include the effect of fish length as well
Fish mass as a function of its length (no lake effects)
\[ y_{i} = \underbrace{\beta_0 + \beta_1 x_{i}}_{\text{fixed}} + \epsilon_{ij} \]
\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)
Fish mass as a function of its length (no lake effects)
## fit global regression model a1 <- lm (lfm ~ lfl)
Fish mass as a function of its length for each lake
\[ y_{ij} = \underbrace{\beta_{0j} + \beta_{1j} x_{ij}}_{\text{fixed}} + \epsilon_{ij} \]
\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)
Fish mass as a function of its length for each lake
## matrix for coefs cf <- matrix(NA, nl, 2) ## fit regression unique to each lake for(i in 1:nl) { cf[i,] <- coef(lm(fm[[i]] ~ fl[[i]])) }
Fish mass as a function of its length for a random lake
\[ y_{ij} = \underbrace{\beta_{0j} + \beta_1 x_{ij}}_{\text{fixed}} + \underbrace{\alpha_{j}}_\text{random} + \epsilon_{ij} \]
\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)
\(\alpha_{j} \sim \text{N}(0,\sigma_\alpha)\)
Fish mass as a function of its length and random lake
## fit ANCOVA with fixed factor for length & rdm factor for lake a2 <- lmer(lfm ~ lfl + (1|IDs))
Fish mass as a function of its length for a random fish and lake
\[ y_{ij} = (\beta_{0j} + \alpha_{j}) + (\beta_{1j} + \delta_j) x_{ij} + \epsilon_{ij} \\ y_{ij} = \underbrace{\beta_{0j} + \beta_{1j} x_{ij}}_\text{fixed} + \underbrace{\alpha_{j} + \delta_j x_{ij}}_\text{random} + \epsilon_{ij} \]
\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)
\(\alpha_{j} \sim \text{N}(0,\sigma_\alpha)\)
\(\delta_{j} \sim \text{N}(0,\sigma_\delta)\)
Fish mass as a function of its length for a random fish and lake
## fit ANCOVA with random effects for length & lake a3 <- lmer(lfm ~ lfl + (lfl|IDs))
We have seen how to write a general linear model as
\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{e} \]
where \(\mathbf{X}\) is the design matrix and \(\boldsymbol{\beta}\) contains the fixed effects of \(\mathbf{X}\) on \(\mathbf{y}\)
We can extend the general linear model to include both of fixed and random effects (a mixed effects model)
\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\alpha} + \mathbf{e} \]
where \(\mathbf{Z}\) is also a design matrix and \(\mathbf{Z}\) contains a mix of \(z \in \{-1,0,1\}\) and \(z \in \mathbb{R}\)
We can extend the general linear model to include both of fixed and random effects (a mixed effects model)
\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\alpha} + \mathbf{e} \\ ~ \\ \mathbf{e} \sim \text{MVN}(\mathbf{0}, \sigma^2 \mathbf{I}) \\ ~ \\ \boldsymbol{\alpha} \sim \text{MVN}(\mathbf{0}, \sigma^2 \mathbf{D}) \]
where \(\mathbf{I}\) is the identity matrix and \(\mathbf{D}\) is a square matrix of constants
Variance decomposition
\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\alpha} + \mathbf{e} \\ \Downarrow \\ \text{Var}(\mathbf{y}) = \text{Var}(\mathbf{\mathbf{X} \boldsymbol{\beta}}) + \text{Var}(\mathbf{Z} \boldsymbol{\alpha}) + \text{Var}(\mathbf{e}) \]
Variance of random components
\[ \text{Var}(\mathbf{y} | \mathbf{\mathbf{X} \boldsymbol{\beta}}) = \text{Var}(\mathbf{Z} \boldsymbol{\alpha}) + \text{Var}(\mathbf{e}) \\ \Downarrow \\ \begin{aligned} \mathbf{V} &= \mathbf{Z} \text{Var} (\boldsymbol{\alpha}) \mathbf{Z}^\top + \text{Var}(\mathbf{e}) \\ &= \mathbf{Z} (\sigma^2 \mathbf{D}) \mathbf{Z}^\top + \sigma^2 \mathbf{I} \\ &= \sigma^2 (\mathbf{Z} \mathbf{D} \mathbf{Z}^\top + \mathbf{I}) \end{aligned} \]
Recall that we think of likelihoods in terms of the observed data
But the random effects in our model are unobserved random variables, so we need to integrate them out of the likelihood
The log-likelihood for the fixed effects \(\boldsymbol{\beta}\)
\[ \log \mathcal{L}(\mathbf{y}; \boldsymbol{\beta}, \sigma^2) = - \frac{1}{2} \log \left| \mathbf{V} \right| - \frac{1}{2}( \mathbf{y} - \mathbf{X} \boldsymbol{\beta})^\top \mathbf{V}^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) \]
This leads us to our familiar statement for the weighted least squares estimate for \(\boldsymbol{\beta}\)
\[ \begin{aligned} \hat{\boldsymbol{\beta}} &= \min ~ (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})^{\top} \mathbf{V}^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) \\ &= (\mathbf{X}^{\top} \mathbf{V}^{-1} \mathbf{X}) \mathbf{X}^{\top} \mathbf{V}^{-1} \mathbf{y} \end{aligned} \]
Our variance estimate for \(\boldsymbol{\beta}\) is then
\[ \text{Var}(\hat{\boldsymbol{\beta}}) = (\mathbf{X}^{\top} \mathbf{V}^{-1} \mathbf{X})^{-1} \]
The log-likelihood for the random effects is given by
\[ \begin{aligned} \log \mathcal{L}(\mathbf{y}; \boldsymbol{\beta}, \sigma^2) = - \frac{\sigma^2}{2} &- \frac{1}{2 \sigma^2}( \mathbf{y} - \mathbf{X} \boldsymbol{\beta} - \mathbf{Z} \boldsymbol{\alpha})^\top (\mathbf{y} - \mathbf{X} \boldsymbol{\beta} - \mathbf{Z} \boldsymbol{\alpha}) \\ &- \frac{1}{2} \left| \mathbf{Z} \mathbf{D} \mathbf{Z}^\top\right| - \frac{1}{2} \boldsymbol{\alpha}^\top (\mathbf{Z} \mathbf{D} \mathbf{Z}^\top)^{-1} \boldsymbol{\alpha} \end{aligned} \]
This leads to the best linear unbiased predictor for \(\boldsymbol{\alpha}\)
\[ \hat{\boldsymbol{\alpha}} = \sigma^2 (\mathbf{Z} \mathbf{D} \mathbf{Z}^\top) \mathbf{Z}^\top \mathbf{V}^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) \]
Estimating the parameters in a mixed effects model requires restricted maximum likelihood (REML)
REML works by
estimating the fixed effects \((\hat{\boldsymbol{\beta}})\) via ML
using the \(\hat{\boldsymbol{\beta}}\) to estimate the \(\hat{\boldsymbol{\alpha}}\)
lme4 makes this easy for us
With random effects models, we can’t use our standard inference tools because we don’t know the distributions for our test statistic
(lme4 won’t give \(p\)-values)
We can use a likelihood ratio test for nested models, but the assumption of a \(\chi^2\) distribution can be poor
We can also use \(F\)-tests to evaluate a single fixed effect, but again the assumption of a \(F\) distribution can be poor
We can use bootstrapping to conduct likelihood ratio tests
simulate data from the simple model
fit simple & full model and calculate likelihood ratio
see where test statistic falls within estimated distribution from (2)
We can report parameter estimates and CI’s via bootstrapping
We can generate predictions given fixed and random effects and estimate their uncertainty via bootstrapping
Recall that \(AIC = 2 k - 2 \log \mathcal{L}\)
The problem with mixed effects models is that it’s not clear what \(k\) equals
It works well to select among fixed effects if random effects are held constant
To use AIC, we can follow these steps
Fit a model with all of the possible fixed-effects included
Keep the fixed effects constant and search for random effects
Keep random effects as is and fit different fixed effects
Other options include
BIC
cross-validation
Decide what random effects make sense
Once random effects are chosen, select fixed effects
Inference will generally require bootstrapping