- Understand types of random effects structures
- Understand how random effects are estimated
- Understand restricted maximum likelihood
- Understand approaches to make inference from mixed models
4 May 2026
| Errors | Single random process | Multiple random processes |
|---|---|---|
| Normal Errors | Linear Model (LM) | Linear Mixed Model (LMM) |
| Multiple Forms of Errors | Generalized Linear Model (GLM) | Generalized Linear Mixed Model (GLMM) |
We are now moving into linear models with multiple random processes
Let’s start with fixed effects, which are what we’ve been estimating
We use fixed effects for
We use fixed effects for
Continuous predictors
Categorical predictors (factors) when
We use fixed effects for:
Continuous predictors
Categorical predictors (factors) when
we are interested in the specific levels (eg, “N”, “P”, “N & P”, “Control”)
the factors don’t come from a larger population (eg, “Age 1”, “Age 2”, “Age 3+”)
We use fixed effects for:
Continuous predictors
Categorical predictors (factors) when
we are interested in the specific levels (eg, “N”, “P”, “N & P”, “Control”)
the factors don’t come from a larger population (eg, “Age 1”, “Age 2”, “Age 3+”)
we only have a few levels (eg, “Plot 1”, “Plot 2”, “Plot 3”)
We use random effects for
We use random effects for
factor levels sampled from a distribution (eg, we randomly chose 12 study plots)
correlated errors when data are
We use random effects for
factor levels sampled from a distribution (eg, we randomly chose 12 study plots)
correlated errors when data are
nested (eg, plants within plots & multiple plots sampled)
time series (eg, productivity of captive birds over many years)
We use random effects for
factor levels sampled from a distribution (eg, we randomly chose 12 study plots)
correlated errors when data are
nested (eg, plants within plots & multiple plots sampled)
time series (eg, productivity of captive birds over many years)
spatial (eg, multiple ponds under study)
Imagine we are interested in modeling the mass of fish measured in several different lakes
We have 3 hypotheses about the variation in fish sizes
Imagine we are interested in modeling the mass of fish measured in several different lakes
We have 3 hypotheses about the variation in fish sizes
differences in mass are due mostly to individual fish with no differences among lakes
differences in mass are due mostly to specific factors that differ among lakes
Imagine we are interested in modeling the mass of fish measured in several different lakes
We have 3 hypotheses about the variation in fish sizes
differences in mass are due mostly to individual fish with no differences among lakes
differences in mass are due mostly to specific factors that differ among lakes
differences in mass are due mostly to general factors that are shared among lakes
Our first model simply treats all of the fish \(i\) in the different lakes \(j\) as one large group
\[ y_{ij} = \mu + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \]
where \(\mu\) is the mean mass of fish across all lakes & our primary interest is the size of \(\sigma_{\epsilon}^2\)
In essence, we are pooling all of fish from the different lakes together so we can drop the \(j\) subscript
\[ y_{ij} = \mu + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \Downarrow \\ y_{i} = \mu + \epsilon_{i} \\ \epsilon_{i} \sim \text{N}(0, \sigma^2_{\epsilon}) \]
Our second model separates all of the fish \(i\) into groups based on the specific lake \(j\) from which they were caught
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \]
where \(\alpha_j\) is the specific effect of lake \(j\)
Here there is no pooling of fish from different lakes and the \(j\) subscript tells us about a specific lake
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \]
Our last model treats differences in fish mass among lakes as similar to one another (correlated)
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \alpha_j \sim \text{N}(0, \sigma^2_{\alpha}) \]
where \(\alpha_j\) is the effect of lake \(j\) as though it were randomly chosen
The degree of correlation among lakes \((\rho)\) is determined by the relative sizes of \(\sigma^2_{\alpha}\) and \(\sigma^2_{\epsilon}\)
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \alpha_j \sim \text{N}(0, \sigma^2_{\alpha}) \\ \Downarrow \\ \rho = \frac{\sigma^2_{\alpha}}{\sigma^2_{\alpha} + \sigma^2_{\epsilon}} \]
Here we could say that the lakes are partially pooled together by formally addressing correlations among lakes
\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \alpha_j \sim \text{N}(0, \sigma^2_{\alpha}) \]
with
\[ \rho = \frac{\sigma^2_{\alpha}}{\sigma^2_{\alpha} + \sigma^2_{\epsilon}} \]
Simple model with complete pooling
## log of fish mass (lfm) as grand mean m1 <- lm(lfm ~ 1)
Fixed effects model with no pooling across lakes
## log of fish mass (lfm) with lake-level means m2 <- lm(lfm ~ 1 + as.factor(IDs))
Random effects model with partial pooling across lakes
## load lme4 package library(lme4) ## log of fish mass (lfm) with lake-level effects m3 <- lmer(lfm ~ 1 + (1|IDs))
In fixed effects models, the group means are
\[ \alpha_j = \bar{y} - \mu \]
In random effects models, the group means “shrink” towards the mean
\[ \alpha_j = (\bar{y} - \mu) \left( \frac{\sigma^2_{\alpha}}{\sigma^2_{\alpha} + \sigma^2_{\epsilon}} \right) \]
Let’s return to our model for fish mass across different lakes
Now we want to include the effect of fish length as well
Fish mass as a function of its length (no lake effects)
\[ y_{i} = \underbrace{\beta_0 + \beta_1 x_{i}}_{\text{fixed}} + \epsilon_{ij} \]
\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)
Fish mass as a function of its length (no lake effects)
## fit global regression model a1 <- lm (lfm ~ lfl)
Fish mass as a function of its length for each lake
\[ y_{ij} = \underbrace{\beta_{0j} + \beta_{1j} x_{ij}}_{\text{fixed}} + \epsilon_{ij} \]
\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)
Fish mass as a function of its length for each lake
## matrix for coefs
cf <- matrix(NA, nl, 2)
## fit regression unique to each lake
for(i in 1:nl) {
cf[i,] <- coef(lm(fm[[i]] ~ fl[[i]]))
}
Fish mass as a function of its length for a random lake
\[ y_{ij} = \underbrace{\beta_{0j} + \beta_1 x_{ij}}_{\text{fixed}} + \underbrace{\alpha_{j}}_\text{random} + \epsilon_{ij} \]
\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)
\(\alpha_{j} \sim \text{N}(0,\sigma_\alpha)\)
Fish mass as a function of its length and random lake
## fit ANCOVA with fixed factor for length & rdm factor for lake a2 <- lmer(lfm ~ lfl + (1|IDs))
Fish mass as a function of its length for a random fish and lake
\[ y_{ij} = (\beta_{0j} + \alpha_{j}) + (\beta_{1j} + \delta_j) x_{ij} + \epsilon_{ij} \\ y_{ij} = \underbrace{\beta_{0j} + \beta_{1j} x_{ij}}_\text{fixed} + \underbrace{\alpha_{j} + \delta_j x_{ij}}_\text{random} + \epsilon_{ij} \]
\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)
\(\alpha_{j} \sim \text{N}(0,\sigma_\alpha)\)
\(\delta_{j} \sim \text{N}(0,\sigma_\delta)\)
Fish mass as a function of its length for a random fish and lake
## fit ANCOVA with random effects for length & lake a3 <- lmer(lfm ~ lfl + (lfl|IDs))
Think hard about your question and data
Are there groups or levels?
Are there temporal or spatial dimensions?