Intro to mixed effects models

4 May 2026

Goals for today

Understand types of random effects structures

Understand how random effects are estimated

Understand restricted maximum likelihood

Understand approaches to make inference from mixed models

General types of models

Errors	Single random process	Multiple random processes
Normal Errors	Linear Model (LM)	Linear Mixed Model (LMM)
Multiple Forms of Errors	Generalized Linear Model (GLM)	Generalized Linear Mixed Model (GLMM)

Multiple random processes

Linear mixed models

We are now moving into linear models with multiple random processes

the first still describes our model errors (as before)

the other(s) describe the random effect(s) of some factor(s)

What do we mean by random effects?

Let’s start with fixed effects, which are what we’ve been estimating

Fixed effects

We use fixed effects for

Continuous predictors

Fixed effects

We use fixed effects for

Continuous predictors
Categorical predictors (factors) when
- we are interested in the specific levels (eg, “N”, “P”, “N & P”, “Control”)

Fixed effects

We use fixed effects for:

Continuous predictors
Categorical predictors (factors) when
- we are interested in the specific levels (eg, “N”, “P”, “N & P”, “Control”)
- the factors don’t come from a larger population (eg, “Age 1”, “Age 2”, “Age 3+”)

Fixed effects

We use fixed effects for:

Continuous predictors
Categorical predictors (factors) when
- we are interested in the specific levels (eg, “N”, “P”, “N & P”, “Control”)
- the factors don’t come from a larger population (eg, “Age 1”, “Age 2”, “Age 3+”)
- we only have a few levels (eg, “Plot 1”, “Plot 2”, “Plot 3”)

Fixed vs. random effects?

We use random effects for

factor levels sampled from a distribution (eg, we randomly chose 12 study plots)

Fixed vs. random effects?

We use random effects for

factor levels sampled from a distribution (eg, we randomly chose 12 study plots)
correlated errors when data are
- nested (eg, plants within plots & multiple plots sampled)

Fixed vs. random effects?

We use random effects for

factor levels sampled from a distribution (eg, we randomly chose 12 study plots)
correlated errors when data are
- nested (eg, plants within plots & multiple plots sampled)
- time series (eg, productivity of captive birds over many years)

Fixed vs. random effects?

We use random effects for

factor levels sampled from a distribution (eg, we randomly chose 12 study plots)
correlated errors when data are
- nested (eg, plants within plots & multiple plots sampled)
- time series (eg, productivity of captive birds over many years)
- spatial (eg, multiple ponds under study)

Model for means

Imagine we are interested in modeling the mass of fish measured in several different lakes

We have 3 hypotheses about the variation in fish sizes

differences in mass are due mostly to individual fish with no differences among lakes

Model for means

Imagine we are interested in modeling the mass of fish measured in several different lakes

We have 3 hypotheses about the variation in fish sizes

differences in mass are due mostly to individual fish with no differences among lakes
differences in mass are due mostly to specific factors that differ among lakes

Model for means

Imagine we are interested in modeling the mass of fish measured in several different lakes

We have 3 hypotheses about the variation in fish sizes

differences in mass are due mostly to individual fish with no differences among lakes
differences in mass are due mostly to specific factors that differ among lakes
differences in mass are due mostly to general factors that are shared among lakes

Model for means

Our first model simply treats all of the fish \(i\) in the different lakes \(j\) as one large group

\[ y_{ij} = \mu + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \]

where \(\mu\) is the mean mass of fish across all lakes & our primary interest is the size of \(\sigma_{\epsilon}^2\)

Model for means

In essence, we are pooling all of fish from the different lakes together so we can drop the \(j\) subscript

\[ y_{ij} = \mu + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \Downarrow \\ y_{i} = \mu + \epsilon_{i} \\ \epsilon_{i} \sim \text{N}(0, \sigma^2_{\epsilon}) \]

Model for means

Our second model separates all of the fish \(i\) into groups based on the specific lake \(j\) from which they were caught

\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \]

where \(\alpha_j\) is the specific effect of lake \(j\)

Model for means

Here there is no pooling of fish from different lakes and the \(j\) subscript tells us about a specific lake

\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \]

Model for means

Our last model treats differences in fish mass among lakes as similar to one another (correlated)

\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \alpha_j \sim \text{N}(0, \sigma^2_{\alpha}) \]

where \(\alpha_j\) is the effect of lake \(j\) as though it were randomly chosen

Model for means

The degree of correlation among lakes \((\rho)\) is determined by the relative sizes of \(\sigma^2_{\alpha}\) and \(\sigma^2_{\epsilon}\)

\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \alpha_j \sim \text{N}(0, \sigma^2_{\alpha}) \\ \Downarrow \\ \rho = \frac{\sigma^2_{\alpha}}{\sigma^2_{\alpha} + \sigma^2_{\epsilon}} \]

Model for means

Here we could say that the lakes are partially pooled together by formally addressing correlations among lakes

\[ y_{ij} = \mu + \alpha_j + \epsilon_{ij} \\ \epsilon_{ij} \sim \text{N}(0, \sigma^2_{\epsilon}) \\ \alpha_j \sim \text{N}(0, \sigma^2_{\alpha}) \]

with

\[ \rho = \frac{\sigma^2_{\alpha}}{\sigma^2_{\alpha} + \sigma^2_{\epsilon}} \]

Model for means

Fish mass across lakes

Simple model with complete pooling

## log of fish mass (lfm) as grand mean
m1 <- lm(lfm ~ 1)

Fish mass across lakes

Fixed effects model with no pooling across lakes

## log of fish mass (lfm) with lake-level means
m2 <- lm(lfm ~ 1 + as.factor(IDs))

Fish mass across lakes

Random effects model with partial pooling across lakes

## load lme4 package
library(lme4)
## log of fish mass (lfm) with lake-level effects
m3 <- lmer(lfm ~ 1 + (1|IDs))

Fish mass across lakes

Shrinkage of group means

In fixed effects models, the group means are

\[ \alpha_j = \bar{y} - \mu \]

In random effects models, the group means “shrink” towards the mean

\[ \alpha_j = (\bar{y} - \mu) \left( \frac{\sigma^2_{\alpha}}{\sigma^2_{\alpha} + \sigma^2_{\epsilon}} \right) \]

QUESTIONS?

Fish mass across lakes

Let’s return to our model for fish mass across different lakes

Now we want to include the effect of fish length as well

Fish mass versus length

A global regression model

Fish mass as a function of its length (no lake effects)

\[ y_{i} = \underbrace{\beta_0 + \beta_1 x_{i}}_{\text{fixed}} + \epsilon_{ij} \]

\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)

A global regression model

Fish mass as a function of its length (no lake effects)

## fit global regression model
a1 <- lm (lfm ~ lfl)

A global regression model

Unique regression models

Fish mass as a function of its length for each lake

\[ y_{ij} = \underbrace{\beta_{0j} + \beta_{1j} x_{ij}}_{\text{fixed}} + \epsilon_{ij} \]

\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)

Unique regression models

Fish mass as a function of its length for each lake

## matrix for coefs
cf <- matrix(NA, nl, 2)
## fit regression unique to each lake
for(i in 1:nl) {
  cf[i,] <- coef(lm(fm[[i]] ~ fl[[i]]))
}

Unique regression models

A linear mixed model

Fish mass as a function of its length for a random lake

\[ y_{ij} = \underbrace{\beta_{0j} + \beta_1 x_{ij}}_{\text{fixed}} + \underbrace{\alpha_{j}}_\text{random} + \epsilon_{ij} \]

\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)

\(\alpha_{j} \sim \text{N}(0,\sigma_\alpha)\)

A linear model (ANCOVA)

Fish mass as a function of its length and random lake

## fit ANCOVA with fixed factor for length & rdm factor for lake
a2 <- lmer(lfm ~ lfl + (1|IDs))

Fish mass versus length

A random effects model

Fish mass as a function of its length for a random fish and lake

\[ y_{ij} = (\beta_{0j} + \alpha_{j}) + (\beta_{1j} + \delta_j) x_{ij} + \epsilon_{ij} \\ y_{ij} = \underbrace{\beta_{0j} + \beta_{1j} x_{ij}}_\text{fixed} + \underbrace{\alpha_{j} + \delta_j x_{ij}}_\text{random} + \epsilon_{ij} \]

\(\epsilon_{ij} \sim \text{N}(0,\sigma_\epsilon)\)

\(\alpha_{j} \sim \text{N}(0,\sigma_\alpha)\)

\(\delta_{j} \sim \text{N}(0,\sigma_\delta)\)

A random effects model

Fish mass as a function of its length for a random fish and lake

## fit ANCOVA with random effects for length & lake
a3 <- lmer(lfm ~ lfl + (lfl|IDs))

A random effects model

Model diagnostics

Summary

Think hard about your question and data
- Are there groups or levels?
- Are there temporal or spatial dimensions?

Decide what random effects make sense