Understand some of the diagnostics available for evaluating GLMM fits
Become familiar with some of the inferential tools for GLMMs
29 May 2026
Understand some of the diagnostics available for evaluating GLMM fits
Become familiar with some of the inferential tools for GLMMs
Diagnostics for GLMMs are similar to those for GLMs, but we are limited in our choices
Our data consist of counts of the number of eggs per female at 10 locations over 13 years
We are interested in the fixed effect of body size and the random effects of location and year
We’ll begin with only the effects of body size and location
library(lme4)
## set control parameters for model fitting
cntrl <- glmerControl(optimizer = "bobyqa", tol = 1e-4, optCtrl = list(maxfun = 100000))
## fixed effect of size plus random effect of location
fit_1 <- glmer(eggs ~ size + (1|loc),
family = poisson,
data = df_eggs,
control = cntrl)
## fixed effect of size plus random effects of location & year
fit_2 <- glmer(eggs ~ size + (1|loc) + (1|year),
family = poisson,
data = df_eggs,
control = cntrl)
Recall our goodness of fit test based on the Pearson’s \(\chi^2\)
\[ X^2 = \sum_{i = 1}^n \frac{(O_i - E_i)^2}{E_i} \sim \chi^2_{(n - 1)} \]
where \(O_i\) is the observed count and \(E_i\) is the expected count
For a binomial distribution
\[ X^2 = \sum_{i = 1}^n \frac{(y_i - n_i \hat{p}_i)^2}{n_i \hat{p}_i} \]
For a Poisson distribution
\[ X^2 = \sum_{i = 1}^n \frac{(y_i - \lambda_i)^2}{\lambda_i} \]
\(H_0\): Our model is correctly specified
## Pearson's X^2 statistic
X2 <- sum((eggs - fitted(fit_2))^2 / fitted(fit_2))
## likelihood ratio test
pchisq(X2, df = nn - length(coef(fit_2)),
lower.tail = FALSE)
## [1] 0.6447966
The \(p\)-value is large so we cannot reject \(H_0\)
Check for heteroscedasticity
Use DHARMa::simulateResiduals to scale the residuals
For other models, we can calculate the leverages to evaluate potentially extreme values in predictor space
For GLMMs, however, the leverages depend on the estimated variance-covariance matrices of the random effects
We can fit a GLM and evaluate leverage of the fixed effects
For other models, we can calculate Cook’s distances to identify potentially influential data points
For GLMMs, however, the Cook’s distances involve derivatives of the likelihood with respect to the random effects (this is an active area of research)
Again, we can fit a GLM and look at the Cook’s distances for the fixed effects
Large snakes with lots of eggs
## large Cook's D cooks_big <- sort(cooks_d, decreasing = TRUE)[1:2] ## details about them df_eggs[as.numeric(names(cooks_big)),]
## eggs size loc year ## 138 30 0.92 1 12 ## 186 28 0.89 1 8
We can test the significance of the fixed effects via a \(\chi^2\) test
by comparing models with and without the effect(s)
## fit reduced model with only random effects
fit_0 <- glmer(eggs ~ (1 | loc) + (1 | year),
data = df_eggs, family = poisson)
## compare null and full models
anova(fit_2, fit_0)
## Data: df_eggs ## Models: ## fit_0: eggs ~ (1 | loc) + (1 | year) ## fit_2: eggs ~ size + (1 | loc) + (1 | year) ## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) ## fit_0 3 1148.1 1158.4 -571.03 1142.1 ## fit_2 4 1041.9 1055.8 -516.97 1033.9 108.14 1 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can test the significance of the random effects via a \(\chi^2\) test
by comparing models with and without the effect(s)
## fit reduced model with only fixed effects
fit_0 <- glm(eggs ~ size, data = df_eggs,
family = poisson(link = "log"))
## compare null and full models
anova(fit_2, fit_0)
## Data: df_eggs ## Models: ## fit_0: eggs ~ size ## fit_2: eggs ~ size + (1 | loc) + (1 | year) ## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) ## fit_0 2 1659.5 1666.4 -827.74 1655.5 ## fit_2 4 1041.9 1055.8 -516.97 1033.9 621.54 2 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can evaluate our fixed effects with AIC - as long as we keep the random effects unchanged
## fit null model with only RE's
fit_0 <- glmer(eggs ~ (1 | loc) + (1 | year),
family = poisson,
data = df_eggs)
## compare AIC
AIC(fit_0, fit_2)
## df AIC ## fit_0 3 1148.070 ## fit_2 4 1041.932
We can test the significance of the random effects using a bootstrap LRT
A significant p-value leads us to reject the null hypothesis that the simpler model is adequate
## fit_1 and fit_2 fit on slide 7
## threshold deviance
lrt <- 2 * (logLik(fit_2) - logLik(fit_1))
## bootstrapped estimates
b_boot <- rep(NA, 100)
for(i in 1:100) {
y <- unlist(simulate(fit_1))
b_null <- update(fit_1, y ~ ., control = cntrl)
b_alt <- update(fit_2, y ~ ., control = cntrl)
b_boot[i] <- as.numeric(2 * (logLik(b_alt) - logLik(b_null)))
}
## approximate p-value
(p_value <- mean(b_boot > lrt))
## [1] 0
For confidence intervals, we will again use bootstrapping
As with GLMs, we can check GLMMs for evidence of overdispersion, which we estimate as
\[ \hat{c} = \frac{X^2}{n - k} \]
Let’s do so for our snake model applied to another data set
## Pearson's X^2 statistic X2 <- sum((eggs - fitted(snakes))^2 / fitted(snakes)) ## number of parameters k <- length(coef(snakes)) + length(ranef(snakes)) ## overdispersion parameter (c_hat <- X2 / (nn - k))
## [1] 2.180794
## test for overdispersion pchisq(deviance(snakes), k, lower.tail = FALSE)
## [1] 0
## neg binomial model for overdispersion
fit_nb <- glmer.nb(eggs_nb ~ size + (1 | loc) + (1 | year),
data = df_eggs,
control = cntrl)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(0.4683) ( log )
## Formula: eggs_nb ~ size + (1 | loc) + (1 | year)
## Data: df_eggs
## Control: cntrl
##
## AIC BIC logLik deviance df.resid
## 1190.9 1208.2 -590.4 1180.9 229
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.6762 -0.6265 -0.3639 0.2628 7.3453
##
## Random effects:
## Groups Name Variance Std.Dev.
## year (Intercept) 0.0000 0.000
## loc (Intercept) 0.1954 0.442
## Number of obs: 234, groups: year, 13; loc, 10
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.4447 0.1744 8.283 < 2e-16 ***
## size 0.7127 0.1806 3.946 7.95e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## size -0.042
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Diagnostics for GLMMs are an art rather than a science
Use LRTs to test random effects (or just leave them in)
Use robust inferential methods like bootstrapped confidence intervals