29 May 2026

Goals for today

  • Understand some of the diagnostics available for evaluating GLMM fits

  • Become familiar with some of the inferential tools for GLMMs

Diagnostics

Diagnostics for GLMMs are similar to those for GLMs, but we are limited in our choices

 

Brown tree snakes

Our data consist of counts of the number of eggs per female at 10 locations over 13 years

We are interested in the fixed effect of body size and the random effects of location and year

We’ll begin with only the effects of body size and location

Brown tree snakes

Let’s fit some models

library(lme4)

## set control parameters for model fitting
cntrl <- glmerControl(optimizer = "bobyqa", tol = 1e-4, optCtrl = list(maxfun = 100000))

## fixed effect of size plus random effect of location
fit_1 <- glmer(eggs ~ size + (1|loc),
               family = poisson,
               data = df_eggs,
               control = cntrl)

## fixed effect of size plus random effects of location & year
fit_2 <- glmer(eggs ~ size + (1|loc) + (1|year),
               family = poisson,
               data = df_eggs, 
               control = cntrl)

Goodness of fit

Recall our goodness of fit test based on the Pearson’s \(\chi^2\)

\[ X^2 = \sum_{i = 1}^n \frac{(O_i - E_i)^2}{E_i} \sim \chi^2_{(n - 1)} \]


where \(O_i\) is the observed count and \(E_i\) is the expected count

Pearson’s \(\chi^2\) statistic

For a binomial distribution

\[ X^2 = \sum_{i = 1}^n \frac{(y_i - n_i \hat{p}_i)^2}{n_i \hat{p}_i} \]

For a Poisson distribution

\[ X^2 = \sum_{i = 1}^n \frac{(y_i - \lambda_i)^2}{\lambda_i} \]

Goodness of fit

\(H_0\): Our model is correctly specified

## Pearson's X^2 statistic
X2 <- sum((eggs - fitted(fit_2))^2 / fitted(fit_2))

## likelihood ratio test
pchisq(X2, df = nn - length(coef(fit_2)),
       lower.tail = FALSE)
## [1] 0.6447966

The \(p\)-value is large so we cannot reject \(H_0\)

Model diagnostics

Check for heteroscedasticity

Diagnostics

Model diagnostics

Leverage

For other models, we can calculate the leverages to evaluate potentially extreme values in predictor space

For GLMMs, however, the leverages depend on the estimated variance-covariance matrices of the random effects

Leverage

We can fit a GLM and evaluate leverage of the fixed effects

Cook’s Distance

For other models, we can calculate Cook’s distances to identify potentially influential data points

For GLMMs, however, the Cook’s distances involve derivatives of the likelihood with respect to the random effects (this is an active area of research)

Cook’s Distance

Again, we can fit a GLM and look at the Cook’s distances for the fixed effects

Large Cook’s distances (from GLM)

Large snakes with lots of eggs

## large Cook's D
cooks_big <- sort(cooks_d, decreasing = TRUE)[1:2]       

## details about them
df_eggs[as.numeric(names(cooks_big)),]
##     eggs size loc year
## 138   30 0.92   1   12
## 186   28 0.89   1    8

Inference for fixed effects

We can test the significance of the fixed effects via a \(\chi^2\) test
by comparing models with and without the effect(s)

## fit reduced model with only random effects
fit_0 <- glmer(eggs ~ (1 | loc) + (1 | year),
               data = df_eggs, family = poisson)

## compare null and full models
anova(fit_2, fit_0)
## Data: df_eggs
## Models:
## fit_0: eggs ~ (1 | loc) + (1 | year)
## fit_2: eggs ~ size + (1 | loc) + (1 | year)
##       npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## fit_0    3 1148.1 1158.4 -571.03   1142.1                         
## fit_2    4 1041.9 1055.8 -516.97   1033.9 108.14  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Inference for random effects

We can test the significance of the random effects via a \(\chi^2\) test
by comparing models with and without the effect(s)

## fit reduced model with only fixed effects
fit_0 <- glm(eggs ~ size, data = df_eggs,
                 family = poisson(link = "log"))

## compare null and full models
anova(fit_2, fit_0)
## Data: df_eggs
## Models:
## fit_0: eggs ~ size
## fit_2: eggs ~ size + (1 | loc) + (1 | year)
##       npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## fit_0    2 1659.5 1666.4 -827.74   1655.5                         
## fit_2    4 1041.9 1055.8 -516.97   1033.9 621.54  2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Inference for fixed effects

We can evaluate our fixed effects with AIC - as long as we keep the random effects unchanged

## fit null model with only RE's
fit_0 <- glmer(eggs ~ (1 | loc) + (1 | year),
                   family = poisson,
                   data = df_eggs)

## compare AIC
AIC(fit_0, fit_2)
##       df      AIC
## fit_0  3 1148.070
## fit_2  4 1041.932

Inference for random effects

We can test the significance of the random effects using a bootstrap LRT

A significant p-value leads us to reject the null hypothesis that the simpler model is adequate

## fit_1 and fit_2 fit on slide 7

## threshold deviance
lrt <- 2 * (logLik(fit_2) - logLik(fit_1)) 

## bootstrapped estimates
b_boot <- rep(NA, 100)
for(i in 1:100) {
  y <- unlist(simulate(fit_1))
  b_null <- update(fit_1, y ~ ., control = cntrl)
  b_alt <- update(fit_2, y ~ ., control = cntrl)
  b_boot[i] <- as.numeric(2 * (logLik(b_alt) - logLik(b_null)))
}

## approximate p-value
(p_value <- mean(b_boot > lrt))
## [1] 0

Inference

For confidence intervals, we will again use bootstrapping

  • We will cover this in lab

Overdispersion

As with GLMs, we can check GLMMs for evidence of overdispersion, which we estimate as

\[ \hat{c} = \frac{X^2}{n - k} \]


Let’s do so for our snake model applied to another data set

Overdispersion

## Pearson's X^2 statistic
X2 <- sum((eggs - fitted(snakes))^2 / fitted(snakes))

## number of parameters
k <- length(coef(snakes)) + length(ranef(snakes))

## overdispersion parameter
(c_hat <- X2 / (nn - k))
## [1] 2.180794
## test for overdispersion
pchisq(deviance(snakes), k, lower.tail = FALSE)
## [1] 0

Brown tree snakes

Negative binomial

## neg binomial model for overdispersion
fit_nb <- glmer.nb(eggs_nb ~ size + (1 | loc) + (1 | year), 
                   data = df_eggs,
                   control = cntrl)

Brown tree snakes

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(0.4683)  ( log )
## Formula: eggs_nb ~ size + (1 | loc) + (1 | year)
##    Data: df_eggs
## Control: cntrl
## 
##      AIC      BIC   logLik deviance df.resid 
##   1190.9   1208.2   -590.4   1180.9      229 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.6762 -0.6265 -0.3639  0.2628  7.3453 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  year   (Intercept) 0.0000   0.000   
##  loc    (Intercept) 0.1954   0.442   
## Number of obs: 234, groups:  year, 13; loc, 10
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.4447     0.1744   8.283  < 2e-16 ***
## size          0.7127     0.1806   3.946 7.95e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## size -0.042
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Summary

  • Diagnostics for GLMMs are an art rather than a science

  • Use LRTs to test random effects (or just leave them in)

  • Use robust inferential methods like bootstrapped confidence intervals