Sit back and reflect on how much you’ve learned
29 May 2020
Sit back and reflect on how much you’ve learned
Identify an appropriate statistical model based on the data and specific question
Understand the assumptions behind a chosen statistical model
Use R to fit a variety of linear models to data
Evaluate data support for various models and select the most parsimonious model among them
Use R Markdown to combine text, equations, code, tables, and figures into reports
The total deviations in the data equal the sum of those for the model and errors
\[ \underbrace{y_i - \bar{y}}_{\text{Total}} = \underbrace{\hat{y}_i - \bar{y}}_{\text{Model}} + \underbrace{y_i - \hat{y}_i}_{\text{Error}} \]
The sums-of-squares have the same additive property as the deviations
\[ \underbrace{\sum (y_i - \bar{y})^2}_{SSTO} = \underbrace{\sum (\hat{y}_i - \bar{y})^2}_{SSR} + \underbrace{\sum (y_i - \hat{y}_i)^2}_{SSE} \]
\[ y_i = \beta_0 + \beta_1 x_{1,i} + \beta_2 x_{2,i} + e_i \\ \Downarrow \\ \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{bmatrix} = \begin{bmatrix} 1 & x_{1,1} & x_{2,1} \\ 1 & x_{1,2} & x_{2,2} \\ \vdots & \vdots \\ 1 & x_{1,N} & x_{2,N} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix} + \begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_N \end{bmatrix} \\ \Downarrow \\ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{e} \]
Rewriting our model, we have
\[ \mathbf{y} = \mathbf{X} \hat{\boldsymbol{\beta}} + \mathbf{e} \\ \Downarrow \\ \mathbf{e} = \mathbf{y} - \mathbf{X} \hat{\boldsymbol{\beta}} \]
so the sum of squared errors is
\[ \mathbf{e}^{\top} \mathbf{e} = (\mathbf{y} - \mathbf{X} \hat{\boldsymbol{\beta}})^{\top} (\mathbf{y} - \mathbf{X} \hat{\boldsymbol{\beta}}) \]
Minimizing the sum of squared errors leads to
\[ \hat{\boldsymbol{\beta}} = (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^{\top} \mathbf{y} \\ \Downarrow \\ \hat{\mathbf{y}} = \mathbf{X} \hat{\boldsymbol{\beta}} \]
The variance of \(\hat{\boldsymbol{\beta}}\) is given by
\[ \text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2 (\mathbf{X}^{\top} \mathbf{X})^{-1} \]
This suggests that our confidence in our estimate increases with the spread in \(\mathbf{X}\)
Consider these two scenarios where the slope of the relationship is identical
A CI for the mean response is given by
\[ \hat{\mathbf{y}}^* \pm ~ t^{(\alpha / 2)}_{df} \sigma \sqrt{ {\mathbf{X}^*}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^* } \]
A CI on a new prediction is given by
\[ \hat{\mathbf{y}}^* \pm ~ t^{(\alpha / 2)}_{df} \sigma \sqrt{1 + {\mathbf{X}^*}^{\top} (\mathbf{X}^{\top} \mathbf{X})^{-1} \mathbf{X}^* } \]
This is typically referred to as the prediction interval
Fixed effects describe specific levels of factors that are not part of a larger group
Random effects describe varying levels of factors drawn from a larger group
We can extend the general linear model to include both of fixed and random effects
\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\alpha} + \mathbf{e} \\ ~ \\ \mathbf{e} \sim \text{MVN}(\mathbf{0}, \sigma^2 \mathbf{I}) \\ ~ \\ \boldsymbol{\alpha} \sim \text{MVN}(\mathbf{0}, \sigma^2 \mathbf{D}) \]
Estimating the parameters in a mixed effects model requires restricted maximum likelihood (REML)
REML works by
estimating the fixed effects \((\hat{\boldsymbol{\beta}})\) via ML
using the \(\hat{\boldsymbol{\beta}}\) to estimate the \(\hat{\boldsymbol{\alpha}}\)
To use AIC, we can follow these steps
Fit a model with all of the possible fixed-effects included
Keep the fixed effects constant and search for random effects
Keep random effects as is and fit different fixed effects
Three important components
Distribution of the data \(y \sim f_{\theta}(y)\)
Link function \(g(\eta)\)
Linear predictor \(\eta = \mathbf{X} \boldsymbol{\beta}\)
Distribution | Link function | Mean function |
---|---|---|
Identity | \(1(\mu) = \mathbf{X} \boldsymbol{\beta}\) | \(\mu = \mathbf{X} \boldsymbol{\beta}\) |
Log | \(\log (\mu) = \mathbf{X} \boldsymbol{\beta}\) | \(\mu = \exp (\mathbf{X} \boldsymbol{\beta})\) |
Logit | \(\log \left( \frac{\mu}{1 - \mu} \right) = \mathbf{X} \boldsymbol{\beta}\) | \(\mu = \frac{\exp (\mathbf{X} \boldsymbol{\beta})}{1 + \exp (\mathbf{X} \boldsymbol{\beta})}\) |
We need 3 things to specify our GLM
Distribution of the data: \(y \sim \text{Bernoulli}(p)\)
Link function: \(\text{logit}(p) = \log \left( \frac{p}{1 - p} \right) = \eta\)
Linear predictor: \(\eta = \mathbf{X} \boldsymbol{\beta}\)
We need 3 things to specify our GLM
Distribution of the data: \(y \sim \text{Binomial}(N, p)\)
Link function: \(\text{logit}(p) = \log \left( \frac{p}{1 - p} \right) = \eta\)
Linear predictor: \(\eta = \mathbf{X} \boldsymbol{\beta}\)
The variance is larger than expected
Overdispersion generally arises in 2 ways related to IID errors
trials occur in groups & \(p\) is not constant among groups
trials are not independent
We can estimate the dispersion \(c\) from the deviance \(D\) as
\[ \hat{c} = \frac{D}{n - k} \]
or from Pearson’s \(\chi^2\) as
\[ \hat{c} = \frac{X^2}{n - k} \]
The estimate of \(\hat{\boldsymbol{\beta}}\) is not affected by overdispersion…
but the variance of \(\hat{\boldsymbol{\beta}}\) is affected, such that
\[ \text{Var}(\hat{\boldsymbol{\beta}}) = \hat{c} \left( \mathbf{X}^{\top} \hat{\mathbf{W}} \mathbf{X} \right)^{-1} \]
Model | Pros | Cons |
---|---|---|
binomial | Easy | Underestimates variance |
binomial with VIF | Easy; estimate of variance | Ad hoc |
quasi-binomial | Easy; estimate of variance | No distribution for inference |
beta-binomial | Strong foundation | Somewhat hard to implement |
With count data, we only know the frequency of occurrence
That is, how often something occurred without knowing how often it did not occur
Counts \((y_i)\) as a function of covariates
\[ \begin{aligned} \text{data distribution:} & ~~ y_i \sim \text{Poisson}(\lambda_i) \\ \\ \text{link function:} & ~~ \text{log}(\lambda_i) = \mu_i \\ \\ \text{linear predictor:} & ~~ \mu_i = \mathbf{X} \boldsymbol{\beta} \end{aligned} \]
We can consider the possibility that the variance scales linearly with the mean
\[
\text{Var}(y) = c \lambda
\]
If \(c\) = 1 then \(y \sim \text{Poisson}(\lambda)\)
If \(c\) > 1 the data are overdispersed
Counts \((y_i)\) as a function of covariates
\[ \begin{aligned} \text{data distribution:} & ~~ y_i \sim \text{negBin}(r, \mu_i) \\ \\ \text{link function:} & ~~ \text{log}(\mu_i) = \eta_i \\ \\ \text{linear predictor:} & ~~ \eta_i = \mathbf{X} \boldsymbol{\beta} \end{aligned} \]
Zero-truncated data cannot take a value of 0
Although somewhat rare in ecological studies, examples include
time a whale is at the surface before diving
herd size in elk
number of fin rays on a fish
The probability that \(y_i = 0\) and \(y_i \neq 0\)
\[ f(y_i = 0; \lambda_i) = \exp (\text{-} \lambda_i) \\ \Downarrow \\ f(y_i \neq 0; \lambda_i) = 1 - \exp (\text{-} \lambda_i) \]
We can exclude the probability that \(y_i = 0\) by dividing the pmf by the probability that \(y_i \neq 0\)
\[ f(y_i; \lambda_i) = \frac{\exp (\text{-} \lambda_i) \lambda_{i}^{y_i}}{y_i!} \\ \Downarrow \\ f(y_i; \lambda_i | y_i > 0) = \frac{\exp (\text{-} \lambda_i) \lambda_{i}^{y_i}}{y_i!} \cdot \frac{1}{1 - \exp (\text{-} \lambda_i)} \\ \Downarrow \\ \log \mathcal{L} = \left( y_i \log \lambda_i - \lambda_i \right) - \left( 1 - \exp (\text{-} \lambda_i) \right) \]
We can exclude the probability that \(y_i = 0\) by dividing the pmf by the probability that \(y_i \neq 0\)
\[ f(y; r, \mu) = \frac{(y+r-1) !}{(r-1) ! y !} \left( \frac{r}{\mu + r} \right)^{r} \left( \frac{\mu}{\mu + r} \right)^{y} \\ \Downarrow \\ f(y_i; \lambda_i | y_i > 0) = \frac{ \frac{(y+r-1) !}{(r-1) ! y !} \left( \frac{r}{\mu + r} \right)^{r} \left( \frac{\mu}{\mu + r} \right)^{y} }{ 1 - \left( \frac{r}{\mu + r} \right)^{r} } \\ \Downarrow \\ \log \mathcal{L} = \log \mathcal{L}(\text{NB}) - \log \left( 1 - \left( \frac{r}{\mu + r} \right)^{r} \right) \]
Lots of count data are zero-inflated
The data contain more zeros than would be expected under a Poisson or negative binomial distribution
In general, there are 4 different types of errors that cause zeros
Structural (animal absent because the habitat is unsuitable)
Design (sampling is limited temporally or spatially)
Observer error (inexperience or difficult circumstances)
Process error (habitat is suitable but unused)
There are 2 general approaches for dealing with zero-inflated data
Zero-altered (“hurdle”) models
Zero-inflated (“mixture”) models
Hurdle models do not discriminate among the 4 types of zeros
The data are treated as 2 distinct groups:
Zeros
Non-zero counts
Hurdle models consist of 2 parts
Use a binomial model to determine the probability of a zero
If non-zero (“over the hurdle”), use a truncated Poisson or negative binomial to model the positive counts
Zero-inflated (mixture) models treat the zeros as coming from 2 sources
observation errors (missed detections)
ecological (function of environment)
Zero-inflated (mixture) models consist of 2 parts
Use a binomial model to determine the probability of a zero
Use a Poisson or negative binomial to model counts, which can include zeros
Source | Reason | Over-dispersion | Zero inflation | Approach |
---|---|---|---|---|
Random | Sampling variability | No | No | Poisson |
Yes | No | Neg binomial | ||
Structural | Outside count process | No | Yes | ZAP or ZIP |
Yes | Yes | ZANB or ZINB |
GLMMs combine the flexibility of non-normal distributions (GLMs) with the ability to address correlations among observations and nested data structures (LMMs)
Good news
Bad news
these models are on the frontier of statistical research
existing documentation is rather technical
multiple approaches for fitting models; some with different results
Just like GLMs, GLMMs have three components:
Distribution of the data \(f(y; \theta)\)
Link function \(g(\eta)\)
Linear predictor \(\eta\)
For GLMMs, our linear predictor also includes random effects
\[ \eta = \beta_0 + \beta_1 x_1 + \dots + \beta_k x_k + \alpha_0 + \alpha_1 z_1 + \dots + \alpha_l z_l\\ \Downarrow \\ \eta = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\alpha} \]
where the \(\beta_i\) are fixed effects of the covariates \(x_i\)
\[ \begin{aligned} \text{data distribution:} & ~~~ y_{i,j} \sim \text{Binomial}(N_{i,j}, s_{i,j}) \\ \text{link function:} & ~~~ \text{logit}(s_{i,j}) = \text{log}\left(\frac{s_{i,j}}{1-s_{i,j}}\right) = \mu_{i,j} \\ \text{linear model:} & ~~~ \mu_{i,j} = (\beta_0 + \alpha_j) + (\beta_1 + \delta_j) x_{i,j} \\ & ~~~ \alpha_j \sim \text{N}(0, \sigma_{\alpha}^2) \\ & ~~~ \delta_j \sim \text{N}(0, \sigma_{\delta}^2) \end{aligned} \]
Method | Advantages | Disadvantages | R functions |
---|---|---|---|
Penalized quasi-likelihood | Flexible, widely implemented | inference may be inappropriate; potentially biased | MASS::glmmPQL |
Laplace approximation | More accurate than PQL | Slower and less flexible than PQL | lme4::glmer glmmsr::glmm glmmML::glmmML |
Gauss-Hermite quadrature | More accurate than Laplace | Slower than Laplace; limited random effects | lme4::glmer glmmsr::glmm glmmML::glmmML |
Generalized additive models (QERM 514 in a different year)
Occupancy models (SEFS 590)
Capture-Mark-Recapture models (SEFS 590)
Multivariate response models (FISH 560)
Time series models (FISH 507)
Spatio-temporal models (FISH 556)
Bayesian methods (FISH 558, FISH 559)