Intro to Generalized Additive Models

1 June 2026

Acknowledgments

Gavin Simpson and colleagues have produced a lot of useful teaching material on GAMs that I’ve made use of here. For example:

Gavin’s blog

Webinar 1

Webinar 2

Goals for today

Understand what a smoother is and understand the motivation for using a smoother in a model
Learn how to fit some basic GAMs with splines

Assumptions of linear models

Linear relationship between predictor and response

Observations are a random sample from the population

The predictor(s) are known without measurement error

If 2+ predictors, they are independent of each other

Errors are IID: \(\epsilon_i \sim \text{N} (0, \sigma); ~ \text{Cov}(\epsilon_i, \epsilon_{j \neq i}) = 0\)

Non-linear relationships?

For Gaussian LMs we assume a linear relationship between predictor and response

For non-Gaussian GLMs, we assume a linear relationship between link(predictor) and response

What if we think the relationship is non-linear?

Non-linear data

Sea surface temperature anomalies from Hadley Centre (UK Met Office)

Non-linear relationships

A degree-10 (decic) polynomial fit to the data

Generalized additive models

We seek a model that allows for some smooth and non-linear relationship between year and temperature

Generalized additive models

We seek a model that allows for some smooth and non-linear relationship between year and temperature

A generalized additive model is an extension of the GLM with a predictor that is a sum of smoothed (and possibly unsmoothed) terms:

\[ y_i = \beta_0 + \sum_j s_j(x_{j,i}) + \sum_k \beta_k x_{k,i} + \epsilon_i \]

Type of GAMs

LOESS (LOcally Estimated Scatterplot Smoothers)

Fit many local (weighted) regressions within a sliding window

LOESS fitting is available in the mgcv::gam() function

Type of GAMs

LOESS (LOcally Estimated Scatterplot Smoothers)

Pros

Pretty simple and easy to understand

Type of GAMs

LOESS (LOcally Estimated Scatterplot Smoothers)

Pros

Pretty simple and easy to understand

Cons

Size of the window (and hence the degree of smoothness) matters, but there isn’t much theory to help you choose

If we make the window big, we’ll essentially get a linear regression, if we make it small, we will overfit (or “under-smooth”)

LOESS isn’t very reliable near the ends of the data

Type of GAMs

Splines

Much more flexible and adaptable than LOESS smoothing

Can be applied to lots of non-Gaussian error distributions & with random effects

The degree of smoothness can be estimated as part of model fitting via restricted maximum likelihood or generalized cross validation

What are splines?

Splines are basis expansions

For example, a polynomial is one type of basis expansion:

\[ \begin{aligned} x^0 &= 1\\ x^1 &= x\\ x^2 &= x^2\\ x^3 &= x^3\\ &~~\vdots \end{aligned} \]

What are splines?

In a basis expansion

We call each of the simpler functions a basis function
The set of simpler functions, \(b_k\), is called a basis

What are splines?

When we model using splines, each of the \(b_k\) has a coefficient \(\beta_k\)

The spline is the sum of these functions (weighted by \(\beta_k\) and evaluated at \(x\)):

\[ s(x)=\sum_k \beta_k b_k(x) \]

Types of splines

One of the easiest to think about is a cubic regression spline

\(X\) is divided into intervals (defined by the number of “knots”)

Types of splines

One of the easiest to think about is a cubic regression spline

\(X\) is divided into intervals (defined by the number of “knots”)
In each interval, we fit a cubic polynomial

\[ y_i = \beta_0 + \beta_1 x_i + \beta_2 x_{i}^2 + \beta_3 x_{i}^3 \]

The fitted values per interval are stuck together to form the curve

The intervals connect at the knots

To obtain a smooth connection at the knots, certain conditions are imposed

How do we go from a basis to a model fit?

We want to allow for a function that is “wiggly”, but not too wiggly

A function that is too wiggly would indicate that we are overfitting (or undersmoothing)

We seek a model that explains the general underlying process while not overfitting the nuances of an individual dataset

We need just the right level of wiggliness, but how do we measure wiggliness?

Measuring wiggliness

First, consider the squared second derivative:

\[ \int_{\mathbb{R}}[f'']^2dx \]

This is the rate of change in the slope - more wiggliness will result in a higher rate of change in the slope

We estimate this with:

\[ \int_{\mathbb{R}}[f'']^2dx=\beta^TS\beta \]

where \(S\) is a penalty matrix, derived from our weighted basis functions

Measuring wiggliness

We now use this to penalize our likelihood function

\[ log(\mathcal{L_p{|\beta}})=log(\mathcal{L{|\beta}}) - \lambda\beta^TS\beta \]

where \(\lambda\) is a smoothness parameter, indicating how much we penalize wiggliness

We will maximize this penalized likelihood

Penalty on lambda affects wiggliness

More on wiggliness

We set \(k\) as the maximum wiggliness (this determines the number of knots)

but what should we choose for the smoothness parameter \(\lambda\)?

Choosing \(\lambda\)

method = "REML" uses a relationship between random effects and smoothers – we can think of \(\lambda\) as a prior on the spline coefficients

We can also use generalized cross-validation

There is reason to think that “REML” works best (Wood 2011)

What kind of splines?

thin plate spline s(X, bs = 'tp') [default in mgcv::gam()]

one knot for every unique value of \(x\)

What kind of splines?

thin plate spline s(X, bs = 'tp') [default in mgcv::gam()]

one knot for every unique value of \(x\)

cubic regression spline s(X, bs = 'cr')

widely used, especially useful for big datasets

What kind of splines?

thin plate spline s(X, bs = 'tp') [default in mgcv::gam()]

one knot for every unique value of \(x\)

cubic regression spline s(X, bs = 'cr')

widely used, especially useful for big datasets

cyclic spline s(X, bs = 'cc')

join the ends of the spline

What kind of splines?

thin plate spline s(X, bs = 'tp') [default in mgcv::gam()]

one knot for every unique value of \(x\)

cubic regression spline s(X, bs = 'cr')

widely used, especially useful for big datasets

cyclic spline s(X, bs = 'cc')

join the ends of the spline

splines on a sphere s(X, bs = 'sos')

many others in mgcv::gam() for special situations

Fitting a GAM

Let’s fit a GAM to the global temperature data using splines

ex_gam_fit <- mgcv::gam(temp ~ s(year), data = sst, method = 'REML')

Summary of GAM fit

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## temp ~ s(year)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.056439   0.006859  -8.229 5.36e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value    
## s(year) 8.515  8.933 248.8  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.927   Deviance explained = 93.1%
## -REML = -148.15  Scale est. = 0.0082328  n = 175

GAM fit to the data

Another example

Chinook salmon survival from homework #4

Fit a GAM

## fit GAM with spline for temp
ex_gam_fit <- gam(sar ~ s(temp),  
              data = chinook, method = "REML")

Model fit and data

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## sar ~ s(temp)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.5605     0.1291   27.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df     F p-value  
## s(temp) 2.94   3.63 3.387  0.0304 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.273   Deviance explained = 34.4%
## -REML = 36.917  Scale est. = 0.51672   n = 31

Model fit and data

GLM with quadratic term

ex_glm_fit <- glm(cbind(adults, smolts - adults) ~ temp + I(temp^2),
                  family = binomial(link = "logit"),
                  data = chinook)

Quadratic model fit and data

Next time

Model checking
More on model selection
The flexibility of GAMs