Generalized Additive Model (ELMR Chapter 15)

Biostat 200C

Author

Dr. Jin Zhou @ UCLA

Published

June 4, 2025

Display system information and load tidyverse and faraway packages

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.1    fastmap_1.2.0     cli_3.6.4        
 [5] tools_4.4.1       htmltools_0.5.8.1 rstudioapi_0.17.1 yaml_2.3.10      
 [9] rmarkdown_2.29    knitr_1.49        jsonlite_1.9.1    xfun_0.51        
[13] digest_0.6.37     rlang_1.1.5       evaluate_1.0.3   
library(tidyverse)
library(faraway)

1 Additional materials about GAM

2 Additive models

  • For \(p\) predictors \(x_1, \ldots, x_p\), nonparametric regression takes the form \[ y = f(x_1, \ldots, x_p) + \epsilon. \] For \(p\) larger than two or three, it becomes impractical to fit such models due to large sample size requirements.

  • A simplified model is the additive model \[ y = \beta_0 + \sum_{j=1}^p f_j(x_j) + \epsilon, \] where \(f_j\) are smooth functions. Interaction terms are ignored. To incorporate categorical predictors, we can use the model \[ y = \beta_0 + \sum_{j=1}^p f_j(x_j) + \mathbf{z}^T \boldsymbol{\gamma} + \epsilon, \] where \(\mathbf{z}\) are categorical predictors.

  • There are several packages in R that can fit additive models: gam, mgcv (included in default installation of R), and gss.

  • gam package uses a backfitting algorithm for fitting the additive models.

    1. Initialize \(\beta_0 = \bar y\), \(f_j(x_j) = \hat \beta_j x_j\) say from the least squares fit.

    2. Cycle throught \(j=1,\ldots,p, 1,\ldots,p,\ldots\) \[ f_j = S(x_j, y - \beta_0 - \sum_{i \ne j} f_i(X_i)), \] where \(S(x,y)\) means the smooth on the data \((x, y)\). User specifies the smoother being used.

    The algorithm is iterated until convergence.

  • mgcv package employs a penalized smoothing spline approach. Suppose \[ f_j(x) = \sum_i \beta_i \phi_i(x) \] for a family of spline basis functions, \(\phi_i\). The roughness penalty \(\int [f_j''(x)]^2 \, dx\) translates to the term \(\boldsymbol{\beta}_j^T \mathbf{S}_j \boldsymbol{\beta}_j\) for a suitable \(\mathbf{S}_j\) that depends on the choice of basis. It then maximizes the penalized log-likelihood \[ \log L(\boldsymbol{\beta}) - \sum_j \lambda_j \boldsymbol{\beta}_j^T \mathbf{S}_j \boldsymbol{\beta}_j, \] where \(\lambda_j\) control the amount of smoothing for each variable. Generalized cross-validation (GCV) is used to select the \(\lambda_j\)s.

3 LA ozone concentration

  • The response is O3 (ozone level). The predictors are temp (temperature at El Monte), ibh (inversion base height at LAX), and ibt (inversion top temperature at LAX).
ozone <- as_tibble(ozone) %>% print()
# A tibble: 330 × 10
      O3    vh  wind humidity  temp   ibh   dpg   ibt   vis   doy
   <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     3  5710     4       28    40  2693   -25    87   250    33
 2     5  5700     3       37    45   590   -24   128   100    34
 3     5  5760     3       51    54  1450    25   139    60    35
 4     6  5720     4       69    35  1568    15   121    60    36
 5     4  5790     6       19    45  2631   -33   123   100    37
 6     4  5790     3       25    55   554   -28   182   250    38
 7     6  5700     3       73    41  2083    23   114   120    39
 8     7  5700     3       59    44  2654    -2    91   120    40
 9     4  5770     8       27    54  5000   -19    92   120    41
10     6  5720     3       44    51   111     9   173   150    42
# ℹ 320 more rows
  • Numerical summary.
summary(ozone)
       O3              vh            wind           humidity    
 Min.   : 1.00   Min.   :5320   Min.   : 0.000   Min.   :19.00  
 1st Qu.: 5.00   1st Qu.:5690   1st Qu.: 3.000   1st Qu.:47.00  
 Median :10.00   Median :5760   Median : 5.000   Median :64.00  
 Mean   :11.78   Mean   :5750   Mean   : 4.848   Mean   :58.13  
 3rd Qu.:17.00   3rd Qu.:5830   3rd Qu.: 6.000   3rd Qu.:73.00  
 Max.   :38.00   Max.   :5950   Max.   :11.000   Max.   :93.00  
      temp            ibh              dpg              ibt       
 Min.   :25.00   Min.   : 111.0   Min.   :-69.00   Min.   :-25.0  
 1st Qu.:51.00   1st Qu.: 877.5   1st Qu.: -9.00   1st Qu.:107.0  
 Median :62.00   Median :2112.5   Median : 24.00   Median :167.5  
 Mean   :61.75   Mean   :2572.9   Mean   : 17.37   Mean   :161.2  
 3rd Qu.:72.00   3rd Qu.:5000.0   3rd Qu.: 44.75   3rd Qu.:214.0  
 Max.   :93.00   Max.   :5000.0   Max.   :107.00   Max.   :332.0  
      vis             doy       
 Min.   :  0.0   Min.   : 33.0  
 1st Qu.: 70.0   1st Qu.:120.2  
 Median :120.0   Median :205.5  
 Mean   :124.5   Mean   :209.4  
 3rd Qu.:150.0   3rd Qu.:301.8  
 Max.   :350.0   Max.   :390.0  
  • Graphical summary.
ozone %>%
  ggplot(mapping = aes(x = temp, y = O3)) + 
  geom_point(size = 1) + 
  geom_smooth()

ozone %>%
  ggplot(mapping = aes(x = ibh, y = O3)) + 
  geom_point(size = 1) + 
  geom_smooth()

ozone %>%
  ggplot(mapping = aes(x = ibt, y = O3)) + 
  geom_point(size = 1) + 
  geom_smooth()

  • As a reference, let’s fit a linear model
olm <- lm(O3 ~ temp + ibh + ibt, data = ozone)
summary(olm)

Call:
lm(formula = O3 ~ temp + ibh + ibt, data = ozone)

Residuals:
     Min       1Q   Median       3Q      Max 
-11.3224  -3.1913  -0.2591   2.9635  13.2860 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -7.7279822  1.6216623  -4.765 2.84e-06 ***
temp         0.3804408  0.0401582   9.474  < 2e-16 ***
ibh         -0.0011862  0.0002567  -4.621 5.52e-06 ***
ibt         -0.0058215  0.0101793  -0.572    0.568    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.748 on 326 degrees of freedom
Multiple R-squared:  0.652, Adjusted R-squared:  0.6488 
F-statistic: 203.6 on 3 and 326 DF,  p-value: < 2.2e-16
  • Additive model using mgcv.
library(mgcv)

ammgcv <- gam(O3 ~ s(temp) + s(ibh) + s(ibt), data = ozone)
summary(ammgcv)

Family: gaussian 
Link function: identity 

Formula:
O3 ~ s(temp) + s(ibh) + s(ibt)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  11.7758     0.2382   49.44   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
          edf Ref.df      F  p-value    
s(temp) 3.386  4.259 20.681  < 2e-16 ***
s(ibh)  4.174  5.076  7.338 1.74e-06 ***
s(ibt)  2.112  2.731  1.400    0.214    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.708   Deviance explained = 71.7%
GCV = 19.346  Scale est. = 18.72     n = 330

Effective degrees of freedom is calculated as the trace of projection matrices. An approximate F test gives the significance of predictors.

  • We can exmine the transformations being used.
plot(ammgcv, residuals = TRUE, select = 1)

plot(ammgcv, residuals = TRUE, select = 2)

plot(ammgcv, residuals = TRUE, select = 3)

  • ibt is not significant after adjusting for the effects of temp and ibh, so we drop it in following analysis.

  • We can test the significance of the nonlinearity of a predictor by F test.

am1 <- gam(O3 ~ s(temp) + s(ibh), data = ozone)
am2 <- gam(O3 ~    temp + s(ibh), data = ozone)
anova(am2, am1, test = "F")
Analysis of Deviance Table

Model 1: O3 ~ temp + s(ibh)
Model 2: O3 ~ s(temp) + s(ibh)
  Resid. Df Resid. Dev     Df Deviance      F    Pr(>F)    
1    322.74       6950                                     
2    319.11       6054 3.6237   895.98 13.109 3.146e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • We can include functions of two variables with mgcv to incorporate interactions. In the isotropic case (same scale), we can impose same smoothing on both variables. In the anisotropic case (different scales), we use tensor product to allow different smoothings.
# te(x1, x2) play the role x1 * x2 in regular lm/glm
amint <- gam(O3 ~ te(temp, ibh), data = ozone)
summary(amint)

Family: gaussian 
Link function: identity 

Formula:
O3 ~ te(temp, ibh)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  11.7758     0.2385   49.38   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
               edf Ref.df     F p-value    
te(temp,ibh) 10.45  13.02 61.39  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.708   Deviance explained = 71.7%
GCV = 19.439  Scale est. = 18.764    n = 330
  • We can test the significance of the interactions using F test.
anova(am1, amint, test = "F")
Analysis of Deviance Table

Model 1: O3 ~ s(temp) + s(ibh)
Model 2: O3 ~ te(temp, ibh)
  Resid. Df Resid. Dev     Df Deviance      F Pr(>F)
1    319.11     6054.0                              
2    315.98     5977.3 3.1312   76.643 1.3044 0.2725
  • We can visualize the interactions
plot(amint, select = 1)

vis.gam(amint, theta = -45, color = "gray")

  • GAM can be used for prediction.
predict(am1, data.frame(temp = 60, ibh = 2000, ibt = 100), se = T)
$fit
       1 
10.60479 

$se.fit
        1 
0.7106163 
# extrapolation
predict(am1, data.frame(temp = 120, ibh = 2000, ibt = 100), se = T)
$fit
       1 
39.11324 

$se.fit
       1 
5.256285 

3.1 Generalized additive model

  • Combining GLM and additive model yields the generalized additive models (GAMs). The systematic component (linear predictor) becomes \[ \eta = \beta_0 + \sum_{j=1}^p f_j(X_j). \]

  • For example, we an fit the ozone data using a Poisson additive model. Setting scale = -1 tells the function to fit overdispersion parameter as well.

gammgcv <- gam(O3 ~ s(temp) + s(ibh) + s(ibt), family = poisson, scale = -1, data = ozone)
summary(gammgcv)

Family: poisson 
Link function: log 

Formula:
O3 ~ s(temp) + s(ibh) + s(ibt)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.29269    0.02305   99.45   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
          edf Ref.df      F p-value    
s(temp) 3.816  4.736 16.803  <2e-16 ***
s(ibh)  3.737  4.568 10.594  <2e-16 ***
s(ibt)  1.348  1.623  0.238   0.651    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.712   Deviance explained = 72.9%
GCV = 1.5062  Scale est. = 1.4585    n = 330
plot(gammgcv, residuals = TRUE, select = 1)

plot(gammgcv, residuals = TRUE, select = 2)

plot(gammgcv, residuals = TRUE, select = 3)

4 Generalized additive mixed model (GAMM)

  • GAMM combine the three major themes in this class.

  • Let’s re-analyze the eplepsy data from the GLMM chapter.

egamm <- gamm(seizures ~ offset(timeadj) + treat * expind + s(age), 
              family   = poisson, 
              random   = list(id = ~1), 
              data     = epilepsy,  
              subset   = (id != 49))

 Maximum number of PQL iterations:  20 
summary(egamm$gam)

Family: poisson 
Link function: log 

Formula:
seizures ~ offset(timeadj) + treat * expind + s(age)

Parametric coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -4.871650   0.140365 -34.707  < 2e-16 ***
treat        -0.007726   0.195377  -0.040    0.968    
expind        4.725542   0.047200 100.117  < 2e-16 ***
treat:expind -0.302384   0.070194  -4.308 2.27e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
       edf Ref.df     F p-value
s(age)   1      1 0.339   0.561

R-sq.(adj) =  0.324   
  Scale est. = 1         n = 290

5 Multivariate adaptive regression splines (MARS) (not covered in this course)

  • MARS fits data by model \[ \hat f(x) = \sum_{j=1}^k c_j B_j(x), \] where the basis functions \(B_j(x)\) are formed from products of terms \([\pm (x_i - t)]_+^q\).

  • By default, only additive (first-order) predictors are allowed.

library(earth)

mmod <- earth(O3 ~ ., data = ozone)
summary(mmod)
Call: earth(formula=O3~., data=ozone)

               coefficients
(Intercept)      14.1595171
h(5860-vh)       -0.0137728
h(wind-3)        -0.3377222
h(54-humidity)   -0.1349547
h(temp-58)        0.2791320
h(1105-ibh)      -0.0033837
h(dpg-10)        -0.0991581
h(ibt-120)        0.0326330
h(150-vis)        0.0231881
h(96-doy)        -0.1105145
h(doy-96)         0.0406468
h(doy-158)       -0.0836732

Selected 12 of 20 terms, and 9 of 9 predictors
Termination condition: Reached nk 21
Importance: temp, humidity, dpg, doy, vh, ibh, vis, ibt, wind
Number of terms at each degree of interaction: 1 11 (additive model)
GCV 14.61004    RSS 4172.671    GRSq 0.7730502    RSq 0.8023874
  • We can restrict the model size
mmod <- earth(O3 ~ ., ozone, nk = 7)
summary(mmod)
Call: earth(formula=O3~., data=ozone, nk=7)

            coefficients
(Intercept)   12.3746135
h(58-temp)    -0.1084870
h(temp-58)     0.4962117
h(ibh-1105)   -0.0012172
h(96-doy)     -0.0988262
h(doy-96)     -0.0130345

Selected 6 of 7 terms, and 3 of 9 predictors
Termination condition: Reached nk 7
Importance: temp, ibh, doy, vh-unused, wind-unused, humidity-unused, ...
Number of terms at each degree of interaction: 1 5 (additive model)
GCV 18.3112    RSS 5646.565    GRSq 0.715557    RSq 0.7325855
  • We can also include second-order (two-way) interaction term. Compare with the additive model approach. There are 9 predictors with 36 possible two-way interaction terms, which is complex to estimate and interpret.
mmod <- earth(O3 ~ ., ozone, nk = 7, degree = 2)
summary(mmod)
Call: earth(formula=O3~., data=ozone, degree=2, nk=7)

                            coefficients
(Intercept)                   10.0034999
h(58-temp)                    -0.1065473
h(temp-58)                     0.4562717
h(ibh-1105)                   -0.0011576
h(55-humidity) * h(temp-58)   -0.0130712
h(humidity-55) * h(temp-58)    0.0059811

Selected 6 of 7 terms, and 3 of 9 predictors
Termination condition: Reached nk 7
Importance: temp, ibh, humidity, vh-unused, wind-unused, dpg-unused, ...
Number of terms at each degree of interaction: 1 3 2
GCV 18.38791    RSS 5581.691    GRSq 0.7143654    RSq 0.7356579
plotmo(mmod)
 plotmo grid:    vh wind humidity temp    ibh dpg   ibt vis   doy
               5760    5       64   62 2112.5  24 167.5 120 205.5