Introduction to Generalized Additive Models with R and mgcv Git, Youtube
2 Additive models
For \(p\) predictors \(x_1, \ldots, x_p\), nonparametric regression takes the form \[
y = f(x_1, \ldots, x_p) + \epsilon.
\] For \(p\) larger than two or three, it becomes impractical to fit such models due to large sample size requirements.
A simplified model is the additive model\[
y = \beta_0 + \sum_{j=1}^p f_j(x_j) + \epsilon,
\] where \(f_j\) are smooth functions. Interaction terms are ignored. To incorporate categorical predictors, we can use the model \[
y = \beta_0 + \sum_{j=1}^p f_j(x_j) + \mathbf{z}^T \boldsymbol{\gamma} + \epsilon,
\] where \(\mathbf{z}\) are categorical predictors.
There are several packages in R that can fit additive models: gam, mgcv (included in default installation of R), and gss.
gam package uses a backfitting algorithm for fitting the additive models.
Initialize \(\beta_0 = \bar y\), \(f_j(x_j) = \hat \beta_j x_j\) say from the least squares fit.
Cycle throught \(j=1,\ldots,p, 1,\ldots,p,\ldots\)\[
f_j = S(x_j, y - \beta_0 - \sum_{i \ne j} f_i(X_i)),
\] where \(S(x,y)\) means the smooth on the data \((x, y)\). User specifies the smoother being used.
The algorithm is iterated until convergence.
mgcv package employs a penalized smoothing spline approach. Suppose \[
f_j(x) = \sum_i \beta_i \phi_i(x)
\] for a family of spline basis functions, \(\phi_i\). The roughness penalty \(\int [f_j''(x)]^2 \, dx\) translates to the term \(\boldsymbol{\beta}_j^T \mathbf{S}_j \boldsymbol{\beta}_j\) for a suitable \(\mathbf{S}_j\) that depends on the choice of basis. It then maximizes the penalized log-likelihood \[
\log L(\boldsymbol{\beta}) - \sum_j \lambda_j \boldsymbol{\beta}_j^T \mathbf{S}_j \boldsymbol{\beta}_j,
\] where \(\lambda_j\) control the amount of smoothing for each variable. Generalized cross-validation (GCV) is used to select the \(\lambda_j\)s.
3 LA ozone concentration
The response is O3 (ozone level). The predictors are temp (temperature at El Monte), ibh (inversion base height at LAX), and ibt (inversion top temperature at LAX).
O3 vh wind humidity
Min. : 1.00 Min. :5320 Min. : 0.000 Min. :19.00
1st Qu.: 5.00 1st Qu.:5690 1st Qu.: 3.000 1st Qu.:47.00
Median :10.00 Median :5760 Median : 5.000 Median :64.00
Mean :11.78 Mean :5750 Mean : 4.848 Mean :58.13
3rd Qu.:17.00 3rd Qu.:5830 3rd Qu.: 6.000 3rd Qu.:73.00
Max. :38.00 Max. :5950 Max. :11.000 Max. :93.00
temp ibh dpg ibt
Min. :25.00 Min. : 111.0 Min. :-69.00 Min. :-25.0
1st Qu.:51.00 1st Qu.: 877.5 1st Qu.: -9.00 1st Qu.:107.0
Median :62.00 Median :2112.5 Median : 24.00 Median :167.5
Mean :61.75 Mean :2572.9 Mean : 17.37 Mean :161.2
3rd Qu.:72.00 3rd Qu.:5000.0 3rd Qu.: 44.75 3rd Qu.:214.0
Max. :93.00 Max. :5000.0 Max. :107.00 Max. :332.0
vis doy
Min. : 0.0 Min. : 33.0
1st Qu.: 70.0 1st Qu.:120.2
Median :120.0 Median :205.5
Mean :124.5 Mean :209.4
3rd Qu.:150.0 3rd Qu.:301.8
Max. :350.0 Max. :390.0
We can include functions of two variables with mgcv to incorporate interactions. In the isotropic case (same scale), we can impose same smoothing on both variables. In the anisotropic case (different scales), we use tensor product to allow different smoothings.
# te(x1, x2) play the role x1 * x2 in regular lm/glmamint <-gam(O3 ~te(temp, ibh), data = ozone)summary(amint)
5 Multivariate adaptive regression splines (MARS) (not covered in this course)
MARS fits data by model \[
\hat f(x) = \sum_{j=1}^k c_j B_j(x),
\] where the basis functions \(B_j(x)\) are formed from products of terms \([\pm (x_i - t)]_+^q\).
By default, only additive (first-order) predictors are allowed.
library(earth)mmod <-earth(O3 ~ ., data = ozone)summary(mmod)
Call: earth(formula=O3~., data=ozone)
coefficients
(Intercept) 14.1595171
h(5860-vh) -0.0137728
h(wind-3) -0.3377222
h(54-humidity) -0.1349547
h(temp-58) 0.2791320
h(1105-ibh) -0.0033837
h(dpg-10) -0.0991581
h(ibt-120) 0.0326330
h(150-vis) 0.0231881
h(96-doy) -0.1105145
h(doy-96) 0.0406468
h(doy-158) -0.0836732
Selected 12 of 20 terms, and 9 of 9 predictors
Termination condition: Reached nk 21
Importance: temp, humidity, dpg, doy, vh, ibh, vis, ibt, wind
Number of terms at each degree of interaction: 1 11 (additive model)
GCV 14.61004 RSS 4172.671 GRSq 0.7730502 RSq 0.8023874
We can restrict the model size
mmod <-earth(O3 ~ ., ozone, nk =7)summary(mmod)
Call: earth(formula=O3~., data=ozone, nk=7)
coefficients
(Intercept) 12.3746135
h(58-temp) -0.1084870
h(temp-58) 0.4962117
h(ibh-1105) -0.0012172
h(96-doy) -0.0988262
h(doy-96) -0.0130345
Selected 6 of 7 terms, and 3 of 9 predictors
Termination condition: Reached nk 7
Importance: temp, ibh, doy, vh-unused, wind-unused, humidity-unused, ...
Number of terms at each degree of interaction: 1 5 (additive model)
GCV 18.3112 RSS 5646.565 GRSq 0.715557 RSq 0.7325855
We can also include second-order (two-way) interaction term. Compare with the additive model approach. There are 9 predictors with 36 possible two-way interaction terms, which is complex to estimate and interpret.
mmod <-earth(O3 ~ ., ozone, nk =7, degree =2)summary(mmod)
Call: earth(formula=O3~., data=ozone, degree=2, nk=7)
coefficients
(Intercept) 10.0034999
h(58-temp) -0.1065473
h(temp-58) 0.4562717
h(ibh-1105) -0.0011576
h(55-humidity) * h(temp-58) -0.0130712
h(humidity-55) * h(temp-58) 0.0059811
Selected 6 of 7 terms, and 3 of 9 predictors
Termination condition: Reached nk 7
Importance: temp, ibh, humidity, vh-unused, wind-unused, dpg-unused, ...
Number of terms at each degree of interaction: 1 3 2
GCV 18.38791 RSS 5581.691 GRSq 0.7143654 RSq 0.7356579