Display system information and load tidyverse
and
faraway
packages
sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.31 R6_2.5.1 jsonlite_1.8.4 evaluate_0.21
## [5] cachem_1.0.8 rlang_1.1.1 cli_3.6.1 rstudioapi_0.14
## [9] jquerylib_0.1.4 bslib_0.4.2 rmarkdown_2.21 tools_4.2.3
## [13] xfun_0.39 yaml_2.3.7 fastmap_1.1.1 compiler_4.2.3
## [17] htmltools_0.5.5 knitr_1.42 sass_0.4.6
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(faraway)
So far we studied linear mixed models (LMM), where the responses are continuous and normally distributed. It is natural to extend LMM to handle nonnormally distributed responses.
Recall that, in GLM, the distribution of \(Y_i\) is from the exponential family of distributions of form \[ f(y_i \mid \theta_i, \phi) = \exp \left[ \frac{y \theta_i - b(\theta_i)}{a(\phi)} + c(y_i, \phi) \right]. \] If we use the canonical link, then \[ \theta_i = g(\mu_i) = \eta_i, \] where \(\mu_i = \mathbb{E} Y_i\). Now let \[ \eta_i = \mathbf{x}_i^T \boldsymbol{\beta} + \mathbf{z}_i^T \boldsymbol{\gamma}, \] where \(\boldsymbol{\beta}\) is the fixed effects and \(\boldsymbol{\gamma}\) is the random effects with density \(h(\gamma \mid \boldsymbol{\Sigma})\). (Typically we assume multivariate normal with mean 0 and covariance \(\boldsymbol{\Sigma}\).) Then the likelihood is \[ L(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \prod_{i=1}^n \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma} \] and the log-likelihood is \[ \ell(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \sum_{i=1}^n \log \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma}. \] This model, combining GLM and mixed effects model, is called the generalized linear mixed effects model (GLMM).
Estimation and inference of GLMM have been active research. We give an overview of a few commonly used methods.
Maximum likelihood estimate (MLE) is obtained by maximizing the log-likelihood function. However, each iteration of the optimization algorithm needs to evaluate numerical integration, which quickly becomes infeasible when the dimension of random effects \(\boldsymbol{\gamma}\) is large.
Gauss-Hermite quadrature can be used for numerical integration. It approximates the integral by a weighted sum of integrands at different knots. The more knots, the more accurate the approximation, but at higher computational expense.
Laplace approximation is a special case of Gauss-Hermite quadrature with only one knot. It is less accurate than Guass-Hermite quadrature but computationally cheaper.
Bayesian method is another approach for estimation and inference of GLMM. It’s not covered in this course due to time constraint. Interested students can study ELMR Chapter 12.
Penalized quasi-likelihood (PQL). In the IRWLS (iteratively reweighted least squares) procedure for estimating the GLM, each iteration uses the pseudo-response or working response \[ \mathbf{z}^{(t)} = \boldsymbol{\eta}^{(t)} + (\mathbf{W}^{(t)})^{-1} (\mathbf{y} - \widehat{\boldsymbol{\mu}}^{(t)}). \] It can be adapted to the mixed effects model. PQL has computational advantage but generally considered less accurate than other approaches.
Generalized estimation equations (GEE). To be discussed later.
ctsib
data in EMLR studies the balance of 40
individuals.
norm
or foam
), vision
(factor, open
or closed
or
dome
).surface
and vision
. 12 measurements per
subject.ctsib <- as_tibble(ctsib) %>%
mutate(stable = ifelse(CTSIB == 1, 1, 0)) %>%
print(n = 24)
## # A tibble: 480 × 9
## Subject Sex Age Height Weight Surface Vision CTSIB stable
## <int> <fct> <int> <dbl> <dbl> <fct> <fct> <int> <dbl>
## 1 1 male 22 176 68.2 norm open 1 1
## 2 1 male 22 176 68.2 norm open 1 1
## 3 1 male 22 176 68.2 norm closed 2 0
## 4 1 male 22 176 68.2 norm closed 2 0
## 5 1 male 22 176 68.2 norm dome 1 1
## 6 1 male 22 176 68.2 norm dome 2 0
## 7 1 male 22 176 68.2 foam open 2 0
## 8 1 male 22 176 68.2 foam open 2 0
## 9 1 male 22 176 68.2 foam closed 2 0
## 10 1 male 22 176 68.2 foam closed 2 0
## 11 1 male 22 176 68.2 foam dome 2 0
## 12 1 male 22 176 68.2 foam dome 2 0
## 13 2 male 22 181 67.6 norm open 1 1
## 14 2 male 22 181 67.6 norm open 1 1
## 15 2 male 22 181 67.6 norm closed 2 0
## 16 2 male 22 181 67.6 norm closed 2 0
## 17 2 male 22 181 67.6 norm dome 2 0
## 18 2 male 22 181 67.6 norm dome 2 0
## 19 2 male 22 181 67.6 foam open 2 0
## 20 2 male 22 181 67.6 foam open 2 0
## 21 2 male 22 181 67.6 foam closed 3 0
## 22 2 male 22 181 67.6 foam closed 3 0
## 23 2 male 22 181 67.6 foam dome 3 0
## 24 2 male 22 181 67.6 foam dome 3 0
## # ℹ 456 more rows
summary(ctsib)
## Subject Sex Age Height Weight
## Min. : 1.00 female:240 Min. :18.00 Min. :142.0 Min. : 44.20
## 1st Qu.:10.75 male :240 1st Qu.:21.75 1st Qu.:166.8 1st Qu.: 60.65
## Median :20.50 Median :25.50 Median :173.0 Median : 68.00
## Mean :20.50 Mean :26.80 Mean :172.1 Mean : 71.14
## 3rd Qu.:30.25 3rd Qu.:33.00 3rd Qu.:180.2 3rd Qu.: 83.50
## Max. :40.00 Max. :38.00 Max. :190.0 Max. :102.40
## Surface Vision CTSIB stable
## foam:240 closed:160 Min. :1.000 Min. :0.0000
## norm:240 dome :160 1st Qu.:2.000 1st Qu.:0.0000
## open :160 Median :2.000 Median :0.0000
## Mean :1.919 Mean :0.2375
## 3rd Qu.:2.000 3rd Qu.:0.0000
## Max. :4.000 Max. :1.0000
subsum <- ctsib %>%
group_by(Subject) %>%
summarise(Height = Height[1],
Weight = Weight[1],
stable = mean(stable),
Age = Age[1],
Sex = Sex[1])
subsum %>%
ggplot() +
geom_point(mapping = aes(x = Height, y = stable))
subsum %>%
ggplot() +
geom_point(mapping = aes(x = Weight, y = stable))
subsum %>%
ggplot() +
geom_point(mapping = aes(x = Age, y = stable))
subsum %>%
ggplot() +
geom_boxplot(mapping = aes(x = Sex, y = stable))
ctsib %>%
group_by(Subject, Surface) %>%
summarize(stable = mean(stable)) %>%
ggplot() +
geom_boxplot(mapping = aes(x = Surface, y = stable)) +
scale_y_log10()
## `summarise()` has grouped output by 'Subject'. You can override using the
## `.groups` argument.
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 39 rows containing non-finite values (`stat_boxplot()`).
ctsib %>%
group_by(Subject, Vision) %>%
summarize(stable = mean(stable)) %>%
ggplot() +
geom_boxplot(mapping = aes(x = Vision, y = stable)) +
scale_y_log10()
## `summarise()` has grouped output by 'Subject'. You can override using the
## `.groups` argument.
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 63 rows containing non-finite values (`stat_boxplot()`).
gf <- glm(stable ~ Sex + Age + Height + Weight + Surface + Vision,
family = binomial,
data = ctsib)
summary(gf)
##
## Call:
## glm(formula = stable ~ Sex + Age + Height + Weight + Surface +
## Vision, family = binomial, data = ctsib)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.28763 -0.54392 -0.13884 -0.05483 2.48130
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.277449 3.803987 1.913 0.055734 .
## Sexmale 1.401577 0.516231 2.715 0.006627 **
## Age 0.002521 0.024307 0.104 0.917390
## Height -0.096413 0.026837 -3.593 0.000327 ***
## Weight 0.043503 0.018002 2.417 0.015665 *
## Surfacenorm 3.967515 0.447179 8.872 < 2e-16 ***
## Visiondome 0.363753 0.383218 0.949 0.342516
## Visionopen 3.187501 0.416001 7.662 1.83e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 526.25 on 479 degrees of freedom
## Residual deviance: 295.20 on 472 degrees of freedom
## AIC: 311.2
##
## Number of Fisher Scoring iterations: 6
library(lme4)
modlap <-
glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
family = binomial,
data = ctsib)
summary(modlap)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |
## Subject)
## Data: ctsib
##
## AIC BIC logLik deviance df.resid
## 247.6 285.1 -114.8 229.6 471
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.0149 -0.1266 -0.0176 -0.0005 4.8664
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 8.197 2.863
## Number of obs: 480, groups: Subject, 40
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.920823 13.091873 0.758 0.4486
## Sexmale 2.825340 1.746528 1.618 0.1057
## Age -0.003642 0.079834 -0.046 0.9636
## Height -0.151012 0.089829 -1.681 0.0927 .
## Weight 0.058925 0.061583 0.957 0.3387
## Surfacenorm 7.524402 1.164373 6.462 1.03e-10 ***
## Visiondome 0.683933 0.529835 1.291 0.1968
## Visionopen 6.321076 1.074732 5.882 4.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Sexmal Age Height Weight Srfcnr Visndm
## Sexmale 0.476
## Age -0.170 0.097
## Height -0.957 -0.392 0.050
## Weight 0.372 -0.366 -0.172 -0.560
## Surfacenorm 0.010 0.195 -0.010 -0.136 0.059
## Visiondome -0.012 0.029 0.001 -0.026 0.015 0.115
## Visionopen 0.013 0.202 -0.007 -0.135 0.051 0.857 0.362
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.128017 (tol = 0.002, component 1)
## Model is nearly unidentifiable: large eigenvalue ratio
## - Rescale variables?
library(lme4)
modgh <-
glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
nAGQ = 25,
family = binomial,
data = ctsib)
summary(modgh)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: binomial ( logit )
## Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |
## Subject)
## Data: ctsib
##
## AIC BIC logLik deviance df.resid
## 247.9 285.5 -115.0 229.9 471
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.8841 -0.1386 -0.0197 -0.0007 4.9021
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 7.194 2.682
## Number of obs: 480, groups: Subject, 40
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 16.168911 12.719372 1.271 0.2037
## Sexmale 3.096705 1.696024 1.826 0.0679 .
## Age -0.006669 0.076455 -0.087 0.9305
## Height -0.192248 0.088937 -2.162 0.0306 *
## Weight 0.075156 0.059100 1.272 0.2035
## Surfacenorm 7.285350 1.055139 6.905 5.03e-12 ***
## Visiondome 0.675903 0.527363 1.282 0.2000
## Visionopen 6.088819 0.972383 6.262 3.81e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Sexmal Age Height Weight Srfcnr Visndm
## Sexmale 0.509
## Age -0.166 0.094
## Height -0.960 -0.429 0.047
## Weight 0.386 -0.324 -0.169 -0.570
## Surfacenorm 0.166 0.256 -0.013 -0.281 0.147
## Visiondome 0.007 0.034 0.000 -0.044 0.027 0.113
## Visionopen 0.169 0.265 -0.008 -0.280 0.138 0.829 0.382
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.289574 (tol = 0.002, component 1)
## Model is nearly unidentifiable: large eigenvalue ratio
## - Rescale variables?
library(MASS)
modpql <- glmmPQL(stable ~ Sex + Age + Height + Weight + Surface + Vision,
random = ~ 1 | Subject,
family = binomial,
data = ctsib)
## iteration 1
## iteration 2
## iteration 3
## iteration 4
## iteration 5
## iteration 6
## iteration 7
summary(modpql)
## Linear mixed-effects model fit by maximum likelihood
## Data: ctsib
## AIC BIC logLik
## NA NA NA
##
## Random effects:
## Formula: ~1 | Subject
## (Intercept) Residual
## StdDev: 3.060712 0.5906232
##
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: stable ~ Sex + Age + Height + Weight + Surface + Vision
## Value Std.Error DF t-value p-value
## (Intercept) 15.571494 13.498304 437 1.153589 0.2493
## Sexmale 3.355340 1.752614 35 1.914478 0.0638
## Age -0.006638 0.081959 35 -0.080992 0.9359
## Height -0.190819 0.092023 35 -2.073601 0.0455
## Weight 0.069467 0.062857 35 1.105155 0.2766
## Surfacenorm 7.724078 0.573578 437 13.466492 0.0000
## Visiondome 0.726464 0.325933 437 2.228873 0.0263
## Visionopen 6.485257 0.543980 437 11.921876 0.0000
## Correlation:
## (Intr) Sexmal Age Height Weight Srfcnr Visndm
## Sexmale 0.488
## Age -0.164 0.110
## Height -0.963 -0.388 0.041
## Weight 0.368 -0.374 -0.168 -0.555
## Surfacenorm 0.051 0.116 0.023 -0.114 0.055
## Visiondome -0.003 0.011 0.004 -0.017 0.011 0.087
## Visionopen 0.056 0.125 0.026 -0.116 0.049 0.788 0.377
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -7.3825387626 -0.2333403344 -0.0233564299 -0.0004216629 9.9310682722
##
## Number of Observations: 480
## Number of Groups: 40
epilepsy
data in ELMR.
seizures
(number of seizures of each epilepsy
patient).treat
(binary, 1 = treatment or 0 = control
group), expind
(binary, 0 = baseline or 1 = experiment
phase), timeadj
(length of phase).epilepsy <- as_tibble(epilepsy) %>%
mutate(period = rep(0:4, 59),
drug = factor(c("placebo", "treatment"))[treat + 1],
phase = factor(c("baseline", "experiment"))[expind + 1]) %>%
print()
## # A tibble: 295 × 9
## seizures id treat expind timeadj age period drug phase
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <fct> <fct>
## 1 11 1 0 0 8 31 0 placebo baseline
## 2 5 1 0 1 2 31 1 placebo experiment
## 3 3 1 0 1 2 31 2 placebo experiment
## 4 3 1 0 1 2 31 3 placebo experiment
## 5 3 1 0 1 2 31 4 placebo experiment
## 6 11 2 0 0 8 30 0 placebo baseline
## 7 3 2 0 1 2 30 1 placebo experiment
## 8 5 2 0 1 2 30 2 placebo experiment
## 9 3 2 0 1 2 30 3 placebo experiment
## 10 3 2 0 1 2 30 4 placebo experiment
## # ℹ 285 more rows
summary(epilepsy)
## seizures id treat expind timeadj
## Min. : 0.00 Min. : 1 Min. :0.0000 Min. :0.0 Min. :2.0
## 1st Qu.: 3.00 1st Qu.:15 1st Qu.:0.0000 1st Qu.:1.0 1st Qu.:2.0
## Median : 6.00 Median :30 Median :1.0000 Median :1.0 Median :2.0
## Mean : 12.86 Mean :30 Mean :0.5254 Mean :0.8 Mean :3.2
## 3rd Qu.: 14.50 3rd Qu.:45 3rd Qu.:1.0000 3rd Qu.:1.0 3rd Qu.:2.0
## Max. :151.00 Max. :59 Max. :1.0000 Max. :1.0 Max. :8.0
## age period drug phase
## Min. :18.00 Min. :0 placebo :140 baseline : 59
## 1st Qu.:22.00 1st Qu.:1 treatment:155 experiment:236
## Median :28.00 Median :2
## Mean :28.85 Mean :2
## 3rd Qu.:35.00 3rd Qu.:3
## Max. :57.00 Max. :4
epilepsy %>%
group_by(drug, phase) %>%
summarize(rate = mean(seizures / timeadj)) %>%
xtabs(formula = rate ~ phase + drug)
## `summarise()` has grouped output by 'drug'. You can override using the
## `.groups` argument.
## drug
## phase placebo treatment
## baseline 3.848214 3.955645
## experiment 4.303571 3.983871
epilepsy %>%
ggplot() +
geom_line(mapping = aes(x = period, y = seizures, linetype = drug, group = id)) +
xlim(1, 4) +
scale_y_sqrt(breaks = (0:10)^2) +
theme(legend.position = "top", legend.direction = "horizontal")
ratesum <- epilepsy %>%
group_by(id, phase, drug) %>%
summarize(rate = mean(seizures / timeadj))
## `summarise()` has grouped output by 'id', 'phase'. You can override using the
## `.groups` argument.
spread(ratesum, phase, rate) %>%
ggplot() +
geom_point(mapping = aes(x = baseline, y = experiment, shape = drug)) +
scale_x_sqrt() + scale_y_sqrt() +
geom_abline(intercept = 0, slope = 1) +
theme(legend.position = "top", legend.direction = "horizontal")
epilo <- filter(epilepsy, id != 49)
modglm <- glm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat),
family = poisson,
data = epilo)
summary(modglm)
##
## Call:
## glm(formula = seizures ~ offset(log(timeadj)) + expind + treat +
## I(expind * treat), family = poisson, data = epilo)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.4725 -2.3605 -1.0290 0.9001 14.0104
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.34761 0.03406 39.566 < 2e-16 ***
## expind 0.11184 0.04688 2.386 0.017 *
## treat -0.10682 0.04863 -2.197 0.028 *
## I(expind * treat) -0.30238 0.06971 -4.338 1.44e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 2485.1 on 289 degrees of freedom
## Residual deviance: 2411.5 on 286 degrees of freedom
## AIC: 3449.7
##
## Number of Fisher Scoring iterations: 5
library(lme4)
modgh <- glmer(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat) + (1 | id),
nAGQ = 25,
family = poisson,
data = epilo)
summary(modgh)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
## Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
## Family: poisson ( log )
## Formula: seizures ~ offset(log(timeadj)) + expind + treat + I(expind *
## treat) + (1 | id)
## Data: epilo
##
## AIC BIC logLik deviance df.resid
## 877.7 896.1 -433.9 867.7 285
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8724 -0.8482 -0.1722 0.5697 9.8941
##
## Random effects:
## Groups Name Variance Std.Dev.
## id (Intercept) 0.515 0.7176
## Number of obs: 290, groups: id, 58
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.035998 0.141256 7.334 2.23e-13 ***
## expind 0.111838 0.046877 2.386 0.017 *
## treat -0.008152 0.196525 -0.041 0.967
## I(expind * treat) -0.302387 0.069714 -4.338 1.44e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) expind treat
## expind -0.175
## treat -0.718 0.126
## I(xpnd*trt) 0.118 -0.672 -0.173
library(MASS)
modpql <- glmmPQL(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat),
random = ~ 1 | id,
family = poisson,
data = epilo)
## iteration 1
## iteration 2
## iteration 3
## iteration 4
## iteration 5
summary(modpql)
## Linear mixed-effects model fit by maximum likelihood
## Data: epilo
## AIC BIC logLik
## NA NA NA
##
## Random effects:
## Formula: ~1 | id
## (Intercept) Residual
## StdDev: 0.6820012 1.605385
##
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat)
## Value Std.Error DF t-value p-value
## (Intercept) 1.0761832 0.09990514 230 10.772050 0.0000
## expind 0.1125119 0.07412152 230 1.517939 0.1304
## I(expind * treat) -0.3037615 0.10819095 230 -2.807642 0.0054
## Correlation:
## (Intr) expind
## expind -0.198
## I(expind * treat) -0.014 -0.656
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.2934834 -0.5649468 -0.1492931 0.3224895 6.3123337
##
## Number of Observations: 290
## Number of Groups: 58
The quasi-likelihood (quasi-binomial, quasi-Poisson, etc) approach can be generalized to handle correlated, nonnormal responses.
Let \(\mathbf{Y}_i\) be the response vector of \(i\)-th individual. \[\begin{eqnarray*} \mathbb{E} \mathbf{Y}_i &=& \boldsymbol{\mu}_i \\ g(\boldsymbol{\mu}_i) &=& \mathbf{X}_i \boldsymbol{\beta} \\ \mathbf{V}_i &=& \text{Var}(\mathbf{Y}_i) = \phi \mathbf{A}_i^{1/2} \mathbf{R}_i(\boldsymbol{\alpha}) \mathbf{A}_i^{1/2}, \end{eqnarray*}\] where \(\mathbf{A}_i = \text{diag}(a(\boldsymbol{\mu}))\) captures the individual variances and \(\mathbf{R}_i(\boldsymbol{\alpha})\) is a working correlation matrix.
Commonly used working correlation are
compound symmetry, or equicorrelation, or exchangeable correlation: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \cdots & \rho \\ \rho & 1 & & \rho \\ \vdots & & \ddots & \vdots \\ \rho & \rho & \cdots & 1 \end{pmatrix} \]
Autoregressive model: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \rho^2 & \cdots & \rho^{n-1} \\ \rho & 1 & \rho & & \rho^{n-2} \\ \rho^2 & \rho & 1 & & \vdots \\ \vdots & & & \ddots & \\ \rho^{n-1} & \cdots & & \rho & 1 \end{pmatrix} \]
Unstructured correlation matrix
Given estimates of \(\phi\) and \(\boldsymbol{\alpha}\), we solve the estimation equation \[ \sum_i [D_{\boldsymbol{\beta}} \boldsymbol{\mu}_i(\boldsymbol{\beta})]^T \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol{\mu}_i) = \mathbf{0}. \]
Let’s revisit the ctsib
(stability) data set. We use
the exchangeable correlation, or equivalently, compound symmetry.
Standard errors and Wald test are constructed using the sandwich
estimators. \(\boldsymbol{\beta}\)
estimates in GEE are about half the size of those from GLMM. The
scale.fix = TRUE
instructs the function geeglm
not estimate the dispersion parameter \(\phi\), since for binary response the
dispersion is always 1.
library(geepack)
modgeep <- geeglm(stable ~ Sex + Age + Height + Weight + Surface + Vision,
id = Subject,
corstr = "exchangeable",
scale.fix = TRUE,
data = ctsib,
family = binomial(link = "logit"))
summary(modgeep)
##
## Call:
## geeglm(formula = stable ~ Sex + Age + Height + Weight + Surface +
## Vision, family = binomial(link = "logit"), data = ctsib,
## id = Subject, corstr = "exchangeable", scale.fix = TRUE)
##
## Coefficients:
## Estimate Std.err Wald Pr(>|W|)
## (Intercept) 8.62332 5.91992 2.122 0.1452
## Sexmale 1.64488 0.90347 3.315 0.0687 .
## Age -0.01205 0.04802 0.063 0.8019
## Height -0.10211 0.04239 5.801 0.0160 *
## Weight 0.04365 0.03399 1.649 0.1991
## Surfacenorm 3.91632 0.56682 47.738 4.87e-12 ***
## Visiondome 0.35888 0.40403 0.789 0.3744
## Visionopen 3.17990 0.46063 47.657 5.08e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation structure = exchangeable
## Scale is fixed.
##
## Link = identity
##
## Estimated Correlation Parameters:
## Estimate Std.err
## alpha 0.2185 0.04467
## Number of clusters: 40 Maximum cluster size: 12
epilepsy
data set.modgeep <- geeglm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind*treat),
id = id,
family = poisson,
corstr = "ar1",
data = epilepsy,
subset = (id != 49))
summary(modgeep)
##
## Call:
## geeglm(formula = seizures ~ offset(log(timeadj)) + expind + treat +
## I(expind * treat), family = poisson, data = epilepsy, subset = (id !=
## 49), id = id, corstr = "ar1")
##
## Coefficients:
## Estimate Std.err Wald Pr(>|W|)
## (Intercept) 1.3138 0.1616 66.10 4.4e-16 ***
## expind 0.1509 0.1108 1.86 0.173
## treat -0.0797 0.1983 0.16 0.688
## I(expind * treat) -0.3987 0.1745 5.22 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation structure = ar1
## Estimated Scale Parameters:
##
## Estimate Std.err
## (Intercept) 10.6 2.35
## Link = identity
##
## Estimated Correlation Parameters:
## Estimate Std.err
## alpha 0.783 0.0519
## Number of clusters: 58 Maximum cluster size: 5