Mixed Effects Models for Nonnormal Responses (ELMR Chapter 13)

Biostat 200C

Author

Dr. Jin Zhou @ UCLA

Published

May 27, 2025

Display system information and load tidyverse and faraway packages

sessionInfo()

R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] geepack_1.3.12  MASS_7.3-65     lme4_1.1-37     Matrix_1.7-2   
 [5] faraway_1.0.9   lubridate_1.9.4 forcats_1.0.0   stringr_1.5.1  
 [9] dplyr_1.1.4     purrr_1.0.4     readr_2.1.5     tidyr_1.3.1    
[13] tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] generics_0.1.3    stringi_1.8.7     lattice_0.22-6    hms_1.1.3        
 [5] digest_0.6.37     magrittr_2.0.3    evaluate_1.0.3    grid_4.4.1       
 [9] timechange_0.3.0  fastmap_1.2.0     jsonlite_1.9.1    backports_1.5.0  
[13] scales_1.3.0      Rdpack_2.6.3      reformulas_0.4.0  cli_3.6.4        
[17] rlang_1.1.5       rbibutils_2.3     munsell_0.5.1     splines_4.4.1    
[21] withr_3.0.2       yaml_2.3.10       tools_4.4.1       tzdb_0.4.0       
[25] nloptr_2.2.1      minqa_1.2.8       colorspace_2.1-1  boot_1.3-31      
[29] broom_1.0.8       vctrs_0.6.5       R6_2.6.1          lifecycle_1.0.4  
[33] htmlwidgets_1.6.4 pkgconfig_2.0.3   pillar_1.10.1     gtable_0.3.6     
[37] glue_1.8.0        Rcpp_1.0.14       xfun_0.51         tidyselect_1.2.1 
[41] rstudioapi_0.17.1 knitr_1.49        htmltools_0.5.8.1 nlme_3.1-167     
[45] rmarkdown_2.29    compiler_4.4.1

library(tidyverse)
library(faraway)

1 Generalized linear mixed models (GLMM)

So far we studied linear mixed models (LMM), where the responses are continuous and normally distributed. It is natural to extend LMM to handle nonnormally distributed responses.
Recall that, in GLM, the distribution of \(Y_i\) is from the exponential family of distributions of form \[ f(y_i \mid \theta_i, \phi) = \exp \left[ \frac{y \theta_i - b(\theta_i)}{a(\phi)} + c(y_i, \phi) \right]. \] If we use the canonical link, then \[ \theta_i = g(\mu_i) = \eta_i, \] where \(\mu_i = \mathbb{E} Y_i\). Now let \[ \eta_i = \mathbf{x}_i^T \boldsymbol{\beta} + \mathbf{z}_i^T \boldsymbol{\gamma}_i, \] where \(\boldsymbol{\beta}\) is the fixed effects and \(\boldsymbol{\gamma}\) is the random effects with density \(h(\gamma \mid \boldsymbol{\Sigma})\). (Typically we assume multivariate normal with mean 0 and covariance \(\boldsymbol{\Sigma}\).) Then the likelihood is \[ L(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \prod_{i=1}^n \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma} \] and the log-likelihood is \[ \ell(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \sum_{i=1}^n \log \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma}. \] This model, combining GLM and mixed effects model, is called the generalized linear mixed effects model (GLMM).

2 Overview of estimation and inference methods

Estimation and inference of GLMM have been active research. We give an overview of a few commonly used methods.
Maximum likelihood estimate (MLE) is obtained by maximizing the log-likelihood function. However, each iteration of the optimization algorithm needs to evaluate numerical integration, which quickly becomes infeasible when the dimension of random effects \(\boldsymbol{\gamma}\) is large.
- Gauss-Hermite quadrature can be used for numerical integration. It approximates the integral by a weighted sum of integrands at different knots. The more knots, the more accurate the approximation, but at higher computational expense.
- Laplace approximation is a special case of Gauss-Hermite quadrature with only one knot. It is less accurate than Guass-Hermite quadrature but computationally cheaper.
Bayesian method is another approach for estimation and inference of GLMM. It’s not covered in this course due to time constraint. Interested students can study ELMR Chapter 12.
Penalized quasi-likelihood (PQL). In the IRWLS (iteratively reweighted least squares) procedure for estimating the GLM, each iteration uses the pseudo-response or working response \[ \mathbf{z}^{(t)} = \boldsymbol{\eta}^{(t)} + (\mathbf{W}^{(t)})^{-1} (\mathbf{y} - \widehat{\boldsymbol{\mu}}^{(t)}). \] It can be adapted to the mixed effects model. PQL has computational advantage but generally considered less accurate than other approaches.
Generalized estimation equations (GEE). To be discussed later.

3 Binary response example

ctsib data in EMLR studies the balance of 40 individuals.
- Response: stability (1 = stable, 0 = instable).
- Predictors: sex (binary), age (continuous), height (continuous), surface (binary, norm or foam), vision (factor, open or closed or dome).
- Each subject is tested twice on a combination of surface and vision. 12 measurements per subject.

ctsib <- as_tibble(ctsib) %>%
  mutate(stable = ifelse(CTSIB == 1, 1, 0)) %>%
  print(n = 24)

# A tibble: 480 × 9
   Subject Sex     Age Height Weight Surface Vision CTSIB stable
     <int> <fct> <int>  <dbl>  <dbl> <fct>   <fct>  <int>  <dbl>
 1       1 male     22    176   68.2 norm    open       1      1
 2       1 male     22    176   68.2 norm    open       1      1
 3       1 male     22    176   68.2 norm    closed     2      0
 4       1 male     22    176   68.2 norm    closed     2      0
 5       1 male     22    176   68.2 norm    dome       1      1
 6       1 male     22    176   68.2 norm    dome       2      0
 7       1 male     22    176   68.2 foam    open       2      0
 8       1 male     22    176   68.2 foam    open       2      0
 9       1 male     22    176   68.2 foam    closed     2      0
10       1 male     22    176   68.2 foam    closed     2      0
11       1 male     22    176   68.2 foam    dome       2      0
12       1 male     22    176   68.2 foam    dome       2      0
13       2 male     22    181   67.6 norm    open       1      1
14       2 male     22    181   67.6 norm    open       1      1
15       2 male     22    181   67.6 norm    closed     2      0
16       2 male     22    181   67.6 norm    closed     2      0
17       2 male     22    181   67.6 norm    dome       2      0
18       2 male     22    181   67.6 norm    dome       2      0
19       2 male     22    181   67.6 foam    open       2      0
20       2 male     22    181   67.6 foam    open       2      0
21       2 male     22    181   67.6 foam    closed     3      0
22       2 male     22    181   67.6 foam    closed     3      0
23       2 male     22    181   67.6 foam    dome       3      0
24       2 male     22    181   67.6 foam    dome       3      0
# ℹ 456 more rows

summary(ctsib)

    Subject          Sex           Age            Height          Weight      
 Min.   : 1.00   female:240   Min.   :18.00   Min.   :142.0   Min.   : 44.20  
 1st Qu.:10.75   male  :240   1st Qu.:21.75   1st Qu.:166.8   1st Qu.: 60.65  
 Median :20.50                Median :25.50   Median :173.0   Median : 68.00  
 Mean   :20.50                Mean   :26.80   Mean   :172.1   Mean   : 71.14  
 3rd Qu.:30.25                3rd Qu.:33.00   3rd Qu.:180.2   3rd Qu.: 83.50  
 Max.   :40.00                Max.   :38.00   Max.   :190.0   Max.   :102.40  
 Surface       Vision        CTSIB           stable      
 foam:240   closed:160   Min.   :1.000   Min.   :0.0000  
 norm:240   dome  :160   1st Qu.:2.000   1st Qu.:0.0000  
            open  :160   Median :2.000   Median :0.0000  
                         Mean   :1.919   Mean   :0.2375  
                         3rd Qu.:2.000   3rd Qu.:0.0000  
                         Max.   :4.000   Max.   :1.0000

Graphical summary.

subsum <- ctsib %>% 
  group_by(Subject) %>%
  summarise(Height = Height[1], 
            Weight = Weight[1], 
            stable = mean(stable), 
            Age    = Age[1], 
            Sex    = Sex[1])
subsum %>%  
  ggplot() + 
  geom_point(mapping = aes(x = Height, y = stable))

subsum %>%  
  ggplot() + 
  geom_point(mapping = aes(x = Weight, y = stable))

subsum %>%  
  ggplot() + 
  geom_point(mapping = aes(x = Age, y = stable))

subsum %>%  
  ggplot() + 
  geom_boxplot(mapping = aes(x = Sex, y = stable))

ctsib %>%
  group_by(Subject, Surface) %>%
  summarize(stable = mean(stable)) %>%
  ggplot() + 
  geom_boxplot(mapping = aes(x = Surface, y = stable)) + 
  scale_y_log10()

Warning in scale_y_log10(): log-10 transformation introduced infinite values.

Warning: Removed 39 rows containing non-finite outside the scale range
(`stat_boxplot()`).

ctsib %>%
  group_by(Subject, Vision) %>%
  summarize(stable = mean(stable)) %>%
  ggplot() + 
  geom_boxplot(mapping = aes(x = Vision, y = stable)) + 
  scale_y_log10()

Warning in scale_y_log10(): log-10 transformation introduced infinite values.

Warning: Removed 63 rows containing non-finite outside the scale range
(`stat_boxplot()`).

GLM analysis. What are the issues with the GLM here?

gf <- glm(stable ~ Sex + Age + Height + Weight + Surface + Vision, 
          family = binomial,
          data   = ctsib)
summary(gf)


Call:
glm(formula = stable ~ Sex + Age + Height + Weight + Surface + 
    Vision, family = binomial, data = ctsib)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.277449   3.803987   1.913 0.055734 .  
Sexmale      1.401577   0.516231   2.715 0.006627 ** 
Age          0.002521   0.024307   0.104 0.917390    
Height      -0.096413   0.026837  -3.593 0.000327 ***
Weight       0.043503   0.018002   2.417 0.015665 *  
Surfacenorm  3.967515   0.447179   8.872  < 2e-16 ***
Visiondome   0.363753   0.383218   0.949 0.342516    
Visionopen   3.187501   0.416001   7.662 1.83e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 526.25  on 479  degrees of freedom
Residual deviance: 295.20  on 472  degrees of freedom
AIC: 311.2

Number of Fisher Scoring iterations: 6

GLMM analysis. We treat subjects as random. First, estimation by Laplace approximation.

library(lme4)

modlap <- 
  glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
        family = binomial,
        data   = ctsib)
summary(modlap)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |  
    Subject)
   Data: ctsib

      AIC       BIC    logLik -2*log(L)  df.resid 
    247.6     285.1    -114.8     229.6       471 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-5.0149 -0.1266 -0.0176 -0.0005  4.8664 

Random effects:
 Groups  Name        Variance Std.Dev.
 Subject (Intercept) 8.197    2.863   
Number of obs: 480, groups:  Subject, 40

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  9.920823  13.107103   0.757   0.4491    
Sexmale      2.825340   1.746985   1.617   0.1058    
Age         -0.003642   0.079837  -0.046   0.9636    
Height      -0.151012   0.089924  -1.679   0.0931 .  
Weight       0.058925   0.061593   0.957   0.3387    
Surfacenorm  7.524401   1.164373   6.462 1.03e-10 ***
Visiondome   0.683933   0.529837   1.291   0.1968    
Visionopen   6.321076   1.074736   5.882 4.07e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) Sexmal Age    Height Weight Srfcnr Visndm
Sexmale      0.476                                          
Age         -0.170  0.096                                   
Height      -0.957 -0.392  0.050                            
Weight       0.373 -0.366 -0.172 -0.561                     
Surfacenorm  0.010  0.195 -0.010 -0.135  0.059              
Visiondome  -0.012  0.029  0.001 -0.026  0.015  0.115       
Visionopen   0.013  0.202 -0.007 -0.134  0.051  0.857  0.362
optimizer (Nelder_Mead) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.128162 (tol = 0.002, component 1)
Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

GLMM estimation by Gauss-Hermite quadrature with 25 knots.

library(lme4)

modgh <- 
  glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
        nAGQ   = 25,
        family = binomial,
        data   = ctsib)
summary(modgh)

Generalized linear mixed model fit by maximum likelihood (Adaptive
  Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
 Family: binomial  ( logit )
Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |  
    Subject)
   Data: ctsib

      AIC       BIC    logLik -2*log(L)  df.resid 
    247.9     285.5    -115.0     229.9       471 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.8842 -0.1386 -0.0197 -0.0007  4.9019 

Random effects:
 Groups  Name        Variance Std.Dev.
 Subject (Intercept) 7.195    2.682   
Number of obs: 480, groups:  Subject, 40

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 16.171235  12.722352   1.271   0.2037    
Sexmale      3.096645   1.696160   1.826   0.0679 .  
Age         -0.006672   0.076457  -0.087   0.9305    
Height      -0.192263   0.088956  -2.161   0.0307 *  
Weight       0.075159   0.059103   1.272   0.2035    
Surfacenorm  7.285384   1.055154   6.905 5.04e-12 ***
Visiondome   0.675899   0.527365   1.282   0.2000    
Visionopen   6.088911   0.972404   6.262 3.81e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) Sexmal Age    Height Weight Srfcnr Visndm
Sexmale      0.509                                          
Age         -0.166  0.094                                   
Height      -0.960 -0.429  0.047                            
Weight       0.386 -0.324 -0.169 -0.570                     
Surfacenorm  0.166  0.256 -0.013 -0.281  0.147              
Visiondome   0.007  0.034  0.000 -0.044  0.027  0.113       
Visionopen   0.169  0.265 -0.008 -0.280  0.138  0.829  0.382
optimizer (Nelder_Mead) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.288533 (tol = 0.002, component 1)
Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

GLMM estimation by PQL.

library(MASS)

modpql <- glmmPQL(stable ~ Sex + Age + Height + Weight + Surface + Vision,
                  random = ~ 1 | Subject,
                  family = binomial,
                  data   = ctsib)
summary(modpql)

Linear mixed-effects model fit by maximum likelihood
  Data: ctsib 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | Subject
        (Intercept)  Residual
StdDev:    3.060712 0.5906232

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects:  stable ~ Sex + Age + Height + Weight + Surface + Vision 
                Value Std.Error  DF   t-value p-value
(Intercept) 15.571494 13.498304 437  1.153589  0.2493
Sexmale      3.355340  1.752614  35  1.914478  0.0638
Age         -0.006638  0.081959  35 -0.080992  0.9359
Height      -0.190819  0.092023  35 -2.073601  0.0455
Weight       0.069467  0.062857  35  1.105155  0.2766
Surfacenorm  7.724078  0.573578 437 13.466492  0.0000
Visiondome   0.726464  0.325933 437  2.228873  0.0263
Visionopen   6.485257  0.543980 437 11.921876  0.0000
 Correlation: 
            (Intr) Sexmal Age    Height Weight Srfcnr Visndm
Sexmale      0.488                                          
Age         -0.164  0.110                                   
Height      -0.963 -0.388  0.041                            
Weight       0.368 -0.374 -0.168 -0.555                     
Surfacenorm  0.051  0.116  0.023 -0.114  0.055              
Visiondome  -0.003  0.011  0.004 -0.017  0.011  0.087       
Visionopen   0.056  0.125  0.026 -0.116  0.049  0.788  0.377

Standardized Within-Group Residuals:
          Min            Q1           Med            Q3           Max 
-7.3825387635 -0.2333403346 -0.0233564300 -0.0004216629  9.9310682774 

Number of Observations: 480
Number of Groups: 40

4 Count response example

epilepsy data in ELMR.
- Response: seizures (number of seizures of each epilepsy patient).
- Variables: treat (binary, 1 = treatment or 0 = control group), expind (binary, 0 = baseline or 1 = experiment phase), timeadj (length of phase).

epilepsy <- as_tibble(epilepsy) %>% 
  mutate(period = rep(0:4, 59), 
         drug   = factor(c("placebo", "treatment"))[treat + 1],
         phase  = factor(c("baseline", "experiment"))[expind + 1]) %>%
  print()

# A tibble: 295 × 9
   seizures    id treat expind timeadj   age period drug    phase     
      <dbl> <int> <dbl>  <dbl>   <dbl> <dbl>  <int> <fct>   <fct>     
 1       11     1     0      0       8    31      0 placebo baseline  
 2        5     1     0      1       2    31      1 placebo experiment
 3        3     1     0      1       2    31      2 placebo experiment
 4        3     1     0      1       2    31      3 placebo experiment
 5        3     1     0      1       2    31      4 placebo experiment
 6       11     2     0      0       8    30      0 placebo baseline  
 7        3     2     0      1       2    30      1 placebo experiment
 8        5     2     0      1       2    30      2 placebo experiment
 9        3     2     0      1       2    30      3 placebo experiment
10        3     2     0      1       2    30      4 placebo experiment
# ℹ 285 more rows

summary(epilepsy)

    seizures            id         treat            expind       timeadj   
 Min.   :  0.00   Min.   : 1   Min.   :0.0000   Min.   :0.0   Min.   :2.0  
 1st Qu.:  3.00   1st Qu.:15   1st Qu.:0.0000   1st Qu.:1.0   1st Qu.:2.0  
 Median :  6.00   Median :30   Median :1.0000   Median :1.0   Median :2.0  
 Mean   : 12.86   Mean   :30   Mean   :0.5254   Mean   :0.8   Mean   :3.2  
 3rd Qu.: 14.50   3rd Qu.:45   3rd Qu.:1.0000   3rd Qu.:1.0   3rd Qu.:2.0  
 Max.   :151.00   Max.   :59   Max.   :1.0000   Max.   :1.0   Max.   :8.0  
      age            period         drug            phase    
 Min.   :18.00   Min.   :0   placebo  :140   baseline  : 59  
 1st Qu.:22.00   1st Qu.:1   treatment:155   experiment:236  
 Median :28.00   Median :2                                   
 Mean   :28.85   Mean   :2                                   
 3rd Qu.:35.00   3rd Qu.:3                                   
 Max.   :57.00   Max.   :4

Numerical summary. Is there any difference in the number of seizures per week in the placebo and treatment groups?

epilepsy %>%
  group_by(drug, phase) %>%
  summarize(rate = mean(seizures / timeadj)) %>%
  xtabs(formula = rate ~ phase + drug)

            drug
phase         placebo treatment
  baseline   3.848214  3.955645
  experiment 4.303571  3.983871

table(epilepsy$treat, epilepsy$phase)

   
    baseline experiment
  0       28        112
  1       31        124

Graphical summary.

epilepsy %>%
  ggplot() + 
  geom_line(mapping = aes(x = period, y = seizures, linetype = drug, group = id)) +
  xlim(1, 4) + 
  scale_y_sqrt(breaks = (0:10)^2) + 
  theme(legend.position = "top", legend.direction = "horizontal")

ratesum <- epilepsy %>%
  group_by(id, phase, drug) %>%
  summarize(rate = mean(seizures / timeadj))

spread(ratesum, phase, rate) %>%
  ggplot() + 
  geom_point(mapping = aes(x = baseline, y = experiment, shape = drug)) + 
  scale_x_sqrt() + scale_y_sqrt() + 
  geom_abline(intercept = 0, slope = 1) + 
  theme(legend.position = "top", legend.direction = "horizontal")

In following analysis we remove patient 49, who seems to have unsually high rate of seizures.

epilo <- filter(epilepsy, id != 49)

We start with GLM analysis even though the model is not correct due to the grouping of observations.

modglm <- glm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat), 
              family   = poisson,
              data     = epilo)
summary(modglm)


Call:
glm(formula = seizures ~ offset(log(timeadj)) + expind + treat + 
    I(expind * treat), family = poisson, data = epilo)

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        1.34761    0.03406  39.566  < 2e-16 ***
expind             0.11184    0.04688   2.386    0.017 *  
treat             -0.10682    0.04863  -2.197    0.028 *  
I(expind * treat) -0.30238    0.06971  -4.338 1.44e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2485.1  on 289  degrees of freedom
Residual deviance: 2411.5  on 286  degrees of freedom
AIC: 3449.7

Number of Fisher Scoring iterations: 5

GLMM analysis by Gauss-Hermite quadrature with 25 knots.

library(lme4)

modgh <- glmer(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat) + (1 | id), 
               nAGQ     = 25,
               family   = poisson,
               data     = epilo)
summary(modgh)

Generalized linear mixed model fit by maximum likelihood (Adaptive
  Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
 Family: poisson  ( log )
Formula: seizures ~ offset(log(timeadj)) + expind + treat + I(expind *  
    treat) + (1 | id)
   Data: epilo

      AIC       BIC    logLik -2*log(L)  df.resid 
    877.7     896.1    -433.9     867.7       285 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8724 -0.8482 -0.1722  0.5697  9.8941 

Random effects:
 Groups Name        Variance Std.Dev.
 id     (Intercept) 0.515    0.7176  
Number of obs: 290, groups:  id, 58

Fixed effects:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        1.035998   0.141256   7.334 2.23e-13 ***
expind             0.111838   0.046877   2.386    0.017 *  
treat             -0.008152   0.196524  -0.041    0.967    
I(expind * treat) -0.302387   0.069714  -4.338 1.44e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) expind treat 
expind      -0.175              
treat       -0.718  0.126       
I(xpnd*trt)  0.118 -0.672 -0.173

GLMM estimation by PQL.

library(MASS)

modpql <- glmmPQL(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat),
                  random = ~ 1 | id,
                  family = poisson,
                  data   = epilo)
summary(modpql)

Linear mixed-effects model fit by maximum likelihood
  Data: epilo 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | id
        (Intercept) Residual
StdDev:   0.6820012 1.605385

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects:  seizures ~ offset(log(timeadj)) + expind + treat + I(expind *      treat) 
                       Value  Std.Error  DF   t-value p-value
(Intercept)        1.0761832 0.09990514 230 10.772050  0.0000
expind             0.1125119 0.07412152 230  1.517939  0.1304
I(expind * treat) -0.3037615 0.10819095 230 -2.807642  0.0054
 Correlation: 
                  (Intr) expind
expind            -0.198       
I(expind * treat) -0.014 -0.656

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.2934834 -0.5649468 -0.1492931  0.3224895  6.3123337 

Number of Observations: 290
Number of Groups: 58

5 GEE (generalized estimation equation)

The quasi-likelihood (quasi-binomial, quasi-Poisson, etc) approach can be generalized to handle correlated, nonnormal responses.
Let \(\mathbf{Y}_i\) be the response vector of \(i\)-th individual. \[\begin{eqnarray*} \mathbb{E} \mathbf{Y}_i &=& \boldsymbol{\mu}_i \\ g(\boldsymbol{\mu}_i) &=& \mathbf{X}_i \boldsymbol{\beta} \\ \mathbf{V}_i &=& \text{Var}(\mathbf{Y}_i) = \phi \mathbf{A}_i^{1/2} \mathbf{R}_i(\boldsymbol{\alpha}) \mathbf{A}_i^{1/2}, \end{eqnarray*}\] where \(\mathbf{A}_i = \text{diag}(a(\boldsymbol{\mu}))\) captures the individual variances and \(\mathbf{R}_i(\boldsymbol{\alpha})\) is a working correlation matrix.
Commonly used working correlation are
- compound symmetry, or equicorrelation, or exchangeable correlation: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \cdots & \rho \\ \rho & 1 & & \rho \\ \vdots & & \ddots & \vdots \\ \rho & \rho & \cdots & 1 \end{pmatrix} \]
- Autoregressive model: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \rho^2 & \cdots & \rho^{n-1} \\ \rho & 1 & \rho & & \rho^{n-2} \\ \rho^2 & \rho & 1 & & \vdots \\ \vdots & & & \ddots & \\ \rho^{n-1} & \cdots & & \rho & 1 \end{pmatrix} \]
- Unstructured correlation matrix
Given estimates of \(\phi\) and \(\boldsymbol{\alpha}\), we solve the estimation equation \[ \sum_i [D_{\boldsymbol{\beta}} \boldsymbol{\mu}_i(\boldsymbol{\beta})]^T \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol{\mu}_i) = \mathbf{0}. \]
Let’s revisit the ctsib (stability) data set. We use the exchangeable correlation, or equivalently, compound symmetry. Standard errors and Wald test are constructed using the sandwich estimators. \(\boldsymbol{\beta}\) estimates in GEE are about half the size of those from GLMM. The scale.fix = TRUE instructs the function geeglm not estimate the dispersion parameter \(\phi\), since for binary response the dispersion is always 1.

library(geepack)

modgeep <- geeglm(stable    ~ Sex + Age + Height + Weight + Surface + Vision,
                  id        = Subject, 
                  corstr    = "exchangeable",
                  scale.fix = TRUE,
                  data      = ctsib,
                  family    = binomial(link = "logit"))
summary(modgeep)


Call:
geeglm(formula = stable ~ Sex + Age + Height + Weight + Surface + 
    Vision, family = binomial(link = "logit"), data = ctsib, 
    id = Subject, corstr = "exchangeable", scale.fix = TRUE)

 Coefficients:
            Estimate  Std.err   Wald Pr(>|W|)    
(Intercept)  8.62332  5.91992  2.122   0.1452    
Sexmale      1.64488  0.90347  3.315   0.0687 .  
Age         -0.01205  0.04802  0.063   0.8019    
Height      -0.10211  0.04239  5.801   0.0160 *  
Weight       0.04365  0.03399  1.649   0.1991    
Surfacenorm  3.91632  0.56682 47.738 4.87e-12 ***
Visiondome   0.35888  0.40403  0.789   0.3744    
Visionopen   3.17990  0.46063 47.657 5.08e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Scale is fixed.

  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha   0.2185 0.04467
Number of clusters:   40  Maximum cluster size: 12

Let’s revisit the epilepsy data set.

modgeep <- geeglm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind*treat),
                  id       = id, 
                  family   = poisson,
                  corstr   = "ar1", 
                  data     = epilepsy,
                  subset  = (id != 49))
summary(modgeep)


Call:
geeglm(formula = seizures ~ offset(log(timeadj)) + expind + treat + 
    I(expind * treat), family = poisson, data = epilepsy, subset = (id != 
    49), id = id, corstr = "ar1")

 Coefficients:
                  Estimate Std.err  Wald Pr(>|W|)    
(Intercept)         1.3138  0.1616 66.10  4.4e-16 ***
expind              0.1509  0.1108  1.86    0.173    
treat              -0.0797  0.1983  0.16    0.688    
I(expind * treat)  -0.3987  0.1745  5.22    0.022 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = ar1 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     10.6    2.35
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha    0.783  0.0519
Number of clusters:   58  Maximum cluster size: 5