Mixed Effects Models for Nonnormal Responses (ELMR Chapter 13)

Biostat 200C

Author

Dr. Jin Zhou @ UCLA

Published

May 27, 2025

Display system information and load tidyverse and faraway packages

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] geepack_1.3.12  MASS_7.3-65     lme4_1.1-37     Matrix_1.7-2   
 [5] faraway_1.0.9   lubridate_1.9.4 forcats_1.0.0   stringr_1.5.1  
 [9] dplyr_1.1.4     purrr_1.0.4     readr_2.1.5     tidyr_1.3.1    
[13] tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] generics_0.1.3    stringi_1.8.7     lattice_0.22-6    hms_1.1.3        
 [5] digest_0.6.37     magrittr_2.0.3    evaluate_1.0.3    grid_4.4.1       
 [9] timechange_0.3.0  fastmap_1.2.0     jsonlite_1.9.1    backports_1.5.0  
[13] scales_1.3.0      Rdpack_2.6.3      reformulas_0.4.0  cli_3.6.4        
[17] rlang_1.1.5       rbibutils_2.3     munsell_0.5.1     splines_4.4.1    
[21] withr_3.0.2       yaml_2.3.10       tools_4.4.1       tzdb_0.4.0       
[25] nloptr_2.2.1      minqa_1.2.8       colorspace_2.1-1  boot_1.3-31      
[29] broom_1.0.8       vctrs_0.6.5       R6_2.6.1          lifecycle_1.0.4  
[33] htmlwidgets_1.6.4 pkgconfig_2.0.3   pillar_1.10.1     gtable_0.3.6     
[37] glue_1.8.0        Rcpp_1.0.14       xfun_0.51         tidyselect_1.2.1 
[41] rstudioapi_0.17.1 knitr_1.49        htmltools_0.5.8.1 nlme_3.1-167     
[45] rmarkdown_2.29    compiler_4.4.1   
library(tidyverse)
library(faraway)

1 Generalized linear mixed models (GLMM)

  • So far we studied linear mixed models (LMM), where the responses are continuous and normally distributed. It is natural to extend LMM to handle nonnormally distributed responses.

  • Recall that, in GLM, the distribution of \(Y_i\) is from the exponential family of distributions of form \[ f(y_i \mid \theta_i, \phi) = \exp \left[ \frac{y \theta_i - b(\theta_i)}{a(\phi)} + c(y_i, \phi) \right]. \] If we use the canonical link, then \[ \theta_i = g(\mu_i) = \eta_i, \] where \(\mu_i = \mathbb{E} Y_i\). Now let \[ \eta_i = \mathbf{x}_i^T \boldsymbol{\beta} + \mathbf{z}_i^T \boldsymbol{\gamma}_i, \] where \(\boldsymbol{\beta}\) is the fixed effects and \(\boldsymbol{\gamma}\) is the random effects with density \(h(\gamma \mid \boldsymbol{\Sigma})\). (Typically we assume multivariate normal with mean 0 and covariance \(\boldsymbol{\Sigma}\).) Then the likelihood is \[ L(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \prod_{i=1}^n \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma} \] and the log-likelihood is \[ \ell(\boldsymbol{\beta}, \phi, \boldsymbol{\Sigma}) = \sum_{i=1}^n \log \int f(y_i \mid \boldsymbol{\beta}, \phi, \boldsymbol{\gamma}) h(\boldsymbol{\gamma} \mid \boldsymbol{\Sigma}) \, d \boldsymbol{\gamma}. \] This model, combining GLM and mixed effects model, is called the generalized linear mixed effects model (GLMM).

2 Overview of estimation and inference methods

  • Estimation and inference of GLMM have been active research. We give an overview of a few commonly used methods.

  • Maximum likelihood estimate (MLE) is obtained by maximizing the log-likelihood function. However, each iteration of the optimization algorithm needs to evaluate numerical integration, which quickly becomes infeasible when the dimension of random effects \(\boldsymbol{\gamma}\) is large.

    • Gauss-Hermite quadrature can be used for numerical integration. It approximates the integral by a weighted sum of integrands at different knots. The more knots, the more accurate the approximation, but at higher computational expense.

    • Laplace approximation is a special case of Gauss-Hermite quadrature with only one knot. It is less accurate than Guass-Hermite quadrature but computationally cheaper.

  • Bayesian method is another approach for estimation and inference of GLMM. It’s not covered in this course due to time constraint. Interested students can study ELMR Chapter 12.

  • Penalized quasi-likelihood (PQL). In the IRWLS (iteratively reweighted least squares) procedure for estimating the GLM, each iteration uses the pseudo-response or working response \[ \mathbf{z}^{(t)} = \boldsymbol{\eta}^{(t)} + (\mathbf{W}^{(t)})^{-1} (\mathbf{y} - \widehat{\boldsymbol{\mu}}^{(t)}). \] It can be adapted to the mixed effects model. PQL has computational advantage but generally considered less accurate than other approaches.

  • Generalized estimation equations (GEE). To be discussed later.

3 Binary response example

  • ctsib data in EMLR studies the balance of 40 individuals.
    • Response: stability (1 = stable, 0 = instable).
    • Predictors: sex (binary), age (continuous), height (continuous), surface (binary, norm or foam), vision (factor, open or closed or dome).
    • Each subject is tested twice on a combination of surface and vision. 12 measurements per subject.
ctsib <- as_tibble(ctsib) %>%
  mutate(stable = ifelse(CTSIB == 1, 1, 0)) %>%
  print(n = 24)
# A tibble: 480 × 9
   Subject Sex     Age Height Weight Surface Vision CTSIB stable
     <int> <fct> <int>  <dbl>  <dbl> <fct>   <fct>  <int>  <dbl>
 1       1 male     22    176   68.2 norm    open       1      1
 2       1 male     22    176   68.2 norm    open       1      1
 3       1 male     22    176   68.2 norm    closed     2      0
 4       1 male     22    176   68.2 norm    closed     2      0
 5       1 male     22    176   68.2 norm    dome       1      1
 6       1 male     22    176   68.2 norm    dome       2      0
 7       1 male     22    176   68.2 foam    open       2      0
 8       1 male     22    176   68.2 foam    open       2      0
 9       1 male     22    176   68.2 foam    closed     2      0
10       1 male     22    176   68.2 foam    closed     2      0
11       1 male     22    176   68.2 foam    dome       2      0
12       1 male     22    176   68.2 foam    dome       2      0
13       2 male     22    181   67.6 norm    open       1      1
14       2 male     22    181   67.6 norm    open       1      1
15       2 male     22    181   67.6 norm    closed     2      0
16       2 male     22    181   67.6 norm    closed     2      0
17       2 male     22    181   67.6 norm    dome       2      0
18       2 male     22    181   67.6 norm    dome       2      0
19       2 male     22    181   67.6 foam    open       2      0
20       2 male     22    181   67.6 foam    open       2      0
21       2 male     22    181   67.6 foam    closed     3      0
22       2 male     22    181   67.6 foam    closed     3      0
23       2 male     22    181   67.6 foam    dome       3      0
24       2 male     22    181   67.6 foam    dome       3      0
# ℹ 456 more rows
summary(ctsib)
    Subject          Sex           Age            Height          Weight      
 Min.   : 1.00   female:240   Min.   :18.00   Min.   :142.0   Min.   : 44.20  
 1st Qu.:10.75   male  :240   1st Qu.:21.75   1st Qu.:166.8   1st Qu.: 60.65  
 Median :20.50                Median :25.50   Median :173.0   Median : 68.00  
 Mean   :20.50                Mean   :26.80   Mean   :172.1   Mean   : 71.14  
 3rd Qu.:30.25                3rd Qu.:33.00   3rd Qu.:180.2   3rd Qu.: 83.50  
 Max.   :40.00                Max.   :38.00   Max.   :190.0   Max.   :102.40  
 Surface       Vision        CTSIB           stable      
 foam:240   closed:160   Min.   :1.000   Min.   :0.0000  
 norm:240   dome  :160   1st Qu.:2.000   1st Qu.:0.0000  
            open  :160   Median :2.000   Median :0.0000  
                         Mean   :1.919   Mean   :0.2375  
                         3rd Qu.:2.000   3rd Qu.:0.0000  
                         Max.   :4.000   Max.   :1.0000  
  • Graphical summary.
subsum <- ctsib %>% 
  group_by(Subject) %>%
  summarise(Height = Height[1], 
            Weight = Weight[1], 
            stable = mean(stable), 
            Age    = Age[1], 
            Sex    = Sex[1])
subsum %>%  
  ggplot() + 
  geom_point(mapping = aes(x = Height, y = stable))

subsum %>%  
  ggplot() + 
  geom_point(mapping = aes(x = Weight, y = stable))

subsum %>%  
  ggplot() + 
  geom_point(mapping = aes(x = Age, y = stable))

subsum %>%  
  ggplot() + 
  geom_boxplot(mapping = aes(x = Sex, y = stable))

ctsib %>%
  group_by(Subject, Surface) %>%
  summarize(stable = mean(stable)) %>%
  ggplot() + 
  geom_boxplot(mapping = aes(x = Surface, y = stable)) + 
  scale_y_log10()
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
Warning: Removed 39 rows containing non-finite outside the scale range
(`stat_boxplot()`).

ctsib %>%
  group_by(Subject, Vision) %>%
  summarize(stable = mean(stable)) %>%
  ggplot() + 
  geom_boxplot(mapping = aes(x = Vision, y = stable)) + 
  scale_y_log10()
Warning in scale_y_log10(): log-10 transformation introduced infinite values.
Warning: Removed 63 rows containing non-finite outside the scale range
(`stat_boxplot()`).

  • GLM analysis. What are the issues with the GLM here?
gf <- glm(stable ~ Sex + Age + Height + Weight + Surface + Vision, 
          family = binomial,
          data   = ctsib)
summary(gf)

Call:
glm(formula = stable ~ Sex + Age + Height + Weight + Surface + 
    Vision, family = binomial, data = ctsib)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  7.277449   3.803987   1.913 0.055734 .  
Sexmale      1.401577   0.516231   2.715 0.006627 ** 
Age          0.002521   0.024307   0.104 0.917390    
Height      -0.096413   0.026837  -3.593 0.000327 ***
Weight       0.043503   0.018002   2.417 0.015665 *  
Surfacenorm  3.967515   0.447179   8.872  < 2e-16 ***
Visiondome   0.363753   0.383218   0.949 0.342516    
Visionopen   3.187501   0.416001   7.662 1.83e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 526.25  on 479  degrees of freedom
Residual deviance: 295.20  on 472  degrees of freedom
AIC: 311.2

Number of Fisher Scoring iterations: 6
  • GLMM analysis. We treat subjects as random. First, estimation by Laplace approximation.
library(lme4)

modlap <- 
  glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
        family = binomial,
        data   = ctsib)
summary(modlap)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |  
    Subject)
   Data: ctsib

      AIC       BIC    logLik -2*log(L)  df.resid 
    247.6     285.1    -114.8     229.6       471 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-5.0149 -0.1266 -0.0176 -0.0005  4.8664 

Random effects:
 Groups  Name        Variance Std.Dev.
 Subject (Intercept) 8.197    2.863   
Number of obs: 480, groups:  Subject, 40

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  9.920823  13.107103   0.757   0.4491    
Sexmale      2.825340   1.746985   1.617   0.1058    
Age         -0.003642   0.079837  -0.046   0.9636    
Height      -0.151012   0.089924  -1.679   0.0931 .  
Weight       0.058925   0.061593   0.957   0.3387    
Surfacenorm  7.524401   1.164373   6.462 1.03e-10 ***
Visiondome   0.683933   0.529837   1.291   0.1968    
Visionopen   6.321076   1.074736   5.882 4.07e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) Sexmal Age    Height Weight Srfcnr Visndm
Sexmale      0.476                                          
Age         -0.170  0.096                                   
Height      -0.957 -0.392  0.050                            
Weight       0.373 -0.366 -0.172 -0.561                     
Surfacenorm  0.010  0.195 -0.010 -0.135  0.059              
Visiondome  -0.012  0.029  0.001 -0.026  0.015  0.115       
Visionopen   0.013  0.202 -0.007 -0.134  0.051  0.857  0.362
optimizer (Nelder_Mead) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.128162 (tol = 0.002, component 1)
Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?
  • GLMM estimation by Gauss-Hermite quadrature with 25 knots.
library(lme4)

modgh <- 
  glmer(stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 | Subject),
        nAGQ   = 25,
        family = binomial,
        data   = ctsib)
summary(modgh)
Generalized linear mixed model fit by maximum likelihood (Adaptive
  Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
 Family: binomial  ( logit )
Formula: stable ~ Sex + Age + Height + Weight + Surface + Vision + (1 |  
    Subject)
   Data: ctsib

      AIC       BIC    logLik -2*log(L)  df.resid 
    247.9     285.5    -115.0     229.9       471 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.8842 -0.1386 -0.0197 -0.0007  4.9019 

Random effects:
 Groups  Name        Variance Std.Dev.
 Subject (Intercept) 7.195    2.682   
Number of obs: 480, groups:  Subject, 40

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) 16.171235  12.722352   1.271   0.2037    
Sexmale      3.096645   1.696160   1.826   0.0679 .  
Age         -0.006672   0.076457  -0.087   0.9305    
Height      -0.192263   0.088956  -2.161   0.0307 *  
Weight       0.075159   0.059103   1.272   0.2035    
Surfacenorm  7.285384   1.055154   6.905 5.04e-12 ***
Visiondome   0.675899   0.527365   1.282   0.2000    
Visionopen   6.088911   0.972404   6.262 3.81e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) Sexmal Age    Height Weight Srfcnr Visndm
Sexmale      0.509                                          
Age         -0.166  0.094                                   
Height      -0.960 -0.429  0.047                            
Weight       0.386 -0.324 -0.169 -0.570                     
Surfacenorm  0.166  0.256 -0.013 -0.281  0.147              
Visiondome   0.007  0.034  0.000 -0.044  0.027  0.113       
Visionopen   0.169  0.265 -0.008 -0.280  0.138  0.829  0.382
optimizer (Nelder_Mead) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.288533 (tol = 0.002, component 1)
Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?
  • GLMM estimation by PQL.
library(MASS)

modpql <- glmmPQL(stable ~ Sex + Age + Height + Weight + Surface + Vision,
                  random = ~ 1 | Subject,
                  family = binomial,
                  data   = ctsib)
summary(modpql)
Linear mixed-effects model fit by maximum likelihood
  Data: ctsib 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | Subject
        (Intercept)  Residual
StdDev:    3.060712 0.5906232

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects:  stable ~ Sex + Age + Height + Weight + Surface + Vision 
                Value Std.Error  DF   t-value p-value
(Intercept) 15.571494 13.498304 437  1.153589  0.2493
Sexmale      3.355340  1.752614  35  1.914478  0.0638
Age         -0.006638  0.081959  35 -0.080992  0.9359
Height      -0.190819  0.092023  35 -2.073601  0.0455
Weight       0.069467  0.062857  35  1.105155  0.2766
Surfacenorm  7.724078  0.573578 437 13.466492  0.0000
Visiondome   0.726464  0.325933 437  2.228873  0.0263
Visionopen   6.485257  0.543980 437 11.921876  0.0000
 Correlation: 
            (Intr) Sexmal Age    Height Weight Srfcnr Visndm
Sexmale      0.488                                          
Age         -0.164  0.110                                   
Height      -0.963 -0.388  0.041                            
Weight       0.368 -0.374 -0.168 -0.555                     
Surfacenorm  0.051  0.116  0.023 -0.114  0.055              
Visiondome  -0.003  0.011  0.004 -0.017  0.011  0.087       
Visionopen   0.056  0.125  0.026 -0.116  0.049  0.788  0.377

Standardized Within-Group Residuals:
          Min            Q1           Med            Q3           Max 
-7.3825387635 -0.2333403346 -0.0233564300 -0.0004216629  9.9310682774 

Number of Observations: 480
Number of Groups: 40 

4 Count response example

  • epilepsy data in ELMR.
    • Response: seizures (number of seizures of each epilepsy patient).
    • Variables: treat (binary, 1 = treatment or 0 = control group), expind (binary, 0 = baseline or 1 = experiment phase), timeadj (length of phase).
epilepsy <- as_tibble(epilepsy) %>% 
  mutate(period = rep(0:4, 59), 
         drug   = factor(c("placebo", "treatment"))[treat + 1],
         phase  = factor(c("baseline", "experiment"))[expind + 1]) %>%
  print()
# A tibble: 295 × 9
   seizures    id treat expind timeadj   age period drug    phase     
      <dbl> <int> <dbl>  <dbl>   <dbl> <dbl>  <int> <fct>   <fct>     
 1       11     1     0      0       8    31      0 placebo baseline  
 2        5     1     0      1       2    31      1 placebo experiment
 3        3     1     0      1       2    31      2 placebo experiment
 4        3     1     0      1       2    31      3 placebo experiment
 5        3     1     0      1       2    31      4 placebo experiment
 6       11     2     0      0       8    30      0 placebo baseline  
 7        3     2     0      1       2    30      1 placebo experiment
 8        5     2     0      1       2    30      2 placebo experiment
 9        3     2     0      1       2    30      3 placebo experiment
10        3     2     0      1       2    30      4 placebo experiment
# ℹ 285 more rows
summary(epilepsy)
    seizures            id         treat            expind       timeadj   
 Min.   :  0.00   Min.   : 1   Min.   :0.0000   Min.   :0.0   Min.   :2.0  
 1st Qu.:  3.00   1st Qu.:15   1st Qu.:0.0000   1st Qu.:1.0   1st Qu.:2.0  
 Median :  6.00   Median :30   Median :1.0000   Median :1.0   Median :2.0  
 Mean   : 12.86   Mean   :30   Mean   :0.5254   Mean   :0.8   Mean   :3.2  
 3rd Qu.: 14.50   3rd Qu.:45   3rd Qu.:1.0000   3rd Qu.:1.0   3rd Qu.:2.0  
 Max.   :151.00   Max.   :59   Max.   :1.0000   Max.   :1.0   Max.   :8.0  
      age            period         drug            phase    
 Min.   :18.00   Min.   :0   placebo  :140   baseline  : 59  
 1st Qu.:22.00   1st Qu.:1   treatment:155   experiment:236  
 Median :28.00   Median :2                                   
 Mean   :28.85   Mean   :2                                   
 3rd Qu.:35.00   3rd Qu.:3                                   
 Max.   :57.00   Max.   :4                                   
  • Numerical summary. Is there any difference in the number of seizures per week in the placebo and treatment groups?
epilepsy %>%
  group_by(drug, phase) %>%
  summarize(rate = mean(seizures / timeadj)) %>%
  xtabs(formula = rate ~ phase + drug)
            drug
phase         placebo treatment
  baseline   3.848214  3.955645
  experiment 4.303571  3.983871
table(epilepsy$treat, epilepsy$phase)
   
    baseline experiment
  0       28        112
  1       31        124
  • Graphical summary.
epilepsy %>%
  ggplot() + 
  geom_line(mapping = aes(x = period, y = seizures, linetype = drug, group = id)) +
  xlim(1, 4) + 
  scale_y_sqrt(breaks = (0:10)^2) + 
  theme(legend.position = "top", legend.direction = "horizontal")

ratesum <- epilepsy %>%
  group_by(id, phase, drug) %>%
  summarize(rate = mean(seizures / timeadj))

spread(ratesum, phase, rate) %>%
  ggplot() + 
  geom_point(mapping = aes(x = baseline, y = experiment, shape = drug)) + 
  scale_x_sqrt() + scale_y_sqrt() + 
  geom_abline(intercept = 0, slope = 1) + 
  theme(legend.position = "top", legend.direction = "horizontal")

  • In following analysis we remove patient 49, who seems to have unsually high rate of seizures.
epilo <- filter(epilepsy, id != 49)
  • We start with GLM analysis even though the model is not correct due to the grouping of observations.
modglm <- glm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat), 
              family   = poisson,
              data     = epilo)
summary(modglm)

Call:
glm(formula = seizures ~ offset(log(timeadj)) + expind + treat + 
    I(expind * treat), family = poisson, data = epilo)

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        1.34761    0.03406  39.566  < 2e-16 ***
expind             0.11184    0.04688   2.386    0.017 *  
treat             -0.10682    0.04863  -2.197    0.028 *  
I(expind * treat) -0.30238    0.06971  -4.338 1.44e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2485.1  on 289  degrees of freedom
Residual deviance: 2411.5  on 286  degrees of freedom
AIC: 3449.7

Number of Fisher Scoring iterations: 5
  • GLMM analysis by Gauss-Hermite quadrature with 25 knots.
library(lme4)

modgh <- glmer(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat) + (1 | id), 
               nAGQ     = 25,
               family   = poisson,
               data     = epilo)
summary(modgh)
Generalized linear mixed model fit by maximum likelihood (Adaptive
  Gauss-Hermite Quadrature, nAGQ = 25) [glmerMod]
 Family: poisson  ( log )
Formula: seizures ~ offset(log(timeadj)) + expind + treat + I(expind *  
    treat) + (1 | id)
   Data: epilo

      AIC       BIC    logLik -2*log(L)  df.resid 
    877.7     896.1    -433.9     867.7       285 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.8724 -0.8482 -0.1722  0.5697  9.8941 

Random effects:
 Groups Name        Variance Std.Dev.
 id     (Intercept) 0.515    0.7176  
Number of obs: 290, groups:  id, 58

Fixed effects:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        1.035998   0.141256   7.334 2.23e-13 ***
expind             0.111838   0.046877   2.386    0.017 *  
treat             -0.008152   0.196524  -0.041    0.967    
I(expind * treat) -0.302387   0.069714  -4.338 1.44e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) expind treat 
expind      -0.175              
treat       -0.718  0.126       
I(xpnd*trt)  0.118 -0.672 -0.173
  • GLMM estimation by PQL.
library(MASS)

modpql <- glmmPQL(seizures ~ offset(log(timeadj)) + expind + treat + I(expind * treat),
                  random = ~ 1 | id,
                  family = poisson,
                  data   = epilo)
summary(modpql)
Linear mixed-effects model fit by maximum likelihood
  Data: epilo 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | id
        (Intercept) Residual
StdDev:   0.6820012 1.605385

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects:  seizures ~ offset(log(timeadj)) + expind + treat + I(expind *      treat) 
                       Value  Std.Error  DF   t-value p-value
(Intercept)        1.0761832 0.09990514 230 10.772050  0.0000
expind             0.1125119 0.07412152 230  1.517939  0.1304
I(expind * treat) -0.3037615 0.10819095 230 -2.807642  0.0054
 Correlation: 
                  (Intr) expind
expind            -0.198       
I(expind * treat) -0.014 -0.656

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.2934834 -0.5649468 -0.1492931  0.3224895  6.3123337 

Number of Observations: 290
Number of Groups: 58 

5 GEE (generalized estimation equation)

  • The quasi-likelihood (quasi-binomial, quasi-Poisson, etc) approach can be generalized to handle correlated, nonnormal responses.

  • Let \(\mathbf{Y}_i\) be the response vector of \(i\)-th individual. \[\begin{eqnarray*} \mathbb{E} \mathbf{Y}_i &=& \boldsymbol{\mu}_i \\ g(\boldsymbol{\mu}_i) &=& \mathbf{X}_i \boldsymbol{\beta} \\ \mathbf{V}_i &=& \text{Var}(\mathbf{Y}_i) = \phi \mathbf{A}_i^{1/2} \mathbf{R}_i(\boldsymbol{\alpha}) \mathbf{A}_i^{1/2}, \end{eqnarray*}\] where \(\mathbf{A}_i = \text{diag}(a(\boldsymbol{\mu}))\) captures the individual variances and \(\mathbf{R}_i(\boldsymbol{\alpha})\) is a working correlation matrix.

  • Commonly used working correlation are

    • compound symmetry, or equicorrelation, or exchangeable correlation: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \cdots & \rho \\ \rho & 1 & & \rho \\ \vdots & & \ddots & \vdots \\ \rho & \rho & \cdots & 1 \end{pmatrix} \]

    • Autoregressive model: \[ \mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \rho^2 & \cdots & \rho^{n-1} \\ \rho & 1 & \rho & & \rho^{n-2} \\ \rho^2 & \rho & 1 & & \vdots \\ \vdots & & & \ddots & \\ \rho^{n-1} & \cdots & & \rho & 1 \end{pmatrix} \]

    • Unstructured correlation matrix

  • Given estimates of \(\phi\) and \(\boldsymbol{\alpha}\), we solve the estimation equation \[ \sum_i [D_{\boldsymbol{\beta}} \boldsymbol{\mu}_i(\boldsymbol{\beta})]^T \mathbf{V}_i^{-1}(\mathbf{Y}_i - \boldsymbol{\mu}_i) = \mathbf{0}. \]

  • Let’s revisit the ctsib (stability) data set. We use the exchangeable correlation, or equivalently, compound symmetry. Standard errors and Wald test are constructed using the sandwich estimators. \(\boldsymbol{\beta}\) estimates in GEE are about half the size of those from GLMM. The scale.fix = TRUE instructs the function geeglm not estimate the dispersion parameter \(\phi\), since for binary response the dispersion is always 1.

library(geepack)

modgeep <- geeglm(stable    ~ Sex + Age + Height + Weight + Surface + Vision,
                  id        = Subject, 
                  corstr    = "exchangeable",
                  scale.fix = TRUE,
                  data      = ctsib,
                  family    = binomial(link = "logit"))
summary(modgeep)

Call:
geeglm(formula = stable ~ Sex + Age + Height + Weight + Surface + 
    Vision, family = binomial(link = "logit"), data = ctsib, 
    id = Subject, corstr = "exchangeable", scale.fix = TRUE)

 Coefficients:
            Estimate  Std.err   Wald Pr(>|W|)    
(Intercept)  8.62332  5.91992  2.122   0.1452    
Sexmale      1.64488  0.90347  3.315   0.0687 .  
Age         -0.01205  0.04802  0.063   0.8019    
Height      -0.10211  0.04239  5.801   0.0160 *  
Weight       0.04365  0.03399  1.649   0.1991    
Surfacenorm  3.91632  0.56682 47.738 4.87e-12 ***
Visiondome   0.35888  0.40403  0.789   0.3744    
Visionopen   3.17990  0.46063 47.657 5.08e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = exchangeable 
Scale is fixed.

  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha   0.2185 0.04467
Number of clusters:   40  Maximum cluster size: 12 
  • Let’s revisit the epilepsy data set.
modgeep <- geeglm(seizures ~ offset(log(timeadj)) + expind + treat + I(expind*treat),
                  id       = id, 
                  family   = poisson,
                  corstr   = "ar1", 
                  data     = epilepsy,
                  subset  = (id != 49))
summary(modgeep)

Call:
geeglm(formula = seizures ~ offset(log(timeadj)) + expind + treat + 
    I(expind * treat), family = poisson, data = epilepsy, subset = (id != 
    49), id = id, corstr = "ar1")

 Coefficients:
                  Estimate Std.err  Wald Pr(>|W|)    
(Intercept)         1.3138  0.1616 66.10  4.4e-16 ***
expind              0.1509  0.1108  1.86    0.173    
treat              -0.0797  0.1983  0.16    0.688    
I(expind * treat)  -0.3987  0.1745  5.22    0.022 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = ar1 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     10.6    2.35
  Link = identity 

Estimated Correlation Parameters:
      Estimate Std.err
alpha    0.783  0.0519
Number of clusters:   58  Maximum cluster size: 5