Random Effects (ELMR Chapter 10)

Biostat 200C

Author

Dr. Jin Zhou @ UCLA

Published

May 15, 2025

Display system information and load tidyverse and faraway packages

sessionInfo()

R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.1    fastmap_1.2.0     cli_3.6.4        
 [5] tools_4.4.1       htmltools_0.5.8.1 rstudioapi_0.17.1 yaml_2.3.10      
 [9] rmarkdown_2.29    knitr_1.49        jsonlite_1.9.1    xfun_0.51        
[13] digest_0.6.37     rlang_1.1.5       evaluate_1.0.3

library(tidyverse)
library(faraway)

1 Introduction

The pulp data set contains data from an experiment to test the paper brightness depending on a shift operator.

pulp <- as_tibble(pulp) %>%
  print(n = Inf)

# A tibble: 20 × 2
   bright operator
    <dbl> <fct>   
 1   59.8 a       
 2   60   a       
 3   60.8 a       
 4   60.8 a       
 5   59.8 a       
 6   59.8 b       
 7   60.2 b       
 8   60.4 b       
 9   59.9 b       
10   60   b       
11   60.7 c       
12   60.7 c       
13   60.5 c       
14   60.9 c       
15   60.3 c       
16   61   d       
17   60.8 d       
18   60.6 d       
19   60.5 d       
20   60.5 d

Graphical summary:

pulp %>%
  ggplot(mapping = aes(x = operator, y = bright)) + 
  geom_jitter(width=0.1, height=0.0) + 
  labs(x = "Operator", y = "Bright")

Research question is whether there is any difference in the paper brightness produced by different shift operators.
We can answer the question using the classical ANOVA (analysis of variance) \[ y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \quad i = 1,\ldots,4, j = 1,\ldots,5, \] where \(\epsilon_{ij}\) are assumed to be iid \(N(0,\sigma^2)\).

options(contrasts = c("contr.sum", "contr.poly"))
lmod <- lm(bright ~ operator, data = pulp)
summary(lmod)


Call:
lm(formula = bright ~ operator, data = pulp)

Residuals:
   Min     1Q Median     3Q    Max 
-0.440 -0.195 -0.070  0.175  0.560 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 60.40000    0.07289 828.681   <2e-16 ***
operator1   -0.16000    0.12624  -1.267    0.223    
operator2   -0.34000    0.12624  -2.693    0.016 *  
operator3    0.22000    0.12624   1.743    0.101    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.326 on 16 degrees of freedom
Multiple R-squared:  0.4408,    Adjusted R-squared:  0.3359 
F-statistic: 4.204 on 3 and 16 DF,  p-value: 0.02261

The contrast contr.sum forces the coefficients for 4 operators sum to 0 (when using the dummy/one-hot coding). So the coeffiicient for the 4th operator is 0.28.

model.matrix(lmod)

   (Intercept) operator1 operator2 operator3
1            1         1         0         0
2            1         1         0         0
3            1         1         0         0
4            1         1         0         0
5            1         1         0         0
6            1         0         1         0
7            1         0         1         0
8            1         0         1         0
9            1         0         1         0
10           1         0         1         0
11           1         0         0         1
12           1         0         0         1
13           1         0         0         1
14           1         0         0         1
15           1         0         0         1
16           1        -1        -1        -1
17           1        -1        -1        -1
18           1        -1        -1        -1
19           1        -1        -1        -1
20           1        -1        -1        -1
attr(,"assign")
[1] 0 1 1 1
attr(,"contrasts")
attr(,"contrasts")$operator
[1] "contr.sum"

The factor operator is significant with p-value 0.023.

anova(lmod, test = "F")

Analysis of Variance Table

Response: bright
          Df Sum Sq Mean Sq F value  Pr(>F)  
operator   3   1.34 0.44667  4.2039 0.02261 *
Residuals 16   1.70 0.10625                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The hypothesis being tested in the ANOVA is are these four operators same in terms of the paper brightness they produce. This is called a fixed effects model because the effects \(\alpha_i\) are assumed to be fixed.
Suppose we are interested in the research question are the population of operators all same based on this data set? We will arrive a random effects model \[ y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \quad i = 1,\ldots,4, j = 1,\ldots,5, \] where we assume
- \(\alpha_i\) are iid from \(N(0, \sigma_{\alpha}^2)\),
- \(\epsilon_{ij}\) are iid \(N(0,\sigma_{\epsilon}^2)\), and
- \(\alpha_i\) and \(\epsilon_{ij}\) are jointly independent.
This is called a random effects model because now the effects \(\alpha_i\) are assumed to be random. The parameters in this random effects model are \(\mu\), \(\sigma_\alpha^2\), and \(\sigma_\epsilon^2\). The latter two are also called the variance component parameters. To test the research hypothesis, we would test \(H_0: \sigma_\alpha^2=0\).
Differences between a fixed effects model and a random effects model.
1. the interpretation is different,
2. random effects model can be more parsimonious (less parameters) than the corresponding fixed effects model,
3. observations in random effects model are correlated.
In this chapter, we study estimation and inference for mixed effects models.

2 Mixed effects model

Traditionally, linear models can be divided into three categories:
1. Fixed effects model: \(\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}\), where \(\boldsymbol{\beta}\) is fixed.
2. Random effects model: \(\mathbf{y} = \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon}\), where \(\boldsymbol{\gamma}\) is random.
3. Mixed effects model or mixed model: \(\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon}\), where \(\boldsymbol{\beta}\) is fixed and \(\boldsymbol{\gamma}\) is random.
In a mixed effects model \[\begin{eqnarray*} \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon} \end{eqnarray*}\] where
- \(\mathbf{X} \in \mathbb{R}^{n \times p}\) is a design matrix for fixed effects \(\boldsymbol{\beta} \in \mathbb{R}^p\),
- \(\mathbf{Z} \in \mathbb{R}^{n \times q}\) is a design matrix for random effects \(\boldsymbol{\gamma} \in \mathbb{R}^q\),
- The most common assumption is \(\boldsymbol{\epsilon} \sim N(\mathbf{0}_n, \sigma^2 \mathbf{I})\), \(\boldsymbol{\gamma} \sim N(\mathbf{0}_q, \boldsymbol{\Sigma})\), and \(\boldsymbol{\epsilon}\) is independent of \(\boldsymbol{\gamma}\).
Primary goal of the mixed model (aka variance components model) is to
- estimation and testing of the fixed effects \(\boldsymbol{\beta}\),
- estimation and testing of the variance component parameters, and
- prediction.

3 Example: one-way ANOVA

For the one-way ANOVA random effects model with \(a\) levels and \(n_i\) observations in level \(i\), \[ y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \quad i=1,\ldots,a, \] we recognize it as a mixed effects model \[\begin{eqnarray*} \mathbf{y} = \mathbf{1}_{\sum_i n_i} \mu + \begin{pmatrix} \mathbf{1}_{n_1} & & \\ & \vdots & \\ & & \mathbf{1}_{n_a} \end{pmatrix} \boldsymbol{\gamma} + \boldsymbol{\epsilon}, \end{eqnarray*}\] where \(\boldsymbol{\gamma} = (\alpha_1, \ldots, \alpha_a)^T \sim N(\mathbf{0}_a, \sigma_{\alpha}^2 \mathbf{I}_a)\) and \(\boldsymbol{\epsilon} \sim N(\mathbf{0}_{\sum_i n_i}, \sigma_{\epsilon}^2 \mathbf{I})\) are independent. Note in \(\mathbf{Z}\) we have one column for each level.
How do we estimate parameters \(\mu\), \(\sigma_{\alpha}^2\), and \(\sigma_{\epsilon}^2\)?

4 Estimation

4.1 ANOVA estimator

Consider the one-way ANOVA case. Assume we have a balanced design: \(n_1 = \cdots = n_a = n\). That is each level has the same number of observations \(n\). For example \(n=5\) in the pulp example.
Because \(\mathbb{E} Y_{ij} = \mu\). So we can estimate \(\mu\) by average of \(y_{ij}\)

mean(pulp$bright)

[1] 60.4

To estimate the variance component parameters \(\sigma_a^2\) and \(\sigma_\epsilon^2\), the familiar ANOVA table gives the partition \[\begin{eqnarray*} \text{SST} &=& \text{SSE} + \text{SSA} \\ \sum_{i=1}^a \sum_{j=1}^n (y_{ij} - \bar{y}_{\cdot \cdot})^2 &=& \sum_{i=1}^a \sum_{j=1}^n (y_{ij} - \bar{y}_{i \cdot})^2 + \sum_{i=1}^a \sum_{j=1}^n (\bar{y}_{i \cdot} - \bar{y}_{\cdot \cdot})^2. \end{eqnarray*}\] Now we have (show in HW5) \[\begin{eqnarray*} \mathbb{E} (\text{SSE}) &=& a(n-1) \sigma_{\epsilon}^2 \\ \mathbb{E} (\text{SSA}) &=& (a-1)(n \sigma_{\alpha}^2 + \sigma_{\epsilon}^2), \end{eqnarray*}\] which can be solved to obtain estimators \[\begin{eqnarray*} \widehat{\sigma}_{\epsilon}^2 &=& \frac{\text{SSE}}{a(n-1)} = \text{MSE}, \\ \widehat{\sigma}_{\alpha}^2 &=& \frac{\text{SSA}/(a-1) - \widehat{\sigma}_{\epsilon}^2}{n} = \frac{\text{MSA} - \text{MSE}}{n}. \end{eqnarray*}\]
For the pulp data, the ANOVA table is

(aovmod <- aov(bright ~ operator, data = pulp) %>%
  summary())

            Df Sum Sq Mean Sq F value Pr(>F)  
operator     3   1.34  0.4467   4.204 0.0226 *
Residuals   16   1.70  0.1062                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

str(aovmod)

List of 1
 $ :Classes 'anova' and 'data.frame':   2 obs. of  5 variables:
  ..$ Df     : num [1:2] 3 16
  ..$ Sum Sq : num [1:2] 1.34 1.7
  ..$ Mean Sq: num [1:2] 0.447 0.106
  ..$ F value: num [1:2] 4.2 NA
  ..$ Pr(>F) : num [1:2] 0.0226 NA
 - attr(*, "class")= chr [1:2] "summary.aov" "listof"

We estimate \(\sigma_{\alpha}^2\) by

(aovmod[1][[1]][[3]][1] - aovmod[1][[1]][[3]][2]) / 5

[1] 0.06808333

and \(\sigma_{\epsilon}^2\) by

aovmod[1][[1]][[3]][2]

[1] 0.10625

Drawbacks of ANOVA estimators.
1. When MSA<MSE, we obtain negative \(\widehat{\sigma}_{\alpha}^2\).
2. Hard to generalize to unbalanced ANOVA or more complicated designs.

4.2 Maximum likelihood estimation (MLE)

For the mixed effects model \[\begin{eqnarray*} \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon}, \end{eqnarray*}\] where \(\boldsymbol{\epsilon} \sim N(\mathbf{0}_n, \sigma^2 \mathbf{I})\) and \(\boldsymbol{\gamma} \sim N(\mathbf{0}_q, \boldsymbol{\Sigma})\) are independent of each other, we have \[ \mathbf{Y} \sim N(\mathbf{X} \boldsymbol{\beta}, \mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I}). \] So the likelihood is \[ \frac{1}{(2\pi)^{n/2} \det(\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I})^{1/2}} e^{- \frac 12 (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})^T (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I})^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})} \] and the log-likelihood is \[ \ell(\boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma^2) = - \frac n2 \log(2\pi) - \frac 12 \log \det (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I}) - \frac 12 (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})^T (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I})^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}). \]
We can maximize the log-likelihood function to obtain MLE.
Exercise (HW5): Derive the MLE for the balanced one-way ANOVA example.
One drawback of MLE is that it produces biased estimate for the variance component parameters \(\sigma^2\) and \(\boldsymbol{\Sigma}\). E.g., in the linear regression case (fixed effects model), MLE for \(\sigma^2\) is \[ \widehat{\sigma}_{\text{MLE}}^2 = \frac{\text{RSS}}{n}, \] where an unbiased estimator is \[ \widehat{\sigma}^2 = \frac{\text{RSS}}{n-p}, \] where \(p\) is the number of parameters.
Let’s find the MLE for the pulp data using lme4 package. The syntax (1 | operator) instructs that the data is grouped by operator and the 1 indicates that random effect is constant within each group. The default method is REML so we set REML = FALSE to compute the MLE instead.

library(lme4)

smod <- lmer(bright ~ 1 + (1 | operator), data = pulp, REML = FALSE)
summary(smod)

Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: bright ~ 1 + (1 | operator)
   Data: pulp

      AIC       BIC    logLik -2*log(L)  df.resid 
     22.5      25.5      -8.3      16.5        17 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.50554 -0.78116 -0.06353  0.65850  1.56232 

Random effects:
 Groups   Name        Variance Std.Dev.
 operator (Intercept) 0.04575  0.2139  
 Residual             0.10625  0.3260  
Number of obs: 20, groups:  operator, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept)  60.4000     0.1294   466.7

4.3 Restricted MLE (REML)

The restricted maximum likelihood (REML) method tries to reduce the bias in variance component estimates.
Assume \(\mathbf{X}\) has full column rank \(p\). Let \(\mathbf{K} \in \mathbb{R}^{n \times (n-p)}\) be an basis of the space \({\cal N}(\mathbf{X}^T)\), which is orthogonal to \({\cal C}(\mathbf{X})\). Then \[ \mathbf{K}^T \mathbf{Y} \sim N(\mathbf{0}_{n-p}, \mathbf{K}^T (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I}) \mathbf{K}). \] We estimate variance component parameters \((\sigma^2, \boldsymbol{\Sigma})\) by MLE using this transformed data \((\mathbf{K}^T \mathbf{y}, \mathbf{K}^T \mathbf{Z})\). Then we estimate fixed effects \(\boldsymbol{\beta}\) using general least squares. It can be shown that the REML estimate does not depend on the choice of \(\mathbf{K}\).
Let’s find the REML for the pulp data using lme4.

library(lme4)

mmod <- lmer(bright ~ 1 + (1 | operator), data = pulp, REML = TRUE)
summary(mmod)

Linear mixed model fit by REML ['lmerMod']
Formula: bright ~ 1 + (1 | operator)
   Data: pulp

REML criterion at convergence: 18.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.4666 -0.7595 -0.1244  0.6281  1.6012 

Random effects:
 Groups   Name        Variance Std.Dev.
 operator (Intercept) 0.06808  0.2609  
 Residual             0.10625  0.3260  
Number of obs: 20, groups:  operator, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept)  60.4000     0.1494   404.2

We found the REML estimate is exactly same as ANOVA estimate. Exercise (HW5): show this.

5 Inference

5.1 LRT and adjusted F test for fixed effects

If we compare two nested models that differ only in fixed effects, we can use the standard likelihood ratio test (LRT). Remember we have to use MLE (not REML) for LRT.
F tests for fixed effects need to use adjusted degrees of freedom (Kenward-Roger adjusted F-test for REML), as implemented in the pbkrtest package.

5.2 Parametric bootstrap

Testing the variance component parameters, e.g., \(H_0:\sigma_{\alpha}^2=0\), can be hard because of the boundary condition (estimator has to be nonnegative). Conventional \(\chi^2\) null distribution for LRT can be wrong. E.g., for the pulp data, LRT using the the conventional \(\chi_1^2\) null distribution gives a non-significant p-value.

nullmod <- lm(bright ~ 1, data = pulp)
(lrtstat <- as.numeric(2 * (logLik(smod, REML = FALSE) - logLik(nullmod))))

[1] 2.568371

# fishy result using LRT
pchisq(lrtstat, 1, lower.tail = FALSE)

[1] 0.1090199

The idea of parameteric bootstrap is we generate new \(y\) from the fitted null model many times. For each simulation replicate, we calculate the LRT statistic. Then the p-value is estimated by the proportion of simulation replicates that generate LRT statistics larger than the observed one \(2.5684\).

B <- 1000
lrstat <- numeric(B)
set.seed(123) # for reproducibility
for (i in 1:B) {
  by <- unlist(simulate(nullmod))
  bnull <- lm(by ~ 1)
  #balt  <- refitML(smod, by)
  balt  <- suppressMessages(lmer(by ~ 1 + (1 | operator), pulp, REML = FALSE))
  lrstat[i] <- as.numeric(2 * (logLik(balt, REML = FALSE) - logLik(bnull)))
}

Then the bootstrap p-value is

# parametric bootstrap p-value
(pval <- mean(lrstat > 2.5684))

[1] 0.019

with standard error

sqrt(pval * (1 - pval) / B)

[1] 0.004317291

Advantages of parametric boostrap:
- It’s not restricted to MLE estimators. It applies to many other estimators.
- It does not rely on the dubious asymptotic \(\chi^2\) distribution.
- It can be used to generate confidence intervals.
- It applies to inference of both \(\boldsymbol{\beta}\) and variance component parameters.
Drawbacks of parametric bootstrap: computationally intensive.

5.3 Exact LRT and RLRT test

Another simulation based method is provided by the RLRsim package.

library(RLRsim)

exactLRT(smod, nullmod)


    simulated finite sample distribution of LRT. (p-value based on 10000
    simulated values)

data:  
LRT = 2.5684, p-value = 0.0241

exactRLRT(mmod)


    simulated finite sample distribution of RLRT.
    
    (p-value based on 10000 simulated values)

data:  
RLRT = 3.4701, p-value = 0.0224

6 Estimate random effects

Consider the mixed effects model \[ \begin{eqnarray*} \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma} + \boldsymbol{\epsilon}, \end{eqnarray*} \] where \(\boldsymbol{\epsilon} \sim N(\mathbf{0}_n, \sigma^2 \mathbf{I})\) and \(\boldsymbol{\gamma} \sim N(\mathbf{0}_q, \boldsymbol{\Sigma})\) are independent of each other. The random effects \(\boldsymbol{\gamma}\) are random variables. It does not make sense to estimate \(\boldsymbol{\gamma}\).
However, if we take the Bayesian point of view, we can estimate it by its posterior mean. We have a likelihood \[ \mathbf{y} \mid \boldsymbol{\gamma} \sim N(\mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \boldsymbol{\gamma}, \sigma^2 \mathbf{I}_n) \] and a prior distribution \[ \boldsymbol{\gamma} \sim N(\mathbf{0}_q, \boldsymbol{\Sigma}). \] Then by the Bayes theorem, the posterior distribution is \[\begin{eqnarray*} f(\boldsymbol{\gamma} \mid \mathbf{y}) &=& \frac{f(\mathbf{y} \mid \boldsymbol{\gamma}) \times f(\boldsymbol{\gamma})}{f(\mathbf{y})} \\ &=& ... \end{eqnarray*}\] is a multivariate normal with mean (show in HW5) \[ \mathbb{E} (\boldsymbol{\gamma} \mid \mathbf{y}) = \boldsymbol{\Sigma} \mathbf{Z}^T (\mathbf{Z} \boldsymbol{\Sigma} \mathbf{Z}^T + \sigma^2 \mathbf{I})^{-1} (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}). \]
We can use this posterior mean to estimate random effects. For the pulp data, estimate of random effects is obtained by

ranef(mmod)$operator

  (Intercept)
a  -0.1219403
b  -0.2591231
c   0.1676679
d   0.2133955

Compare these to the coefficients from fixed effects model:

(cc <- model.tables(aov(bright ~ operator, data = pulp)))

Tables of effects

 operator 
operator
    a     b     c     d 
-0.16 -0.34  0.22  0.28

We found the estimated random effects are uniformly smaller than the fixed effects

cc[[1]]$operator / ranef(mmod)$operator

  (Intercept)
a    1.312117
b    1.312117
c    1.312117
d    1.312117

That’s why Bayesian estimates are often called the shrinkage estimator. Exercise: prove this in HW5 for the balanced one-way ANOVA example.

7 Prediction

For the observed levels (an operator we know), the best linear unbiased predictors (BLUP) is obtained by

fixef(mmod) + ranef(mmod)$operator

  (Intercept)
a    60.27806
b    60.14088
c    60.56767
d    60.61340

predict(mmod, newdata = data.frame(operator="a"))

       1 
60.27806

For a new operator, the best we can do is the intercept.

predict(mmod, re.form = ~ 0)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 60.4 
  17   18   19   20 
60.4 60.4 60.4 60.4

8 Diagnostics

Diagnostic plots for random effects usually use the residuals calculated using predicted random effects. These residuals are regarded as estimates of \(\epsilon\).
QQ-plot.

qqnorm(residuals(mmod), main="")

Residual vs fitted.

plot(fitted(mmod), residuals(mmod), xlab = "Fitted", ylab = "Residuals")
abline(h=0)

9 Blocks as random effects

In the pulp example, operator defines blocks.
In the penicillin data set, we want to study how yield depends on the treat (proccess) and blend (corn steep liquor). It is natural to treat treat as fixed effects and blend as a blocking variable (random effects).

penicillin

   treat  blend yield
1      A Blend1    89
2      B Blend1    88
3      C Blend1    97
4      D Blend1    94
5      A Blend2    84
6      B Blend2    77
7      C Blend2    92
8      D Blend2    79
9      A Blend3    81
10     B Blend3    87
11     C Blend3    87
12     D Blend3    85
13     A Blend4    87
14     B Blend4    92
15     C Blend4    89
16     D Blend4    84
17     A Blend5    79
18     B Blend5    81
19     C Blend5    80
20     D Blend5    88

Graphical summary

penicillin %>%
  ggplot() + 
  geom_point(mapping = aes(x = treat, y = yield, color = blend))

penicillin %>%
  ggplot() + 
  geom_point(mapping = aes(x = blend, y = yield, color = treat))

Classical two-way ANOVA:

op <- options(contrasts = c("contr.sum", "contr.poly"))
lmod <- aov(yield ~ blend + treat, data = penicillin)
summary(lmod)

            Df Sum Sq Mean Sq F value Pr(>F)  
blend        4    264   66.00   3.504 0.0407 *
treat        3     70   23.33   1.239 0.3387  
Residuals   12    226   18.83                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coef(lmod)

(Intercept)      blend1      blend2      blend3      blend4      treat1 
         86           6          -3          -1           2          -2 
     treat2      treat3 
         -1           3

It is reasonable to treat treat as fixed effects and blend as random effects (draws from the population of blend).

mmod <- lmer(yield ~ treat + (1 | blend), data = penicillin)
summary(mmod)

Linear mixed model fit by REML ['lmerMod']
Formula: yield ~ treat + (1 | blend)
   Data: penicillin

REML criterion at convergence: 106.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.4152 -0.5017 -0.1644  0.6830  1.2836 

Random effects:
 Groups   Name        Variance Std.Dev.
 blend    (Intercept) 11.79    3.434   
 Residual             18.83    4.340   
Number of obs: 20, groups:  blend, 5

Fixed effects:
            Estimate Std. Error t value
(Intercept)   86.000      1.817  47.341
treat1        -2.000      1.681  -1.190
treat2        -1.000      1.681  -0.595
treat3         3.000      1.681   1.785

Correlation of Fixed Effects:
       (Intr) treat1 treat2
treat1  0.000              
treat2  0.000 -0.333       
treat3  0.000 -0.333 -0.333

options(op)

Random effect estimates are shrunken version of fixed effect estimates.

ranef(mmod)$blend

       (Intercept)
Blend1   4.2878788
Blend2  -2.1439394
Blend3  -0.7146465
Blend4   1.4292929
Blend5  -2.8585859

9.1 Test fixed effects

Kenward-Roger adjusted F-tests for REML.

library(pbkrtest)

amod_reml <- lmer(yield ~ treat + (1 | blend), data = penicillin, REML = TRUE)
nmod_reml <- lmer(yield ~ 1 + (1 | blend), data = penicillin, REML = TRUE)
KRmodcomp(amod_reml, nmod_reml)

large : yield ~ treat + (1 | blend)
small : yield ~ 1 + (1 | blend)
         stat     ndf     ddf F.scaling p.value
Ftest  1.2389  3.0000 12.0000         1  0.3387

LRT using \(\chi^2\) null distribution.

amod_mle <- lmer(yield ~ treat + (1 | blend), data = penicillin, REML = FALSE)
nmod_mle <- lmer(yield ~ 1 + (1 | blend), data = penicillin, REML = FALSE)
as.numeric(2 * (logLik(amod_mle, REML = FALSE) - logLik(nmod_mle, REML = FALSE)))

[1] 4.047367

The \(\chi_3^2\) approximation gives p-value

pchisq(4.0474, 3, lower.tail = FALSE)

[1] 0.2563911

Parametric bootstrap. Now we use parametric bootstrap to obtain a p-value for the LRT.

set.seed(123)
B <- 1000
lrstat <- numeric(B)
for (i in 1:B) {
  ryield <- unlist(simulate(nmod_mle))
  nmodr <- suppressMessages(lmer(ryield ~ 1 + (1 | blend), data = penicillin, REML = FALSE))
  amodr <- suppressMessages(lmer(ryield ~ treat + (1 | blend), data = penicillin, REML = FALSE))
  lrstat[i] <- 2 * (logLik(amodr, REML = FALSE) - logLik(nmodr, REML = FALSE))
}

The bootstrap p-value is

# p-value estimated by parametric bootstrap
(pval <- mean(lrstat > 4.0474))

[1] 0.352

with standard error

sqrt(pval * (1 - pval) / B)

[1] 0.01510285

The pbkrtest packages automate this parametric boostrap procedure (PBtest) along with LRT and other tests for fixed effects. LRT and parametric bootstrap results are similar to what we got. Note the F test here is the LRT divided by the degrees of freedom assumed to be F-distributed.

library(pbkrtest)

set.seed(123)
PBmodcomp(amod_reml, nmod_reml) %>%
  summary()

Bootstrap test; time: 6.54 sec; samples: 1000; extremes: 353;
large : yield ~ treat + (1 | blend)
yield ~ 1 + (1 | blend)
           stat     df    ddf p.value
LRT      4.0474 3.0000         0.2564
PBtest   4.0474                0.3536
Gamma    4.0474                0.3496
Bartlett 3.2970 3.0000         0.3481
F        1.3491 3.0000 2.7455  0.4153

9.2 Test random effects

Let’s test the blend random effects by parametric bootstrap. First calculate LRT statistic.

rmod <- lmer(yield ~ treat + (1 | blend), data = penicillin, REML = FALSE)
nlmod <- lm(yield ~ treat, data = penicillin)
as.numeric(2 * (logLik(rmod, REML = FALSE) - logLik(nlmod)))

[1] 3.453634

Parametric boostrap:

B <- 1000
lrstatf <- numeric(B)
for (i in 1:B) {
  ryield <- unlist(simulate(nlmod))  
  nlmodr <- lm(ryield ~ treat, data = penicillin)
  rmodr <- suppressMessages(lmer(ryield ~ treat + (1 | blend), data = penicillin, REML = FALSE))
  lrstatf[i] <- 2 * (logLik(rmodr, REML = FALSE) - logLik(nlmodr))
}
mean(lrstatf > 3.453634)

[1] 0.046

The exact LRT simulation method.

library(RLRsim)

exactLRT(rmod, nlmod)


    simulated finite sample distribution of LRT. (p-value based on 10000
    simulated values)

data:  
LRT = 3.4536, p-value = 0.0412

10 Other designs

More complicated designs discussed in ELMR (split plots, cross effects, multi-level model) are not covered in this course.