rm (list= ls ())
sessionInfo ()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.3.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.4 compiler_4.4.1 fastmap_1.2.0 cli_3.6.4
[5] tools_4.4.1 htmltools_0.5.8.1 rstudioapi_0.17.1 yaml_2.3.10
[9] rmarkdown_2.29 knitr_1.49 jsonlite_1.9.1 xfun_0.51
[13] digest_0.6.37 rlang_1.1.5 evaluate_1.0.3
library (tidyverse)
library (faraway)
library (gtsummary)
Binomial model
We model \(Y_i\) as a Binomial random variable with batch size \(m_i\) and “success” probability \(p_i\) \[
\mathbb{P}(Y_i = y_i) = \binom{m_i}{y_i} p_i^{y_i} (1 - p_i)^{m_i - y_i}.
\]
The parameter \(p_i\) is linked to the predictors \(X_1, \ldots, X_{q}\) via an inverse link function \[
p_i = \frac{e^{\eta_i}}{1 + e^{\eta_i}},
\] where \(\eta_i\) is the linear predictor or systematic component \[
\eta_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_{q} x_{iq} = \mathbf{x}_i^T \boldsymbol{\beta}
\]
The log-likelihood is \[\begin{eqnarray*}
\ell(\boldsymbol{\beta}) &=& \sum_{i=1}^n \left[ y_i \log p_i + (m_i - y_i) \log (1 - p_i) + \log \binom{m_i}{y_i} \right] \\
&=& \sum_{i=1}^n \left[ y_i \eta_i - m_i \log ( 1 + e^{\eta_i}) + \log \binom{m_i}{y_i} \right] \\
&=& \sum_{i=1}^n \left[ y_i \cdot \mathbf{x}_i^T \boldsymbol{\beta} - m_i \log ( 1 + e^{\mathbf{x}_i^T \boldsymbol{\beta}}) + \log \binom{m_i}{y_i} \right].
\end{eqnarray*}\]
The Bernoulli model in ELMR 2 is a special case with all batch sizes \(m_i = 1\) .
Conversely, the Binomial model is equivalent to a Bernoulli model with \(\sum_i m_i\) observations, or a Bernoulli model with observation weights \((y_i, m_i - y_i)\) .
The Binomial model is an equivalent (weighted) Bernoulli model.
obs_wt = c (rbind (orings$ damage, 6 - orings$ damage))
(orings_long <- orings %>%
slice (rep (1 : n (), each = 2 )) %>% # replicate each row twice
mutate (damage = rep (c (1 , 0 ), 23 )) %>%
mutate (obs_wt = obs_wt) %>%
mutate (rn = row_number ()) %>%
rowwise () %>%
slice (rep (1 , obs_wt)) %>%
select (- rn))
# A tibble: 138 × 3
# Rowwise:
temp damage obs_wt
<dbl> <dbl> <dbl>
1 53 1 5
2 53 1 5
3 53 1 5
4 53 1 5
5 53 1 5
6 53 0 1
7 57 1 1
8 57 0 5
9 57 0 5
10 57 0 5
# ℹ 128 more rows
glm (damage ~ temp, family = binomial, data = orings_long) %>%
tbl_regression (intercept = TRUE )
(Intercept)
12
5.6, 19
<0.001
temp
-0.22
-0.33, -0.12
<0.001
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
We write out the log-likelihood for this reformatted data with binary outcome. Given \(n = \sum_{i} m_i\) (where \(m_i\) is the batch size) data points \((y_i, \mbox{temp}_i)\) , \(i=1,\ldots,n\) , the log-likelihood is \[\begin{eqnarray*}
\ell(\boldsymbol{\beta}) &=& \sum_i \log \left[p_i^{y_i} (1 - p_i)^{1 - y_i}\right] \\
&=& \sum_i \left[ y_i \log p_i + (1 - y_i) \log (1 - p_i) \right] \\
&=& \sum_i \left[ y_i \log \frac{e^{\eta_i}}{1 + e^{\eta_i}} + (1 - y_i) \log \frac{1}{1 + e^{\eta_i}} \right] \\
&=& \sum_i \left[ y_i \eta_i - \log (1 + e^{\eta_i}) \right] \\
&=& \sum_i \left[ y_i \cdot (\beta_0 + \beta_1 \mbox{temp}_i) - \log (1 + e^{\beta_0 + \beta_1 \mbox{temp}_i}) \right].
\end{eqnarray*}\]
Therefore,
\[\begin{eqnarray*}
\ell(\boldsymbol{\beta})
&=& \sum_i \left[ y_i \cdot (\beta_0 + \beta_1 \mbox{temp}_i) - \log (1 + e^{\beta_0 + \beta_1 \mbox{temp}_i}) \right]\\
&=& \mbox{obs\_wt}_1 * \left[ \mathbf{1} \cdot (\beta_0 + \beta_1 \mbox{temp}_1) - \log (1 + e^{\beta_0 + \beta_1 \mbox{temp}_1}) \right] + \\
&& \mbox{obs\_wt}_2 * \left[ \mathbf{0} \cdot (\beta_0 + \beta_1 \mbox{temp}_1) - \log (1 + e^{\beta_0 + \beta_1 \mbox{temp}_1}) \right] + \\
&& \cdots + \\
&& \mbox{obs\_wt}_{45} * \left[ \mathbf{1} \cdot (\beta_0 + \beta_1 \mbox{temp}_{46}) - \log (1 + e^{\beta_0 + \beta_1 \mbox{temp}_{45}}) \right]\\
&& \mbox{obs\_wt}_{46} * \left[ \mathbf{0} \cdot (\beta_0 + \beta_1 \mbox{temp}_{46}) - \log (1 + e^{\beta_0 + \beta_1 \mbox{temp}_{46}}) \right].
\end{eqnarray*}\]