Bonus question #2

Biostat 200C

Author

Dr. Jin Zhou @ UCLA

Published

April 21, 2025

rm(list=ls())
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.3.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.4.1    fastmap_1.2.0     cli_3.6.4        
 [5] tools_4.4.1       htmltools_0.5.8.1 rstudioapi_0.17.1 yaml_2.3.10      
 [9] rmarkdown_2.29    knitr_1.49        jsonlite_1.9.1    xfun_0.51        
[13] digest_0.6.37     rlang_1.1.5       evaluate_1.0.3   
library(tidyverse)
library(faraway)
library(gtsummary)

1 Binomial model

  • We model \(Y_i\) as a Binomial random variable with batch size \(m_i\) and “success” probability \(p_i\) \[ \mathbb{P}(Y_i = y_i) = \binom{m_i}{y_i} p_i^{y_i} (1 - p_i)^{m_i - y_i}. \]

  • The parameter \(p_i\) is linked to the predictors \(X_1, \ldots, X_{q}\) via an inverse link function \[ p_i = \frac{e^{\eta_i}}{1 + e^{\eta_i}}, \] where \(\eta_i\) is the linear predictor or systematic component \[ \eta_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_{q} x_{iq} = \mathbf{x}_i^T \boldsymbol{\beta} \]

  • The log-likelihood is \[\begin{eqnarray*} \ell(\boldsymbol{\beta}) &=& \sum_{i=1}^n \left[ y_i \log p_i + (m_i - y_i) \log (1 - p_i) + \log \binom{m_i}{y_i} \right] \\ &=& \sum_{i=1}^n \left[ y_i \eta_i - m_i \log ( 1 + e^{\eta_i}) + \log \binom{m_i}{y_i} \right] \\ &=& \sum_{i=1}^n \left[ y_i \cdot \mathbf{x}_i^T \boldsymbol{\beta} - m_i \log ( 1 + e^{\mathbf{x}_i^T \boldsymbol{\beta}}) + \log \binom{m_i}{y_i} \right]. \end{eqnarray*}\]

2 Binomial model vs Binomial model

  • The Bernoulli model in ELMR 2 is a special case with all batch sizes \(m_i = 1\).

  • Conversely, the Binomial model is equivalent to a Bernoulli model with \(\sum_i m_i\) observations, or a Bernoulli model with observation weights \((y_i, m_i - y_i)\).

  • Q1. Reformat the data to have \(n = \sum_i m_i\) rows, with the binary outcome to represent there are \(n = \sum_i m_i\) Bernoulli trials conducted.

  • Q2. Refitted the model using logistic regression (glm) using the reformatted data above (no weights are needed) and show it is equivalent to the Binomial model and Bernoulli model with weights.

  • Q3. Write out the log-likelihood for above model and show it is equivalent to the Binomial model and Bernoulli model with weights.

3 Binomial model fit

The following table shows numbers of beetles dead after five hours exposure to gaseous carbon disulphide at various concentrations.

(beetle <- tibble(dose = c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839),
                 beetles = c(59, 60, 62, 56, 63, 59, 62, 60),
                 killed = c(6, 13, 18, 28, 52, 53, 61, 60)))
# A tibble: 8 × 3
   dose beetles killed
  <dbl>   <dbl>  <dbl>
1  1.69      59      6
2  1.72      60     13
3  1.76      62     18
4  1.78      56     28
5  1.81      63     52
6  1.84      59     53
7  1.86      62     61
8  1.88      60     60
  1. Let \(x_i\) be dose, \(n_i\) be the number of beetles, and \(y_i\) be the number of killed. Plot the proportions \(p_i = y_i/n_i\) plotted against dose \(x_i\).

  2. We fit a logistic model to understand the relationship between dose and the probably of being killed. Write out the logistic model and associated log-likelihood function.

  3. Derive the scores, \(\mathbf{U}\), with respect to parameters in the above logistic model. (Hint there are two parameters)

  4. Derive the information matrix, \(\mathcal{I}\) (Hint, a \(2\times 2\) matrix)

  5. We use Newton-Raphson method to obtain maximum likelihood estimates (MLE). Show MLEs are obtained by solving the iterative equation

\[ \mathcal{I}^{(m-1)}\mathbf{b}^{(m)} = \mathcal{I}^{(m-1)}\mathbf{b}^{(m-1)}+ \mathbf{U}^{(m-1)} \] where \(\mathbf{b}\) is the vector of estimates.

  1. Starting with \(\mathbf{b}^{(0)} = 0\), implement this algorithm to show successive iterations are
Iterations \(\beta_1\) \(\beta_2\) log-likelihood
0 0 0 -333.404
1 -37.856 21.337 -200.010
2 -53.853 30.384 -187.274
3
4
5
6 -60.717 34.270 -186.235
  • If after 6 steps, the model converged. For this final model, calculate the deviance. What is the distribution the deviance has?

  • Does the model fit the data well? justify your answer.