We model \(Y_i\) as a Binomial random variable with batch size \(m_i\) and “success” probability \(p_i\)\[
\mathbb{P}(Y_i = y_i) = \binom{m_i}{y_i} p_i^{y_i} (1 - p_i)^{m_i - y_i}.
\]
The parameter \(p_i\) is linked to the predictors \(X_1, \ldots, X_{q}\) via an inverse link function\[
p_i = \frac{e^{\eta_i}}{1 + e^{\eta_i}},
\] where \(\eta_i\) is the linear predictor or systematic component\[
\eta_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_{q} x_{iq} = \mathbf{x}_i^T \boldsymbol{\beta}
\]
The Bernoulli model in ELMR 2 is a special case with all batch sizes \(m_i = 1\).
Conversely, the Binomial model is equivalent to a Bernoulli model with \(\sum_i m_i\) observations, or a Bernoulli model with observation weights \((y_i, m_i - y_i)\).
Q1. Reformat the data to have \(n = \sum_i m_i\) rows, with the binary outcome to represent there are \(n = \sum_i m_i\) Bernoulli trials conducted.
Q2. Refitted the model using logistic regression (glm) using the reformatted data above (no weights are needed) and show it is equivalent to the Binomial model and Bernoulli model with weights.
Q3. Write out the log-likelihood for above model and show it is equivalent to the Binomial model and Bernoulli model with weights.
3 Binomial model fit
The following table shows numbers of beetles dead after five hours exposure to gaseous carbon disulphide at various concentrations.
Let \(x_i\) be dose, \(n_i\) be the number of beetles, and \(y_i\) be the number of killed. Plot the proportions \(p_i = y_i/n_i\) plotted against dose \(x_i\).
We fit a logistic model to understand the relationship between dose and the probably of being killed. Write out the logistic model and associated log-likelihood function.
Derive the scores, \(\mathbf{U}\), with respect to parameters in the above logistic model. (Hint there are two parameters)
Derive the information matrix, \(\mathcal{I}\) (Hint, a \(2\times 2\) matrix)
We use Newton-Raphson method to obtain maximum likelihood estimates (MLE). Show MLEs are obtained by solving the iterative equation
\[
\mathcal{I}^{(m-1)}\mathbf{b}^{(m)} = \mathcal{I}^{(m-1)}\mathbf{b}^{(m-1)}+ \mathbf{U}^{(m-1)}
\] where \(\mathbf{b}\) is the vector of estimates.
Starting with \(\mathbf{b}^{(0)} = 0\), implement this algorithm to show successive iterations are
Iterations
\(\beta_1\)
\(\beta_2\)
log-likelihood
0
0
0
-333.404
1
-37.856
21.337
-200.010
2
-53.853
30.384
-187.274
3
4
5
6
-60.717
34.270
-186.235
If after 6 steps, the model converged. For this final model, calculate the deviance. What is the distribution the deviance has?
Does the model fit the data well? justify your answer.