Display system information and load tidyverse
and
faraway
packages
sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.31 R6_2.5.1 jsonlite_1.8.4 evaluate_0.21
## [5] cachem_1.0.8 rlang_1.1.1 cli_3.6.1 rstudioapi_0.14
## [9] jquerylib_0.1.4 bslib_0.4.2 rmarkdown_2.21 tools_4.2.3
## [13] xfun_0.39 yaml_2.3.7 fastmap_1.1.1 compiler_4.2.3
## [17] htmltools_0.5.5 knitr_1.42 sass_0.4.6
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(faraway)
A saturated model
Let \(L(\mathbf{b})\) be the maximum value of the likelihood function fo the model of interest.
Then the likelihood ratio is \[ \lambda = \frac{L(\mathbf{b}_{max})}{L(\mathbf{b})} \] provides a way of assessing the goodness of fit for the model.
In terms of log-likelihood \[ \log \lambda = \ell(\mathbf{b}_{max})- \ell(\mathbf{b}) \]
\[ D = 2 \log \frac{L (\mathbf{b}_{max})}{L(\mathbf{b})} = 2[\ell(\mathbf{b}_{max})- \ell(\mathbf{b})] \]
In linear models with normality assumption, for \(i=1,\ldots, n\) \[ \mbox{E}(Y_i) = \mu_i = \mathbf{x}_i'\boldsymbol{\beta}; \quad Y_i \sim N(\mu_i, \sigma^2) \] where \(Y_i\) are independent. The log-likelihood function is \[ \ell(\mathbf{\beta}) = -\frac{1}{2\sigma^2}\sum_{i=1}^n(y_i-\mu_i)^2-\frac{n}{2}\log(2\pi\sigma^2) \]
For a saturated model, we use \(y_i\) as the predicted value. Therefore \[ \ell(\mathbf{b}_{max}) = -\frac{n}{2}\log(2\pi \sigma^2) \]
For a proposed model with \(p<n\) parameters, the maximum likelihood estimators are \[ b = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y} \]
Therefore the Deviance is \[ D = 2(\ell(\mathbf{b}_{max}) - \ell(\mathbf{b})) = -\sum_{i=1}^n \frac{1}{\sigma^2}(y_i -\mathbf{X}_i' \mathbf{b})^2 = \frac{1}{\sigma^2} (\mathbf{y}-\mathbf{X}\mathbf{b})'(\mathbf{y}-\mathbf{X}\mathbf{b}) \] And we know \[ \mathbf{y}-\mathbf{X}\mathbf{b} = [\mathbf{I}-\mathbf{H}]\mathbf{y} \] where \(H = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\) and \[ (\mathbf{y}-\mathbf{X}\mathbf{b})'(\mathbf{y}-\mathbf{X}\mathbf{b}) = \mathbf{y}'(\mathbf{I}-\mathbf{H})\mathbf{y} \] Since the rank of \(\mathbf{I}-\mathbf{H}\) is \(n-p\), \(D\) has a chi-squared distribution with \(n-p\) degrees of freedom and non-centrality parameter \((\mathbf{X}\boldsymbol{\beta})'(\mathbf{I}-\mathbf{H})(\mathbf{X}\boldsymbol{\beta})/\sigma^2\). But \((\mathbf{I}-\mathbf{H})\mathbf{X} = 0\), therefore \(D\) has the central distribution \(\chi^2(n-p)\) exactly.
The term scaled deviance is sometimes used for \[ \sigma^2D = \sum_{i=1}^n(y_i-\widehat{\mu}_i)^2 \] where \(\widehat{\mu}_i= \mathbf{X}_i' \mathbf{b}\). So if the model fits data well, then \(D\sim \chi^2(n-p)\). The expected value for a random variable with \(\chi^2(n-p)\) distribution is \(n-p\), so the expected value of \(D\) is \(n-p\).
This provides an estimate of \(\sigma^2\) as \[ \widehat{\sigma}^2 = \frac{\sum_{i=1}^{n}(y_i-\widehat{\mu}_i)^2}{n-p} \] or in our lecture note, we said: “An unbiased estimate of the error variance \(\sigma^2\) is” \[ \widehat{\sigma} = \sqrt{\frac{\text{RSS}}{\text{df}}} \] Some program output the scaled deviance for a Normal linear model and call \(\widehat{\sigma}^2\) the scale parameter.
The deviance is also related to the sume of squares of the standardized residulas \[ \sum_{i=1}^m r_i^2 = \frac{1}{\widehat{\sigma}^2}\sum_{i=1}^n(y_i-\widehat{\mu}_i)^2 \] where \(\widehat{\sigma}^2\) is an estimate of \(\sigma^2\). This provides a rough rule of thumb for the overall magnitude of the standardized residuals. If the model fits well so that \(D\sim \chi^2(n-p)\), you would expect \(\sum_{i=1}^m r_i^2 = n-p\), approximately.
gavote <- gavote %>%
mutate(undercount = (ballots - votes) / ballots) %>%
rename(usage = rural) %>%
mutate(pergore = gore / votes,
perbush = bush / votes)
lmod <- lm(undercount ~ pergore + perAA, gavote)
sumlmod <- summary(lmod)
deviance
, is \(\sigma^2D\)RSS <- sum(residuals(sumlmod)^2)
RSS
## [1] 0.09324918
deviance(lmod)
## [1] 0.09324918
lm
fitsigma = sqrt(RSS/df.residual(lmod))
sigma
## [1] 0.02444895
sumlmod$sigma
## [1] 0.02444895
If the response \(Y_1, \ldots, Y_n\) are independent and \(Y_i\sim \text{Bin}(m_i, p_i)\), then the log-likelihood is \[ \ell(\boldsymbol{\beta}) = \sum_{i=1}^n \left[y_i\log p_i - y_i\log(1-p_i) + m_i\log(1-p_i) + \log {m_i\choose y_i}\right] \]
For a saturated model, the \(y_i\)’s are independent. \(p_i\in (0,1)\) has no restriction. There are \(n\) parameters, i.e., \(\boldsymbol{\beta} = (p_1,\ldots, p_n)'\). MLE is \(\mathbf{b}_{max} = y_i/m_i\)
For a proposed logistic model, the \(y_i\)’s are independent and \[ p_i = \frac{e^{\mathbf{x_i}'\boldsymbol{\beta}}}{1+e^{\mathbf{x_i}'\boldsymbol{\beta}}} \] Therefore, there are \(q+1\) total number of parameters. And the MLE is \[ \hat p_i = \frac{e^{\mathbf{x_i}'\mathbf{b}}}{1+e^{\mathbf{x_i}'\mathbf{b}}} \]
So the deviance is
\[\begin{eqnarray*}
D &=& 2 \sum_i y_i \log(y_i/m_i) + (m_i - y_i) \log(1 - y_i /
m_i) \\
& & - 2 \sum_i y_i \log(\widehat{p}_i) + (m_i - y_i) \log(1 -
\widehat{p}_i) \\
&=& 2 \sum_i y_i \log(y_i / \widehat{y}_i) + (m_i - y_i)
\log(m_i - y_i)/(m_i - \widehat{y}_i),
\end{eqnarray*}\] where \(\widehat{y}_i\) are the fitted values from
the model.
When \(Y\) is truely binomial and \(m_i\) are relatively large, the deviance \(D\) is approximately \(\chi_{n-q-1}^2\) if the model is correct. A rule of thumb to use this asymptotic approximation is \(m_i \ge 5\). The large p-value indicates that the model has an adequate fit.
We can define the Pearson residuals as \[ r_i^{\text{P}} = \frac{y_i - n_i \widehat{p}_i}{\sqrt{\operatorname{Var} \widehat{y}_i}}. \] Then \[ X^2 = \sum_i \left(r_i^{\text{P}} \right)^2 \] in analogy to \(\text{RSS} = \sum_i r_i^2\) in linear regression.
Or we can define deviance residual similar in Binary outcome models \[ r_{dres_i} = \text{sign}(y_i - m_i\widehat{p}_i)\sqrt{y_i\ln\left(\frac{y}{m_i\widehat{p}_i}\right) + (m_i-y_i)\ln\left(\frac{m_i-y_i}{m_i-m_i\widehat{p}_i}\right)} \]
binmod <- glm(cbind(damage, 6 - damage) ~ temp, family = binomial, data = orings)
sumbinmod <- summary(binmod)
sumbinmod
##
## Call:
## glm(formula = cbind(damage, 6 - damage) ~ temp, family = binomial,
## data = orings)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9529 -0.7345 -0.4393 -0.2079 1.9565
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 11.66299 3.29626 3.538 0.000403 ***
## temp -0.21623 0.05318 -4.066 4.78e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 38.898 on 22 degrees of freedom
## Residual deviance: 16.912 on 21 degrees of freedom
## AIC: 33.675
##
## Number of Fisher Scoring iterations: 6
tibble(rperson = residuals(binmod, type = "pearson"),
rdeviance = residuals(binmod, type = "deviance")) %>%
ggplot() +
geom_point(mapping = aes(x = rperson, y =rdeviance)) +
geom_abline(intercept = 0, slope = 1) +
xlab("Pearson residual") +
ylab("Deviance residual")