knitr::opts_chunk$set(echo = TRUE, cache = FALSE)
library(knitr)
library(ggplot2)
library(tidyverse)Diagnostic of Logistic Regression
Logistic Regression with a Quadratic Term
Suppose the linear predictor in a logistic regression model has the following form:
\[ \eta = \beta_0 + \beta_1 x_1^2 + \beta_2 x_2 \] We set \(\beta_0 = 1\), \(\beta_1 = 3\), and \(\beta_2 = 5\). We generate a dataset with \(n = 1000\) observations from this model, where \(x_1\) and \(x_2\) are independent standard normal random variables. The response variable \(y\) is generated from a Bernoulli distribution with parameter \(p_i = \frac{e^{\eta_i}}{1 + e^{\eta_i}}\). There is no noise added to the linear predictor.
### Generate data
## Set seed
set.seed(1017)
## Set parameters
n_1 <- 1000
beta_0 <- -1
beta_1 <- 3
beta_2 <- 5
## Generate data
x_1 <- rnorm(n_1)
x_2 <- rnorm(n_1)
eta <- beta_0 + beta_1 * x_1^2 + beta_2 * x_2
p <- exp(eta) / (1 + exp(eta))
y <- rbinom(n_1, 1, p)Binned deviance residuals against linear predictor \(X_1\) assuming the true model is linear
Now we fit a logistic regression model to this dataset, but we assume that the true model is linear, i.e., \(\eta = \beta_0 + \beta_1 x_1 + \beta_2 x_2\). We fit the model using the glm function.
### model the systematic component as a linear function of the predictors
linear_glm <- glm(y ~ x_1 + x_2, family = binomial)
summary(linear_glm)
Call:
glm(formula = y ~ x_1 + x_2, family = binomial)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.73599 0.09556 7.702 1.34e-14 ***
x_1 -0.03702 0.08847 -0.419 0.676
x_2 2.61156 0.16537 15.793 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1352.99 on 999 degrees of freedom
Residual deviance: 787.84 on 997 degrees of freedom
AIC: 793.84
Number of Fisher Scoring iterations: 6
## Create a data frame
data_1 <- data.frame(x_1, x_2, y, linpred = predict(linear_glm),
predprob = predict(linear_glm, type = "response"),
devres = residuals(linear_glm))
### Deviance residuals against linear predictor X_1
ggplot(data_1) +
geom_point(mapping = aes(x = x_1, y = devres)) +
labs(x = "Linear predictor X_1", y = "Deviance residuals") +
theme_minimal() +
labs(title = "Dev_res against linear predictor X_1 when model is wrong")### Binned deviance residuals against linear predictor X_1
data_1 |>
group_by(cut(x_1, breaks = unique(quantile(x_1, (1:40)/41)))) |>
summarize(devres = mean(devres), x_1 = mean(x_1)) |>
ggplot() +
geom_point(mapping = aes(x = x_1, y = devres)) +
labs(x = "Linear predictor X_1", y = "Binned deviance residual") +
theme_minimal() +
labs(title = "Binned Dev_res against linear predictor X_1 when model is wrong")From the above plot, we can see that there is an obvious quadratic relationship between the deviance residuals and the linear predictor \(x_1\). This suggests that the true model is not linear in \(x_1\).
Binned deviance residuals against the quadratic term \(X_1^2\) assuming the true model is quadratic
Now we fit a logistic regression model to this dataset, but we assume that the true model is quadratic, i.e., \(\eta = \beta_0 + \beta_1 x_1^2 + \beta_2 x_2\). We fit the model using the glm function.
### model the systematic component as a linear function of the predictors
quadratic_glm <- glm(y ~ I(x_1^2) + x_2, family = binomial)Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(quadratic_glm)
Call:
glm(formula = y ~ I(x_1^2) + x_2, family = binomial)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2621 0.1898 -6.649 2.95e-11 ***
I(x_1^2) 3.1474 0.2717 11.584 < 2e-16 ***
x_2 5.1159 0.3845 13.304 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1352.99 on 999 degrees of freedom
Residual deviance: 405.71 on 997 degrees of freedom
AIC: 411.71
Number of Fisher Scoring iterations: 8
## Create a data frame
data_2 <- data.frame(x_1, x_2, y, linpred = predict(quadratic_glm),
predprob = predict(quadratic_glm, type = "response"),
devres = residuals(quadratic_glm))
### Deviance residuals against linear predictor X_1^2
ggplot(data_2) +
geom_point(mapping = aes(x = x_1^2, y = devres)) +
labs(x = "Quadratic term X_1^2", y = "Deviance residuals") +
theme_minimal() +
labs(title = "Dev_res against quadratic term X_1^2 when model is correct")### Binned deviance residuals against linear predictor X_1^2
data_2 |>
mutate(x_1_sq = x_1^2) |>
group_by(cut(x_1_sq, breaks = unique(quantile(x_1_sq, (1:40)/41)))) |>
summarize(devres = mean(devres), x_1_sq = mean(x_1_sq)) |>
ggplot() +
geom_point(mapping = aes(x = x_1_sq, y = devres)) +
labs(x = "Quadratic term X_1^2", y = "Binned deviance residual") +
theme_minimal() +
labs(title = "Binned Dev_res against quadratic term X_1^2 when model is correct")Binned deviance residuals against fitted value \(\hat{\eta}\) assuming the true model is quadratic
### Deviance residuals against fitted value
ggplot(data_2) +
geom_point(mapping = aes(x = linpred, y = devres)) +
labs(x = "Linear predictor", y = "Deviance residuals") +
theme_minimal() +
labs(title = "Dev_res against linear predictor when model is correct")### Binned deviance residuals against fitted value
data_2 |>
group_by(cut(linpred, breaks = unique(quantile(linpred, (1:40)/41)))) |>
summarize(devres = mean(devres), linpred = mean(linpred)) |>
ggplot() +
geom_point(mapping = aes(x = linpred, y = devres)) +
labs(x = "Linear predictor", y = "Binned deviance residual") +
theme_minimal() +
labs(title = "Binned Dev_res against linear predictor when model is correct")Binned deviance residuals against fitted value \(\hat{\eta}\) assuming the true model is linear
### Deviance residuals against fitted value
ggplot(data_1) +
geom_point(mapping = aes(x = linpred, y = devres)) +
labs(x = "Linear predictor", y = "Deviance residuals") +
theme_minimal() +
labs(title = "Dev_res against linear predictor when model is wrong")### Binned deviance residuals against fitted value
data_1 |>
group_by(cut(linpred, breaks = unique(quantile(linpred, (1:40)/41)))) |>
summarize(devres = mean(devres), linpred = mean(linpred)) |>
ggplot() +
geom_point(mapping = aes(x = linpred, y = devres)) +
labs(x = "Linear predictor", y = "Binned deviance residual") +
theme_minimal() +
labs(title = "Binned Dev_res against linear predictor when model is wrong")Scatter plot of logit (binned \(Y\)) and \(X_1^2\)
data_1 |>
mutate(x_1_sq = x_1^2) |>
group_by(cut(x_1_sq, breaks = unique(quantile(x_1_sq, (1:40)/41))) ) |>
summarize(y = mean(y), x_1_sq = mean(x_1_sq)) |>
ggplot() +
geom_point(mapping = aes(x = x_1_sq, y = log(y / (1 - y))) ) +
labs(x = "X_1^2", y = "logit(Y)") +
theme_minimal() +
labs(title = "Scatter plot of logit(binned Y) and X_1^2") +
ylim(-2, 5)Scatter plot of logit (binned \(Y\)) and \(\hat \eta\)
data_1 |>
group_by(cut(linpred, breaks = unique(quantile(linpred, (1:40)/41))) ) |>
summarize(y = mean(y), linpred = mean(linpred)) |>
ggplot() +
geom_point(mapping = aes(x = linpred, y = log(y / (1 - y))) ) +
labs(x = "Linear predictor", y = "logit(Y)") +
theme_minimal() +
labs(title = "Scatter plot of logit(binned Y) and linear predictor when model is wrong") +
ylim(-3, 5)Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).
data_2 |>
group_by(cut(linpred, breaks = unique(quantile(linpred, (1:40)/41))) ) |>
summarize(y = mean(y), linpred = mean(linpred)) |>
ggplot() +
geom_point(mapping = aes(x = linpred, y = log(y / (1 - y))) ) +
labs(x = "Linear predictor", y = "logit(Y)") +
theme_minimal() +
labs(title = "Scatter plot of logit(binned Y) and linear predictor when model is correct") +
ylim(-3, 5)Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Scatter plot of logit (binned \(Y\)) and \(X_1\)
data_1 |>
group_by(cut(x_1, breaks = unique(quantile(x_1, (1:40)/41))) ) |>
summarize(y = mean(y), x_1 = mean(x_1)) |>
ggplot() +
geom_point(mapping = aes(x = x_1, y = log(y / (1 - y))) ) +
labs(x = "X_1", y = "logit(Y)") +
theme_minimal() +
labs(title = "Scatter plot of logit(binned Y) and X_1") +
ylim(-2, 5)Further exploration
Harrell, F. E., Jr. (2016). Regression modeling strategies. Springer International Publishing. link
Page 235
Several types of residuals can be computed for binary logistic model fits. Many of these residuals are used to examine the influence of individual observations on the fit. The partial residual can be used for directly assessing how each predictor should be transformed. For the ith observation, the partial residual for the \(j\)th element of \(X\) is defined by \[ r_{ij} = \hat\beta_j X_{ij} + \frac{Y_i - \hat{p}_i}{\hat{p}_i (1 - \hat{p}_i)} \] where \(X_{ij}\) is the value of the \(j\)th variable in the \(i\)th observation, \(Y_i\) is the corresponding value of the response, and \(\hat p_i\) is the predicted probability that \(Y_i=1\). A smooth plot (using, e.g. loess) of \(X_{ij}\) against \(r_{ij}\) will provide an estimate of how \(X_{j}\) should be transformed, adjusting for other \(X\)s (using their current transformations).
### Partial residual plot for linear predictor X_1
data_1 |>
mutate(partial_res = linear_glm$coefficients[[2]] * x_1 +
(y - predprob)/(predprob * ( 1 - predprob))) |>
ggplot() +
geom_point(mapping = aes(x = x_1, y = partial_res)) +
geom_smooth(mapping = aes(x = x_1, y = partial_res), method = "loess") +
theme_minimal()### Partial residual plot for quadratic term X_1^2
data_2 |>
mutate(partial_res = quadratic_glm$coefficients[[2]] * x_1^2 +
(y - predprob)/(predprob * ( 1 - predprob))) |>
ggplot() +
geom_point(mapping = aes(x = x_1^2, y = partial_res)) +
geom_smooth(mapping = aes(x = x_1^2, y = partial_res), method = "loess") +
theme_minimal()