Title: | Influence Diagnostics in Statistical Models |
---|---|
Description: | Set of routines for influence diagnostics by using case-deletion in ordinary least squares, ridge estimation [Walker and Birch (1988). <doi:10.1080/00401706.1988.10488370>] and least absolute deviations (LAD) regression [Sun and Wei (2004). <doi:10.1016/j.spl.2003.08.018>]. |
Authors: | Felipe Osorio [aut, cre] |
Maintainer: | Felipe Osorio <[email protected]> |
License: | GPL-3 |
Version: | 0.1 |
Built: | 2024-10-30 04:29:57 UTC |
Source: | https://github.com/cran/india |
Cook's distance is a measure to assess the influence of the ith observation on the model parameter estimates. This function computes the Cook's distance based on leave-one-out cases deletion for ordinary least squares, lad and ridge regression.
## S3 method for class 'lad' cooks.distance(model, ...) ## S3 method for class 'ols' cooks.distance(model, ...) ## S3 method for class 'ridge' cooks.distance(model, type = "cov", ...)
## S3 method for class 'lad' cooks.distance(model, ...) ## S3 method for class 'ols' cooks.distance(model, ...) ## S3 method for class 'ridge' cooks.distance(model, type = "cov", ...)
model |
|
type |
only required for |
... |
further arguments passed to or from other methods. |
A vector whose ith element contains the Cook's distance,
for , with
a positive definite matrix and
. Specific
choices of
and
are done for objects of class
ols
, lad
and
ridge
.
Cook, R.D., Weisberg, S. (1980). Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics 22, 495-508. doi:10.1080/00401706.1980.10486199
Cook, R.D., Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, London.
Sun, R.B., Wei, B.C. (2004). On influence assessment for LAD regression. Statistics & Probability Letters 67, 97-110. doi:10.1016/j.spl.2003.08.018.
Walker, E., Birch, J.B. (1988). Influence measures in ridge regression. Technometrics 30, 221-227. doi:10.1080/00401706.1988.10488370
# Cook's distances for linear regression fm <- ols(stack.loss ~ ., data = stackloss) CD <- cooks.distance(fm) plot(CD, ylab = "Cook's distances", ylim = c(0,0.8)) text(21, CD[21], label = as.character(21), pos = 3) # Cook's distances for LAD regression fm <- lad(stack.loss ~ ., data = stackloss) CD <- cooks.distance(fm) plot(CD, ylab = "Cook's distances", ylim = c(0,0.4)) text(17, CD[17], label = as.character(17), pos = 3) # Cook's distances for ridge regression data(portland) fm <- ridge(y ~ ., data = portland) CD <- cooks.distance(fm) plot(CD, ylab = "Cook's distances", ylim = c(0,0.5)) text(8, CD[8], label = as.character(8), pos = 3)
# Cook's distances for linear regression fm <- ols(stack.loss ~ ., data = stackloss) CD <- cooks.distance(fm) plot(CD, ylab = "Cook's distances", ylim = c(0,0.8)) text(21, CD[21], label = as.character(21), pos = 3) # Cook's distances for LAD regression fm <- lad(stack.loss ~ ., data = stackloss) CD <- cooks.distance(fm) plot(CD, ylab = "Cook's distances", ylim = c(0,0.4)) text(17, CD[17], label = as.character(17), pos = 3) # Cook's distances for ridge regression data(portland) fm <- ridge(y ~ ., data = portland) CD <- cooks.distance(fm) plot(CD, ylab = "Cook's distances", ylim = c(0,0.5)) text(8, CD[8], label = as.character(8), pos = 3)
Computes leverage measures from a fitted model object.
leverages(model, ...) ## S3 method for class 'lm' leverages(model, infl = lm.influence(model, do.coef = FALSE), ...) ## S3 method for class 'ols' leverages(model, ...) ## S3 method for class 'ridge' leverages(model, ...) ## S3 method for class 'ols' hatvalues(model, ...) ## S3 method for class 'ridge' hatvalues(model, ...)
leverages(model, ...) ## S3 method for class 'lm' leverages(model, infl = lm.influence(model, do.coef = FALSE), ...) ## S3 method for class 'ols' leverages(model, ...) ## S3 method for class 'ridge' leverages(model, ...) ## S3 method for class 'ols' hatvalues(model, ...) ## S3 method for class 'ridge' hatvalues(model, ...)
model |
|
infl |
influence structure as returned by |
... |
further arguments passed to or from other methods. |
A vector containing the diagonal of the prediction (or ‘hat’) matrix.
For linear regression (i.e., for "lm"
or "ols"
objects) the prediction matrix assumes
the form
in which case, for
. Whereas
for ridge regression, the prediction matrix is given by
where represents the ridge parameter. Thus, the diagonal elements of
,
are
,
.
This function never creates the prediction matrix and only obtains its diagonal elements from
the singular value decomposition of .
Function hatvalues
only is a wrapper for function leverages
.
Chatterjee, S., Hadi, A.S. (1988). Sensivity Analysis in Linear Regression. Wiley, New York.
Cook, R.D., Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, London.
Walker, E., Birch, J.B. (1988). Influence measures in ridge regression. Technometrics 30, 221-227. doi:10.1080/00401706.1988.10488370.
# Leverages for linear regression fm <- ols(stack.loss ~ ., data = stackloss) lev <- leverages(fm) cutoff <- 2 * mean(lev) plot(lev, ylab = "Leverages", ylim = c(0,0.45)) abline(h = cutoff, lty = 2, lwd = 2, col = "red") text(17, lev[17], label = as.character(17), pos = 3) # Leverages for ridge regression data(portland) fm <- ridge(y ~ ., data = portland) lev <- leverages(fm) cutoff <- 2 * mean(lev) plot(lev, ylab = "Leverages", ylim = c(0,0.7)) abline(h = cutoff, lty = 2, lwd = 2, col = "red") text(10, lev[10], label = as.character(10), pos = 3)
# Leverages for linear regression fm <- ols(stack.loss ~ ., data = stackloss) lev <- leverages(fm) cutoff <- 2 * mean(lev) plot(lev, ylab = "Leverages", ylim = c(0,0.45)) abline(h = cutoff, lty = 2, lwd = 2, col = "red") text(17, lev[17], label = as.character(17), pos = 3) # Leverages for ridge regression data(portland) fm <- ridge(y ~ ., data = portland) lev <- leverages(fm) cutoff <- 2 * mean(lev) plot(lev, ylab = "Leverages", ylim = c(0,0.7)) abline(h = cutoff, lty = 2, lwd = 2, col = "red") text(10, lev[10], label = as.character(10), pos = 3)
Compute the likelihood displacement influence measure based on leave-one-out cases deletion for linear models, lad and ridge regression.
logLik.displacement(model, ...) ## S3 method for class 'lm' logLik.displacement(model, pars = "full", ...) ## S3 method for class 'ols' logLik.displacement(model, pars = "full", ...) ## S3 method for class 'lad' logLik.displacement(model, method = "quasi", pars = "full", ...) ## S3 method for class 'ridge' logLik.displacement(model, pars = "full", ...)
logLik.displacement(model, ...) ## S3 method for class 'lm' logLik.displacement(model, pars = "full", ...) ## S3 method for class 'ols' logLik.displacement(model, pars = "full", ...) ## S3 method for class 'lad' logLik.displacement(model, method = "quasi", pars = "full", ...) ## S3 method for class 'ridge' logLik.displacement(model, pars = "full", ...)
model |
|
pars |
should be considered the whole vector of parameters ( |
method |
only required for |
... |
further arguments passed to or from other methods. |
A vector whose ith element contains the distance between the likelihood functions,
for pars = "full"
, where and
denote the estimates of
and
when the ith observation is
removed from the dataset. If we are interested only in
(i.e.
pars = "coef"
)
the likelihood displacement becomes
Cook, R.D., Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, London.
Cook, R.D., Pena, D., Weisberg, S. (1988). The likelihood displacement: A unifying principle for influence measures. Communications in Statistics - Theory and Methods 17, 623-640. doi:10.1080/03610928808829645.
Elian, S.N., Andre, C.D.S., Narula, S.C. (2000). Influence measure for the L1 regression. Communications in Statistics - Theory and Methods 29, 837-849. doi:10.1080/03610920008832518.
Sun, R.B., Wei, B.C. (2004). On influence assessment for LAD regression. Statistics & Probability Letters 67, 97-110. doi:10.1016/j.spl.2003.08.018.
# Likelihood displacement for linear regression fm <- ols(stack.loss ~ ., data = stackloss) LD <- logLik.displacement(fm) plot(LD, ylab = "Likelihood displacement", ylim = c(0,9)) text(21, LD[21], label = as.character(21), pos = 3) # Likelihood displacement for LAD regression fm <- lad(stack.loss ~ ., data = stackloss) LD <- logLik.displacement(fm) plot(LD, ylab = "Likelihood displacement", ylim = c(0,1.5)) text(17, LD[17], label = as.character(17), pos = 3) # Likelihood displacement for ridge regression data(portland) fm <- ridge(y ~ ., data = portland) LD <- logLik.displacement(fm) plot(LD, ylab = "Likelihood displacement", ylim = c(0,4)) text(8, LD[8], label = as.character(8), pos = 3)
# Likelihood displacement for linear regression fm <- ols(stack.loss ~ ., data = stackloss) LD <- logLik.displacement(fm) plot(LD, ylab = "Likelihood displacement", ylim = c(0,9)) text(21, LD[21], label = as.character(21), pos = 3) # Likelihood displacement for LAD regression fm <- lad(stack.loss ~ ., data = stackloss) LD <- logLik.displacement(fm) plot(LD, ylab = "Likelihood displacement", ylim = c(0,1.5)) text(17, LD[17], label = as.character(17), pos = 3) # Likelihood displacement for ridge regression data(portland) fm <- ridge(y ~ ., data = portland) LD <- logLik.displacement(fm) plot(LD, ylab = "Likelihood displacement", ylim = c(0,4)) text(8, LD[8], label = as.character(8), pos = 3)
This dataset comes from an experimental investigation of the heat evolved during the setting and hardening of Portland cements of varied composition and the dependence of this heat on the percentages of four compounds in the clinkers from which the cement was produced.
data(portland)
data(portland)
A data frame with 13 observations on the following 5 variables.
The heat evolved after 180 days of curing, measured in calories per gram of cement.
Tricalcium aluminate.
Tricalcium silicate.
Tetracalcium aluminoferrite.
-dicalcium silicate.
Kaciranlar, S., Sakallioglu, S., Akdeniz, F., Styan, G.P.H., Werner, H.J. (1999). A new biased estimator in linear regression and a detailed analysis of the widely-analysed dataset on Portland cement. Sankhya, Series B 61, 443-459.
Compute the relative condition index to identify collinearity-influential points in linear models.
relative.condition(x)
relative.condition(x)
x |
the model matrix |
To assess the influence of the ith row of on the condition index of
,
Hadi (1988) proposed the relative change,
for , where
and
denote the (scaled) condition index for
and
, respectively.
Chatterjee, S., Hadi, A.S. (1988). Sensivity Analysis in Linear Regression. Wiley, New York.
Hadi, A.S. (1988). Diagnosing collinerity-influential observations. Computational Statistics & Data Analysis 7, 143-159. doi:10.1016/0167-9473(88)90089-8.
data(portland) fm <- ridge(y ~ ., data = portland, x = TRUE) x <- fm$x rel <- relative.condition(x) plot(rel, ylab = "Relative condition number", ylim = c(-0.1,0.4)) abline(h = 0, lty = 2, lwd = 2, col = "red") text(3, rel[3], label = as.character(3), pos = 3)
data(portland) fm <- ridge(y ~ ., data = portland, x = TRUE) x <- fm$x rel <- relative.condition(x) plot(rel, ylab = "Relative condition number", ylim = c(-0.1,0.4)) abline(h = 0, lty = 2, lwd = 2, col = "red") text(3, rel[3], label = as.character(3), pos = 3)