Title: | Estimation and Testing for the Multivariate t-Distribution |
---|---|
Description: | Routines to perform estimation and inference under the multivariate t-distribution <doi:10.1007/s10182-022-00468-2>. Currently, the following methodologies are implemented: multivariate mean and covariance estimation, hypothesis testing about equicorrelation and homogeneity of variances, the Wilson-Hilferty transformation, QQ-plots with envelopes and random variate generation. |
Authors: | Felipe Osorio [aut, cre] |
Maintainer: | Felipe Osorio <[email protected]> |
License: | GPL-3 |
Version: | 0.3-81 |
Built: | 2024-10-25 03:46:01 UTC |
Source: | https://github.com/cran/MVT |
Data extracted from Standard & Poor's Compustat PC Plus. This dataset has been used to illustrate some influence diagnostic techniques.
data(companies)
data(companies)
A data frame with 26 observations on the following 3 variables.
book value in dollars per share at the end of 1992.
net sales in millions of dollars in 1992.
sales to assets ratio in 1992.
Hadi, A.S., and Nyquist, H. (1999). Frechet distance as a tool for diagnosing multivariate data. Linear Algebra and Its Applications 289, 183-201.
Hadi, A.S., and Son, M.S. (1997). Detection of unusual observations in regression and multivariate data. In: A. Ullah, D.E.A. Giles (Eds.) Handbook of Applied Economic Statistics. Marcel Dekker, New York. pp. 441-463.
Measurements of the weight of cork borings taken from the north (N), east (E), south (S), and west (W) directions of 28 trees. It is of interest to compare the bark thickness (and hence weight) in the four directions.
data(cork)
data(cork)
A data frame with 28 observations on the following 4 variables.
north.
east.
south.
west.
Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979). Multivariate Analysis. Academic Press, London.
Constructs a normal QQ-plot using a Wilson-Hilferty transformation for the estimated Mahalanobis distances obtained from the fitting procedure.
envelope.student(object, reps = 50, conf = 0.95, plot.it = TRUE)
envelope.student(object, reps = 50, conf = 0.95, plot.it = TRUE)
object |
an object of class |
reps |
number of simulated point patterns to be generated when computing the envelopes. The default number is 50. |
conf |
the confidence level of the envelopes required. The default is to find 95% confidence envelopes. |
plot.it |
if TRUE it will draw the corresponding plot, if FALSE it will only return the computed values. |
A list with the following components :
transformed |
a vector with the |
envelope |
a matrix with two columns corresponding to the values of the lower and upper pointwise confidence envelope. |
Atkinson, A.C. (1985). Plots, Transformations and Regression. Oxford University Press, Oxford.
Osorio, F., Galea, M., Henriquez, C., Arellano-Valle, R. (2023). Addressing non-normality in multivariate analysis using the t-distribution. AStA Advances in Statistical Analysis 107, 785-813.
data(PSG) fit <- studentFit(~ manual + automated, data = PSG, family = Student(eta = 0.25)) envelope.student(fit, reps = 500, conf = 0.95)
data(PSG) fit <- studentFit(~ manual + automated, data = PSG, family = Student(eta = 0.25)) envelope.student(fit, reps = 500, conf = 0.95)
Performs several test for testing that the covariance matrix follows an equicorrelation (or compound symmetry) structure. Likelihood ratio test, score, Wald and gradient can be used as a test statistic.
equicorrelation.test(object, test = "LRT")
equicorrelation.test(object, test = "LRT")
object |
object of class |
test |
test statistic to be used. One of "LRT" (default), "Wald", "score" or "gradient". |
A list of class 'equicorrelation.test' with the following elements:
statistic |
value of the statistic, i.e. the value of either Likelihood ratio test, Wald, score or gradient test. |
parameter |
the degrees of freedom for the test statistic, which is chi-square distributed. |
p.value |
the p-value for the test. |
estimate |
the estimated covariance matrix. |
null.value |
the hypothesized value for the covariance matrix. |
method |
a character string indicating what type of test was performed. |
null.fit |
a list representing the fitted model under the null hypothesis. |
data |
name of the data used in the test. |
Sutradhar, B.C. (1993). Score test for the covariance matrix of the elliptical t-distribution. Journal of Multivariate Analysis 46, 1-12.
data(examScor) fit <- studentFit(examScor, family = Student(eta = .25)) fit z <- equicorrelation.test(fit, test = "LRT") z
data(examScor) fit <- studentFit(examScor, family = Student(eta = .25)) fit z <- equicorrelation.test(fit, test = "LRT") z
Dataset from Mardia, Kent and Bibby on 88 students who took examinations in five subjects. The first two subjects were tested with closed book exams and the last three were tested with open book exams.
data(examScor)
data(examScor)
A data frame with 88 observations on the following 5 variables.
mechanics, closed book exam.
vectors, closed book exam.
algebra, open book exam.
analysis, open book exam.
statistics, open book exam.
Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979). Multivariate Analysis. Academic Press, London.
Performs several test for testing equality of correlated variables. Likelihood ratio test,
score, Wald and gradient can be used as a test statistic.
homogeneity.test(object, test = "LRT")
homogeneity.test(object, test = "LRT")
object |
object of class |
test |
test statistic to be used. One of "LRT" (default), "Wald", "score" or "gradient". |
A list of class 'homogeneity.test' with the following elements:
statistic |
value of the statistic, i.e. the value of either Likelihood ratio test, Wald, score or gradient test. |
parameter |
the degrees of freedom for the test statistic, which is chi-square distributed. |
p.value |
the p-value for the test. |
estimate |
the estimated covariance matrix. |
null.value |
the hypothesized value for the covariance matrix. |
method |
a character string indicating what type of test was performed. |
null.fit |
a list representing the fitted model under the null hypothesis. |
data |
name of the data used in the test. |
Harris, P. (1985). Testing the variance homogeneity of correlated variables. Biometrika 72, 103-107.
Modarres, R. (1993). Testing the equality of dependent variables. Biometrical Journal 7, 785-790.
data(examScor) fit <- studentFit(examScor, family = Student(eta = .25)) fit z <- homogeneity.test(fit, test = "LRT") z
data(examScor) fit <- studentFit(examScor, family = Student(eta = .25)) fit z <- homogeneity.test(fit, test = "LRT") z
This function computes the kurtosis of a multivariate distribution and estimates the kurtosis parameter for the t-distribution using the method of moments.
kurtosis.student(x)
kurtosis.student(x)
x |
vector or matrix of data with, say, p columns. |
A list with the following components :
kurtosis |
returns the value of Mardia's multivariate kurtosis. |
kappa |
returns the excess kurtosis related to a multivariate t-distribution. |
eta |
estimated shape (kurtosis) parameter using the methods of moments, only valid if |
Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519-530.
Osorio, F., Galea, M., Henriquez, C., Arellano-Valle, R. (2023). Addressing non-normality in multivariate analysis using the t-distribution. AStA Advances in Statistical Analysis 107, 785-813.
data(companies) kurtosis.student(companies)
data(companies) kurtosis.student(companies)
These functions provide the density and random number generation from the multivariate Student-t distribution.
dmt(x, mean = rep(0, nrow(Sigma)), Sigma = diag(length(mean)), eta = 0.25, log = FALSE) rmt(n = 1, mean = rep(0, nrow(Sigma)), Sigma = diag(length(mean)), eta = 0.25)
dmt(x, mean = rep(0, nrow(Sigma)), Sigma = diag(length(mean)), eta = 0.25, log = FALSE) rmt(n = 1, mean = rep(0, nrow(Sigma)), Sigma = diag(length(mean)), eta = 0.25)
x |
vector or matrix of data. |
n |
the number of samples requested. |
mean |
a vector giving the means of each variable |
Sigma |
a positive-definite covariance matrix |
eta |
shape parameter (must be in |
log |
logical; if TRUE, the logarithm of the density function is returned. |
A random vector has a multivariate t distribution,
with a
mean vector, covariance matrix
, and
shape parameter, if its density function is given by
where
with . This parameterization of the multivariate t distribution
is introduced mainly because
and
correspond to the mean vector
and covariance matrix, respectively.
The function rmt
is an interface to C routines, which make calls to subroutines from LAPACK.
The matrix decomposition is internally done using the Cholesky decomposition. If Sigma
is not
non-negative definite then there will be a warning message.
This parameterization of the multivariate-t includes the normal distribution as a particular
case when eta = 0
.
If x
is a matrix with rows, then
dmt
returns a
vector considering each row of
x
as a copy from the multivariate t distribution.
If n = 1
, then rmt
returns a vector of the same length as mean
, otherwise
a matrix of n
rows of random vectors.
Fang, K.T., Kotz, S., Ng, K.W. (1990). Symmetric Multivariate and Related Distributions. Chapman & Hall, London.
Gomez, E., Gomez-Villegas, M.A., Marin, J.M. (1998). A multivariate generalization of the power exponential family of distributions. Communications in Statistics - Theory and Methods 27, 589-600.
# covariance matrix Sigma <- matrix(c(10,3,3,2), ncol = 2) Sigma # generate the sample y <- rmt(n = 1000, Sigma = Sigma) # scatterplot of a random bivariate t sample with mean vector # zero and covariance matrix 'Sigma' par(pty = "s") plot(y, xlab = "", ylab = "") title("bivariate t sample (eta = 0.25)", font.main = 1)
# covariance matrix Sigma <- matrix(c(10,3,3,2), ncol = 2) Sigma # generate the sample y <- rmt(n = 1000, Sigma = Sigma) # scatterplot of a random bivariate t sample with mean vector # zero and covariance matrix 'Sigma' par(pty = "s") plot(y, xlab = "", ylab = "") title("bivariate t sample (eta = 0.25)", font.main = 1)
Allows users to set control parameters for the estimation routine available in MVT
.
MVT.control(maxiter = 2000, tolerance = 1e-6, fix.shape = FALSE)
MVT.control(maxiter = 2000, tolerance = 1e-6, fix.shape = FALSE)
maxiter |
maximum number of iterations. The default is 2000. |
tolerance |
the relative tolerance in the iterative algorithm. |
fix.shape |
whether the shape parameter should be kept fixed in
the fitting processes. The default is |
A list of control arguments to be used in a call to studentFit
.
A call to MVT.control
can be used directly in the control
argument
of the call to studentFit
.
ctrl <- MVT.control(maxiter = 500, tol = 1e-04, fix.shape = TRUE) data(PSG) studentFit(~ manual + automated, data = PSG, family = Student(eta = 0.25), control = ctrl)
ctrl <- MVT.control(maxiter = 500, tol = 1e-04, fix.shape = TRUE) data(PSG) studentFit(~ manual + automated, data = PSG, family = Student(eta = 0.25), control = ctrl)
Clinical study designed to compare the automated and semi-automated scoring of Polysomnographic (PSG) recordings used to diagnose transient sleep disorders. The study considered 82 patients who were given a sleep-inducing drug (Zolpidem 10 mg). Measurements of latency to persistent sleep (LPS: lights out to the beginning of 10 consecutive minutes of uninterrupted sleep) were obtained using six different methods.
data(PSG)
data(PSG)
A data frame with 82 observations on the following 3 variables.
fully manual scoring.
automated scoring by the Morpheus software.
Morpheus automated scoring with manual review.
Svetnik, V., Ma, J., Soper, K.A., Doran, S., Renger, J.J., Deacon, S., Koblan, K.S. (2007). Evaluation of automated and semi-automated scoring of polysomnographic recordings from a clinical trial using zolpidem in the treatment of insomnia. SLEEP 30, 1562-1574.
Provide a convenient way to specify the details of the model used by function studentFit
.
Student(eta = .25)
Student(eta = .25)
eta |
shape parameter for the multivariate t-distribution, must be confined to |
Student
is a generic function to create info about the t-distribution which
is passed to the estimation algorithm.
MyFmly <- Student(eta = .4) MyFmly
MyFmly <- Student(eta = .4) MyFmly
Estimates the mean vector and covariance matrix assuming the data came from a multivariate t-distribution: this provides some degree of robustness to outlier without giving a high breakdown point.
studentFit(x, data, family = Student(eta = .25), covStruct = "UN", subset, na.action, control)
studentFit(x, data, family = Student(eta = .25), covStruct = "UN", subset, na.action, control)
x |
a formula or a numeric matrix or an object that can be coerced to a numeric matrix. |
data |
an optional data frame (or similar: see |
family |
a description of the error distribution to be used in the model.
By default the multivariate t-distribution with 0.25 as shape parameter is considered
(using |
covStruct |
a character string specifying the type of covariance structure. The options
available are: |
subset |
an optional expression indicating the subset of the rows of data that should be used in the fitting process. |
na.action |
a function that indicates what should happen when the data contain NAs. |
control |
a list of control values for the estimation algorithm to replace
the default values returned by the function |
A list with class 'studentFit'
containing the following components:
call |
a list containing an image of the |
family |
the |
center |
final estimate of the location vector. |
Scatter |
final estimate of the scale matrix. |
logLik |
the log-likelihood at convergence. |
numIter |
the number of iterations used in the iterative algorithm. |
weights |
estimated weights corresponding to the assumed heavy-tailed distribution. |
distances |
estimated squared Mahalanobis distances. |
eta |
final estimate of the shape parameter, if requested. |
Generic function print
show the results of the fit.
Kent, J.T., Tyler, D.E., Vardi, Y. (1994). A curious likelihood identity for the multivariate t-distribution. Communications in Statistics: Simulation and Computation 23, 441-453.
Lange, K., Little, R.J.A., Taylor, J.M.G. (1989). Robust statistical modeling using the t distribution. Journal of the American Statistical Association 84, 881-896.
Osorio, F., Galea, M., Henriquez, C., Arellano-Valle, R. (2023). Addressing non-normality in multivariate analysis using the t-distribution. AStA Advances in Statistical Analysis 107, 785-813.
cov
, cov.rob
and cov.trob
in package MASS.
data(PSG) fit <- studentFit(~ manual + automated, data = PSG, family = Student(eta = 0.25)) fit
data(PSG) fit <- studentFit(~ manual + automated, data = PSG, family = Student(eta = 0.25)) fit
Returns the Wilson-Hilferty transformation of random variables with distribution.
WH.student(x, center, cov, eta = 0)
WH.student(x, center, cov, eta = 0)
x |
object of class |
center |
mean vector of the distribution or second data vector of length |
cov |
covariance matrix ( |
eta |
shape parameter of the multivariate t-distribution. By default the multivariate normal ( |
Let the following random variable:
where denotes the squared Mahalanobis distance defined as
Thus the Wilson-Hilferty transformation is given by
and is approximately distributed as a standard normal distribution. This is useful, for instance, in the construction of
QQ-plots.
For eta = 0
, we obtain
which is the Wilson-Hilferty transformation for chi-square variables.
Osorio, F., Galea, M., Henriquez, C., Arellano-Valle, R. (2023). Addressing non-normality in multivariate analysis using the t-distribution. AStA Advances in Statistical Analysis 107, 785-813.
Wilson, E.B., and Hilferty, M.M. (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences of the United States of America 17, 684-688.
cov
, mahalanobis
, envelope.student
data(companies) x <- companies z <- WH.student(x, center = colMeans(x), cov = cov(x)) par(pty = "s") qqnorm(z, main = "Transformed distances Q-Q plot") abline(c(0,1), col = "red", lwd = 2)
data(companies) x <- companies z <- WH.student(x, center = colMeans(x), cov = cov(x)) par(pty = "s") qqnorm(z, main = "Transformed distances Q-Q plot") abline(c(0,1), col = "red", lwd = 2)
This dataset consists of 278 hourly average wind speed in the Pacific North-West of the United States collected at three meteorological towers approximately located on a line and ordered from west to east: Goodnoe Hills (gh), Kennewick (kw), and Vansycle (vs). The data were collected from 25 February to 30 November 2003 recorded at midnight, a time when wind speeds tend to peak.
data(WindSpeed)
data(WindSpeed)
A data frame with 278 observations on the following 3 variables.
Goodnoe Hills.
Kennewick.
Vansycle.
Azzalini, A., Genton, M.G. (2008). Robust likelihood methods based on the skew-t and related distributions. International Statistical Review 76, 106-129.