Module heavytail.tyler

Implementation of Tyler's estimator of covariance matrix for heavy-tail data.

Author: Mohammadjavad Vakili

This code implements Tyler's estimator, which is a robust estimator of the covariance matrix suitable for heavy-tailed distributions. It is particularly useful in scenarios where traditional estimators may fail due to the presence of outliers or non-normality in the data.

Source: Tyler, D. E. (1987). A distribution-free M-estimator of multivariate scatter. The Annals of Statistics, 15(1), 234_251.

Functions

def tyler_covariance(data: numpy.ndarray,
epsilon: float = 1e-06,
tol: float = 1e-06,
max_iter: int = 100) ‑> numpy.ndarray
Expand source code
def tyler_covariance(
    data: np.ndarray, epsilon: float = 1e-6, tol: float = 1e-6, max_iter: int = 100
) -> np.ndarray:
    r"""Compute Tyler's estimator of the covariance matrix.

    Let $\mathbf{X}$ be the input data matrix of shape $\(n_{samples}, n_{features}\)$ with mean zero,
    where $n_{samples}$ is the number of observations and $n_{features}$ is the number of features.
    For brevity of notation, we denote $T = n_{samples}$ and $N = n_{features}$.
    In the context of multivariate time-series data, $T$ is the length of the time series
    and $N$ is the number of variables.

    The input data matrix $\mathbf{X}$ is defined as follows:

    $$
    \mathbf{X} = \big[ \mathbf{x}_1  \& \mathbf{x}_2  \&  \ldots  \&  \mathbf{x}_T \big]
    $$

    where $\mathbf{x}_t$ is the $t$-th observation vector of length $N$.
    For each observation $\mathbf{x}_t$, we can define a normalized version as follows:

    $$
    {\mathbf{s}}_t = \frac{\mathbf{x}_t}{\|\mathbf{x}_t\|} \quad \text{for } t = 1, \ldots, T
    $$

    where $\|\mathbf{x}_t\| = \sqrt{\mathbf{x}_t^T \mathbf{x}_t}$ is the Euclidean norm of the vector $\mathbf{x}_t$.

    Tyler demonstrated that the pdf of the normalized observations $p(\mathbf{s}_t)$
    follows an angular distribution regardless of the tail behavior of $\mathbf{x}_t$:

    $$
    p(\mathbf{s}_t) \propto \frac{1}{\| \Sigma  \|^{1/2}} \big( \mathbf{s}_t^T \Sigma^{-1} \mathbf{s}_t \big) ^ {-\frac{N}{2}},
    $$

    where $\| \Sigma \|$ is the determinant of the scatter matrix $\Sigma$.

    Taking the logarithm of the pdf, we have:
    $$
    \log p(\mathbf{s}_t) = -\frac{N}{2} \log \big( \mathbf{s}_t^T \Sigma^{-1} \mathbf{s}_t \big) - \frac{1}{2} \log \| \Sigma  \|
    $$

    Substituting the normalized observations into the log-likelihood function, we obtain:
    $$
    \log p(\mathbf{s}_t) = -\frac{N}{2} \log \big( \big( \mathbf{x}_t^T \Sigma^{-1} \mathbf{x}_t \big) - \log \big( \|\mathbf{x}_t\|^2 \big) \big) - \frac{1}{2} \log \| \Sigma  \|
    $$

    We can obtain the negative-log-likelihood function of the scatter matrix $\Sigma$ by summing over all observations:

    $$
    \mathcal{L}(\Sigma) = -\sum_{t=1}^{T} \log p(\mathbf{s}_t) = \frac{N}{2} \sum_{t=1}^{T} \log \big( \mathbf{x}_t^T \Sigma^{-1} \mathbf{x}_t \big) + \frac{1}{2} T \log \| \Sigma  \|
    $$

    Therefore, the maximum likelihood estrimator (MLE) of the scatter matrix $\Sigma$ can be obatinaed by minimizing the following expression:

    $$
    \hat{\Sigma} = \arg\min_{\Sigma} \mathcal{L}(\Sigma) = \arg\min_{\Sigma} \Big[ \frac{N}{T} \sum_{t=1}^{T} \log \big( \mathbf{x}_t^T \Sigma^{-1} \mathbf{x}_t \big) + \log \| \Sigma  \| \Big]
    $$

    Solution to this optimization problem can be obtained using an iterative approach, where we update the estimate of the covariance matrix $\Sigma$ until convergence:


    $$
    \hat{\Sigma}^{(k+1)} = \frac{N}{T} \sum_{t=1}^{T} \frac{\mathbf{x}_t \mathbf{x}_t^T}{\mathbf{x}_t^T \hat{\Sigma}^{(k)^{-1}} \mathbf{x}_t}
    $$

    Existence of the solution is guaranteed as long as $ T > N $.

    The parameter $\Sigma$ can only be estimated up to a scale factor. This scale factor can be found heuristically
    by the robustly estimating the variances of the $N$ variables, and requiring the diagonal elements of the covariance matrix
    to be equal to the estimated variances. This is done by scaling the covariance matrix $\hat{\Sigma}$ as follows:

    First, we compute the variances of the $N$ variables using the median absolute deviation (MAD) as follows:
    $$
    \hat{\sigma}_i^2 = \text{MAD}(\mathbf{x}_i) = 1.4826 \cdot \text{median}(|\mathbf{x}_i - \text{median}(\mathbf{x}_i)|)^2
    $$

    Next, we scale the covariance matrix $\hat{\Sigma}$ by requiring the diagonal elements to be equal to the estimated variances:
    $$
    \hat{\Sigma}_{ii} = \hat{\sigma}_i^2, \quad i = 1, \ldots, N
    $$
    This can be achieved by scaling the covariance matrix as follows:

    $$
    \hat{\Sigma} \rightarrow \frac{1}{N} \mathbf{1}^{T} \cdot \hat{\sigma}^{2} \cdot\hat{\Sigma},
    $$
    where $\mathbf{1}$ is a vector of ones of length $N$ and $\hat{\sigma}^{2}$ is the vector of estimated variances.

    Parameters
    ----------
    data : ndarray
        Input data matrix of shape (n_samples, n_features).
    epsilon : float, optional
        Small value to add to the diagonal elements of the initial guess of the covariance matrix.
    tol : float
        Tolerance for convergence.
    max_iter : int
        Maximum number of iterations.

    Returns
    -------
    C : ndarray
        Estimated covariance matrix.

    Raises
    ------
    ValueError
        If the estimated covariance matrix is not positive definite.

    References
    ----------
    Tyler, D. E. (1987). A distribution-free M-estimator of multivariate scatter. The Annals of Statistics, 15(1), 234_251.

    https://bookdown.org/palomar/portfoliooptimizationbook/3.5-heavy-tail-ML.html#tylers-estimator

    """  # noqa: E501
    n_samples, n_features = data.shape
    mu = np.median(data, axis=0)  # Shape (N,)
    # Center the data by subtracting the median
    data_cen = data - mu  # Shape (T, N)
    # Initial guess for covariance matrix
    cov = np.cov(data_cen, rowvar=False)  # Shape (N, N)
    cov /= np.trace(cov)  # Normalize to have unit trace

    # Iterative Tyler's estimator of covariance matrix
    for _ in range(max_iter):
        cov_prev = cov.copy()
        cov_prev += epsilon * np.eye(n_features)
        z = solve(cov_prev, data_cen.T).T  # Shape (T, N)
        data_inv_cov_data = np.sum(data_cen * z, axis=1)  # Shape (T,)
        weights = 1 / np.maximum(
            data_inv_cov_data, epsilon
        )  # Shape(T, ). Avoid division by zero
        # Update covariance matrix
        cov = (n_samples / data_cen.shape[0]) * (
            data_cen.T @ np.diag(weights) @ data_cen
        )  # Shape (N, N)
        # Check for convergence
        if _relative_error_array(x=cov, y=cov_prev) < tol:
            break
    # Estimate the variances of the features using median absolute deviation
    sigma2 = _median_absolute_deviation(data)
    # Scale the covariance matrix to match the estimated variances
    cov = np.outer(np.ones(n_features), sigma2) * cov / np.diag(cov)
    cov = _symmetrizer(cov)  # Ensure symmetry
    if not _check_positive_definite(cov):
        msg = "Estimated covariance matrix is not positive definite."
        raise ValueError(msg)
    return cov

Compute Tyler's estimator of the covariance matrix.

Let $\mathbf{X}$ be the input data matrix of shape $(n_{samples}, n_{features})$ with mean zero, where $n_{samples}$ is the number of observations and $n_{features}$ is the number of features. For brevity of notation, we denote $T = n_{samples}$ and $N = n_{features}$. In the context of multivariate time-series data, $T$ is the length of the time series and $N$ is the number of variables.

The input data matrix $\mathbf{X}$ is defined as follows:

\mathbf{X} = \big[ \mathbf{x}_1 \& \mathbf{x}_2 \& \ldots \& \mathbf{x}_T \big]

where $\mathbf{x}_t$ is the $t$-th observation vector of length $N$. For each observation $\mathbf{x}_t$, we can define a normalized version as follows:

{\mathbf{s}}_t = \frac{\mathbf{x}_t}{\|\mathbf{x}_t\|} \quad \text{for } t = 1, \ldots, T

where $|\mathbf{x}_t| = \sqrt{\mathbf{x}_t^T \mathbf{x}_t}$ is the Euclidean norm of the vector $\mathbf{x}_t$.

Tyler demonstrated that the pdf of the normalized observations $p(\mathbf{s}_t)$ follows an angular distribution regardless of the tail behavior of $\mathbf{x}_t$:

p(\mathbf{s}_t) \propto \frac{1}{\| \Sigma \|^{1/2}} \big( \mathbf{s}_t^T \Sigma^{-1} \mathbf{s}_t \big) ^ {-\frac{N}{2}},

where $| \Sigma |$ is the determinant of the scatter matrix $\Sigma$.

Taking the logarithm of the pdf, we have: \log p(\mathbf{s}_t) = -\frac{N}{2} \log \big( \mathbf{s}_t^T \Sigma^{-1} \mathbf{s}_t \big) - \frac{1}{2} \log \| \Sigma \|

Substituting the normalized observations into the log-likelihood function, we obtain: \log p(\mathbf{s}_t) = -\frac{N}{2} \log \big( \big( \mathbf{x}_t^T \Sigma^{-1} \mathbf{x}_t \big) - \log \big( \|\mathbf{x}_t\|^2 \big) \big) - \frac{1}{2} \log \| \Sigma \|

We can obtain the negative-log-likelihood function of the scatter matrix $\Sigma$ by summing over all observations:

\mathcal{L}(\Sigma) = -\sum_{t=1}^{T} \log p(\mathbf{s}_t) = \frac{N}{2} \sum_{t=1}^{T} \log \big( \mathbf{x}_t^T \Sigma^{-1} \mathbf{x}_t \big) + \frac{1}{2} T \log \| \Sigma \|

Therefore, the maximum likelihood estrimator (MLE) of the scatter matrix $\Sigma$ can be obatinaed by minimizing the following expression:

\hat{\Sigma} = \arg\min_{\Sigma} \mathcal{L}(\Sigma) = \arg\min_{\Sigma} \Big[ \frac{N}{T} \sum_{t=1}^{T} \log \big( \mathbf{x}_t^T \Sigma^{-1} \mathbf{x}_t \big) + \log \| \Sigma \| \Big]

Solution to this optimization problem can be obtained using an iterative approach, where we update the estimate of the covariance matrix $\Sigma$ until convergence:

\hat{\Sigma}^{(k+1)} = \frac{N}{T} \sum_{t=1}^{T} \frac{\mathbf{x}_t \mathbf{x}_t^T}{\mathbf{x}_t^T \hat{\Sigma}^{(k)^{-1}} \mathbf{x}_t}

Existence of the solution is guaranteed as long as $ T > N $.

The parameter $\Sigma$ can only be estimated up to a scale factor. This scale factor can be found heuristically by the robustly estimating the variances of the $N$ variables, and requiring the diagonal elements of the covariance matrix to be equal to the estimated variances. This is done by scaling the covariance matrix $\hat{\Sigma}$ as follows:

First, we compute the variances of the $N$ variables using the median absolute deviation (MAD) as follows: \hat{\sigma}_i^2 = \text{MAD}(\mathbf{x}_i) = 1.4826 \cdot \text{median}(|\mathbf{x}_i - \text{median}(\mathbf{x}_i)|)^2

Next, we scale the covariance matrix $\hat{\Sigma}$ by requiring the diagonal elements to be equal to the estimated variances: \hat{\Sigma}_{ii} = \hat{\sigma}_i^2, \quad i = 1, \ldots, N This can be achieved by scaling the covariance matrix as follows:

\hat{\Sigma} \rightarrow \frac{1}{N} \mathbf{1}^{T} \cdot \hat{\sigma}^{2} \cdot\hat{\Sigma}, where $\mathbf{1}$ is a vector of ones of length $N$ and $\hat{\sigma}^{2}$ is the vector of estimated variances.

Parameters

data : ndarray
Input data matrix of shape (n_samples, n_features).
epsilon : float, optional
Small value to add to the diagonal elements of the initial guess of the covariance matrix.
tol : float
Tolerance for convergence.
max_iter : int
Maximum number of iterations.

Returns

C : ndarray
Estimated covariance matrix.

Raises

ValueError
If the estimated covariance matrix is not positive definite.

References

Tyler, D. E. (1987). A distribution-free M-estimator of multivariate scatter. The Annals of Statistics, 15(1), 234_251.

https://bookdown.org/palomar/portfoliooptimizationbook/3.5-heavy-tail-ML.html#tylers-estimator