Kwangmin Kim - Basics (3) - Special Matrices

1 Special Matrix

1.1 Square Matrix

A square matrix \(\mathbf{A}\) is a matrix with the same number of rows and columns, i.e., \(\mathbf{A}\) is an \(n \times n\) matrix.

For example, the following is a \(3 \times 3\) square matrix:

\[ \mathbf A= \begin{bmatrix} 1 & 4 & 7\\ 2 & 5 & 8\\ 3 & 6 & 9 \end{bmatrix} \]

1.1.1 Properties

Let \(\mathbf{A}\) be an \(n\times n\) square matrix. Then, the following properties hold:

\(\mathbf{A}\) is invertible if and only if \(\text{det}(\mathbf{A}) \neq 0\).
The trace of \(\mathbf{A}\) is defined as \(\text{tr}(\mathbf{A}) = \sum_{i=1}^n a_{ii}\), where \(a_{ii}\) is the \(i\)th diagonal element of \(\mathbf{A}\).
If \(\mathbf{A}\) is symmetric, then it has \(n\) real eigenvalues and an orthonormal set of eigenvectors.
If \(\mathbf{A}\) is diagonalizable, then \(\mathbf{A}\) can be written as \(\mathbf{A} = \mathbf{PDP}^{-1}\), where \(\mathbf{P}\) is the matrix whose columns are the eigenvectors of \(\mathbf{A}\), and \(\mathbf{D}\) is the diagonal matrix whose diagonal elements are the corresponding eigenvalues.
The transpose of \(\mathbf{A}\), denoted \(\mathbf{A}^\top\), is obtained by reflecting \(\mathbf{A}\) across its main diagonal. That is, \((\mathbf{A}^\top)_{ij} = a_{ji}\).

Here’s an example of a \(3 \times 3\) matrix:

\[ \mathbf A= \begin{bmatrix} 1 & 4 & 7\\ 2 & 5 & 8\\ 3 & 6 & 9 \end{bmatrix} \]

With this matrix, we can see that \(\text{det}(\mathbf{A}) = 0\), so \(\mathbf{A}\) is not invertible. The trace of \(\mathbf{A}\) is \(\text{tr}(\mathbf{A}) = 1 + 5 + 9 = 15\). Since \(\mathbf{A}\) is not symmetric, we cannot say that it has real eigenvalues and an orthonormal set of eigenvectors. However, we can check that \(\mathbf{A}\) is diagonalizable, and we can find that \(\mathbf{A} = \mathbf{PDP}^{-1}\) with

\[ \begin{equation*} P=\begin{pmatrix} -0.8252 & -0.2886 & 0.4848\\ -0.3779 & -0.7551 & -0.5375\\ 0.2185 & -0.5800 & 0.7830 \end{pmatrix}, \text{ } D=\begin{pmatrix} 16.1168 & 0 & 0\\ 0 & -1.1168 & 0\\ 0 & 0 & 0 \end{pmatrix} \end{equation*} \]

1.2 Diagonal Matrix

A diagonal matrix is a square matrix in which all the off-diagonal elements are zero. The diagonal elements can be any scalar value.

\[ \begin{equation*} \mathbf{D} = \begin{pmatrix} d_{1} & 0 & 0 & \cdots & 0 \\ 0 & d_{2} & 0 & \cdots & 0 \\ 0 & 0 & d_{3} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & d_{n} \end{pmatrix} \end{equation*} \]

Here, \(\mathbf{D}\) is an \(n \times n\) diagonal matrix with diagonal elements \(d_1, d_2, \ldots, d_n\). An example of a \(3 \times 3\) diagonal matrix is:

\[ \begin{equation*} \mathbf{D} = \begin{pmatrix} 2 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 4 \end{pmatrix} \end{equation*} \]

1.2.1 Properties

A diagonal matrix is a square matrix in which all the off-diagonal elements are zero, i.e., \(a_{ij} = 0\) for \(i \neq j\). Some properties of a diagonal matrix are:

For two diagnoal matrices \(\mathbf D\) and \(\mathbf E\), \[ \begin{equation*} \mathbf{DE}= \begin{pmatrix} d_{1}e_{1} & 0 & 0 & \cdots & 0 \\ 0 & d_{2}e_{2} & 0 & \cdots & 0 \\ 0 & 0 & d_{3}e_{3} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & d_{n}e_{n} \end{pmatrix} \end{equation*} \]
The determinant of a diagonal matrix is the product of its diagonal entries.
The trace of a diagonal matrix is the sum of its diagonal entries.
The inverse of a non-singular diagonal matrix is a diagonal matrix with the reciprocal of its diagonal entries as its diagonal entries.

1.3 Identity Matrix

An identity matrix is a square matrix in which all the diagonal elements are equal to \(1\) and all the off-diagonal elements are equal to 0. The notation for an identity matrix of size \(n\) is \(\mathbf{I}_n\).

1.3.1 Example

Example of a \(3\times 3\) identity matrix, \(\mathbf{I}_3\): \[ \begin{equation} \mathbf{I}_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \end{equation} \]

1.3.2 Properties

Some properties of an identity matrix include:

Multiplying any matrix by an identity matrix results in the same matrix: \(\mathbf{A} \mathbf{I} = \mathbf{I} \mathbf{A} = \mathbf{A}\).
The product of any matrix and its corresponding inverse is an identity matrix: \(\mathbf{A} \mathbf{A}^{-1} = \mathbf{A}^{-1} \mathbf{A} = \mathbf{I}\).
The determinant of an identity matrix is 1: \(\det(\mathbf{I}) = 1\).
An identity matrix is symmetric: \(\mathbf{I} = \mathbf{I}^T\).

1.4 Symmetric Matrix

A symmetric matrix is a square matrix that is equal to its own transpose, i.e., \(\mathbf{A} = \mathbf{A}^T\). Let \(\mathbf{A}\) be an \(n \times n\) matrix, then \(\mathbf{A}\) is symmetric if and only if \(a_{ij} = a_{ji}\) for all \(i\) and \(j\) such that \(1 \le i\), \(j \le n\).

Here’s an example of a symmetric matrix: \[ \mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 5 \\ 3 & 5 & 6 \end{bmatrix} \]

1.4.1 Properties

A symmetric matrix is a square matrix that is equal to its own transpose. Some properties of symmetric matrices include:

The diagonal entries are real numbers.
The matrix is diagonalizable, meaning it can be expressed as a product of diagonal and orthogonal matrices.
The eigenvalues of a symmetric matrix are real numbers.
The eigenvectors corresponding to different eigenvalues are orthogonal.
The sum and difference of two symmetric matrices is also symmetric.

1.5 Idempotent Matrix

An idempotent matrix is a square matrix that when multiplied by itself yields itself. In other words, an idempotent matrix \(\mathbf{P}\) satisfies \(\mathbf{P}^2 = \mathbf{P}\).

An example of an idempotent matrix is:

\[ \begin{equation} \mathbf{P} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \end{equation} \]

1.5.1 Properties

\(\mathbf{A}^2 = \mathbf A\).
If \(\mathbf{A}\) is an idempotent matrix, \(\mathbf{I_n}-\mathbf{A}\) is also an idempotent matrix. \[ \begin{aligned} (\mathbf{I_n}-\mathbf{A})^2&=(\mathbf{I_n}-\mathbf{A})(\mathbf{I_n}-\mathbf{A})\\ &=\mathbf{I_n}\mathbf{I_n}-\mathbf{I_n}\mathbf{A}-\mathbf{A}\mathbf{I_n}+\mathbf{A}\mathbf{A}\\ &=\mathbf{I_n}\mathbf{I_n}-\mathbf{A}-\mathbf{A}+\mathbf{A} \quad \because \text{A is idempotent}\\ &=\mathbf{I_n}\mathbf{I_n}-\mathbf{A}\\ &=\mathbf{I_n}-\mathbf{A} \end{aligned} \]
The determinant of \(\mathbf{A}\) is either 0 or 1.
If \(\mathbf{A}\) is symmetric, \(\mathbf{A}\) is idempotent if only if the eigenvalue of \(\mathbf{A}\) is either \(0\) or \(1\).
The rank of \(\mathbf A\) is equal to the trace of \(\mathbf A\), which is the sum of the diagonal elements of \(\mathbf A\).

1.6 Ones Matrix

The ones matrix, denoted as \(\mathbf{J}\), is a matrix in which every entry is equal to 1.

경고

The letter \(\mathbf{J}\) is not related to the jacobian matrix at all. The letter \(\mathbf{J}\) of the ones matrix is used for convention.

Here is an example of a \(3\times 3\) ones matrix:

\[ \mathbf{J} = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{pmatrix} \]

1.6.1 Properties

\(\mathbf J\) is a square matrix
\(\mathbf J\) is symmetric
\(\mathbf J\) has rank 1
Trace: The trace of the \(\mathbf J\) matrix is equal to the dimension of the matrix. For the \(\mathbf J\) matrix, it is equal to the number of rows or columns in the matrix.
Matrix multiplication: When multiplied by any matrix \(\mathbf{A}\), the \(\mathbf{J}\) matrix results in a matrix where each row (or column, depending on the multiplication order) is the sum of the elements of the corresponding row (or column) of \(\mathbf{A}\).
- In other words, \(\mathbf{J}\) multiplied by a matrix \(\mathbf{A}\) performs a row (or column) summation operation on \(\mathbf{A}\).
- If A is any \(n \times n\) matrix, then \(\mathbf{AJ} = \mathbf{JA} = \operatorname{trace}(\mathbf{A})\mathbf{J}\), where \(\operatorname{trace}(\mathbf{A})\) is the sum of the diagonal elements of \(\mathbf{A}\).
\(\mathbf J_{m\times n}\) can be represented as the product of two vectors, \(\mathbf 1_{m}\), \(\mathbf 1_{n}\), i.e., \(\mathbf J_{m\times n} = \mathbf 1_{m} \mathbf 1_{n}^T\) where \(m=n\)
Eigenvalues and eigenvectors: The \(\mathbf J\) matrix has one eigenvalue equal to the dimension of the matrix, with the corresponding eigenvector being a vector of all 1’s.
Inverse: The \(\mathbf J\) matrix is a special case where it does not have an inverse, as all its rows (or columns) are linearly dependent.

1.6.2 Applications

calculate a sum using a \(\mathbf 1_n\) vector for \(\mathbf x_n\): \[ \mathbf 1^T\mathbf x =\sum_{i=1}^{n}1\times x_i=x_1+x_2+\dots+x_n \]
calculate a mean using a \(\mathbf 1_n\) vector for \(\mathbf x_n\):

\[ \bar{x}=\frac{1}{n}\mathbf 1^T\mathbf x =\frac{1}{n}\sum_{i=1}^{n}1\times x_i=\frac{1}{n}(x_1+x_2+\dots+x_n) \]

calculate \(n\) column sums of a dataset using a \(\mathbf 1_m\) vector for \(\mathbf X_{m\times n}\): \[ \bar{\mathbf x}=\frac{1}{m}\mathbf X^T\mathbf 1_m \] \[ \begin{aligned} \bar{\mathbf x}&=\frac{1}{m}\mathbf X^T\mathbf 1 \\ &=\frac{1}{m} \begin{bmatrix} x_{11} & x_{21} & \cdots & x_{m1} \\ x_{12} & x_{22} & \cdots & x_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ x_{1n} & x_{2n} & \cdots & x_{mn} \end{bmatrix} \begin{bmatrix} 1_1 \\ 1_2 \\ \vdots \\ 1_m \end{bmatrix}\\ &=\frac{1}{m} \begin{bmatrix} x_{11} + x_{21} + \cdots + x_{m1} \\ x_{12} + x_{22} + \cdots + x_{m2} \\ \vdots \\ x_{1n} + x_{2n} + \cdots + x_{mn} \end{bmatrix} \\ &=\frac{1}{m} \begin{bmatrix} \sum_{i=1}^{m }x_{i1} \\ \sum_{i=1}^{m }x_{i2} \\ \vdots \\ \sum_{i=1}^{m }x_{jn} \end{bmatrix} \\ &= \begin{bmatrix} \bar{x}_{1} \\ \bar{x}_{2} \\ \vdots \\ \bar{x}_{n} \end{bmatrix} \\ &=\bar{\mathbf x} \end{aligned} \]

1.7 Centering Matrix

A centering matrix is a square matrix that is used in multivariate statistical analysis to center data by subtracting the mean of each variable from each observation. The resulting matrix is called the centered data matrix with the mean equal to \(0\). The centering matrix is defined as:

\[ \begin{equation} \mathbf C = \mathbf I - \frac{1}{m}\mathbf J \end{equation} \]

where \(\mathbf I\) is the identity matrix, \(\mathbf J\) is a matrix of ones, and \(m\) is the number of observations.

1.7.1 Example

Here is an example of a centering matrix of size \(3 \times 3\):

\[ \begin{equation*} \mathbf C = \frac{1}{3} \begin{bmatrix} 2 & -1 & -1 \\ -1 & 2 & -1 \\ -1 & -1 & 2 \end{bmatrix} \end{equation*} \]

1.7.2 Properties

The centering matrix is often used in multivariate statistical analysis, such as principal component analysis, to transform the data into a new coordinate system where the variance of each variable is equal to its eigenvalue.

A centering matrix is a square matrix.
A centering matrix is a symmetric matrix.
A centering matrix is a idempotent matrix.
The diagonal elements of a centering matrix are all equal and are given by \(\frac{1}{m}\), where \(n\) is the size of the matrix.
The off-diagonal elements of a centering matrix are all equal and are given by \(-\frac{1}{m}\).
Multiplying a matrix \(\mathbf A\) on the left by a centering matrix \(\mathbf C\), \(\mathbf{CA}\) is equivalent to subtracting the mean of the columns of \(\mathbf A\) from each column of \(\mathbf A\).
Multiplying a matrix \(\mathbf A\) on the right by a centering matrix \(\mathbf C\), \(\mathbf{AC}\) is equivalent to subtracting the mean of the rows of \(\mathbf A\) from each row of \(\mathbf A\).

1.7.3 Applications

find a centered matrix, \(\tilde{\mathbf X}\) using a \(\mathbf 1_m\) vector for \(\mathbf X_{m\times n}\):

\[ \mathbf C = \mathbf I - \frac{1}{m}\mathbf J \] \[ \begin{aligned} \mathbf X&= \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} \text{ } \\ \mathbf {1_m\bar{x}^T}&= \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} \begin{bmatrix} \bar{x}_{1} & \bar{x}_{2} & \dots & \bar{x}_{n} \end{bmatrix}\\ &=\begin{bmatrix} \bar{x}_{1} & \bar{x}_{2} & \dots & \bar{x}_{n}\\ \bar{x}_{1} & \bar{x}_{2} & \dots & \bar{x}_{n}\\ \vdots & \vdots & \ddots & \vdots\\ \bar{x}_{1} & \bar{x}_{2} & \dots & \bar{x}_{n} \end{bmatrix}_{m\times n}\\ \\ \tilde{\mathbf X}&=\mathbf X -{\mathbf{1}}_{m}\bar{\mathbf x}^T\\ &=\mathbf X -\mathbf{1}_{m}(\frac{1}{m}\mathbf X^T \mathbf{1}_m)^T\\ &=\mathbf X -\mathbf{1}_{m}\mathbf{1}_m^T\frac{1}{m}\mathbf X\\ &=(\mathbf I -\frac{1}{m}\mathbf J)\mathbf X\\ &=\mathbf C \mathbf X\\ \end{aligned} \]

1.8 Covariance Matrix

The covariance matrix of a dataset matrix \(\mathbf{X}\) with \(m\) observations and \(n\) variables is a symmetric \(n \times n\) matrix given by:

1.8.1 Algebraic Expression

For two random variables, \(X_1,X_2\), if there are \(m\) of the observed samples or realized values, we can represent the sample data as:

\[ (x_{11},x_{12}),(x_{21},x_{22}),\dots,(x_{m1},x_{m2}) \]

Then, the sample variance between the two random variables are represented as:

\[ s_{x_1x_2} = \frac{\sum_{i=1}^{m}(x_{i1}-\bar{x_1})(x_{i2}-\bar{x_2})}{m-1} \]

Sample Variance

For each variate, \(X_j\), \(j=1,\dots, n\)

\[ s^2_{x}=s_{xx} = \frac{\sum_{i=1}^{m}(x_{ij}-\bar{x})^2}{m-1} \]

For two random variables, \(X_1\), \(X_2\), \[ s_{x_1x_2} = \frac{\sum_{i=1}^{m}(x_{i1}-\bar{x_1})(x_{i2}-\bar{x_2})}{m-1} \]

Often for the notation of sample variance, the squared power is added for the sample variance with one random variable, but it is not for the covariance between two random variables.

For two random variables ,\(X_1,X_2\),

\[ \begin{aligned} \mathbf{S} &=\begin{bmatrix} s_{11}&s_{12}\\ s_{21}&s_{22} \end{bmatrix}\\ &=\begin{bmatrix} s_{x_1x_1}&s_{x_1x_2}\\ s_{x_2x_1}&s_{x_2x_2} \end{bmatrix}\\ &=\frac{1}{n-1} \begin{bmatrix} \sum_{i=1}^{m}(x_{i1}-\bar{x}_1)(x_{i1}-\bar{x}_1)&\sum_{i=1}^{m}(x_{i1}-\bar{x}_1)(x_{i2}-\bar{x}_2)\\ \sum_{i=1}^{m}(x_{i2}-\bar{x}_2)(x_{i1}-\bar{x}_1)&\sum_{i=1}^{m}(x_{i2}-\bar{x}_2)(x_{i2}-\bar{x}_2) \end{bmatrix} \end{aligned} \]

For \(n\) of random variables, \(X_1,\dots,X_n\), if we observed the sample vectors with \(m\) observations, the sample covariance matrix can be represented as:

\[ \mathbf{S}=\begin{bmatrix} s_{11}&s_{12}&\dots&s_{1n}\\ s_{21}&s_{22}&\dots&s_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ s_{n1}&s_{n2}&\dots&s_{nn} \end{bmatrix} \]

where \(s_{jk} = \frac{\sum_{i=1}^{m}(x_{ij}-\bar{x_j})(x_{ik}-\bar{x_k})}{m-1}\). The diagonal entries are the sample variances of each random variable, and the off-diagonal entires are the sample covariance between two different random variables.

1.8.2 Vector Form of Sample Variance in Linear Algebra

The complicating algebraic notation can be represented as the simpler form in linear algebra. To do so, let a sample vector \(\mathbf x =\begin{bmatrix}x_{1}\\x_{2}\\ \vdots\\x_{n}\end{bmatrix}\). In othe words, a record, a row, or the observations across \(n\) variables in a dataset:

\[ \mathbf{X}= \begin{bmatrix} \mathbf{x}_{1}^T\\ \mathbf{x}_{2}^T\\ \vdots\\ \mathbf{x}_{m}^T \end{bmatrix} =\begin{bmatrix} x_{11}&x_{12}&\dots&x_{1n}\\ x_{21}&x_{22}&\dots&x_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ x_{m1}&x_{m2}&\dots&x_{mn} \end{bmatrix} \]

\[ \begin{aligned} \mathbf S &= \operatorname{Cov}(\mathbf{X}) \\ &= \frac{1}{m-1}\sum_{i=1}^{m}(\mathbf x_i - \bar{\mathbf x}_i)(\mathbf x_i - \bar{\mathbf x}_i)^T\\ &= \frac{1}{m-1}\tilde{\mathbf{X}}^T\tilde{\mathbf{X}}\\ &= \frac{1}{m-1}(\mathbf{CX})^T(\mathbf{CX})\\ &= \frac{1}{m-1}\mathbf{X}^T\mathbf{C}^T\mathbf{CX}\\ &= \frac{1}{m-1}\mathbf{X}^T\mathbf{C}\mathbf{C}\mathbf{X}\quad (\because \mathbf{C}\text{ is symmetric})\\ &= \frac{1}{m-1}\mathbf{X}^T\mathbf{C}\mathbf{X}\quad (\because \mathbf{C}\text{ is an idempotent matrix}) \end{aligned} \]

where \(\mathbf{X}\) is the dataset matrix, \(\mathbf{x_i}=\begin{bmatrix}x_{i1}\\x_{i2}\\ \vdots\\x_{in}\end{bmatrix}\) is the observations of each variable along the columns, \(\bar{\mathbf{x}}_i=\begin{bmatrix}\bar{x}_{1}\\\bar{x}_{2}\\ \vdots\\\bar{x}_{n}\end{bmatrix}\) and \((\mathbf{x}_i - \bar{\mathbf{x}}_i)^T=\tilde{\mathbf{X}}\) is the transpose of the centered dataset matrix.

The size of \(\mathbf S\) is \((1 \times n)^T(1 \times n)=(n \times 1)(1 \times n)=(n \times n)\)

힌트

For the \(i\) th observation of \(m\) observations and the \(j\) th variable of \(n\) variables in a dataset, \(\mathbf{X}\), the outer product of \((\mathbf x_i - \bar{\mathbf x}_i)(\mathbf x_i - \bar{\mathbf x}_i)^T\) generates the following \(n \times n\) matrix.

\[ \begin{aligned} (\mathbf x_i - \bar{\mathbf x}_i)(\mathbf x_i - \bar{\mathbf x}_i)^T &= \begin{bmatrix} x_{i1} -\bar{x}_{i1}\\ x_{i2} -\bar{x}_{i2}\\ \vdots \\ x_{in} -\bar{x}_{in} \end{bmatrix}\begin{bmatrix} x_{i1} -\bar{x}_{i1}&x_{i2} -\bar{x}_{i2}&\dots&x_{in} -\bar{x}_{in} \end{bmatrix} \\ &=\begin{bmatrix} (x_{i1}-\bar{x}_{i1})(x_{i1}-\bar{x}_{i1}) & (x_{i1}-\bar{x}_{i1})(x_{i2}-\bar{x}_{i2}) & \dots & (x_{i1}-\bar{x}_{i1})(x_{in}-\bar{x}_{in})\\ (x_{i2}-\bar{x}_{i2})(x_{i1}-\bar{x}_{i1}) & (x_{i2}-\bar{x}_{i2})(x_{i2}-\bar{x}_{i2}) & \dots & (x_{i2}-\bar{x}_{i2})(x_{in}-\bar{x}_{in})\\ \vdots&\vdots&\ddots&\vdots\\ (x_{in}-\bar{x}_{in})(x_{i1}-\bar{x}_{i1})&(x_{in}-\bar{x}_{in})(x_{i2}-\bar{x}_{i2})&\dots&(x_{in}-\bar{x}_{in})(x_{in}-\bar{x}_{in}) \end{bmatrix}_{n\times n} \end{aligned} \]

The \(n \times n\) matrix right above is gnerated by \(m\) times for the \(m\) observations of \(\mathbf{X}_{m \times n}\), the \(\operatorname{Cov}(\mathbf X)\) is the sum of the \(m\) of \(n \times n\) matrices divided by \(\frac{1}{m-1}\). In other words,

\[ \frac{1}{m-1}\sum_{i=1}^{m}(\mathbf x_i - \bar{\mathbf x}_i)(\mathbf x_i - \bar{\mathbf x}_i)^T \]

1.8.3 Example

Example 1

Let’s say we have a dataset with three variables, \(\mathbf{x}_1, \mathbf{x}_2, \text{ and } \mathbf{x}_3,\) with \(m\) observations. The dataset can be represented as a matrix \(\mathbf{X}\) with dimensions \(m \times 3\), where each row represents an observation and each column represents a variable. The covariance matrix, \(\operatorname{Cov}(\mathbf{X})\), of the dataset can be computed as follows:

\[ \begin{aligned} \operatorname{Cov}(\mathbf{X}) &= \frac{1}{m-1}\sum_{i=1}^{m}(\mathbf x_i - \bar{\mathbf x}_i)^T(\mathbf x_i - \bar{\mathbf x}_i)\\ &=\frac{\tilde{\mathbf X}^T\tilde{\mathbf X}}{m-1} \end{aligned} \]

where \(\mathbf{X}\) is the dataset matrix, \(\bar{x}\) is the mean of each variable computed along the rows, and \((\mathbf{x_i} - \bar{\mathbf{x_i}})^T\) is the transpose of the centered dataset matrix. The covariance matrix \(\operatorname{Cov}(\mathbf{X})\) will be a \(3 \times 3\) matrix, with the \((i, j)\) th entry representing the covariance between the \(i\) th and \(j\) th variables in the dataset.

Example 2

The second example is to calculate \(\operatorname{Cov}(\mathbf{X})\) where \(\mathbf{X}\) is the mtcars dataset.

#| echo: true

X<-as.matrix(mtcars)
x_bar_vec<-colMeans(mtcars)

as.numeric(mtcars[1,1:5]-x_bar_vec)%*%t(as.numeric(mtcars[1,1:5]-x_bar_vec)) # for 1th row
as.numeric(mtcars[2,1:5]-x_bar_vec)%*%t(as.numeric(mtcars[2,1:5]-x_bar_vec)) # for 2nd row

X_sweeped<-sweep(X,MARGIN=2,STAT=x_bar_vec,FUN='-')
result<-matrix(0,ncol=ncol(mtcars),nrow=ncol(mtcars))
for (i in 1:nrow(mtcars)){
  result<-result+X_sweeped[i,]%*%t(X_sweeped[i,])
}
result/(nrow(mtcars)-1)

# J matrix
m<-nrow(mtcars)
J<-(1/m)*matrix(1,ncol=m,nrow=m)
# Centering matrix
C<-diag(m)-J
# Covariance
(1/(m-1))*t(X)%*%C%*%X
var(X) #=cov(X)

1.8.4 Properties

Symmetry
Diagonal Entries: The diagonal entries of the covariance matrix represent the variances of the individual variables.
Linearity: The covariance matrix exhibits linearity in the sense that the covariance of a linear combination of variables can be expressed as a linear combination of their covariances. Mathematically, if \(a\) and \(b\) are constants and \(X\), \(Y\), and \(Z\) are random variables, then \(\operatorname{Cov}(aX + bY, Z) = a\operatorname{Cov}(X, Z) + b\operatorname{Cov}(Y, Z)\).
Scale Invariance: The covariance between variables is invariant to changes in scale or units of measurement. For example, if variables are measured in different units or are on different scales, the covariance matrix will still be valid and informative for measuring the linear relationship between the variables.
Independence: If two variables \(X_i\) and \(X_j\) are independent, their covariance \(\operatorname{Cov}(X_j, X_k)\) is zero.
Positive Semidefiniteness: The covariance matrix is positive semidefinite, which means that all of its eigenvalues are non-negative. This property ensures that the covariance matrix is a valid variance-covariance matrix.

노트

양정치 행렬(Positive Definite Matrix)에 대한 상세 내용은 Ch.6 §6.5 Positive Definite Matrices 포스트를 참조한다.