17.7.4.3 Algorithms (Discriminant Analysis)

Discriminant Analysis is used to allocate observations to groups using information from observations whose group memberships are known (i.e., training data).

Let X_t\ be the training data with n observations and p variables on n_g groups. \bar{x}_j is a row vector of the sample mean for the jth group, n_j\ is the number of observations for the jth group. The within-group covariance matrix for group j can be expressed as:

S_j=\frac{1}{n_j-1}\cdot (X_{t}-\bar{x}_j)^T(X_{t}-\bar{x}_j)

The pooled within-group covariance matrix is:

S=\frac{1}{n-n_g}\cdot\sum_{j=1}^{n_g} (X_{t}-\bar{x}_j)^T(X_{t}-\bar{x}_j)

Note that missing values are excluded in a listwise way in the analysis (i.e., an observation containing one or more missing values will be excluded in the analysis).

Test for Equality of Within-group Covariance Matrices

If training data are assumed to follow a multivariate normal distribution, the following likelihood-ratio test statistic G can be used to test for equality of within-group covariance matrices.

G=C{(n-n_g) \mathrm{log} |S|-\sum_{j=1}^{n_g} (n_j-1) \mathrm{log} |S_j|}

where

C=1-\frac{2p^2+3p-1}{6(p+1)(n_g-1)}\cdot(\sum_{j=1}^{n_g} \frac{1}{n_j-1} -\frac{1}{n-n_g})

For large n, G is approximately distributed as a \chi^2\ variable with \frac{1}{2}\cdot p(p+1)(n_g-1) degrees of freedom.

Canonical Discriminant Analysis

Canonical discriminant analysis is used to find the linear combination of the p variables that maximizes the ratio of between-group to within-group variation. The formed canonical variates can then be used to discriminate between groups.

Let the training data with total means subtracted be X, and its rank be k, then the orthogonal matrix Q can be calculated from QR decomposition (for full column rank) or SVD from X. And Q_X\ is the first k columns of Q. Let Q_g\ be an n by n_g-1 orthogonal matrix to define groups. Then let the k by n_g-1\ matrix V be

V=Q_X^TQ_g

The SVD of V is:

V=U_X \triangle U_g^T

Non-zero diagonal elements of the matrix \triangle are the l canonical correlations associated with the l canonical variates, \delta_i\ i=1,2,...,l and l=\mathrm{min}(k, n_g)\ .

Eigenvalues of the within-group sums of squares matrix are:

\lambda_i=\frac{\delta_i^2}{1-\delta_i^2}


  • Wilks' Lambda
Testing for a significant dimensionality greater than i,
\Lambda_i=\prod_{j=i+1}^{l} 1/(1+\lambda_j)
A \chi^2\ statistic with (k-i)(n_g-1-i)\ degrees of freedom is used:
(n-1-n_g-\frac{1}{2}(k-n_g))\sum_{j=i+1}^{l} \mathrm{log}(1+\lambda_j)\ i=0,1,...,l-1
  • Unstandardized Canonical Coefficients
Loading matrix B for canonical variates can be calculated from U_X\ . It is scaled so that the canonical variates have unit pooled within-group variance. i.e.
B^TSB=I\
Note that eigenvector's sign in the SVD result is not unique, which means each column in B can be multiplied by -1. Origin normalizes its sign by forcing the sum of each column in RB\ to be positive, where R is the Cholesky factorization of S.
Constant items can be calculated as follows.
C_0=-X_mB\
where X_m\ is a row vector of means for variables.
  • Standardized Canonical Coefficients
D=S_aB\
where S_a\ is a diagonal matrix, whose diagonal elements are the square roots of the diagonal elements of pooled within group covariance matrix S.
  • Canonical Structure Matrix
C=S_a^{-1}SB\
  • Canonical Group Means
M_j=C_0+\bar{x}_jB\
where M_j\ and \bar{x}_j\ are row vectors of the canonical group mean and group mean for the jth group, respectively.
  • Canonical Scores
A_i=C_0+X_iB\
where A_i\ is the canonical score for the ith observation X_i\ .
Note that here the ith observation can be training data and test data.

Mahalanobis Distance

Mahalanobis distance is a measure of the distance of an observation from a group. It has two forms. For an observation x_i\ from the jth group, the distance is:

  • Using within-group covariance matrix
D_{ij}^2=(x_i-\bar{x}_j)S_j^{-1}(x_i-\bar{x}_j)^T
  • Using pooled within-group covariance matrix
D_{ij}^2=(x_i-\bar{x}_j)S^{-1}(x_i-\bar{x}_j)^T

Classify

Prior Probabilities

The prior probabilities reflect the user’s view as to the likelihood of the observations coming from the different groups. Origin supports two kinds of prior probabilities:

  • Equal
\pi_j=1/n_g\
  • Proportional to Group Size
\pi_j=n_j/n\
where n_j\ is the number of observations in the jth group of the training data.

Posterior Probability

The p variables of observations are assumed to follow a multivariate Normal distribution with mean \mu_j\ and covariance matrix \Sigma_j\ if the observation comes from the jth group. If p(x_i|\mu_j,\Sigma_j)\ is the probability of observing the observation x_i\ from group j, then the posterior probability of belonging to group j is:

q_j=p(j|x_i,\mu_j,\Sigma_j)\propto p(x_i|\mu_j,\Sigma_j)\pi_j

The parameters \mu_j\ and \Sigma_j\ are estimated from training data X_t\ . And the observation is allocated to the group with the highest posterior probability. Origin provides two methods to calculate posterior probability.

  • Linear Discriminant Function
Within-group covariance matrices are assumed equal.
\mathrm{log}(q_j)=-\frac{1}{2}D_{ij}^2+\mathrm{log}(\pi_j)+c_0
where D_{ij}^2 is the the Mahalanobis distance of the ith observation from the jth group using pooled with-group covariance matrix, and c_0\ is a constant.
  • Quadratic Discriminant Function
Within-group covariance matrices are not assumed equal.
\mathrm{log}(q_j)=-\frac{1}{2}D_{ij}^2+\mathrm{log}(\pi_j)-\frac{1}{2}\mathrm{log}|S_j|+c_0
where D_{ij}^2 is the the Mahalanobis distance of the ith observation from the jth group using with-group covariance matrices, and c_0\ is a constant.

q_j\ are standardized as follows and c_0\ will be determined from the standardization.

\sum_{j=1}^{n_g} q_j=1

Atypicality Index

Atypicality Index I_j(x_i)\ indicates the probability of obtaining an observation more typical of group j than the ith observation. If it is close to 1 for all groups, it implies that the observation may come from a grouping not represented in the training data. Atypicality Index is calculated as:

I_j(x_i)=P(B\le z:\frac{1}{2}p,\frac{1}{2}(n_j-d))

where P(B\le \beta:\ a, b) is the lower tail probability from a beta distribution, for equal within-group covariance matrices,

z=D_{ij}^2/(D_{ij}^2+(n-n_g)(n_j-1)/n_j)

for unequal within-group covariance matrices,

z=D_{ij}^2/(D_{ij}^2+(n_j^2-1)/n_j)

Linear Discriminant Function Coefficients

Linear discriminant function (also known as Fisher's linear discriminant functions) can be calculated as:

  • Linear Coefficient for the jth Group.
b_j=S^{-1}\bar{x}_j^T
where b_j\ is a column vector with size of p.
  • Constant Coefficient for the jth Group.
a_j=\bar{x}_jb_j

Classify Training Data

Each observation in training data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability). Squared Mahalanobis distance from each group and Atypicality Index of each group can also be calculated.

Classification result for training data is summarized by comparing given group membership and predicted group membership. Misclassified error rate is calculated by the percentage of misclassified observations weighted by the prior probabilities of groups. i.e.

E=\sum_{j=1}^{n_g} e_j\pi_j

where e_j\ is the percentage of misclassified observations for the jth group.

Cross Validation for Training Data

It follows the same procedure as Classify Training Data except that to predict an observation's membership in training data, the observation is excluded during calculating within-group covariance matrices or pooled within-group covariance matrix.

Classify Test Data

Within-group covariance matrices and pooled within-group covariance matrix are calculated from training data. Each observation in test data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability).