# 17.7.2.3 Algorithms (Partial Least Squares)

## Contents

Partial Least Squares is used to construct a model where there is a large number of correlated predictor variables or when the number of predictor variables exceeds the number of observations. In these cases, use of multiple linear regression techniques often fails to produce a predictive model, due to over-fitting. Partial least squares finds a use in modeling industrial processes and for such things as calibrating and predicting component amounts in spectral analysis.

Partial least squares extracts factors by linear combinations of predictor variables, and projects predictor variables and response variables onto the extracted factor space.

An observation containing one or more missing values will be excluded from the analysis, i.e. excluded in a listwise way.

Let numbers of observations, predictor variables and response variables be n, m and r respectively. Predictor variables are denoted by the matrix X with size of $n \times m$, and response variables by Y with size of $n \times r$. Subtract the mean from each column in matrices X and Y, and let them be $X_0$ and $Y_0$.

• Scale Variables

Each column in the matrix $X_0$ is divided by the standard deviation.

## Partial Least Squares Method

Origin supports two methods to compute extracted factors: Wold's Iterative and Singular Value Decomposition (SVD).

### Wold's Iterative

Use the initial vector u. If r=1, initialize u=Y, otherwise u can be a vector of random values.

• Repeat each iteration until w converges.
$w=X_0^Tu$, and normalize w by $w=w/|| w ||$
$t=X_0w$, and normalize t by $t=t/|| t ||$
$q=Y_0^Tt$, and normalize q by $q=q/|| q ||$
$u=Y_0q$

After w converges, update

$t=X_0w$, and normalize t by $t=t/|| t ||$
$p=X_0^Tt$
$q=Y_0^Tt$
$u=Y_0q$
• Repeat the above process with the residual matrices k times:
$\hat{X}_0=X_0-tp^T$
$\hat{Y}_0=Y_0-tq^T$

and k factors can be constructed. x weights, x scores, y scores, x loadings and y loadings for k factors can be denoted by matrices: W, T, U, P, and Q.

Note that in Origin signs of x scores, y scores, x loadings and y loadings for each factor are normalized by forcing the sum of x weights for each factor to be positive.

### SVD

• X Weights for the First Factor

w is the normalized first left singular vector of $X_0^TY_0$, and,

$t=X_0w$, and normalize t by $t=t/|| t ||$
$p=X_0^Tt$
$q=Y_0^Tt$
$u=Y_0q$
• Repeat the above process with the residual matrices k times.

And k factors can be extracted.

## Cross Validation

Origin uses "leave-one-out" to find the optimal number of factors. It leaves out one observation each time and uses other observations to construct the model and predict responses for the observation.

• PRESS

PRESS is the predicted residual sum of squares. It can be calculated by:

$\text{PRESS} = \sum_{i=1}^n \sum_{j=1}^r (Y_{ij} - \hat{Y}_{ij})^2$
where $\hat{Y}_{ij}$ is the predicted Y value by leave-one-out.

Note that if variables are scaled, PRESS is the scaled result.

If maximum number of factors is k, then it will calculate PRESS for 0, 1, ... k factors. For 0 factor,

$\text{PRESS} = \sum_{i=1}^n \sum_{j=1}^r (Y_{ij} - \bar{Y}_{j})^2$
where $\bar{Y}_{j}$ is the mean value for jth Y variable.
• Root Mean PRESS

Root mean PRESS is the root mean of PRESS. It is defined by:

$\text{Root Mean PRESS} = \sqrt{ \frac{\text{PRESS}}{ (n-1)r } }$

Origin uses the minimum Root Mean PRESS to find the optimal number of factors in Cross Validation.

## Response Prediction

Once the model is constructed, responses can be predicted by coefficients of the fitted model. Coefficients are calculated from weights and loadings matrix:

$C = W(P^TW)^{-1}Q^T$

And the predicted responses are calculated as:

$\hat{Y}_0 = C X_{0}$

Note that here variables are centered. If variables are also scaled, responses should be scaled back.

## Quantities

• Variance Explained for X Effects

Variance Explained for the lth X variable,

$\frac{ \sum_{j=1}^{k} P_{lj}^2 }{ \sum_{i=1}^{n} {X_0}_{il}^2 }$

Variance Explained for X variables,

$\frac{ \sum_{l=1}^{m} \sum_{j=1}^{k} P_{lj}^2 }{ \sum_{l=1}^{m} \sum_{i=1}^{n} {X_0}_{il}^2 }$
• Variance Explained for Y Responses

Variance Explained for the lth Y variable,

$\frac{ \sum_{j=1}^{k} Q_{lj}^2 }{ \sum_{i=1}^{n} {Y_0}_{il}^2 }$

Variance Explained for Y variables,

$\frac{ \sum_{l=1}^{r} \sum_{j=1}^{k} Q_{lj}^2 }{ \sum_{l=1}^{r} \sum_{i=1}^{n} {Y_0}_{il}^2 }$
• VIP Statistic

VIP (variable influence on projections) explains each predictor variable using the mean variance in responses.

• Residual

X Residuals,

$X_r = X_0 - TP^T$

Y Residuals,

$Y_r = Y_0 - TQ^T$

When variables are scaled, residuals should be scaled back.

• Distances

Distances to X model for the ith observation,

$\text{Dist}_x = \sqrt{ \sum_{j=1}^m X_{rij}^2 }$

Distances to Y model for the ith observation,

$\text{Dist}_y = \sqrt{ \sum_{j=1}^r Y_{rij}^2 }$
• T Square

T Square for the ith observation,

$T^2=\sum_{j=1}^k \frac{T_{ij}}{\text{Var}_j}$

where $\text{Var}_j$ is the variance for X scores of the jth factor.

• Control Limit for T Square
$n(n-1)^2\text{betainv}(0.95,k/2.0,(n-k-1)/2.0)$
• Radius for Confidence Ellipse in Scores Plot
$\sqrt{(n-1)^2/n \cdot \text{betainv}(0.95,1,(n-3)/2.0) \cdot \text{Var}_j}$

where $\text{Var}_j$ is the variance for X scores or Y scores of the jth factor.