15.2.5 Algorithms (Linear Regression)

The Linear Regression Model

Simple Linear Regression Model

For a given dataset (x_i,y_i),i=1,2,\ldots n -- where x is the independent variable and y is the dependent variable, \beta_0 and \beta_1 are parameters, and \varepsilon_i is a random error term with mean E\left \{\varepsilon_i\right \}=0 and variance Var\left \{\varepsilon_i\right \}=\sigma^2 -- linear regression fits the data to a model of the following form:

y_i=\beta _0+\beta _1x_i+\varepsilon_i

(1)

The least squares estimation is used to minimize the sum of the n squared deviations

\sum_{i=1}^{n}(Y_i-\beta_0-\beta_1X_i)^2

(2)

the estimated parameters of linear model can be computed as:

\hat\beta _1=\frac{SXY}{SXX}

(3)

\hat\beta _0=\bar y-\hat\beta _1\bar x

(4)

where:

\bar x=\frac {1}{n}\sum_{i=1}^nx_i,\bar y=\frac {1}{n}\sum_{i=1}^ny_i

(5)

and

SXY=\sum_{i=1}^nx_iy_i\; \; \; \; \; \; \; SXX=\sum_{i=1}^nx_i^2 (uncorrected)

(6)

SXY=\sum_{i=1}^n(x_i-\bar x)(y_i-\bar y)\; \; \; \; \; \; \; SXX=\sum_{i=1}^n(x_i-\bar x)^2 (corrected)

(7)

Note: When the intercept is excluded from the model, the coefficients are calculated using the uncorrected formula.

Therefore, we estimate the regression function as follows:

\hat{y}=\hat{\beta_0}+\hat{\beta_1}x

(8)

the residual res_i is defined as:

res_i=y_i-\hat{y_i}

(9)

formula in (2) is to be minimized equaling to residual sum of squares

RSS=\sum_{i=1}^nres_i^2

(10)

when the least squares estimators \hat{\beta_0} and \hat{\beta_1} are used for estimating \beta_0 and \beta_1.

Fit Control

Errors as Weight

In above section, we assume that there is constant variance in the errors. However, when we fit the experimental data, we may need to take the instrument error (which reflect the accuracy and precision of a measuring instrument) into account in fitting process. Therefore, the assumption of constant variance in the errors is violated. Thus, we need to assume \varepsilon_i to be normally distributed with nonconstant variance, and the errors act as \sigma^2, which can be used as weight in fitting. The weight is defined as:

W=\begin{bmatrix}
 w_1& 0 & \dots &0 \\ 
0 & w_2 & \dots &0 \\ 
 \vdots& \vdots &\ \ddots &\vdots \\ 
 0& 0 &\dots  & w_n
\end{bmatrix}

The fitting model is changed into:

\sum_{i=1}^n w_i (y_i-\hat y_i)^2=\sum_{i=1}^n w_i [y_i-(\hat{\beta _0}+\hat{\beta _1}x_i)]^2

(11)

The weight factors w_i can be given by three formulas:

No Weighting

The error bar will not be treated as weight in calculation.

Direct Weighting

w_i=\sigma_i

(12)

Instrumental

As for Instrumental weight, the value is inversely proportional to the instrumental errors, so a trial with small errors will have a larger weight because it is rather precise than some other trials with larger errors.

w_i=\frac 1{\sigma_i^2}

(13)

Note: The errors as weight should be desiganited as "YError" column in worksheet.

Fix Intercept (at)

Fix intercept will set the y-intercept \beta_0 to a fixed value, meanwhile, the total degree of freedom will be n*=n-1 due to the intercept fixed.

Scale Error with sqrt(Reduced Chi-Sqr)

Scale Error with sqrt(Reduced Chi-Sqr) is available when fitting with weight. This option only affects the error on the parameters reported from the fitting process, and does not affect the fitting process or the data in any way. By default, it is checked, and \sigma^2 is taken into account when calculate error on the parameters, otherwise,\sigma^2 will not be taken into account for error calculation. Take Covariance Matrix as an example: Scale Error with sqrt(Reduced Chi-Sqr):

Cov(\beta _i,\beta _j)=\sigma^2 (X^{\prime }X)^{-1}
\sigma^2=\frac{RSS}{n^{*}-1}

(14)

Do not Scale Error with sqrt(Reduced Chi-Sqr):

Cov(\beta _i,\beta _j)=(X'X)^{-1}\,\!

(15)

For weighted fitting, (X'WX)^{-1}\,\! is used instead of (X'X)^{-1}\,\!.

Fit Results

When you perform a linear fit, you generate an analysis report sheet listing computed quantities. The Parameters table reports model slope and intercept (numbers in parentheses show how the quantities are derived):

Fit Parameters

Fitted-paramater.png

Fitted value

See formula (3)&(4)

The Parameter Standard Errors

For each parameter, the standard error can be obtained by:

\varepsilon _{\hat \beta _0}=s_\varepsilon \sqrt{\frac{\sum x_i^2}{nSXX}}

(16)

\varepsilon _{\hat \beta _1}=\frac{s_\varepsilon }{\sqrt{SXX}}

(17)

where the sample variance s_\varepsilon ^2 (or error mean square, MSE) can be estimated as follows:

s_\varepsilon ^2=\frac{RSS}{df_{Error}}=\frac{\sum_{i=1}^n (y_i-\hat y_i)^2}{n^{*}-1}

(18)

And RSS means the residual sum of square (or error sum of square, SSE), which is actually the sum of the squares of the vertical deviations from each data point to the fitted line. It can be computed as:

RSS=\sum_{i=1}^n e_i=\sum_{i=1}^n w_i (y_i-\hat y_i)^2=\sum_{i=1}^n w_i [y_i-(\beta _0+\beta _1x_i)]^2

(19)

Note : Regarding n*, if intercept is included in the model, n*=n-1. Otherwise, n*=n.

t-Value and Confidence Level

If the regression assumptions hold, we have:

\frac{{\hat \beta _0}-\beta _0}{\varepsilon _{\hat \beta _0}}\sim t_{n^{*}-1} and \frac{{\hat \beta _1}-\beta _1}{\varepsilon _{\hat \beta _1}}\sim t_{n^{*}-1}

(20)

The t-test can be used to examine whether the fitting parameters are significantly different from zero, which means that we can test whether \beta _0= 0\,\! (if true, this means that the fitted line passes through the origin) or \beta _1= 0\,\!. The hypotheses of the t-tests are:

H_0 : \beta _0= 0\,\! H_0 : \beta _1= 0\,\!
H_\alpha  : \beta _0  \neq 0\,\! H_\alpha  : \beta _1 \neq  0\,\!

The t-values can be computed by:

t_{\hat \beta _0}=\frac{{\hat \beta _0}-0}{\varepsilon _{\hat \beta _0}} and t_{\hat \beta _1}=\frac{{\hat \beta _1}-0}{\varepsilon _{\hat \beta _1}}

(21)

With the computed t-value, we can decide whether or not to reject the corresponding null hypothesis. Usually, for a given confidence level \alpha\,\! , we can reject H_0 \,\! when |t|>t_{\frac \alpha 2}. Additionally, the p-value, or significance level, is reported with a t-test. We also reject the null hypothesis H_0 \,\! if the p-value is less than \alpha\,\! .

Prob>|t|

The probability that H_0 \,\! in the t test above is true.

prob=2(1-tcdf(|t|,df_{Error}))\,\!

(22)

where tcdf(t, df) computes the lower tail probability for the Student's t distribution with df degree of freedom.

LCL and UCL

From the t-value, we can calculate the (1-\alpha )\times 100\% Confidence Interval for each parameter by:

\hat \beta _j-t_{(\frac \alpha 2,n^{*}-k)}\varepsilon _{\hat \beta _j}\leq \hat \beta _j\leq \hat \beta _j+t_{(\frac \alpha 2,n^{*}-k)}\varepsilon _{\hat \beta _j}

(23)

where UCL and LCL is short for the Upper Confidence Interval and Lower Confidence Interval, respectively.

CI Half Width

The Confidence Interval Half Width is:

CI=\frac{UCL-LCL}2

(24)

where UCL and LCL is the Upper Confidence Interval and Lower Confidence Interval, respectively.

Fit Statistics

Key linear fit statistics are summarized in the Statistics table (numbers in parentheses show how quantities are computed):

FitStats.png

Degrees of Freedom

The Error degrees of freedom. Please refer to the ANOVA table for more details.

Residual Sum of Squares

The residual sum of squares, see formula (19).

Reduced Chi-Sqr

See formula (14)

R-Square (COD)

The quality of linear regression can be measured by the coefficient of determination (COD), or R^2, which can be computed as:

R^2=\frac{SXY}{SXX*TSS}=1-\frac{RSS}{TSS}

(25)

TSS=\sum(y_i-\bar{y})^2

where TSS is the total sum of square, and RSS is the residual sum of square. The R^2 is a value between 0 and 1. Generally speaking, if it is close to 1, the relationship between X and Y will be regarded as very strong and we can have a high degree of confidence in our regression model.

Adj. R-Square

We can further calculate the adjusted R^2 as

{\bar R}^2=1-\frac{RSS/df_{Error}}{TSS/df_{Total}}

(26)

R Value

The R value is the square root of R^2:

R=\sqrt{R^2}

(27)

Pearson's r

In simple linear regression, the correlation coefficient between x and y, denoted by r, equals to:

r=R\,\! if \beta _1\,\! is positive

(28)

r=-R\,\! if \beta _1\,\! is negative

Root-MSE (SD)

Root mean square of the error, or residual standard deviation, which equals to:

RootMSE=\sqrt{\frac{RSS}{df_{Error}}}

(29)

Norm of Residuals

Equals to square root of RSS:

Norm \,of \,Residuals=\sqrt{RSS}

(30)

ANOVA Table

The ANOVA table of linear fitting is:

DF Sum of Squares Mean Square F Value Prob > F
Model 1 SS_{reg} = TSS - RSS MS_{reg} = SS_{reg} / 1 MS_{reg} / MSE p-value
Error n* - 1 RSS MSE = RSS / (n* - 1)
Total n* TSS
Note: If intercept is included in the model, n*=n-1. Otherwise, n*=n and the total sum of squares is uncorrected. If the slope is fixed, df_{Model} = 0.

Where the total sum of square, TSS, is:

TSS =\sum_{i=1}^nw_i(y_i -\frac{\sum_{i=1}^n w_i y_i} {\sum_{i=1}^n w_i})^2 (corrected) (31)
TSS=\sum_{i=1}^n w_iy_i^2 (uncorrected)

The F value here is a test of whether the fitting model differs significantly from the model y=constant.

The p-value, or significance level, is reported with an F-test. If the p-value is less than \alpha\,\!, the fitting model differs significantly from the model y=constant.

If fixing the intercept at a certain value, the p value for F-test is not meaningful, and it is different from that in linear regression without the intercept constraint.

Lack of fit table

To run the lack of fit test, you need to have repeated observations, namely, "replicate data" , so that at least one of the X values is repeated within the dataset, or within multiple datasets when concatenate fit mode is selected.

Notations used for fit with replicates data:

y_{ij} is the jth measurement made at the ith x-value in the data set
\bar{y}_{i} is the average of all of the y values at the ith x-value
\hat{y}_{ij} is the predicted response for the jth measurement made at the ith x-value

The sum of square in table below is expressed by:

RSS=\sum_{i}\sum_{j}(y_{ij}-\hat{y}_{ij})^2
LFSS=\sum_{i}\sum_{j}(\bar{y}_{i}-\hat{y}_{ij})^2
PESS=\sum_{i}\sum_{j}(y_{ij}-\bar{y}_{i})^2

The Lack of fit table of linear fitting is:

DF Sum of Squares Mean Square F Value Prob > F
Lack of Fit c-2 LFSS MSLF = LFSS / (c - 2) MSLF / MSPE p-value
Pure Error n - c PESS MSPE = PESS / (n - c)
Error n*-1 RSS
Note:

If intercept is included in the model, n*=n-1. Otherwise, n*=n and the total sum of squares is uncorrected. If the slope is fixed, df_{Model} = 0.

c denotes the number of distinct x values. If intercept is fixed, DF for Lack of Fit is c-1.

Covariance and Correlation Matrix

The Covariance matrix of linear regression is calculated by:


\begin{pmatrix}
Cov(\beta _0,\beta _0) & Cov(\beta _0,\beta _1)\\
Cov(\beta _1,\beta _0) & Cov(\beta _1,\beta _1)
\end{pmatrix}=\sigma ^2\frac 1{SXX}\begin{pmatrix} \sum \frac{x_i^2}n & -\bar x \\-\bar x & 1 \end{pmatrix}

(32)

The correlation between any two parameters is:


\rho (\beta _i,\beta _j)=\frac{Cov(\beta _i,\beta _j)}{\sqrt{Cov(\beta _i,\beta _i)}\sqrt{Cov(\beta _j,\beta _j)}}

(33)

Outliers

The Outliers are those points whose absolute values in Studentized Residual plot are larger than 2.

abs(Studentized Residual)>2

Studentized Residual is introduced in Detecting outliers by transforming residuals.

Residual Analysis

r_i stands for the Regular Residual res_i.

Standardized

r_i^{\prime }=\frac{r_i}s_\varepsilon

(34)

Studentized

Also known as internally studentized residual.

r_i^{\prime }=\frac{r_i}{s_\varepsilon\sqrt{1-h_i}}

(35)

Studentized deleted

Also known as externally studentized residual.

r_i^{\prime }=\frac{r_i}{s_{\varepsilon-i}\sqrt{1-h_i}}

(36)

In the equations for the Studentized and Studentized deleted residuals, h_i is the ith diagonal element of the matrix P:

P=X(X'X)^{-1}X^{\prime }

(37)

s_{\varepsilon-i} means the variance is calculated based on all points but exclude the ith.

Confidence and Prediction Bands

For a particular value x_p\,\!, the 100(1-\alpha )\% confidence interval for the mean value of y\,\! at x=x_p\,\! is:

\hat y\pm t_{(\frac \alpha 2,n^{*}-1)}s_\varepsilon \sqrt{\frac 1n+\frac{(x_p-\bar x)^2}{SXX}}

(38)

And the 100(1-\alpha )\% prediction interval for the mean value of y\,\! at x=x_p\,\!is:

\hat y\pm t_{(\frac \alpha 2,n^{*}-1)}s_\varepsilon \sqrt{1+\frac 1n+\frac{(x_p-\bar x)^2}{SXX}}

(39)

Confidence Ellipses

Assuming the pair of variables (X, Y) conforms to a bivariate normal distribution, we can examine the correlation between the two variables using a confidence ellipse. The confidence ellipse is centered at (\bar x,\bar y ), and the major semiaxis a and minor semiaxis b can be expressed as follow:

 a=c\sqrt{\frac{\sigma _x^2+\sigma _y^2+\sqrt{(\sigma _x^2-\sigma _y^2)+4r^2\sigma _x^2\sigma _y^2}}2}
 b=c\sqrt{\frac{\sigma _x^2+\sigma _y^2-\sqrt{(\sigma _x^2-\sigma _y^2)+4r^2\sigma _x^2\sigma _y^2}}2}

(40)

For a given confidence level of  (1-\alpha )\,\! :

  • The confidence ellipse for the population mean is defined as:
 c=\sqrt{\frac{2(n-1)}{n(n-2)}(\alpha ^{\frac 2{2-n}}-1)}

(41)

  • The confidence ellipse for prediction is defined as:
 c=\sqrt{\frac{2(n+1)(n-1)}{n(n-2)}(\alpha ^{\frac 2{2-n}}-1)}

(42)

  • The inclination angle of the ellipse is defined as:
\beta =\frac 12\arctan \frac{2r\sqrt{\sigma _x^2\sigma _y^2}}{\sigma _x^2-\sigma _y^2}

(43)

Finding Y/X from X/Y

Residual Plots

Resudial Type

Select one residual type among Regular, Standardized, Studentized, Studentized Deleted for Plots.

Residual vs. Independent

Scatter plot of residual res vs. indenpendent variable x_1,x_2,\dots,x_k, each plot is located in a seperate graphs.

Residual vs. Predicted Value

Scatter plot of residual res vs. fitted results \hat{y_i}.

Residual vs. Order of the Data

res_i vs. sequence number i

Histogram of the Residual

The Histogram plot of the Residual

Residual Lag Plot

Residuals res_i vs. lagged residual res_{(i–1)}.

Normal Probability Plot of Residuals

A normal probability plot of the residuals can be used to check whether the variance is normally distributed as well. If the resulting plot is approximately linear, we proceed to assume that the error terms are normally distributed. The plot is based on the percentiles versus ordered residual, and the percentiles is estimated by

\frac{(i-\frac{3}{8})}{(n+\frac{1}{4})}

where n is the total number of dataset and i is the i th data. Also refer to Probability Plot and Q-Q Plot