15.2.8 Algorithms (Polynomial Regression)



The Polynomial Model

Polynomial Model

For a given dataset (x_i , y_i), i = 1,2, ..., n, where x is the independent variable and y is the dependent variable, a polynomial regression fits data to a model of the following form:

 y_i=\beta _0+\beta _1x_i+\beta_2x_i^2+\beta_3x_i^3+...+\beta_kx_i^k+\varepsilon_i

(1)

where k is the polynomial order. In Origin, k is a positive number that is less than 10. Parameters are estimated using a weighted least-square method. This method minimizes the sum of the squares of the deviations between the theoretical curve and the experimental points for a range of independent variables. After fitting, the model can be evaluated using hypothesis tests and by plotting residuals.

\beta _0\,\! is the y-intercept and the parameters \beta _1\,\!, \beta _2\,\!, ... , \beta _k\,\! are called the "partial coefficients" (or "partial slopes"). It can be written in matrix form:

Y=XB+E\,\!

(2)

and

Y=\begin{bmatrix}
y_1\\
y_2\\
\vdots \\
y_n
\end{bmatrix}
, X=\begin{bmatrix}
1 & x_{11} & x_{1}^2 & \cdots & x_{1}^k\\
1 & x_{21} & x_{2}^2 & \cdots & x_{2}^k\\
\vdots & \vdots  & \vdots & \ddots & \vdots\\
1 & x_{n1} & x_{n}^2 & \cdots & x_{n}^k
\end{bmatrix}  
B=\begin{bmatrix}
\beta_0 \\
\beta_1 \\
\vdots \\
\beta_k
\end{bmatrix} , E=\begin{bmatrix}
\varepsilon _1\\
\varepsilon _2\\
\vdots \\
\varepsilon _n
\end{bmatrix}

Assuming that \varepsilon_i are independent and identically distributed as normal random variables with \bar{E}=0 and Var[E]=\sigma^2. In Order to minimize the \left \|E\right \| with respect to B, we solve the function:

\frac{\partial E'E}{\partial B}=0

(3)

The results \hat B is the least square estimate of the vector B, and it is the solution to the linear equations, which can be expressed as:

\hat B=\begin{bmatrix}
\hat \beta_0 \\
\hat \beta_1 \\
\vdots \\
\hat \beta_k
\end{bmatrix}=(X'X)^{-1}X^{\prime }Y

(4)

where X' is the transpose of X. The predicted value of Y for a given X is:

\hat{Y}=X\hat{B}

(5)

By substituting \hat{B} into (4), we can and defined matrix P.

\hat{Y}=[X(X'X)^{-1}X']Y=PY

(6)

The residuals is defined as:

res_i=Y-\hat{Y}

(7)

and the residual sum of squares can be written by:

RSS=\left \|E \right \|^2={Y}'Y-\hat{B}'X'X\hat{B}

(8)

Note: It is worth noting that the higher order terms in polynomial equation have the greatest effect on the dependent variable. Consequently, models with high order terms (higher than 4) are extremely sensitive to the precision of coefficient values, where small differences in the coefficient values can result in a larges differences in the computed y value. We mention this because, by default, the polynomial fitting results are rounded to 5 decimal places. If you manually plug these reported worksheet values back into the fitted curve, the slight loss of precision that occurs in rounding will have a marked effect on the higher order terms, possibly leading you to conclude wrongly, that your model is faulty. If you wish to perform manual calculations using your best-fit parameter estimates, make sure that you use full-precision values, not rounded values. Note that while Origin may round reported values to 5 decimal places (or other), these values are only for display purposes. Origin always uses full precision (double(8)) in mathematical calculations unless you have specified otherwise. For more information, see Numbers in Origin in the Origin Help file.

Generally speaking, any continuous function can be fitted to a higher order polynomial model. However, higher order terms may not have much practical significance.

Fit Control

Errors as Weight

In above section, we assume that there is constant variance in the errors. However, when we fit with the experimental data, we may need to take account of the instrument error (which reflect the accuracy and precision of a measuring instrument) in fitting process. Therefore, the assumption of constant variance in the errors is violated. We need to assume \varepsilon_i to be normally distributed with nonconstant variance, and the errors act as \sigma^2, which can be used as weight in fitting. The weight is defined as:

W=\begin{bmatrix}
 w_1& 0 & \dots &0 \\ 
0 & w_2 & \dots &0 \\ 
 \vdots& \vdots &\ \ddots &\vdots \\ 
 0& 0 &\dots  & w_n
\end{bmatrix}

The fitting model is changed into:

\sum_{i=1}^n w_i (y_i-\hat y_i)^2=\sum_{i=1}^n w_i [y_i-(\hat{\beta _0}+\hat{\beta _1}x_i)]^2

(9)

The weight factors w_i can be given by three formulas:

No Weighting

The error bar will not be treated as weight in calculation.

Direct Weighting

w_i=\sigma_i

(10)

Instrumental

As for Instrumental weight, the value is inversely proportional to the instrumental errors, so a trial with small errors will have a large weight because it is rather precise than those with larger errors.

w_i=\frac 1{\sigma_i^2}

(11)

Note: the errors as weight should be desiganited as "YError" column in worksheet.

Fix Intercept (at)

Fix intercept will set the y-intercept \beta_0 to a fixed value, meanwhile, the total degree of freedom will be n*=n-1 due to the intercept fixed.

Scale Error with sqrt(Reduced Chi-Sqr)

Scale Error with sqrt(Reduced Chi-Sqr) is available when fitting with weight. This option only affects the error on the parameters reported from the fitting process, and does not affect the fitting process or the data in any way. By default, it is checked, and \sigma^2, which is the variance of E is taken into account for calculating error on the parameters, otherwise variance of E will not be taken for error calculation. Take Covariance Matrix as example:

Scale Error with sqrt(Reduced Chi-Sqr)

Cov(\beta _i,\beta _j)=\sigma^2 (X^{\prime }X)^{-1}

(12)

\sigma^2=\frac{RSS}{n^{*}-k}

Do not Scale Error with sqrt(Reduced Chi-Sqr)

Cov(\beta _i,\beta _j)=(X'X)^{-1}\,\!

(13)

For weighted fitting, (X'WX)^{-1}\,\! is used instead of (X'X)^{-1}\,\!.

Fitting Results

Fit Parameters

Mfitted-paramater.png

The Fitted Values

Formula (4)

The Parameter Standard Errors

For each parameter, the standard error can be obtained by:

s_{\hat \beta _j}=s_\varepsilon \sqrt{C_{jj}}

(14)

where C_{jj} is the jth diagonal element of (X'X)^{-1} (note that (X'WX)^{-1} is used for weight fitting). The residual standard deviation s_\varepsilon (also called "std dev", "standard error of estimate", or "root MSE") is computed as:

s_\varepsilon =\sqrt{\frac{RSS}{df_{Error}}}=\sqrt{\frac{RSS}{n^{*}-k}}

(15)

s_\varepsilon is an estimate of \sigma ^2, which is variance of E

Note: Please read the ANOVA Table for more details about the degree of freedom (df), dfError.

t-Value and Confidence Level

If the regression assumptions hold, we can perform the t-tests for the regression coefficients with the null hypotheses and the alternative hypotheses:

H_0:\beta _j=0\,\!

H_\alpha :\beta _j\neq 0

The t-values can be computed as:

t=\frac{\hat \beta _j-0}{s_{\hat \beta _j}}

(16)

With the t-value, we can decide whether or not to reject the corresponding null hypothesis. Usually, for a given Confidence Level for Parameters: \alpha\,\! , we can reject H_0\,\! when |t|>t_{\frac \alpha 2}. Additionally, the p-value is less than \alpha\,\!.

Prob>|t|

The probability that H_0 in the t test is true.

prob=2(1-tcdf(|t|,df_{Error}))\,\!

(17)

where tcdf(|t|,df_{Error}) compute the cumulative distribution function of the Student's t distribution at the values |t|, with degree of freedom of error (df_{Error}).

LCL and UCL

From the t-value, we can calculate the (1-\alpha )\times 100\% Confidence Interval for each parameter by:

\hat \beta _j-t_{(\frac \alpha 2,n^{*}-k)}\varepsilon _{\hat \beta _j}\leq \hat \beta _j\leq \hat \beta _j+t_{(\frac \alpha 2,n^{*}-k)}\varepsilon _{\hat \beta _j}

(18)

where UCL and LCL is short for the Upper Confidence Interval and Lower Confidence Interval, respectively.

CI Half Width

The Confidence Interval Half Width is:

CI=\frac{UCL-LCL}2

(19)

Fit Statistics

Some fit statistics formulas are summary here:

Mfitted-stats.png

Degree of Freedom

The degree of freedom for (Error) variation. Please refer to the ANOVA table for more details.

Reduced Chi-Sqr

\sigma^2=\frac{RSS}{n^{*}-k}

(20)

Residual Sum of Squares

The residual sum of squares, see formula (8).

R-Square (COD)

The goodness of fit can be evaluated by Coefficient of Determination (COD), R^2, which is given by:

R^2=\frac{Explained\, variation}{Total \, variation}=1-\frac{RSS}{TSS}

(21)

Adj. R-Square

The adjusted R^2 is used to adjust the R^2 value for the degree of freedom. It can be computed as:

\bar R^2=1-\frac{RSS/df_{Error}}{TSS/df_{Total}}

(22)

R Value

Then we can compute the R-value, which is simply the square root of R^2:

R=\sqrt{R^2}

(23)

Root-MSE (SD)

Root Mean Square of the Error, or residual standard deviation, which equals to:

RootMSE=\sqrt{\frac{RSS}{n^*-k}}

(24)

Norm of Residuals

Equals to square root of RSS:

Norm \,of \,Residuals=\sqrt{RSS}

(25)

ANOVA Table

The ANOVA table of polynomial fitting is:

df Sum of Squares Mean Square F Value Prob > F
Model k SS_{reg} = TSS-RSS MS_{reg} = SS_{reg} / k MS_{reg} / MSE p-value
Error n* - k RSS MSE = RSS / (n^* - k)
Total n* TSS
Note: If intercept is included in the model, n*=n-1. Otherwise, n*=n and the total sum of squares is uncorrected.

Where the total sum of square, TSS, is:

TSS =\sum_{i=1}^nw_i(y_i -\frac{\sum_{i=1}^n w_i y_i} {\sum_{i=1}^n w_i})^2 (corrected) (26)
TSS=\sum_{i=1}^n w_iy_i^2 (uncorrected) (27)

The F value here is a test of whether the fitting model differs significantly from the model y=constant.

Additionally, the p-value, or significance level, is reported with an F-test. We can reject the null hypothesis if the p-value is less than \alpha\,\!, which means that the fitting model differs significantly from the model y=constant.

If fixing the intercept at a certain value, the p value for F-test is not meaningful, and it is different from that in Polynomial linear regression without the intercept constraint.

Lack of fit table

To run the lack of fit test, you need to have repeated observations, namely, "replicate data" , so that at least one of the X values is repeated within the dataset, or within multiple datasets when concatenate fit mode is selected.

Notations used for fit with replicates data:

y_{ij} is the jth measurement made at the ith x-value in the dataset
\bar{y}_{i} is the average of all of the y values at the ith x-value
\hat{y}_{ij} is the predicted response for the jth measurement made at the ith x-value

The sum of square in table below is expressed by:

RSS=\sum_{i}\sum_{j}(y_{ij}-\hat{y}_{ij})^2
LFSS=\sum_{i}\sum_{j}(\bar{y}_{i}-\hat{y}_{ij})^2
PESS=\sum_{i}\sum_{j}(y_{ij}-\bar{y}_{i})^2

The Lack of fit table of linear fitting is:

DF Sum of Squares Mean Square F Value Prob > F
Lack of Fit c-k-1 LFSS MSLF = LFSS / (c - k - 1) MSLF / MSPE p-value
Pure Error n - c PESS MSPE = PESS / (n - c)
Error n*-k RSS
Note:

If intercept is included in the model, n*=n-1. Otherwise, n*=n and the total sum of squares is uncorrected. If the slope is fixed, df_{Model} = 0.

c denotes the number of distinct x values. If intercept is fixed, DF for Lack of Fit is c-k.

Covariance and Correlation Matrix

The covariance matrix for the multiple linear regression can be calculated as

Cov(\beta _i,\beta _j)=\sigma ^2(X^{\prime }X)^{-1}

(28)

In particular, the covariance matrix for the simple linear regression can be calculated as


\begin{pmatrix}
Cov(\beta _0,\beta _0) & Cov(\beta _0,\beta _1)\\
Cov(\beta _1,\beta _0) & Cov(\beta _1,\beta _1)
\end{pmatrix}=\sigma ^2\frac 1{SXX}\begin{pmatrix} \sum \frac{x_i^2}n & -\bar x \\-\bar x & 1 \end{pmatrix}

(29)

The correlation between any two parameters is:

\rho (\beta _i,\beta _j)=\frac{Cov(\beta _i,\beta _j)}{\sqrt{Cov(\beta _i,\beta _i)}\sqrt{Cov(\beta _j,\beta _j)}}

(30)

Confidence and Prediction Bands

Confidence Band

The confidence interval for the fitting function says how good your estimate of the value of the fitting function is at particular values of the independent variables. You can claim with 100\alpha% confidence that the correct value for the fitting function lies within the confidence interval, where \alpha is the desired level of confidence. This defined confidence interval for the fitting function is computed as:

\hat{Y}\pm t_{(\alpha/2,dof)}{\left [\chi^2\mathbf{x}\mathbf{C}\mathbf{x}' \right ]}^{\frac{1}{2}}

(31)

where

\mathbf{x}=\left [ \frac{\partial y}{\partial \beta_0},\frac{\partial y}{\partial \beta_1},\frac{\partial y}{\partial \beta_2},...,\frac{\partial y}{\partial \beta_k} \right ]

(32)

and C is the Covariance Matrix.

Prediction Band

The prediction interval for the desired confidence level α is the interval within which 100\alpha% of all the experimental points in a series of repeated measurements are expected to fall at particular values of the independent variables. This defined prediction interval for the fitting function is computed as:

\hat{Y}\pm t_{(\alpha/2,dof)}{\left [\chi^2(1+\mathbf{x}\mathbf{C}\mathbf{x}') \right ]}^{\frac{1}{2}}

(33)

Finding Y/X from X/Y

Residual Plots

Resudial Type

Select one residual type among Regular, Standardized, Studentized, Studentized Deleted for Plots.

Residual vs. Independent

Scatter plot of residual res vs. indenpendent variable x_1,x_2,\dots,x_k, each plot is locate in a seperate graphs.

Residual vs. Predicted Value

Scatter plot of residual res vs. fitted results \hat{y_i}

Residual vs. Order of the Data

res_i vs. sequence number i

Histogram of the Residual

The Histogram plot of the Residual res_i

Residual Lag Plot

Residuals res_i vs. lagged residual res_{(i–1)}.

Normal Probability Plot of Residuals

A normal probability plot of the residuals can be used to check whether the variance is normally distributed as well. If the resulting plot is approximately linear, we proceed assuming that the error terms are normally distributed. The plot is based on the percentiles versus ordered residual, the percentiles is estimated by

\frac{(i-\frac{3}{8})}{(n+\frac{1}{4})}

where n is the total number of dataset and i is the ith data. Also refer to Probability Plot and Q-Q Plot