2.4.8 fitpoly

Brief Information

Polynomial regression

Additional Information

Minimum Origin Version Required for all features: Origin 9.0

Command Line Usage

1. fitpoly iy:=(1,2) polyorder:=2 fixint:=0 intercept:=0 coef:=3 oy:=(4,5);

X-Function Execution Options

Please refer to the page for additional option switches when accessing the x-function from script

Variables

Display
Name
Variable
Name
I/O
and
Type
Default
Value
Description
Input iy

Input

XYRange

<active>
This variable specifies the input data range.
Polynomial Order polyorder

Input

int

2
This variable specifies the order of polynomial to be fit.
Fix Intercept fixint

Input

int

0
A value of 1 (checked in dialog) indicates fixed intercept.
Fix Intercept At intercept

Input

double

0
Specify the value of fixed intercept. If fixint is 0, this value is ignored.
Polynomial Coefficients coef

Output

vector

<optional>
This specifies the column or dataset variable to receive the polynomial coefficients, e.g. coef:=3, which means to output the polynomial coefficients to column 3.
Output oy

Output

XYRange

<optional>
This specifies the Output range to receive the polynomial fit curve.
Number of Points N

Output

int

<unassigned>
This specifies the variable to receive number of points in the fit.
Adjusted residual sum of squares AdjRSq

Output

double

<unassigned>
This specifies the variable to receive the adjusted coefficient of determination(R^2).
Coefficient of determination (R^2) RSqCOD

Output

double

<unassigned>
This specifies the column or dataset variable to receive the coefficient of determination((R^2).
Polynomial Coefficients Errors err

Output

vector

<optional>
This specifies the column or dataset variable to receive the polynomial coefficients standard errors

Description

Polynomial regression fits a given data set to the following model:

FitPolynomial help English files image006.gif.

where \gamma_i are the coefficients and \epsilon is the error term. The error term represents the unexpected or unexplained variation in the dependent variable. It is assumed that the mean of the random variable \epsilon is equal to zero.

Parameters are estimated using a weighted least-square method. This method minimizes the sum of the squares of the deviations between the theoretical curve and the experimental points for a range of independent variables. After fitting, the model can be evaluated using hypothesis tests and by plotting residuals.

It is worth noting that the higher order terms in polynomial equation have the greatest effect on the dependent variable. Consequently, models with high order terms (higher than 4) are extremely sensitive to the precision of coefficient values, where small differences in the coefficient values can result in a larges differences in the computed y value. We mention this because, by default, the polynomial fitting results are rounded to 5 decimal places. If you manually plug these reported worksheet values back into the fitted curve, the slight loss of precision that occurs in rounding will have a marked effect on the higher order terms, possibly leading you to conclude wrongly, that your model is faulty. If you wish to perform manual calculations using your best-fit parameter estimates, make sure that you use full-precision values, not rounded values. Note that while Origin may round reported values to 5 decimal places (or other), these values are only for display purposes. Origin always uses full precision (double(8)) in mathematical calculations unless you have specified otherwise. For more information, see Numbers in Origin.

Generally speaking, any continuous function can be fitted to a higher order polynomial model. However, higher order terms may not have much practical significance.

Examples

  • Code Sample
// This example shows how to use the fitpoly Polynomial Fit function and access the results.
// Get some sample data
newbook name:="Linear Regression Sample" sheet:=1 result:=ResultBook$;
impfile fname:=system.path.program$+"Samples\Curve Fitting\Multiple Gaussians.dat";
// Declare variables for coefficients and Adjusted R^2
dataset ds;	// vector argument requires dataset
double  MyR;
// Setup table for output
type Dataset\tA0\tA1\tA2\tA3\tAdjR^2;
separator 6;

// Now loop through all four curves, fit and report
loop(ii,2,5) {
    fitpoly iy:=(1,$(ii)) polyorder:=3 coef:=ds AdjRSq:=MyR;
    %N = wks.col$(ii).name$;
    type %N\t$(ds[1],S*6)\t$(ds[2],S*6)\t$(ds[3],S*6)\t$(ds[4],S*6)\t$(MyR,S*6);
}
separator 6;
separator 6;
// Now loop through all four curves, fit and report
loop(ii,2,5) {
    fitpoly iy:=(1,$(ii)) polyorder:=3 coef:=ds fixint:=1 intercept:=0 AdjRSq:=MyR;
    %N = wks.col$(ii).name$;
    type %N\t$(ds[1],S*6)\t$(ds[2],S*6)\t$(ds[3],S*6)\t$(ds[4],S*6)\t$(MyR,S*6);
}

Algorithm

Regression model:

For a given dataset (xi , yi ), i = 1,2,...n, where X is the independent variable and Y is the dependent variable, a polynomial regression fits data to a model of the following form:

FitPolynomial help English files image008.gif

where k is the degree and, in Origin, it is a positive number that is less than 10. The error term \epsilon is assumed to be independent and normally distributed N(0, \sigma^2 ).

To fit the model, assume that the residuals:

FitPolynomial help English files image010.gif

Are normally distributed with the mean equal to 0 and the variance equal to \sigma_i^2 . Then the maximum likelihood estimates for the parameters \beta_i can be obtained by minimizing the Chi-square, which is defined as:

If the error is treated as weight, the Chi-square minimizing equation can be written as:

FitPolynomial help English files image012.gif

and:

FitPolynomial help English files image002.gif

where \sigma_i are the measurement errors. If they are unknown, they should all be set to 1.

Coefficient estimation by matrix calculation:

The calculation of the estimated coefficients is a procedure of matrix calculation. First, we can rewrite the regression model in the matrix form

FitPolynomial help English files image015.gif

where:

FitPolynomial help English files image017.gif

The estimate of the vector B is the solution to the linear equations, and can be expressed as:

FitPolynomial help English files image019.gif FitPolynomial help English files image021.gif FitPolynomial help English files image023.gif FitPolynomial help English files image025.gif

where X'is the transpose of X.

FitPolynomial help English files image027.gif

Inference in polynomial regression:

The ANOVA for the polynomial regression is summarized in the following table:

df
Sum of Squares
Mean Square
F Value
Prob> F
Model
k
SSreg = TSS - RSS
MSreg = SSreg / k
MSreg/ MSE
p-value
Error
n* - k
RSS
MSE = RSS / (n*-k)
Total
n*
TSS


(Note: If intercept is included in the model, n*=n-1. Otherwise, n*=n and the total sum of square is uncorrected.)

Where the total sum of square, TSS, is

FitPolynomial help English files image029.gif (for corrected model)

FitPolynomial help English files image031.gif (for uncorrected model)

And the residual sum of square (RSS) or sum of square error (SSE), which is actually the sum of the squares of the vertical deviations from each data point to the fitted line. It can be computed as:

FitPolynomial help English files image033.gif

The result of the F-test is presented in the ANOVA table. The null hypothesis of the F test is that all of the partial coefficients are equal to zero, i.e.

H_0: \beta_0 = \beta_1 = \beta_2 = ... = \beta_k = 0

Thus, the alternative hypothesis is:

'H_\alpha': At least one \beta_j \ne 0

With the computed F-value, we can decide whether or not to reject the null hypothesis. Usually, for a given confidence level \alpha, we can reject H_0 when F > F_\alpha, or the significance of F (the computed p-value) is less than \alpha.

For the inference, we need to know the standard error of partial slopes, which may be computed as:

FitPolynomial help English files image035.gif

where c_{jj} is the jth diagonal element of (X'X)-1. And s_\epsilon is the residual standard deviation (also called td dev, tandard error of estimate, or oot MSE) computed as:

FitPolynomial help English files image037.gif

If the regression assumptions hold, we can perform the t-tests for the regression coefficients with the null hypotheses and the alternative hypotheses:

H_0: \beta_j = 0,
H_\alpha: \beta_j \ne 0,

The t-value can be computed as:

FitPolynomial help English files image039.gif

With the t-values, we can decide whether or not to reject the null hypotheses. Usually, for a given confidence level \alpha, we can reject H_0 when |t| > t{\alpha/2}, or when the significant p-value less than \alpha.

Confidence and Prediction interval:

For a particular value xp, the 100(1-\alpha)% confidence interval for the mean value of y at x=xp is:

FitPolynomial help English files image041.gif

And the 100(1-\alpha)% prediction interval for the mean value of y at x=xp is:

FitPolynomial help English files image043.gif

Coefficient of Determination:

The goodness of fit can be evaluated by coefficient of determination, R2, which is given by:

FitPolynomial help English files image045.gif

The adjusted R2 is used to adjust the R2 value for the degree of freedom. It can be computed as:

FitPolynomial help English files image047.gif

Then we can compute the R-value, which is simply the square root of R2:

FitPolynomial help English files image049.gif

Covariance and Correlation matrix:

The covariance matrix of the polynomial regression can be calculated as:

FitPolynomial help English files image051.gif

And the correlation between any two parameters is:

FitPolynomial help English files image053.gif


References

1. Bruce Bowerman, Richard T. O'Connell. 1997. Applied Statistics: Improving Business Processes. The McGraw-Hill Companies, Inc.

2. Sanford Weisberg. 2005. Applied Linear Regression, 2nd ed. John Wiley & Son, Inc., Hoboken, New Jersey.

3. William H. Press.; et al. 2002. Numerical Recipes in C++, 2nd ed. Cambridge University Press: New York.

Related X-Functions

fitLR, nlfit


Keywords:curve fitting