# 16.2 Interpolate/Extrapolate Y from X

## Overview Interpolation is a method of estimating and constructing new data points from a discrete set of known data points. Given an X vector, this function interpolates a vector Y based on the input curve (XY Range). Origin provides four options for data interpolation: Linear, Cubic spline, Cubic B-spline, Akima Spline.

Linear interpolation is the simplest and fastest data interpolation method. In linear interpolation, the arithmetic mean of two adjacent data points is calculated. This method is useful in situations where low precision can be tolerated. Linear interpolation is also useful for extremely large data sets, because the calculations are not time- or computation-power intensive.

The generalization of linear interpolation is polynomial interpolation. Polynomial interpolation requires much more computation power than linear interpolation and when the polynomial order is high, the fit of the data oscillates wildly. These disadvantages can be avoided by using low-order polynomial fitting, or spline interpolation.

The Cubic spline method uses 3rd order polynomials, and executes data-fitting in a piecewise fashion. Spline interpolation incurs less error than linear interpolation, and the interpolant is smoother.

Similar to Cubic spline interpolation, Cubic B-spline interpolation also fits the data in a piecewise fashion, but it uses 3rd order Bezier splines to approximate the data. Cubic B-Splines allow the accurate modeling of more general classes of geometry.

##### To Interpolate Y from X
1. Create a new worksheet with input data.
2. Select desired data.
3. Select Analysis: Mathematics:Interpolate/Extrapolate Y from X. This opens the interp1 dialog.

The interp1 X-Function is called to perform the calculation.

Note: To generate uniform linearly spaced interpolated values, use the Interpolate/Extrapolate... menu command.

## Dialog Options

Recalculate Controls recalculation of analysis results None Auto Manual For more information, see: Recalculating Analysis Results The vector to interpolate on. The XY range to be interpolated. For help with range controls, see: Specifying Your Input Data Specify interpolation methods Linear Linear interpolation is a fast method of estimating a data point by constructing a line between two neighboring data points. The resulting point may not be an accurate estimation of the missing data. Cubic Spline This method splits the input data into a given number of pieces, and fits each segment with a cubic polynomial. The second derivative of each cubic function is set equal to zero. With these boundary conditions met, an entire function can be constructed in a piece-wise manner. Cubic B-Spline This method also splits the input data into pieces, each segment is fitted with discrete Bezier splines. Akima Spline This method is based on a piecewise function composed of a set of polynomials. The akima interpolation is stable to outliers. When parts of the data range specified by X Values to Interpolate is outside that of the X range specified in Input, these range parts will be considered as the extrapolated range, because the resulted Y values for these parts will be computed from extrapolation. This option can then be used to specify how to extrapolate the corresponding Y values. Extrapolate Extrapolate Y using the last two points Set missing Set all Y values in the extrapolated range to be missing values. Repeat the last value Use the Y value of the closest input X value for all values in the extrapolated range. Boundary condition is only available in cubic spline method. Natural 2nd derivatives are 0 on both ends. Not-A-Knot 3rd derivatives are continuous on the second and last-second point. Smoothing is only available in Cubic B-Spline method. The output vector. Output the coefficients for Spline or B-spline method or not, and show them in which column.

## Algorithm

Given a sequence of distinct pairs of data ( $x_i\,$, $y_i\,$), where $i= 0, 1, ... n-1\!$. we are looking for the interpolated $y\!$ at $x\!$ by the following methods:

1. Linear interpolation (interp1q)

For $x

For $x>x_{n-1,}y=y_{n-1}+\frac{y_{n-1}-y_{n-2}}{x_{n-1}-x_{n-2}}\times (x-x_{n-1})$

For $x_i

2. Cubic spline (spline)

Origin uses the natural cubic spline to do interpolation: $y=Ay_i+By_{i+1}+Cy_i^{''}+Dy_{i+1}^{''}$

where: $A\equiv \frac{x_{i+1}-x}{x_{i+1}-x_i},B\equiv 1-A,C\equiv \frac 16\left( A^3-A\right) \left( x_{i+1}-x_i\right) ^2,D\equiv \frac 16(B^3-B)(x_{i+1}-x_i)^2$

And $y_i^{''}$can be generated from: $\frac{x_i-x_{i-1}}6y_{i-1}+\frac{x_{i+1}-x_{i-1}}3y_i+\frac{x_{i+1}-x_i}6y_{i+1}=\frac{y_{i+1}-y_i}{x_{i+1}-x_i}-\frac{y_i-y_{i-1}}{x_i-x_{i-1}}$

For boundary points, we set $y_o^{''}$ and $y_{n-1}^{''}$equal to zero.

3. Cubic B-spline (bspline)

For $x or $x>x_{n-1}\!$perform linear interpolation.

For $x_0

Here, $N(x)\!$ denotes the normalized cubic B-spline defined upon the knots $x_i\,$, $x_i+1\,$, ..., $x_i+4\,$, And $c_i\,$ denotes the coefficient of the corresponding function.

The total number $n\!$ of these knots and their values $x_1\,$, ..., $x_n\,$ are chosen automatically by the function. The knots $x_5\,$, ..., $x_n-4\,$ are the interior knots; they divide the approximation interval [ $x_1\,$, $x_m\,$] in to $n-7\!$ sub-intervals. The coefficients $c_1\,$, $c_2\,$, ..., $c_n-4\,$ are then determined as the solution of the following constrained minimization problem:

minimize $\eta =\sum_{i=5}^{n-4}\delta _i^2\,$

subject to the constraint $\theta =\sum_{r=1}^m\varepsilon _r^2\leq S\,$

where $\delta _i\,$ stands for the discontinuity jump in the third order derivative of $y\!$ at the interior knot $x_i\,$, $\varepsilon _r\,$ denotes the weighted residual $w_r (y_r-y(x_r))\,$, and S is a non-negatative number to be specified by the user.

The quantity $\eta\,$ can be seen as a measure of the (lack of) smoothness of $y\!$, while closeness of fit is measured through $\theta\,$. By means of the parameter $S\!$, 'the smoothing factor', the user will then control the balance between these two (usually conflicting) properties. If $S\!$ is too large, the spline will be too smooth and signal will be lost (underfit); if $S\!$ is too small, the spline will pick up too much noise (overfit). In the extreme cases the function will return an interpolating spline ( $\theta\,$=0) is $S\!$ is set to zero, and the weighted least-squares cubic polynomial ( $\eta\,$=0) is $S\!$ if set very large. Experimenting with $S\!$ values between these two extremes should result in a good compromise.

4. Akima Spline (akima)

The Akima interpolation method is based on a piecewise function composed of a set of polynomials(third degree at most). This piecewise function can be applied to successive intervals of the given XY points. The slope of the input data plot at each given point can be assumed to be determined by the XY coordinates of 4 neighbor points and the point itself. Then from the slopes at two paired given points and their coordinates, a third degree polynomial is calculated, representing the interval curve between these two points, and the interpolation is then carried out based on the combination of polynomials. An additional estimation is made when calculating polynomials for end points.

Firstly the curve slope $t$ at a given point will be calculated. For a given point (point 3), there will then be five data points 1,2,3,4,5, and $m_{1}, m_{2}, m_{3}, m_{4}$ are slopes of line segments $\bar{12}, \bar{23}, \bar{34}, \bar{45}$ respectively, and $m_i=(y_{i+1}-y_i)/(x_{i+1}-x_i)$. The curve slope $t$ is then determined by the following equations under different conditions:

When $m_{1}\neq m_{2}$ or $m_{3}\neq m_{4}$, $t = \left ( \left | m_{4} - m_{3} \right |m_{2} + \left | m_{2} - m_{1} \right |m_{3} \right )/\left ( \left | m_{4} - m_{3} \right | + \left | m_{2} - m_{1} \right |\right )$

When $m_{1} = m_{2}$ and $m_{3} = m_{4}$, $t = \frac{(x_4-x_3)m_2 + (x_3-x_2)m_3}{x_4-x_2}$

Slopes for two end points need to be estimated at each end of the curve. To estimate them, we calculate its slope by interpolating a parabolic curve from its adjacent three points, e.g. for the first point's slope, we can interpolate a parabolic curve from first three points, and the first point's slope can be calculated by the derivative of the interpolated curve.

Then the polynomial for an interval $[x_i, x_{i+1}]$ between two consecutive data points $\left ( x_i, y_i \right )$ and $\left ( x_{i+1}, y_{i+1} \right )$ are determined by the following four conditions: $y|_{x=x_i} = y_i$ $y'|_{x=x_i}=t_i$ $y|_{x=x_{i+1}} = y_{i+1}$ $y'|_{x=x_{i+1}}=t_{i+1}$

where $t_i$ and $t_{i+1}$ are the slopes at the two points.

## References

1. Michelle Schatzman. Numerical Analysis: A Mathematical Introduction, Chapters 4 and 6. Clarendon Press, Oxford (2002).

2. William H. Press, etc. Numerical Recipes in C++. 2nd Edition. Cambridge University Press (2002).

3. Nag C Library Function Document, nag_1d_spline_fit (e02bec).

4. Hiroshi Akima, Journal of the Association for Computing Machinery, Vol. 17, No. 4, (1970)