File Exchange > Data Analysis >    Best Subset Selection

Author:
OriginLab Technical Support
Date Added:
3/21/2025
Last Update:
6/13/2025
Downloads (90 Days):
9
Total Ratings:
0
File Size:
449 KB
Average Rating:
File Name:
Best_Subse...on.opx
File Version:
1.00
Minimum Versions:
License:
Type:
App
Summary:

Compare all possible multiple linear regression models for given independent variables, and display optimal subsets of independent variables for different statistics.

Screen Shot and Video:
Description:

PURPOSE
This app can fit input data with all possible multiple linear regression models, compare these models, and find optimal subsets of independent variables for different statistics criteria.

INSTALLATION
Download the file Best_Subset_Selection.opx, and then drag-and-drop onto the Origin workspace. An icon will appear in the Apps Gallery window.
NOTE: This tool requires OriginPro.

REQUIRE PACKAGES
This app requires orgutils app.

OPERATION
Make a worksheet for input data active. Click on the Best Subset Selection icon in the Apps Gallery window. A dialog will open. Dialog settings include:

  • Input tab
Input Description
Dependent Variable Specify dependent variable for regression.
Independent
Variables
Free Independent Variables Specifiy possible independent variables for regression, which will be used as subsets of regression models. The maximum number of variables is 28.
Independent Variables
in All Models
Specify independent variables, which will be included in all regression models. It is also called Forced Variables.
  • Settings tab
Settings Description
Number of Free
Independent Variables
Minimum Only compare and show models whose number of free independent variables is no less than the Minimum value.
Maximum Only compare and show models whose number of free independent variables is no more than the Maximum value.
Number of Models to Show for Each Size Specify maximum number of models to show for each given number of independent variables. It will choose models which have the highest \(R^2\) values for each given number of independent variables.
Include Intercept Determine whether to include intercept in all regression models.
  • Output tab
Output Description
Summary Report sheet to show statistics results for regression models, and list chosen free independent variables in these models. Each row in the report sheet represents a model. In each column for statistics results, the best model is marked in a red color.
Fit Data Report data to list dependent variable, free independent variables and forced independent variables. Missing values are removed.
  • Mini Toolbar
    Right click on a row in the report sheet. A mini toolbar will appear. Click on Multiple Linear Regression button in the toolbar, it will perform multiple linear regression with the model specified by the row, and generate a multiple linear regression report.

SAMPLE OPJU FILE
This app provides a sample OPJU file. Right click on the Best Subset Selection icon in the Apps Gallery window, and choose Show Samples Folder from the short-cut menu. A folder will open. Drag-and-drop the project file BestSubsetEx.opju from the folder onto Origin. The Notes window in the project shows detailed steps.
Note: If you wish to save the OPJU after changing, it is recommended that you save to a different folder location (e.g. User Files Folder).

ALGORITHM
R-Square (COD)Adj. R-SquareRoot-MSE (SD) are defined in the same way as Origin's built-in Multiple Linear Regression tool.

  • PRESS
    \(\text{PRESS} = \sum \left( \frac{e_i}{1 - h_i} \right)^2\)
    where \(e_i\) is the residual, \(h_i\) in the ith diagonal element of \(X(X'X)^{-1}X'\). The smaller the value is, the better the model is.
  • Pred. R-Square
    \(\text{Pred}.\ R^2 = 1 - \frac{ \text{PRESS} }{ TSS }\)
    where \(TSS = \sum (y_i - \bar{y})^2\), \(y_i\) is input data for the dependent variable, and \(\bar{y}\) is the mean of the dependent variable. The smaller the value is, the better the model is.
  • Mallows' Cp
    \(C_p = \displaystyle \frac{RSS}{\hat{\sigma}^2} - (n-2p)\)
    where n is the number of input data, p is the number of parameters in the model, \(RSS = \sum e_i^2\), and \(\hat{\sigma}^2\) is Reduced Chi-Sqr for the full model. If \(C_p\) is close to p (excluding the full model because its \(C_p\) is always p. ), it will show the model is good.
  • AICc (Akaike's Corrected Information Criterion)
    \(\text{AICc} = -2 \ln (\text{Likelihood}) + 2(p+1) + \displaystyle \frac{2(p+1)(p+2)}{n-p-2}\)
    where \(-2 \ln (\text{Likelihood}) = n \ln (RSS/n) + n + n \ln(2 \pi)\). And the smaller the value is, the better the model is.
  • BIC (Bayesian Information Criterion)
    \(\text{BIC} = -2 \ln (\text{Likelihood}) + (p+1) \ln (n)\)
    And the smaller the value is, the better the model is.
  • Condition number
    \(C = \displaystyle \frac{\lambda_{\max}}{\lambda_{\min}}\)
    where \(\lambda\) are eigenvalues from the correlation matrix of independent variables in the model. And the smaller the value is, the better the model is.

Reference

  1. nag_all_regsn (g02eac)

Related Apps

  1. General Linear Regression

Updates:

Reviews and Comments:

Be the first to review this File Exchange submission.