Best Subset Selection - File Exchange

File Exchange > Data Analysis > Best Subset Selection

Add rating or comment

How to install and run

Author:

OriginLab Technical Support

Date Added:

3/21/2025

Last Update:

6/13/2025

Downloads (90 Days):

Total Ratings:

File Size:

449 KB

Average Rating:

File Name:

Best_Subse...on.opx

File Version:

1.00

Minimum Versions:

2025b (10.25)

License:

Type:

App

Summary:

Compare all possible multiple linear regression models for given independent variables, and display optimal subsets of independent variables for different statistics.

Screen Shot and Video:

Description:

PURPOSE
This app can fit input data with all possible multiple linear regression models, compare these models, and find optimal subsets of independent variables for different statistics criteria.

INSTALLATION
Download the file Best_Subset_Selection.opx, and then drag-and-drop onto the Origin workspace. An icon will appear in the Apps Gallery window.
NOTE: This tool requires OriginPro.

REQUIRE PACKAGES
This app requires orgutils app.

OPERATION
Make a worksheet for input data active. Click on the Best Subset Selection icon in the Apps Gallery window. A dialog will open. Dialog settings include:

Input tab

Input		Description
Dependent Variable		Specify dependent variable for regression.
Independent Variables	Free Independent Variables	Specifiy possible independent variables for regression, which will be used as subsets of regression models. The maximum number of variables is 28.
Independent Variables	Independent Variables in All Models	Specify independent variables, which will be included in all regression models. It is also called Forced Variables.

Settings tab

Settings		Description
Number of Free Independent Variables	Minimum	Only compare and show models whose number of free independent variables is no less than the Minimum value.
Number of Free Independent Variables	Maximum	Only compare and show models whose number of free independent variables is no more than the Maximum value.
Number of Models to Show for Each Size		Specify maximum number of models to show for each given number of independent variables. It will choose models which have the highest \(R^2\) values for each given number of independent variables.
Include Intercept		Determine whether to include intercept in all regression models.

Output tab

Output	Description
Summary	Report sheet to show statistics results for regression models, and list chosen free independent variables in these models. Each row in the report sheet represents a model. In each column for statistics results, the best model is marked in a red color.
Fit Data	Report data to list dependent variable, free independent variables and forced independent variables. Missing values are removed.

Mini Toolbar
Right click on a row in the report sheet. A mini toolbar will appear. Click on Multiple Linear Regression button in the toolbar, it will perform multiple linear regression with the model specified by the row, and generate a multiple linear regression report.

SAMPLE OPJU FILE
This app provides a sample OPJU file. Right click on the Best Subset Selection icon in the Apps Gallery window, and choose Show Samples Folder from the short-cut menu. A folder will open. Drag-and-drop the project file BestSubsetEx.opju from the folder onto Origin. The Notes window in the project shows detailed steps.
Note: If you wish to save the OPJU after changing, it is recommended that you save to a different folder location (e.g. User Files Folder).

ALGORITHM
R-Square (COD), Adj. R-Square, Root-MSE (SD) are defined in the same way as Origin's built-in Multiple Linear Regression tool.

PRESS
\(\text{PRESS} = \sum \left( \frac{e_i}{1 - h_i} \right)^2\)
where \(e_i\) is the residual, \(h_i\) in the ith diagonal element of \(X(X'X)^{-1}X'\). The smaller the value is, the better the model is.
Pred. R-Square
\(\text{Pred}.\ R^2 = 1 - \frac{ \text{PRESS} }{ TSS }\)
where \(TSS = \sum (y_i - \bar{y})^2\), \(y_i\) is input data for the dependent variable, and \(\bar{y}\) is the mean of the dependent variable. The smaller the value is, the better the model is.
Mallows' Cp
\(C_p = \displaystyle \frac{RSS}{\hat{\sigma}^2} - (n-2p)\)
where n is the number of input data, p is the number of parameters in the model, \(RSS = \sum e_i^2\), and \(\hat{\sigma}^2\) is Reduced Chi-Sqr for the full model. If \(C_p\) is close to p (excluding the full model because its \(C_p\) is always p. ), it will show the model is good.
AICc (Akaike's Corrected Information Criterion)
\(\text{AICc} = -2 \ln (\text{Likelihood}) + 2(p+1) + \displaystyle \frac{2(p+1)(p+2)}{n-p-2}\)
where \(-2 \ln (\text{Likelihood}) = n \ln (RSS/n) + n + n \ln(2 \pi)\). And the smaller the value is, the better the model is.
BIC (Bayesian Information Criterion)
\(\text{BIC} = -2 \ln (\text{Likelihood}) + (p+1) \ln (n)\)
And the smaller the value is, the better the model is.
Condition number
\(C = \displaystyle \frac{\lambda_{\max}}{\lambda_{\min}}\)
where \(\lambda\) are eigenvalues from the correlation matrix of independent variables in the model. And the smaller the value is, the better the model is.

Reference

nag_all_regsn (g02eac)

Related Apps

General Linear Regression

Updates:

Reviews and Comments:

Be the first to review this File Exchange submission.

© OriginLab Corporation. All rights reserved. Site Map \| Privacy Policy \| Terms of Use

× ☐ _ Let's Chat