Response |
Specify data for response variable, which should be categorical, and containing a finite and countable number of categories, and that can be text values or numeric values. |
Binary or Multinomial |
Options include:
- Binary Response
When selecting this item, the response should contains only two categories.
- Multinomial Response
When selecting this item, the response should contains more than two categories.
|
Response Event |
Specify event for the response variable, and it is only available for binary response. The options are the categories from the actually response variable. |
Continuous Predictors |
Specify the continuous variables, which may explain or predict the change of the response. And the values of continuous predictors must be numeric. |
Categorical Predictors |
Specify the categorical variables, which may explain or predict the change of the response. And the values of categorical predictors can be text or numeric. |
Prior Probabilities |
Specify how to calculate the prior probabilities for each categories of the response. Options include:
- Equal Probability
Each response category has the same prior probability.
- Computed From Sample Frequencies
The prior probabilities are computed from the sample proportions.
- Specified
The prior probabilities are specified by user.
|
Prior Probabilities (Separated by Space) |
Available when Prior Probabilities is Specified. Specify the prior probabilities for all response categories. Sum of all the values should be 1. |
Misclassification Costs (Separated by Space) |
Specify misclassification cost for categories of response. It is a matrix, both dimensions are the number of categories of response. And the diagonal of the matrix is zero. |
Validation Method |
Specify the validation method to test the model. Options include:
- K-Fold Validation
Use the K-fold cross-validation method to validate the test sample.
- Test Set
Separate the data set into two parts, and one part for training, the other part for testing.
- None
No validation is performed.
|
Number of Fold (K) |
When Validation Method is K-Fold Cross Validation, this is used to specify the number of folds.
|
Fraction of Rows as Test Set |
When Validation Method is Test Set, this is used to specify how many samples are used as testing data.
|
Random Number Seed |
For K-Fold Cross Validation or Test Set, the samples will be randomly selected for each fold or for testing data, this is the seed for generating the random number.
|
Method to Split Node |
Specify the splitting method for creating the decision tree. Options include:
- Gini
- Entropy
- Twoing
This method is available when response is multinomial.
|
Optimal Tree Selection Criterion |
Select the criterion for choosing the optimal tree, including
- Minimum Misclassification Cost
The optimal tree has the minimum misclassification cost.
- Within K Standard Errors of Minimum Misclassification Cost
The optimal tree has the misclassification cost within K standard errors of the minimum misclassification cost, and also contains the least number of leaf nodes.
|
K = |
K standard errors.
|
Number of Surrogate Splits |
This is used to specify how many surrogates to search for a predictor with missing values.
|
Minimum Samples to Split Branch Node |
This is one condition to decide if to split a branch node. If the number of samples a node contains is less than this value, this node will be a leaf node.
|
Minimum Samples Allowed for Leaf Node |
This is to specify the minimum number of samples that can be contained in a leaf node. If a branch node is splitted and get a leaf node with number of samples less than this value, this split is not allowed.
|
Maximum Tree Depth (Root Node is 1) |
Specify the maximum depth of a tree. The root node has depth of 1.
|
Maximum Number of Leaf Nodes |
Specify maximum number of leaf nodes that can be in a tree.
|
Weights |
Specify weights for each response sample. If not specified, all response samples have weight of 1.
|
Predict |
This tab is used to specify the data for prediction. The continuous predictors and categorical predictors should have the same structure as the training continuous predictors and categorical predictors.
|
Output |
Specify where to output the report table and result data.
|