17.7.3.4 The K-Means Cluster Analysis Dialog Box


Variables

Variables Select data for the K-Means Cluster Analysis. Data in each column corresponds to a variable and each row to an observation.
Observation Labels Choose a column for labeling each observation (optional).

Options

Specify the settings for the K-Means Cluster Analysis.

Standardize Variables
  • None
Variables are not standardized.
  • Z scores (standardize to N(0, 1))
Variables are standardized with zero mean and unit standard deviation.
  • Normalize to (0,1)
Variable are standardized in the range of 0 and 1.

Note: When you choose to standardize variables, cluster center and distance are calculated from normalized data, but Descriptive Statistics and ANOVA are calculated on the original data.

Number of Clusters Specify the number of clusters. This is enabled only when Specify Initial Cluster Center is not selected. The value should be greater than 0 and no less than the number of effective observations.
Specify Initial Cluster Center Determine whether to specify initial cluster centers or use default initial values. When Specify Initial Cluster Center is chosen, Initial Cluster Centers will be available for users to choose data from a sheet as initial cluster centers.

To learn the default initial cluster centers, see the algorithm for initial cluster centers from observations.

Initial Cluster Centers Specify initial cluster centers from data in a sheet. This is available only when Specify Initial Cluster Center is selected. The number of clusters will be the number of effective rows selected in Initial Cluster Centers. Data selected in Initial Cluster Centers should contain the same number of variables as in Variables.
Maximum Number of Iterations Specify the maximum number of iterations allowed in the analysis. The default value is 10.

Quantities

Specify the quantities to calculate for the K-Means Cluster Analysis.

Initial Cluster Centers Specify whether to show initial cluster centers in the report.
ANOVA Specify whether to perform ANOVA analysis on the cluster result.
Cluster Membership Specify whether to output cluster membership in a sheet.
Distance from Clusters Specify whether to calculate the distance between each observation and the center of its allocated cluster.

Plot

Specify whether to show the group graph.

Cluster Plot Create the cluster plot with X range = Principal Component 1 (PC 1) and Y range = Principal Component 2 (PC 2).

Additionally, when the box is checked, output to K-Means Cluster Plot data sheet quantities PC 1, PC 2, Observation Label (optional) and Membership. The resulting cluster plot will have enabled (Plot Details > Centroid (Pro) tab) Show Centroid Point for Subset, Connect to Data Points and Show Ellipse.

Additional Group Graph Specify whether to show the group graph where observations are grouped by the cluster membership. When selected, the Select Variables for Plot branch will be shown.
Select Variables for Plot Select variables as x and y for the group graph.
  • X Range
Select the variable from the sheet as x axis for the group graph.
  • Y Range
Select the variable from the sheet as y axis for the group graph.

Note that variables in the group graph can be different from those for the K-Means Cluster Analysis.

Output Settings

Specify the destination of output results for the K-Means Cluster Analysis.

K-Means Report Specify the sheet for the K-Means Cluster Analysis report. The default value is a new sheet in the input data workbook.
Cluster Membership Specify the sheet for the cluster membership and distance from cluster. The default value is a new sheet in the input data workbook. Note that it will be disabled if either Cluster Membership or Distance from Cluster is selected in the Quantities group.

Recalculate

Specify the way to recalculate and update the result if there is any change in the input data or settings.

None The output will not be connected to the source data, and any change will not result in an update of the result. Results will not be recalculated when settings are changed.
Auto The result automatically updates when source data change. You can also change settings to recalculate the result.
Manual The result will not automatically update when the source data changes. Manually activate the update by clicking the Recalculate button Button Recalculate Manual.png in the Standard toolbar. You can also change settings to recalculate the result.