2.13.3.2 kmeans(Pro)


Menu Information

Statistics: Multivariate Analysis: K-Means Cluster Analysis

Brief Information

Perform K-Means clustering.

Additional Information

This feature is for OriginPro only.

Minimum Origin Version Required: 8.6

Command Line Usage

1. kmeans ir:=1:end num:=3;

2. kmeans ir:=1:end num:=3 plot:=1 iy:=(1,3);

3. kmeans ir:=1:4 specify:=1 iinitial:=[book2]1!1:4;

X-Function Execution Options

Please refer to the page for additional option switches when accessing the x-function from script

Variables

Display
Name
Variable
Name
I/O
and
Type
Default
Value
Description
Variables ir

Input

Range

<active>
Observations to cluster. Note that beginning with Origin 2020b, there is a shortened syntax that follows the form [Book]Sheet!(N1:N2), N1 = the beginning column index and N2 being the ending column index in a contiguous range of columns. More complex strings from non-contiguous data of the form [Book]Sheet!([Book]Sheet!N1:N2,[Book]Sheet!N3:N4) are also possible.
Observation Labels labelr

Input

Range

<optional>
Select labels for observations. About syntax, refer to the Variables parameter.
Standardize Variables std

Input

int

0
Specify the method to standardize variables.

Option list:

  • none:None
    Variables are not standardized.
  • snd:Z scores (standardize to N(0, 1))
    Variables are transformed to the standard normal distribution.
  • range:Normalize to (0, 1)
    Variable are transformed to the range of 0 and 1
Number of Clusters num

Input

int

2
Number of clusters for observation classification. This option is not available when using Specify Initial Cluster Centers.
Specify Initial Cluster Centers specify

Input

int

0
Specify Initial Cluster Centers (1), or use Number of Clusters (0)
Initial Cluster Centers iinitial

Input

Range

Initial cluster centers specified by users
Maximum Number of Iterations iter

Input

int

10
Specify the maximum number of iterations allowed in the analysis.
Initial Cluster Centers oinitial

Input

int

1
Specify whether (1) or not (0) to report initial cluster centers.
ANOVA anova

Input

int

0
Specify whether (1) or not (0) to report ANOVA.
Cluster Membership member

Input

int

1
Specify whether (1) or not (0) to output cluster membership.
Distance from Cluster distance

Input

int

0
Specify whether (1) or not (0) to calculate the distance between each observation and its corresponding cluster center.
Cluster Plot clusterPlot

Input

int

1
Specify whether (1) or not (0) to create Cluster plot.
Additional Group Graph plot

Input

int

0
Specify whether (1) or not (0) to create additional group graph.
Select Variables for Plot iy

Input

Range

Range contains the data to be grouped for group graph, which is only available when plot is 1.
  • X Range
    Select the range as x axis for the group graph.
  • Y Range
    Select the range as y axis for the group graph.
K-Means Report rt

Output

ReportTree

[<input>]<new>
Specify the location of output report tree.
Cluster Membership rd

Output

ReportData

<new>
Specify the location for the cluster membership and distance from cluster.
Plot Data rdplot

Output

ReportData

<new>
Specify the sheet for plot data. This variable is hidden in the dialog.

Description

This function performs the K-Means Cluster Analysis on range data. For more information, see the Cluster Analysis.

Examples

  1. Import the data file \Samples\Statistics\Fisher's Iris Data.dat .
  2. Run the script.
kmeans ir:=1:4 num:=3 -r 2;

Algorithm

See the algorithm of K-Means Cluster Analysis.

References

See the reference of Cluster Analysis.

Related X-Functions

pca, hcluster, discrim