2.13.3.3 hcluster(Pro)

Menu Information

Statistics: Multivariate Analysis: Hierarchical Cluster Analysis

Brief Information

Perform hierarchical cluster analysis

Additional Information

This feature is for OriginPro only.

Minimum Origin Version Required:8.6

Command Line Usage

  1. hcluster irng:=2:5 label:=1 number:=2;
  2. hcluster irng:=4:15 obj:=1 number:=3;

X-Function Execution Options

Please refer to the page for additional option switches when accessing the x-function from script

Variables

Display
Name
Variable
Name
I/O
and
Type
Default
Value
Description
Variables irng

Input

Range

<active>
Select data range for the hierarchical cluster analysis. Note that beginning with Origin 2020b, there is a shortened syntax that follows the form [Book]Sheet!(N1:N2), N1 = the beginning column index and N2 being the ending column index in a contiguous range of columns. More complex strings from non-contiguous data of the form [Book]Sheet!([Book]Sheet!N1:N2,[Book]Sheet!N3:N4) are also possible.
Observation Labels label

Input

Range

<optional>
Select labels for observations. If labels are chosen, they will be shown as ticks of X axis in the dendrogram. This option is enabled only when obj is Observations.
Cluster obj

Input

int

0
Specify the type of objects to cluster.

Option list:

  • Observations
    Cluster observations.
  • Variables
    Cluster variables.
Cluster Method link

Input

int

2
Select the linkage method to calculate the distance between a cluster and a new cluster. Values start from 0, but string values (such as near) are recommended for clarity.

Option list:

  • near:Nearest neighbor
    The minimum of two distances between a cluster and two clusters merged to a new cluster. Also called single linkage.
  • furth:Furthest neighbor
    The maximum of distances between a cluster and two clusters merged to a new cluster. Also called complete linkage.
  • group:Group average
    The mean of two distances between a cluster and two clusters merged to a new cluster.
  • centroid:Centroid
    Clusters are produced that maximize the distance between the centers of clusters.
  • median:Median
    The median distance between an item in one cluster and an item in the other cluster.
  • ward:Ward
    Clusters are produced that minimize the within-cluster variance.

To learn more about linkage methods, see the algorithm of linkage methods.

Distance Type dist1

Input

int

0
Select a distance type in the hierarchical cluster analysis when obj is Observations. Values start from 0, but string values (such as euc) are recommended for clarity.

Option list:

  • euc:Euclidean
    The square root of the sum of the squared differences between two observations.
  • squ:Squared Euclidean
    The sum of the squared differences between two observations.
  • city:City block
    The sum of the absolute differences between two observations. Also known as Manhattan distance.
Distance Type dist2

Input

int

0
Select a distance type in the hierarchical cluster analysis when obj is Variables. Values start from 0, but string values (such as corr) are recommended for clarity.

Option list:

  • corr:Correlation
    The difference between 1 and the correlation of two variables.
  • abs:Absolute correlation
    The difference between 1 and the absolute correlation of two variables.
Standardize Variables std

Input

int

0
Specify the method to standardize variables. It is available only when obj is Observations. Values start from 0, but string values (such as snd) are recommended for clarity.

Option list:

  • none:None
    Variables are not standardized.
  • snd:Z scores (standardize to N(0, 1))
    Variables are transformed to the standard normal distribution.
  • range:Normalize to (0, 1)
    Variable are transformed to the range of 0 and 1
Number of Clusters number

Input

int

1
Specify the number of clusters.
Find Clustroid by stat

Input

int

0
Specify the method to find the clustroid: the most/least representative variable/observation.

Option list:

  • sd:Sum of distances
    Find Clustroid using the sum of distances measured from all other observations/variables in the cluster.
  • md:Maximum distance
    Find Clustroid using the Maximum distance among all distances measured from other observations/variables in the cluster.
  • ssd:Sum of squares of distances
    Find Clustroid using the sum of the squares of distances measured from all other observations/variables in the cluster.
Dissimilarity Matrix dissimilarity

Input

int

0
Specify whether to output the distance matrix. For a large number of objects, the distance matrix will be shown in a sheet instead of the report. 1 = Yes, 0 = No.
Cluster Stages stage

Input

int

1
Specify whether to output the cluster stages. 1 = Yes, 0 = No.
Cluster Center center

Input

int

0
Specify whether to calculate cluster centers. It is available only when obj is Observations. 1 = Yes, 0 = No.
Distance between Cluster Centers distc2c

Input

int

0
Specify whether to calculate the distances between cluster centers. It is available only when obj is Observations. 1 = Yes, 0 = No.
Distance between Observations and Clusters disto2c

Input

int

0
Specify whether to calculate the distance between each observation and cluster centers. It is available only when obj is Observations. 1 = Yes, 0 = No.
Dendrogram dendrogram

Input

int

1
Specify whether to show the dendrogram. 1 = Yes, 0 = No.
Show Dendrogram ngraph

Input

int

0
Specify whether to show the dendrogram in a single graph or in separate graphs for clusters. It is enabled only when dendrogram is 1. Values start from 0.

Option list:

  • Show the dendrogram in a single graph. Different clusters are shown in different colors.
  • Show the dendrogram in separate graphs for clusters. Each graph represents a cluster.
Orientation orient

Input

int

0
Specify the orientation of the dendrogram. Enabled only when dendrogram is Yes.

Option List:

  • 0: Vertical
    Plot Dendrogram vertically.
  • 1: Horizontal
    Plot Dendrogram horizontally.
  • 2: Circular
    Plot circular Dendrogram
Cluster Report rt

Output

ReportTree

<new>
Specify the sheet for the hierarchical cluster analysis report.
Cluster Membership rd

Output

ReportData

<new>
Specify the sheet for cluster membership and distance between observations and clusters.
Distance Matrix rddist

Output

ReportData

<new>
Specify the sheet for distance matrix when number of objects to cluster is very large. This variable is hidden in the dialog.
Plot Data rdplot

Output

ReportData

<new>
Specify the sheet for plot data. This variable is hidden in the dialog.
Clustroid Info clustroid

Input

int

1
Specify the method to find the Clustroid Info: the most/least representative variable/observation

Description

This function performs the Hierarchical Cluster Analysis on range data. For more information, see the Cluster Analysis.

Examples

  1. Import the data file \Samples\Graphing\US Mean Temperature.dat.
  2. Run the script.
hcluster irng:=4[1]:15[100] number:=5 rd:=[<input>]<input> -r 2;

Algorithm

See the algorithm of Hierarchical Cluster Analysis.

References

See the reference of Cluster Analysis.

Related X-Functions

pca, kmeans, discrim