# 5.6.2 Cluster Analysis

## Summary

We will perform cluster analysis for the mean temperatures of US cities over a 3-year-period.

The starting point is a hierarchical cluster analysis with randomly selected data in order to find the best method for clustering. K-means analysis, a quick cluster method, is then performed on the entire original dataset.

Minimum Origin Version Required: OriginPro 8.6 SR0

## Hierarchical Cluster Analysis

1. Start with a new project or a new workbook. Import the data file \Samples\Graphing\US Mean Temperature.dat.
2. Highlight Column D through Column O.
3. Select Statistics: Multivariate Analysis: Hierarchical Cluster Analysis.
4. Select Input tab, click the triangle button next to Variables, and then click Select Columns... in the context menu.
5. In the lower panel of the Column Browser dialog, click the ... button. Set the data range from 1 to 100. Click OK.
6. In the dialog, go to Settings tab, make sure Cluster is set to Observations, and Number of Clusters is 1. Select Furthest Neighbor for Cluster Method and then click OK.
7. Go to the Cluster 1 sheet. Based on the resulting dendrogram, we choose to cluster data into 5 groups.
8. Click the lock icon in the dendrogram or the result tree, and then click Change Parameters in the context menu.
9. Set Number of Clusters to 5 in the Settings tab and then select the Cluster Center check box in the Quantities tab. Click OK.
10. In the resulting dendrogram, we can clearly see how observations are clustered. (Note, you can double-click to open and customize the dendrogram.)
11. Due to the large number of observations, tick labels overlap in this dendrogram. Use the Scale In tool to select an area to magnify.
 Note that beginning with Origin 2019b you will find, on the Plot tab, a radio button for displaying Similarity on the Y axis of your Dendrogram (Distance is still default).

## Analyzing Original Data with K-Means Cluster

1. Right-click on Cluster Center and select Create Copy as New Sheet in the context menu. We are going to use the newly created Cluster Center as the Initial Cluster Centers in our k-means cluster analysis.
2. Go back to the worksheet with the source data (US Mean Temperature), and highlight col(D) through col(O). Select Statistics: Multivariate Analysis: K-Means Cluster Analysis.
3. Select the Specify Initial Cluster Centers check box in the Options tab. Click the interactive button next to Initial Cluster Centers. The dialog will "roll up".
4. Go to Cluster Center and hightlight Col(D) through Col(O). Click the button on the rolled-up dialog to restore the dialog.
5. In the Plot tab, select Group Graph. Click the interactive button next to X Range. The dialog will "roll up". Go back to the source worksheet US Mean Temperature, and highlight Col(B):Longtitude. Click the button in the rolled up dialog to restore.
6. Click the triangle button next to Y Range, and then select C(Y), Latitude. Click OK.
7. Activate the worksheet K-Means Plot Data1. Observe that data has been clustered into 5 groups corresponding to the latitudes of the cities.
 User can also select the output destination of Cluster Membership column, such as next to input data, for further operation if needed