ML-Hierarchical Clustering

Produce a nested sequence of clusters, a tree, also called Dendrogram.

Type of Hierarchical Clustering

Agglomerative (bottom up) clustering: It builds the dendrogram (tree) from the bottom level, and

  • merges the most similar (or nearest) pair of clusters
  • stops when all the data points are merged into a single cluster (i.e., the root cluster).

Divisive (top down) clustering: It starts with all data points in one cluster, the root.

  • Splits the root into a set of child clusters. Each child cluster is recursively divided further
  • stops when only singleton clusters of individual data points remain, i.e., each cluster with only a single point

Pros and Cons

Pros:

  • 实现简单,容易理解

Cons:

  • 合并点/分裂点选择不太容易
  • 合并/分裂的操作不能进行撤销
  • 大数据集不太合适
  • 执行效率较低为迭代次数, 为样本点数