date: 2024-12-24
title: ML-K means
status: DONE
author:
- AllenYGY
tags:
- NOTE
- K-Means
- ML
- Clustering
publish: True
ML-K means
Simple: easy to understand and to implement
Efficient: Time complexity: O(tkn),
Since both k and t are small. k-means is considered a linear algorithm.
K-means is the most popular clustering algorithm.
Note that: it terminates at a local optimum if SSE (Sum of Square Error) is used. The global optimum is hard to find due to its complexity.
The algorithm is only applicable if the mean is defined.
One method is to remove some data points in the clustering process that are much further away from the centroids than other data points.
Another method is to perform random sampling. Since in sampling we only choose a small subset of the data points, the chance of selecting an outlier is very small.
类中心初始化
Use the centroid of each cluster to represent the cluster.
Using classification model
Use frequent values to represent cluster
Clusters of arbitrary shapes
Given a set of 5 samples:
Try the k-means clustering algorithm to cluster the samples into 2 classes.
Consider Data point:
Data Point | Distance to |
Distance to |
Cluster |
---|---|---|---|
(0, 2) | 2 | (0,0) | |
(0, 0) | 0 | 5 | (0,0) |
(1, 0) | 1 | 4 | (0,0) |
(5, 0) | 5 | 0 | (5,0) |
(5, 2) | 2 | (5,0) |
Data Point | Distance to |
Distance to |
Cluster |
---|---|---|---|
(0, 2) | (0,0) | ||
(0, 0) | (0,0) | ||
(1, 0) | (0,0) | ||
(5, 0) | 1 | (5,0) | |
(5, 2) | 1 | (5,0) |
The cluster does not change after the second iteration. The final cluster assignments are:
Class-1:
Class-2: