Clustering (Part 2c of 6) - Simple K-means
Let us see how the simple k-means clustering algorithm works..
The input to the k-means clustering algorithm is k, the number of clusters to be identified. Critics of k-means often feel that the need to specify k is a disadvantage. Other disadvantages includes how the initial k cluster centers are selected. Many DBMS researchers have derived variants of k-means to address some of these problems.
K-means work as follows.
Given N data objects,
Step1 :
Randomly select k objects and make them the cluster centers.
(i.e. These initial k clusters each have 1 object, which is also the center of the cluster!)
Step2:
Consider the remaining (N - k) data objects. Assign each of these data objects to one of the k-clusters of which it is nearest to (e.g. based on its Euclidean distance).
Re-compute the cluster center for the cluster assigned.
Continue the process until all the (N-k) data objects are assigned.
Step3:
For all the k clusters, update their new cluster mean (i.e. center).
Re-assign the N to the clusters based on proximity to the cluster center.
Repeat until there are no more changes to the assignment (i.e. the algorithm converges!)
The input to the k-means clustering algorithm is k, the number of clusters to be identified. Critics of k-means often feel that the need to specify k is a disadvantage. Other disadvantages includes how the initial k cluster centers are selected. Many DBMS researchers have derived variants of k-means to address some of these problems.
K-means work as follows.
Given N data objects,
Step1 :
Randomly select k objects and make them the cluster centers.
(i.e. These initial k clusters each have 1 object, which is also the center of the cluster!)
Step2:
Consider the remaining (N - k) data objects. Assign each of these data objects to one of the k-clusters of which it is nearest to (e.g. based on its Euclidean distance).
Re-compute the cluster center for the cluster assigned.
Continue the process until all the (N-k) data objects are assigned.
Step3:
For all the k clusters, update their new cluster mean (i.e. center).
Re-assign the N to the clusters based on proximity to the cluster center.
Repeat until there are no more changes to the assignment (i.e. the algorithm converges!)
0 Comments:
Post a Comment
<< Home