SQL Server Gems: Clustering (Part 2c of 6)

Thursday, January 26, 2006

Clustering (Part 2c of 6) - Simple K-means

Let us see how the simple k-means clustering algorithm works..

The input to the k-means clustering algorithm is k, the number of clusters to be identified. Critics of k-means often feel that the need to specify k is a disadvantage. Other disadvantages includes how the initial k cluster centers are selected. Many DBMS researchers have derived variants of k-means to address some of these problems.

K-means work as follows.

Given N data objects,

Step1 :
Randomly select k objects and make them the cluster centers.
(i.e. These initial k clusters each have 1 object, which is also the center of the cluster!)

Step2:
Consider the remaining (N - k) data objects. Assign each of these data objects to one of the k-clusters of which it is nearest to (e.g. based on its Euclidean distance).

Re-compute the cluster center for the cluster assigned.
Continue the process until all the (N-k) data objects are assigned.

Step3:
For all the k clusters, update their new cluster mean (i.e. center).
Re-assign the N to the clusters based on proximity to the cluster center.
Repeat until there are no more changes to the assignment (i.e. the algorithm converges!)

About

Little gems and nuggets of information on SQL Server and my thoughts on the evolution of the role of DBMS over the years...

Name: SQL Server Gems

Location: Singapore

My research interests include query processing and optimization for conventional database management system (DBMS) and data streams management systems (DSMS). In the past, I have also worked on projects which extend the Predator DBMS (a research prototype for an EADT-based database management system) to support the symmetric hash join and data mining primitives. In addition, I have also added query operators for querying a network of sensors.

View my complete profile

SQL Server Gems

Thursday, January 26, 2006

Clustering (Part 2c of 6) - Simple K-means

0 Comments:

About

About Me

Theory and Practicals

Previous