SQL Server Gems: Feature Selection

Thursday, February 16, 2006

Feature Selection

An interesting discussion on Feature Selection in SQL Server 2005 by Peter Kim.

"....

These are parameters for automatic feature selection for algorithms. Depending on the algorithm, the feature selection algorithm may be different. For Naive Bayesian and Clustering, we use entropy-based interestingness score, which tells how an attribute would be "interesting". For instance, customer phone numbers wouldn't be interesting than gender. The interestingness score is calculated as I(A) = -(m - E(A))^2, E(A) = - sum_i(pi * ln(pi)), where m is a magic number.

For Decision trees, we use the same interestingness score for output attribute feature selection. Then, we calculate split score for each input attribute vs. the selected output attributes. The input feature selection is based on the calculated split score. This will effectively say which input attributes are worth to consider, which ones are not, based on selected output attributes. "

About

Little gems and nuggets of information on SQL Server and my thoughts on the evolution of the role of DBMS over the years...

Name: SQL Server Gems

Location: Singapore

My research interests include query processing and optimization for conventional database management system (DBMS) and data streams management systems (DSMS). In the past, I have also worked on projects which extend the Predator DBMS (a research prototype for an EADT-based database management system) to support the symmetric hash join and data mining primitives. In addition, I have also added query operators for querying a network of sensors.

View my complete profile

SQL Server Gems

Thursday, February 16, 2006

Feature Selection

0 Comments:

About

About Me

Theory and Practicals

Previous