Kamis, 09 Oktober 2008

So...What's important??

A step of a Knowledge Discovery Process is to perform what is known as Feature Selection, which essentially is the identification of a subset of features with high predictive value.

Feature selection can potentially help in increasing the accuracy of prediction models. Methods such as Naive Bayes can perform better when presented with a subset of selected features, rather than the whole feature set (because of feature redundancy).

Even if feature selection does not prove to help too much, it is important to know the predictive power of each feature. There are numerous methods to do this and -as normally is the case- there is no universally better method to perform an optimal feature selection. The following is a representation of all available Feature Selection methods in WEKA:




Let us stick to our example with stocks, to make things more clear. Suppose that i would like to know which features seem to be important for predicting the behavior of a stock. For our example we will try to find out about how the stock of NBG reacts.

By using a feature selection method we extract the following information :



The feature selection method above shows us how many times each attribute was selected during a 10-fold cross validation. We can see that some attributes are used more times than other attributes during each cross validation . For example :

realTimeDax
aseStockExchangeIndex
xaaPersonalHouseProducts
xaaTechnology
bankAgrotiki
bankAlpha
bankPiraeus
bankEuro


are present in all 10 folds of our cross-validation and hence the 10(100%) entry. xaaFinancialServices index has been selected fewer times (8 out of 10) and hence the 8(80%) entry. Other features never appear to any of the cross validation folds.

Of course feature selection does not stop here and there are many ways to enhance the process. Data Mining is both an art and a science. However for our purpose, we were able to identify those attributes that seem to be important in the prediction of the NBG stock. We immediately see for example that DAX index and the Athens Stock Exchange Index are two important features, plus the stocks of four specific banks. Other methods of feature selection produce weights that essentially rank the importance of each attribute for class prediction.


Tidak ada komentar:

Posting Komentar