Rabu, 12 November 2008

Model Testing

Once a model has been created (such as the decision tree for our example), the analyst is required to test the model. During model testing, an analyst performs specific tests that show the actual predictive power of a model.

Many methods can be used for model testing, depending on the problem. For our example and since the available volume of data is sufficiently large, the model training and testing methodology i used was as follows :

1) 50% of data were used for model training
2) 25% of data were used for model validation - fine tuning
3) 25% were used for testing of the model.

In other words, 75% of the data were used for training the algorithm and assessing the impact that changes on algorithm parameters have on the accuracy of the model. For a decision tree algorithm (and depending on the type of decision tree used) an analyst might try different settings for splitting criteria and/or number of minimum cases per branch, etc.

Unfortunately, numerous times an analyst finds that the predicted accuracy of the model given during training - model validation phases (ie steps 1 and 2 shown above) is in no way representative when the model is tested on unseen cases ( Step 3).

During my analysis, numerous models were showing an estimated accuracy of 85% or more but when they were presented on actual data, the accuracy was dropping down to 50-53%, suggesting that overfitting was present. Consequently, the use of these biased models to predict new cases would have detrimental effects in actual stock trading.

When all models are built, the analyst should choose a model (when there is a requirement to use only one model) according to :

1) (Statistically significant) best accuracy.
2) Misclassification costs, if these are not taken into account during the model building process.


On the next post we will see how text mining may help us in making better predictions for the markets.



Tidak ada komentar:

Posting Komentar