Wednesday, February 16, 2011

Effect of training data size

Looking for papers describing the effect of the amount of training data on the accuracy of different classifier learning algorithms ... | LinkedIn

A very nice discussion on the "effect of training data size on the classification"
Does anyone know any papers that describe the effect that the amount of training data has on the relative effectiveness of different classifier learning algorithms? For example, say I have a binary classification problem and I have a choice between two different learning algorithms A and B, where A generally learns a more accurate classifier than B on this task, but also A is more computationally expensive than B (I'm thinking about something like SVM vs Naive Bayes).
My assumption is that the choice of whether to use A or B depends on how much training data I have. If I don't have a lot of training data, I'd choose A because of the better accuracy of the classifier it learns. However, as I increase the amount of training data, the accuracy of the classifier learnt by B approaches the accuracy of that learnt by A. At some point, I'd switch to using algorithm B because it will learn a classifier of similar accuracy to that learnt by A, but training will be faster because of B's lower computational cost compared with A. As I say, this is just an assumption at the moment; anyone know of any papers which address this topic?

Sounds like the meta-learning, but not for sure.
The discussion contains a lot of information, I would excerpt some shortly soon.

No comments:

Post a Comment