machine learning - bias when not using a development set -
i doing small classification task in weka using 10-fold cross-validation. feature selection , adjusting parameters of classifier used small subset of data. know train , development set should used.but due lack of time , computational power, used small subset of data. in way bias results? thanks!
yes. definition of bias, using small set means have higher bias, towards whatever section decide train on. if randomly choose small samples larger data set though, bias minimized.
if you're using small set of data, 10-fold cross-validation doesn't make sense. idea of cross validation chunk large data set multiple parts, training alternating parts of them serving train , test sets, iterating until best solution across cross-validation sets found.
if don't have time or computational power first suggest step down 10-fold cv checking. mean can include larger portion of data set same computation time , evaluate number of different models in weka before deciding has more promise move forward with.
if have option, final data run highly suggest using full set, while you're deciding algorithm use it's not terrible use portion of full set.
Comments
Post a Comment