>> Noel O'Boyle

Cross-validation

k-fold cross validation

ipred

mypredict.tree <- function(fit,newdata) {predict(fit, newdata, type="class")}
errorest(Species ~ ., iris, model=tree, estimator="cv", predict=mypredict.tree)
# Gives      
#               10-fold cross-validation estimator of misclassification error
#                
#                Misclassification error:  0.04

10-fold CV is the default. The default is to randomise the data before sampling. It is possible to use stratified sampling (what is this??). The predicted categories are not returned by default. Alternative options may be specified as follows:

# Use ?control.errorest for info on the parameters
ans <- errorest(......., est.para=control.errorest(k=20, predictions=TRUE), ....)
ans
#         20-fold cross-validation estimator of misclassification error
#          
#          Misclassification error:  0.0533
table(ans$predictions, iris$Species)
#             setosa versicolor virginica
#  setosa     50      0          0
#  versicolor  0     46          4
#  virginica   0      4         46

A large variation was observed in the misclassification error from run to run (remember, it uses a random sampling), so I don't know whether it's a good idea to just quote a single number without knowing the variation. Leave-1-out CV can be simulated by specifying a value for k equal to n, the number of samples. Leave-n-out CV (n>1) is not possible using this package.