機器學習中,數據通常分為三類:Training Set訓練集,Validation Set驗證集,和Test Set測試集。B.D. Ripley在他的‘Pattern Recognition and Neural Networks’ Cambridge University Press, 1996, ISBN 0-521-46086-7 書中做了如下定義和分類。
訓練集Training Set: A set of examples used for learning, which is to fit the parameters [i.e., weights] of the classifier. 訓練模型或模型參數調試
驗證集Validation Set: A set of examples used to tune the parameters [i.e., architecture, not weights] of a classifier, for example to choose the number of hidden units in a neural network. 模型或參數的優化及確定
測試集Test Set: A set of examples used only to assess the performance [generalization] of a fully specified classifier. 純粹測試已建立模型的預測能力