Home / Tutorials / Training & Validation Datasets

Training & Validation Datasets

How do we know, whether the scorecard will perform identically when evaluating new borrowers and when using historical data?
To provide an answer to this question, the credit portfolio data are divided into two parts: training and validation.
The training dataset is used to train the scorecard, while the validation dataset is used exclusively for the purpose of validation.
As a rule, 80% of available data is used to train the scorecard, while the remaining 20% is used for the purpose of validation. If there is a large amount of data, the validation dataset can comprise as much as 50%.





If a certain category of borrowers is insufficiently represented in the credit portfolio, its distribution in the training and validation datasets must be specifically controlled, since this category must be proportionally represented in both datasets.






Plug&Score is the most easy-to-use and the fastest to integrate scoring system.



For more complex and versatile needs of larger credit institutions we recommend Scorto™: