Home / Tutorials / Binning

Binning

What is binning

Binning means the process of transforming a numeric characteristic into a categorical one as well as re-grouping and consolidating categorical characteristics.

Why binning is required

  • Increases scorecard stability: some characteristic values can rarely occur, and will lead to instability if not grouped together.
  • Improves quality: grouping of similar attributes with similar predictive strengths will increase scorecard accuracy.
  • Allows to understand logical trends of “Good/Bad” deviations for each characteristic.
  • Prevents scorecard impairment otherwise possible due to seldom reversal patterns and extreme values.
  • Prevents overfitting(overtraining) possible with numerical variables.


Automatic binning

The most widely used automatic binning algorithm is Chi-merge. Chi-merge is a process of dividing into intervals (bins) in the way that neighboring bins will differ from each other as much as possible in the ratio of “Good” and “Bad” records in them. For visual cross-verification of automatic binning results one can use WOE values (Fig 1.).

Analysis and manual correction of automatic binning

Sometimes due to particularities in data distribution automatic binning needs to be corrected manually.

The example below shows the range divided into 5 bins using an automatic binning (Fig 1.), now we only need to manually adjust the band.

For example, manually adjusts the second boundary of the range for several values to the left, from 5.02 to 4.94 (Fig 2.) and recalculate WOE values.

As a result, we will get a smooth decreasing WOE curve indicating the correct distribution of values within the ranges.

Sometimes, for easier analysis automatic binning ranges should be adjusted to logical boundaries. For example for Age or Job Time boundaries can be adjusted to integers.


Fig. 1 - Sharply-varied and illogical WOE graph after automatic binning



Fig. 2 - Smooth and logical WOE decline after manual correction







Plug&Score is the most easy-to-use and the fastest to integrate scoring system.



For more complex and versatile needs of larger credit institutions we recommend Scorto™: