Why binning is required
- Increases scorecard stability: some characteristic values can rarely
occur, and will lead to instability if not grouped together.
- Improves quality: grouping of similar attributes with similar
predictive strengths will increase scorecard accuracy.
- Allows to understand logical trends of “Good/Bad” deviations for
each characteristic.
- Prevents scorecard impairment otherwise possible due to seldom
reversal patterns and extreme values.
- Prevents overfitting(overtraining) possible with numerical variables.
Automatic binning
The most widely used automatic binning algorithm is Chi-merge.
Chi-merge is a process of dividing into intervals (bins) in the way
that neighboring bins will differ from each other as much as
possible in the ratio of “Good” and “Bad” records in them.
For visual cross-verification of automatic binning results one can
use WOE values (Fig 1.).
|
|
Analysis and manual correction of automatic binning
Sometimes due to particularities in data distribution automatic
binning needs to be corrected manually.
The example below shows the range divided into 5 bins using
an automatic binning (Fig 1.), now we only need to manually
adjust the band.
For example, manually adjusts the second boundary of the
range for several values to the left, from 5.02 to 4.94 (Fig 2.) and
recalculate WOE values.
As a result, we will get a smooth decreasing WOE curve
indicating the correct distribution of values within the ranges.
Sometimes, for easier analysis automatic binning ranges
should be adjusted to logical boundaries. For example for Age
or Job Time boundaries can be adjusted to integers.
|