Joke Collection Website - Talk about mood - Analysis and Implementation of Risk Control Evaluation Indicators ——KS, WOE, IV

Analysis and Implementation of Risk Control Evaluation Indicators ——KS, WOE, IV

In risk control, feature engineering, feature screening and modeling often involve several indicators. Let's briefly summarize it.

KS is an evaluation index to measure the difference between positive and negative samples. In short, it is the difference between the degree of good people and the degree of bad people.

Represents the first paragraph.

The length of the green dotted line indicates the KS size of the current segment.

Ideally, the higher KS is, the better it can reflect the risk distinguishing ability of features or models, and the perfect credit score distribution is normal distribution. In reality, the value greater than 0.9 is too high, so this model is not representative. If the model KS is too high, consider whether it is over-loaded. Generally speaking, KS is above 0.3, which can barely be used in production, and the risk differentiation effect is average.

At the same time, the later observation of KS is also very important. In the follow-up monitoring, if KS continues to decline, it may be that the market has changed, the customer base has changed, or the model itself is unstable. Therefore, the comparison between the training set and the verification set is also very important when training the model. There is a big difference between the two KS, which indicates that the model is over-fitted, or the generalization ability is not strong.

WOE is spelled Weight of Evidence, that is, the weight of evidence, which is used for risk assessment, credit scorecard and so on.

Represents the first paragraph. Can also be obtained after conversion,

From the above formula, WOE actually represents the difference between "the ratio of good users to bad users in this market segment" and "the ratio of total good users to total bad users". The greater the sorrow, the greater the difference, and the greater the possibility of good users.

At the same time, WOE transform is often used in feature engineering. When we divide some feature variables into boxes with equal frequency or equal distance, we find that the WOE of each horizontal segment does not satisfy monotonicity (most of them are discrete variables), and we carry out WOE transformation, that is, the initial value of the feature is replaced by the corresponding WOE value of each segment, and the distribution of the feature will be monotonous at this time.

Through WOE transformation, the advantage is that there is a positive (negative) correlation between eigenvalues and y values while keeping the WOE curve monotonous. For example, when we define bad users as 1, the larger the eigenvalue, the higher the probability of being predicted as bad users.

IV information value, that is, information value, is an index to measure the prediction ability of features to the model, and is often used as a reference for feature screening before training.

IV can be calculated by WOE.

The IV value of the whole feature is the sum of the IV values of each segment, which can be obtained as follows

When the IV value of a feature is greater, the information value of the feature is greater, and the contribution to judging customer quality is greater, so the feature is more suitable for entering the model.

We usually use IV as an index to judge the predictive ability of features to the model, because WOE has positive and negative values, and IV will only be positive. Most importantly, the WOE value does not reflect the proportion of the number of individuals in the current segment in the total number. For example, the WOE value of a fragment is very large, but the proportion of individuals in the fragment to the total number is very small. This WOE does not represent the whole, because its contribution to the whole is too small, and the IV value will be very small. Therefore, the IV value is used as an index to judge the prediction ability.

It is often necessary to calculate these indicators. I feel very distressed about the characteristics of running many times, so it is necessary to improve efficiency. I integrated these indicators, wrote a library and called out the results. Later, I was even lazier. I wrote a graphical interface and put it on. It is very comfortable to use the mouse lightly. True laziness is the engine of human progress.

Source code: /lianxiangtao/KS_IV

If the article is helpful to you, please don't be stingy with your praise. I will be in a beautiful mood.

WX: xianyu_splash, this official WeChat account is used to record my learning process, basic technology, share daily inspiration and quality tools, welcome to pay attention! * * * progress!