CN110163743A

CN110163743A - A kind of credit-graded approach based on hyperparameter optimization

Info

Publication number: CN110163743A
Application number: CN201910347744.3A
Authority: CN
Inventors: 郭锐; 张祥; 赵熙
Original assignee: Titanium Rong Intelligent Technology (suzhou) Co Ltd
Current assignee: Titanium Rong Intelligent Technology (suzhou) Co Ltd
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-08-23

Abstract

The present invention relates to the credit-graded approach based on hyperparameter optimization, step includes: S1, collects scoring main information data and to data prediction and feature selecting, makes training dataset and test data set；S2 establishes credit scoring model, chooses the modeling of XGBoost algorithm, and using Gaussian process combination Bayes to the hyperparameter optimization of algorithm；S3 chooses the fixed XGBoost algorithm of optimal hyper parameter group, uses training dataset training credit scoring model；S4 is predicted and is assessed to credit scoring model using test data set, by formula score=A-B*ln (p/ (1-p)), calculates credit scoring.The present invention is in hyperparameter optimization, when that can not determine objective function curve, assumed by guess, assertive goal function meets the Gaussian Profile of multivariable and to assuming the further modified process of assessment models, promote the efficiency and reliability of hyperparameter optimization, accelerate model formation efficiency, is conducive to corporate model and substitutes efficiency, promote risk control ability.

Description

A kind of credit-graded approach based on hyperparameter optimization

Technical field

The present invention relates to computer credit scoring technology field more particularly to a kind of credit scorings based on hyperparameter optimization Method.

Background technique

With the rapid development of internet credit industry, risk problem is also continued to bring out.It is asked by model to control risk Topic has become numerous enterprises preferred option, and examination & approval efficiency can be greatly improved by carrying out credit evaluation using model, save manpower at This.

In modeling process, the adjustment of model hyper parameter needs to consume the plenty of time, and the method for usual parameter optimization has network Search method and Monte Carlo analysis.Network searching method can not be suitable for continuity parameter, once parameter combination scale increase, time Meeting exponentially grade increases during going through, and takes considerable time, random search cannot utilize priori knowledge for next group of hyper parameter Selection.

Summary of the invention

The purpose of the present invention is to provide a kind of credit-graded approaches based on hyperparameter optimization, it is intended to solve modeling process Middle adjustment model hyper parameter needs the problem of consuming the plenty of time.

To achieve the above object, technical scheme is as follows:

A kind of credit-graded approach based on hyperparameter optimization, includes the following steps:

S1, collects scoring main information data and to data prediction and feature selecting, makes training dataset and test Data set；

S2 chooses the modeling of XGBoost algorithm, it is assumed that evaluation function is Gaussian process function, using EI optimisation criteria, selection Optimal hyper parameter group；

S3 selects optimal hyper parameter group and training dataset training credit scoring model；

S4 is predicted and is assessed to credit scoring model using test data set, passes through formula score=A-B*ln (p/ (1- P)), credit scoring is calculated, wherein p is model prediction probability, and A, B are constant.

In step 1, scoring main information data include application information, carrier data, Unionpay portrait, main strategies, Data prediction includes desensitization process and WOE coding, makes training dataset and test data set.

Hypothesis evaluation function isWherein, y is output valve, and f is Gaussian process function, and N is height This distribution, f (cx) are prior probability model,For variance；For one group of data point x_1:n={ x₁,…,x_n, it is assumed that evaluation letter Several value f_1:n={ f (x₁),…,f(x_n) described using Gaussian Profile, f_1:n~N (m (x_1:n), k), x_nFor n-th group input to Amount, x_1:nThe input vector matrix of n-th group, f (x are arrived for the 1st group_n) it is x_nCorresponding evaluation function value, f_1:nFor x₁To x_nIt is corresponding The set of evaluation function value.

According to formula Wherein, Φ (Z) is Cumulative Distribution Function, and φ (z) is probability-distribution function,For hyper parameter Gauss Procedure function value, f (x) are Gaussian process functional value, and E is expectation function,_μIt (x) is Gaussian process function x mean value, σ (x) is height This procedure function x mean square deviation uses EI optimization process to give evaluation function input value x, updating using Gaussian process function The posterior error of hypothesis evaluation function is estimated, obtains new input value x by maximizing EI, is counted again by new input value x The output valve of the y of hypothesis evaluation function is calculated, fixed the number of iterations is repeated, until convergence.

Credit-graded approach based on hyperparameter optimization of the invention, by leading to when that can not determine objective function curve Guess is crossed it is assumed that assertive goal function meets the Gaussian Profile of multivariable and to assuming the further modified process of assessment models, The efficiency and reliability of hyperparameter optimization is promoted, model formation efficiency is accelerated, is conducive to corporate model and substitutes efficiency, promote wind Dangerous control ability.

Detailed description of the invention

Fig. 1 is the flow chart of the credit-graded approach in one embodiment of the invention based on hyperparameter optimization.

Specific embodiment

Technical solution of the present invention is described in further detail with reference to the accompanying drawings and examples.

Credit-graded approach based on hyperparameter optimization of the invention, as shown in Figure 1, including the following steps:

S2 establishes credit scoring model, chooses the modeling of XGBoost algorithm, it is assumed that evaluation function is Gaussian process function, is adopted With EI optimisation criteria, optimal hyper parameter group is selected；

Wherein, scoring main information data include application information, carrier data, Unionpay's portrait and main strategies, are led to It crosses data information desensitization process and WOE is encoded and realized data prediction, in data that treated, 70% data are for training Model, makes training dataset, and 30% data test model makes test data set.

Using Gaussian process combination Bayes to the hyperparameter optimization of algorithm method particularly includes: it is assumed that evaluation function is height This procedure function it is expected to obtain multivariate Gaussian distributed model with covariance function by determining, then uses EI optimisation criteria, choosing Select optimal hyper parameter group.

Hypothesis evaluation function isWherein y is evaluation function output valve, and f is Gaussian process letter Number, N are Gaussian Profile, and f (cx) is prior probability model,For variance；For one group of data point x_1:n={ x₁,…,x_n, it is false Determine the value f of evaluation function_1:n={ f (x₁),…,f(x_n), f is described using Gaussian Profile_1:n~N (m (x_1:n), k), x_nIt is n-th Group input vector, x_1:nThe input vector matrix of n-th group, f (x are arrived for first group_n) it is x_nCorresponding evaluation function value, f_1:nFor x₁It arrives x_nThe set of corresponding evaluation function value.

According to formula Wherein, x is input value,For current one group of optimal hyper parameter, Φ (Z) is Cumulative Distribution Function, φ It (z) is probability-distribution function,For hyper parameter Gaussian process functional value, f (x) is Gaussian process functional value, and E is desired letter Number, μ (x) are Gaussian process function x mean value, and σ (x) is Gaussian process function x mean square deviation, use EI optimization process to give accepted opinion Valence function input value x is estimated using the posterior error that Gaussian process function updates hypothesis evaluation function, is obtained by maximizing EI New x recalculates the output valve of the y of hypothesis evaluation function by new x, repeats fixed the number of iterations, until restraining To optimal hyper parameter group.

Finally, verifying credit scoring model using test data set, evaluation index includes ROC curve, KS curve, obscures square Battle array, and by formula score=A-B*ln (p/ (1-p)), calculate credit scoring, wherein_pFor model prediction probability, A, B are Constant.Constant A, B are by will be known to two or the score value assumed is brought into and is calculated in formula, it is generally the case that need to set two It is a it is assumed that first is that score value is specifically expected to some specific ratio set, second is that determining the score be doubled of ratio.

In one embodiment, data dimension, which includes that credit dimension is the age, works, to be determined to the information data of scoring main body Property and the credit card accrediting amount, Risk Dimensions are overdue label, overdue for bad sample, and not overdue preferably sample is good Sample is denoted as 0, and bad sample is denoted as 1, is used for training pattern as model parameter.

Credit scoring model is established, XGBoost algorithm is chosen, using the model under logarithm loss appraisal different parameters Can, the loss functionWherein, N is sample size, y_iFor i-th of sample This true value, p_iFor the predicted value of i-th of sample.Select the hyperparameter optimization being affected to algorithm, including min_ Child_weight, learning_rate, max_depth, n_estimators, setup parameter range min_child_ Weight:(1,5), learning_rate:(0.01,0.1) and, max_depth:(1,100), n_estimators:(10, 500)。

Gaussian process function is initialized, chooses initialization X value, such as x1 (1,0.01,1,10), x2 (1,0.03,2,12) makes The posterior error value for assuming valuation functions is updated with Gaussian process function, then by EI optimisation criteria, calculates the x for maximizing EI Value, calculates the value of Gaussian process function f, according to whether it is minimum or reach the number of iterations to reach the loss function, determines whether to meet Target, if meet save x value, if conditions are not met, repeat update Gaussian process function, obtain best hyper parameter value, as x (1, 0.01,1,10).Finally using the parameter value in the fixed XGBoost of best hyper parameter value x (1,0.01,1,10), and use instruction Practice data set training credit scoring model.

Credit-graded approach based on hyperparameter optimization of the invention, in hyperparameter optimization, it is assumed that evaluation function meets Gaussian Profile chooses input value x by maximizing EI, calculates Gaussian process functional value, judge whether to meet target, constantly repeatedly In generation, updates Gaussian process function, by when that can not determine objective function curve, by guess it is assumed that assertive goal function is full The Gaussian Profile of sufficient multivariable and to assuming the further modified process of assessment models.

Technical solution of the present invention and beneficial effect is described in detail in embodiment described above, it should be understood that Above is only a specific embodiment of the present invention, it is not intended to restrict the invention, it is all to be done in spirit of the invention Any modification, supplementary, and equivalent replacement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of credit-graded approach based on hyperparameter optimization, which comprises the steps of:

S1, collects scoring main information data and to data prediction and feature selecting, makes training dataset and test data Collection；

S2 establishes credit scoring model, chooses the modeling of XGBoost algorithm, it is assumed that and hyper parameter evaluation function is Gaussian process function, Using EI optimisation criteria, optimal hyper parameter group is selected；

S4 is predicted and is assessed to credit scoring model using test data set, by formula score=A-B*ln (p/ (1-p)), It calculates, credit scoring, wherein p is model prediction probability, and A, B are constant.

2. the credit-graded approach according to claim 1 based on hyperparameter optimization, it is characterised in that: in step 1, comment Dividing main information data includes application information, carrier data, Unionpay's portrait, main strategies, and data prediction includes desensitization Processing and WOE coding, feature selecting are that selection including selects the age for modeling from application information from all fields, Educational background, carrier data such as network duration, a nearly month call detailed list.

3. the credit-graded approach according to claim 1 based on hyperparameter optimization, it is characterised in that: assuming that hyper parameter is commented Valence function isWherein, y is the value of evaluation function, and f is Gaussian process function, and N is Gaussian Profile, f It (cx) is prior probability model,For variance；For one group of data point x_1:n={ x₁,…,x_n, it is assumed that the value f of evaluation function_1:n ={ f (x₁),…,f(x_n), it is described using Gaussian Profile, f_1:n~N (m (x_1:n), k), x_nFor n-th group input vector, x_1:nIt is 1 group of input vector matrix to n-th group, f (x_n) it is x_nCorresponding evaluation function value, f_1:nFor x₁To x_nCorresponding evaluation function value Set.

4. the credit-graded approach according to claim 3 based on hyperparameter optimization, it is characterised in that: according to formulaWherein,For hyper parameter Gaussian process functional value, f (x) is Gaussian process functional value, and E is expectation function, and μ (x) is Gaussian process letter Number x mean value, σ (x) are Gaussian process function x mean square deviation, and Φ (Z) is Cumulative Distribution Function, and φ (z) is probability-distribution function；It is logical Given evaluation function input value x is crossed, is estimated using the posterior error that Gaussian process function updates hypothesis evaluation function, passes through maximum Change EI and obtain new evaluation function input value x, recalculates the y's of hypothesis evaluation function by new evaluation function input value x Output valve repeats fixed the number of iterations, until convergence.