CN110866819A - Automatic credit scoring card generation method based on meta-learning - Google Patents
Automatic credit scoring card generation method based on meta-learning Download PDFInfo
- Publication number
- CN110866819A CN110866819A CN201910991618.1A CN201910991618A CN110866819A CN 110866819 A CN110866819 A CN 110866819A CN 201910991618 A CN201910991618 A CN 201910991618A CN 110866819 A CN110866819 A CN 110866819A
- Authority
- CN
- China
- Prior art keywords
- data
- meta
- parameter
- model
- scoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000008569 process Effects 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000007477 logistic regression Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 10
- 238000006243 chemical reaction Methods 0.000 claims abstract description 7
- 238000012360 testing method Methods 0.000 claims abstract description 7
- 238000000926 separation method Methods 0.000 claims abstract description 5
- 238000004140 cleaning Methods 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 7
- 238000012854 evaluation process Methods 0.000 claims description 7
- 238000012216 screening Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 238000007637 random forest analysis Methods 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 2
- 238000005070 sampling Methods 0.000 abstract description 2
- 238000011156 evaluation Methods 0.000 description 12
- 238000010801 machine learning Methods 0.000 description 11
- 230000015572 biosynthetic process Effects 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000003786 synthesis reaction Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 238000002360 preparation method Methods 0.000 description 5
- 238000012795 verification Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000011985 exploratory data analysis Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012954 risk control Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Abstract
The invention discloses an automatic credit rating card generation method based on meta-learning, which comprises the following steps: the first step is as follows: cleaning and normalizing data, establishing an incidence relation of a plurality of tables between internal data and external data through incidence variables between data tables, and distinguishing test data from training data; the second step is that: calculating the meta-characteristics of the input data set, calculating the similar data set of the input data set, and initializing a search space according to the corresponding parameter configuration output parameters; the third step: performing a parameter tuning strategy based on a Hyperband method, and sampling in a search space to generate a parameter sample set; the fourth step: performing box separation operation on each variable; performing WOE conversion calculation on input data based on the box dividing result, fitting training data by using a logistic regression method, and constructing a final scoring calculation method of the scoring card according to fitting parameters. The invention reduces the manpower consumption in the modeling process; the modeling efficiency is improved; and ensuring the prediction accuracy of the model.
Description
Technical Field
The invention relates to an automatic credit rating card generation method based on meta-learning, in particular to an automatic modeling method of a pre-credit application rating card, belonging to the field of machine learning and data mining, and specifically relating to a modeling method based on a logistic regression algorithm and an automatic machine learning method.
Background
In the financial field, whether investment financing or loan payment, risk control is always the core foundation of business. For consumer finance, the main service objects are characterized by: the characteristics of small amount, large crowd and short period lead the situation to be recognized as the subdivision field with the highest risk. With the continuous penetration of technologies such as artificial intelligence, big data and the like, various financial data are actively collected, analyzed and sorted by means of financial technologies, more accurate wind control service is provided for subdivided people, and the method becomes an effective way for solving the problem of consumption financial wind control. Anti-fraud is a key item in the field of risk control, and once an anti-fraud link is in a problem, immeasurable great economic loss can be caused. The existing anti-fraud strategy generation depends on manual experience to judge, however, with the rapid increase of application users and the continuous expansion of user application data dimensions, the traditional pure manual experience method is more and more difficult to obtain the effective anti-fraud strategy. With the development of artificial intelligence technology and the coming of data era, the adoption of a data-driven method will be the mainstream method for the generation of anti-fraud strategies in the future. The credit scoring card model is an advanced technical means in the management of consumption credit, is one of the most core management technologies of enterprise entities related to consumption credit, such as banks, credit card companies, personal consumption credit companies, telecommunication companies, insurance companies and the like, is widely applied to the fields of credit card life cycle management, automobile loan management, housing loan management, personal loan management, other consumption credit management and the like, and plays an important role in various aspects of marketing, credit approval, risk management, account management, customer relationship management and the like.
The credit scoring model utilizes an advanced data mining technology and a statistical analysis method, systematically analyzes big data such as population characteristics, credit history records and transaction records of consumers, mines behavior patterns and credit characteristics contained in the data, and captures the relationship between history information and future credit performances. And establishing a predictive model, and integrally evaluating certain future credit performance of the consumer by using a credit score. Credit scoring is essentially a classification problem in pattern recognition that classifies businesses or individual consumers into two categories, being able to pay for themselves on schedule (i.e., "good" customers) and default (i.e., "bad" customers). The method is characterized in that according to a plurality of samples of each category (such as due payment and default) in history, the characteristics of the default and non-default persons are found out from the known data, so that the classified rules are summarized, and a machine learning model is established for measuring the default risk (or default probability) of the borrower and providing basis for the credit consumption decision.
Due to the interpretability characteristic of the logistic regression model, the method for constructing the scoring card model by using the logistic regression method is a commonly used solution in the industry at present, and a typical modeling flow comprises the following steps: one is problem preparation, which requires the definition of default and normal users, the scope and origin of data, etc. to be determined based on historical performance of specific credit products. And secondly, data preparation, namely, data required by modeling needs to be acquired at this stage, and external data such as credit investigation data, external scoring data and the like can be acquired besides application data and enterprise internal data. As the sources of external data increase, it becomes more important how to select the appropriate, most valuable external resource. In order to examine data and understand its characteristics, modelers are usually required to perform a series of Exploratory Data Analysis (EDA) tasks, including analyzing the evaluation of univariate statistical characteristics of candidate predictive variables and the distribution of their values within a variable range; analyzing and calculating default rate distribution under each candidate predictive variable classification or segmentation condition; and determining the checking relation among different variables through a list table, an association table, relevant retrograde indexes and the like. And thirdly, data preprocessing, which generally needs to perform a great deal of data cleaning and conversion work on the data to determine a unique data set of all elements required for developing the scoring card and create a prediction index or an independent variable with strong prediction capability. Meanwhile, evidence Weight (WOE) conversion is a specific data preparation process in a development process of a score card, all variables used in the score card need to be subjected to WOE conversion, the cardinality of class variables needs to be reduced for class variables, continuous variables need to be segmented for logarithmic value variables, and the like, and the process is equivalent to the rough classification of all variables of data. And fourthly, selecting variables, wherein the result of data preparation is a modeling view containing a plurality of candidate independent variables, but not all the candidate independent variables can be practically applied in the model. Most credit providers have abundant data and therefore need to screen for more powerful variables among these hundreds or even thousands of modeled variables. And fifthly, model development, wherein the standard scoring card is based on a logistic regression model. Logistic regression models are essentially extensions of linear regression, predicting the state of a breach in evaluation by fitting evidence weight transformation (WOE) to the independent variables to obtain the final score. And sixthly, model verification is carried out, the constructed prediction model needs to meet the following basic requirements, firstly, an acceptable accuracy level is achieved, secondly, certain robustness is required, a wider range of data sets need to be adapted, and meanwhile, the model can be detected in the aspects of service variables and period prediction values. Therefore, the constructed model needs to be subjected to multiple verification operations.
Traditional high-quality manual modeling processes rely heavily on manual intervention, including knowledge of the data, substantial expertise, sufficient modeling experience, and the like. Meanwhile, the processes of data preparation, feature engineering and the like consume a great deal of time and energy. With the increase of hardware computing speed and the improvement of machine learning algorithms, the data demand of various industries is increased, and the requirements for processing data and analyzing data are more and more strict.
In order to solve the problems, the automatic credit rating card modeling tool fusing an automatic machine learning method and a credit rating card is provided, various processes in the rating card modeling are realized by using methods such as meta-learning and automatic feature derivation modules, the automation of data processing, feature engineering, model selection, super-parameter tuning, model establishment, rating card establishment and the like is included, the time and energy of a user under repeated and time-consuming work can be greatly reduced, and the modeling efficiency is improved. In addition, the modeling threshold of the user can be reduced, and a scoring card model with excellent performance is established under the condition of no knowledge of the field of machine learning, so that the business can be better developed.
Disclosure of Invention
Based on the existing problems, the invention provides an automatic credit rating card generation method based on meta-learning, based on an automatic machine learning theory, fusing business practice experience, combining a machine learning algorithm under a business scene of a pre-credit rating card, inputting a data set which meets a certain format and has an association relation, automatically executing functions of data preprocessing, characteristic engineering, hyper-parameter optimization, model selection and the like, and outputting a binary prediction result of data. And (4) establishing a scoring model by combining the functions of the scoring card, and generating a critical file generated in the modeling process to automatically form a scoring card report.
In order to realize the purpose, the invention discloses an automatic credit rating card generation method based on meta-learning, which adopts the following technical scheme:
the invention essentially relates to a credit scoring service logic model construction process based on an automatic machine learning method, and the core of the credit scoring service logic model construction process is to realize the automatic operation of each process of scoring card modeling. Therefore, the main processes in the present invention mainly include data preprocessing, meta learner construction, controller construction, score card construction, and the like, and the specific implementation process is shown in fig. 1.
The first step is as follows: data preprocessing: the following operations are mainly carried out: (1) identifying each variable type in each table; (2) and finding association variables and association relations among the tables, wherein the association variables and the association relations comprise four types, namely one-to-one type, one-to-many type, many-to-one type and many-to-many type. (3) The training data and the test data in the input data are confirmed and distinguished. (4) And implementing different missing value processing, abnormal value detection and data standardization operations according to different types of variables in the data.
The input data set contains a master table and a plurality of associated tables, with time-stamped variables. The correlation table is used for containing valuable auxiliary information about the examples in the main table and can be used for improving the prediction performance of the model. Any two tables (master or dependent) may have a relationship, and any pair of tables may have at most one relationship. In the preprocessing stage, operations such as cleaning and regularizing data can be completed, then the incidence relation of a plurality of tables between internal data and external data is established through incidence variables between data tables, and test data and training data are distinguished.
The second step is that: constructing a meta learner: the method comprises the steps of calculating the meta-characteristics of an input data set, calculating a similar data set of the input data set according to a meta-learner, and initializing a search space according to corresponding parameter configuration output parameters.
The meta learner is used for providing an empirical-formula-based hyper-parameter guide, initializing a subsequent parameter space search range, and improving parameter adjusting efficiency through a 'warm start' parameter adjusting process. The meta-learner records different data sets and corresponding meta-feature data thereof in the same service scene and a parameter configuration set enabling a model to be well represented, inputs meta-features of a data set to be modeled, calculates similar data sets of the input data sets through a K-Nearest Neighbors method, and forms a parameter search space according to parameter configuration of the similar data sets, and the specific process is as follows:
the distance between the data set vectors is first calculated by the euclidean distance:
the similar sample set of composition is Nk(x) The parameter set is obtained by the following formula:
the parameters provided by the meta learner include the depth max _ depth of the subsequent feature synthesis, the primitive agg _ priorities of the feature synthesis, and the like.
The third step: the controller is constructed as follows: the controller carries out a parameter optimization strategy based on a Hyperband method, samples in a search space to generate a parameter sample set, screens parameter samples according to model performance in each evaluation process, and selects a parameter combination which enables the model to be best in performance. The steps of each evaluation run included constructing features using Featuretools, screening features using a random forest method, training the model using LightGBM, and evaluating parametric performance.
The controller carries out a parameter optimization strategy based on a Hyperband method, and the Hyperband algorithm expands a successful halogenated algorithm proposed by Jamieson & Talwlkar (2015). The SuccesseviveHall algorithm functions as follows: supposing that n groups of hyper-parameter combinations are provided, then budgets are uniformly distributed to the n groups of hyper-parameters, verification evaluation is carried out, half of hyper-parameter groups with poor performance are eliminated according to verification results, and then the process is repeatedly iterated until a final optimal hyper-parameter combination is found. Based on the algorithm thought, the Hyperband algorithm is improved, parameter adjustment is completed under the resource constraint condition, and more resources can be allocated to the hyper-parameter combination with better parameter adjustment performance each time. And taking the output of the meta-learner as an initial search space of the Hyperband, sampling in the search space to generate a parameter sample set, screening the parameter samples according to the model performance in each evaluation process, and gradually reducing the range of the search space until a parameter combination which enables the model to be best in performance is selected. The evaluation of each parameter comprises operations of feature synthesis, feature screening and the like. The feature synthesis part is based on FeatureTools method, and is a framework for automatic feature generation, which can convert a data set into a feature matrix which can be used for machine learning. Deep feature synthesis overlays multiple transformation and aggregation operations, referred to as feature primitives in the lexicon of the feature tool, to construct new features from data distributed across multiple tables. By the method, all information of the user can be combined in one table, for example, the user name is used as the association to perform characteristic derivation on the user application information, the user history loan information, the user credit card repayment information and the like. Featuretools can automatically derive a large number of characteristics, some characteristics may have low modeling value, overfitting is easy to generate when the characteristics of a data set are excessive, a random forest can be used for generating a data set with importance of each characteristic after training, a threshold value is determined by utilizing the data set, some characteristics which are most helpful for model training are selected, and the model can be trained after important variables are screened out. And then, training and evaluating a prediction result by using a LightGBM method, screening parameters according to the prediction performance of the model, and finally outputting the depth max _ depth, the feature synthesis primitive agg _ primitives and the feature screening variables of the feature synthesis with better model performance in the evaluation process by using the controller.
The fourth step: construction of a scoring card: the scoring card section first performs a binning operation on each variable. Performing WOE conversion calculation on input data based on the box dividing result, fitting the training data by using a logistic regression method, and constructing a final scoring calculation method of the scoring card according to fitting parameters.
The box separation operation aims to perform discretization segmentation on continuous numerical variables and combine more-number category variables. Card-side binning is a bottom-up, merging-based method of data discretization that relies on card-side verification, i.e., adjacent bins with minimum chi-squared values are merged together until a certain stopping criterion is met.
Based on the results of the binning, a logical model is constructed using evidence Weights (WOE) of the bins to which the variables correspond. The expression of WOE is:
Pgoodratio, P, of good usersbadIs the ratio of bad users.
The result of model regression is
The score of the score card can be expressed by a linear expression of the log of the ratio:
Scoretotal=A+B*ln(odds)
given the increased score value PDO when odds is doubled:
Scoretotal+PDO=A+B*ln(2*odds)
a basic score and a score coefficient B can be obtained by solving a binary equation.
The formula for obtaining the final score of the scoring card from the logistic regression result is as follows:
Scoretotal=A+B(β0+β1WOE1+…+βnWOEn)
wherein A, B is a known constant, β is a logistic regression coefficient, WOEiIs the WOE value of the corresponding bin of variable i.
The evaluation part of the scoring card unit model comprises a plurality of evaluation criteria, such as:
(1) degree of distinction
KS value: KS ═ MAX (TPR-FPR), the larger the value of KS, the stronger the ability of the representative model to discriminate between positive and negative samples.
(2) Accuracy of
Confusion matrix, ROC curve and AUC value: the higher the AUC value, the higher the accuracy of the risk prediction by the representative model.
(3) Stability of the model
And PSI, checking the change of population distribution in each time-span fractional interval, wherein the smaller the change is, the better the stability of the model is.
The scoring card part can generate model report files such as a box-dividing result, a scoring result, a model performance evaluation report and the like.
Compared with the prior art, the automatic credit rating card generation method based on meta-learning has the advantages and effects that: (1) and each step of modeling is implemented by using an automatic machine learning method, so that the modeling process is simplified, and the labor consumption in the modeling process is reduced. (2) And a meta-learner is added to initialize a parameter tuning search space, and the warm start operation saves parameter tuning time and improves modeling efficiency. (3) The parameter of the feature synthesis is evaluated by combining a Hyperband parameter tuning method, so that the prediction accuracy of the model is ensured.
Drawings
FIG. 1 is a flow chart of an automated card scoring method of the present invention.
FIG. 2 evaluation of the scoring card model-ROC plot.
Figure 3 rating card model evaluation-PSI plot.
Figure 4 scoring card model evaluation-KS plot.
TABLE 1 Meta-feature calculation results (part)
Table 2 parametric space output results
Table 3 controller characteristics generation results (parts)
TABLE 4 variable binning results (parts)
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the disclosed embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. As shown.
Example (b):
the following will explain the specific implementation process of the present invention by taking "credit rating card application data of a certain finance company" as an example.
First, data preprocessing
The input data set mostly comprises an application main table and a plurality of related tables for providing external data, and the preprocessing part mainly performs the following operations by reading the variables of each table: (1) the types of variables in the tables are identified, such as category type, numerical type, timestamp type, etc. (2) And finding association variables and association relations among the tables, wherein the association variables and the association relations comprise four types, namely one-to-one type, one-to-many type, many-to-one type and many-to-many type. (3) The training data and the test data in the input data are confirmed and distinguished. (4) And carrying out different operations of missing value processing, abnormal value detection, data standardization and the like according to different types of variables in the data.
Second, construct the meta-learner
The meta-learner first calculates meta-features of input data, and then initializes a search space by configuring output parameters according to parameters corresponding to similar data sets in the meta-learner. The raw feature calculation results of the input data are shown in table 1, and the parameter space results of the meta learning output are shown in table 2.
Meta feature name | Meta-characteristic value |
attr_to_inst | 0.003 |
cat_to_num | 1.0 |
freq_class.mean | 0.5 |
inst_to_attr | 333.3 |
nr_attr | 6 |
nr_cat | 3 |
nr_class | 2 |
nr_inst | 2000 |
nr_num | 3 |
num_to_cat | 1.0 |
TABLE 1
Parameter name | Parameter value |
max_depth | [2,4] |
agg_primitives | ['skew','mode','max','mean','min'] |
TABLE 2
Thirdly, constructing a controller
The controller carries out a parameter optimization strategy based on a Hyperband method, samples in a search space to generate a parameter sample set, screens parameter samples according to model performance in each evaluation process, and selects a parameter combination which enables the model to be best in performance. The steps of each evaluation process include constructing features using Featuretools, screening features using a random forest method, training a model using LightGBM and evaluating parameter performance, and the like. Through evaluation, the parameter configuration max _ depth is 2, and the agg _ priorities [ 'skew', 'mean' ] performs best, and the characteristic engineering part output results are shown in table 3.
TABLE 3
The fourth step, construct the scoring card
The scoring card section first performs binning on each variable, and the partial binning results are shown in table 4. Performing WOE conversion calculation on input data based on a box dividing result, fitting training data by using a logistic regression method, and constructing a final scoring calculation method of a scoring card according to fitting parameters, wherein each evaluation graph of a scoring card model is shown in fig. 2, 3 and 4.
TABLE 4
According to the box separation result and the model evaluation result, the predicted AUC value of the scoring card model reaches 0.7805, the PSI value of the training data and the testing data is 0.0276, and the KS value of the model is 0.2125, so that the requirements of the model on the risk prediction accuracy, the stability of the model and the capability of the model in distinguishing positive and negative samples can be met.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the technical scope of the present invention, so that any minor modifications, equivalent changes and modifications made to the above embodiment according to the technical spirit of the present invention are within the technical scope of the present invention.
Claims (5)
1. An automatic credit scoring card generation method based on meta-learning is characterized in that: the method comprises the following steps:
the first step is as follows: data preprocessing: finishing cleaning and arranging operations on data, then establishing incidence relations of a plurality of tables between internal data and external data through incidence variables between data tables, and distinguishing test data from training data;
the second step is that: constructing a meta learner: calculating the meta-characteristics of an input data set, calculating a similar data set of the input data set according to a meta-learner, and initializing a search space according to corresponding parameter configuration output parameters;
the third step: the controller is constructed as follows: the controller part carries out a parameter tuning strategy based on a Hyperband method, samples in a search space to generate a parameter sample set, screens parameter samples according to model performance in each evaluation process, and selects a parameter combination which enables the model to be best in performance; the steps of each evaluation process comprise the steps of constructing features by using Featuretools, screening the features by using a random forest method, training a model by using a LightGBM and evaluating the performance of parameters;
the fourth step: construction of a scoring card: the scoring card part firstly carries out box-dividing operation on each variable; performing WOE conversion calculation on input data based on the box dividing result, fitting the training data by using a logistic regression method, and constructing a final scoring calculation method of the scoring card according to fitting parameters.
2. The method of claim 1 for automated credit rating card generation based on meta-learning, wherein: the data preprocessing comprises the following steps: (1) identifying each variable type in each table; (2) searching association variables and association relations among the tables, wherein the association variables and the association relations comprise four types, namely one-to-one type, one-to-many type, many-to-one type and many-to-many type; (3) confirming and distinguishing training data and test data in input data; (4) and implementing different missing value processing, abnormal value detection and data standardization operations according to different types of variables in the data.
3. The method of claim 1 for automated credit rating card generation based on meta-learning, wherein: the specific process of calculating the similar data set of the input data set and initializing the search space according to the corresponding parameter configuration output parameters is as follows:
the distance between the data set vectors is first calculated by the euclidean distance:
the similar sample set of composition is Nk(x) The parameter set is obtained by the following formula:
4. the method of claim 1 for automated credit rating card generation based on meta-learning, wherein: fourthly, carrying out automatic box division operation on the variable by using a chi-square box division method; i.e. the neighbouring bins with the smallest chi-squared value are merged together until a certain stopping criterion is fulfilled.
5. The method of claim 1 for automated credit rating card generation based on meta-learning, wherein: fourthly, based on the result of the box separation, using the evidence weight WOE of the box separation corresponding to the variable to construct a logic model; the expression of WOE is:
Pgoodratio, P, of good usersbadA ratio of bad users;
the result of model regression is
The score of the score card can be expressed by a linear expression of the log of the ratio:
Scoretotal=A+B*ln(odds)
given the increased score value PDO when odds is doubled:
Scoretotal+PDO=A+B*ln(2*odds)
a, obtaining a basic score and B a scoring coefficient by solving a binary equation;
the formula for obtaining the final score of the scoring card from the logistic regression result is as follows:
Scoretotal=A+B(β0+β1WOE1+…+βnWOEn)
wherein A, B is a known constant, β is a logistic regression coefficient, WOEiIs the WOE value of the corresponding bin of variable i.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910991618.1A CN110866819A (en) | 2019-10-18 | 2019-10-18 | Automatic credit scoring card generation method based on meta-learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910991618.1A CN110866819A (en) | 2019-10-18 | 2019-10-18 | Automatic credit scoring card generation method based on meta-learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110866819A true CN110866819A (en) | 2020-03-06 |
Family
ID=69652318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910991618.1A Pending CN110866819A (en) | 2019-10-18 | 2019-10-18 | Automatic credit scoring card generation method based on meta-learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110866819A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111311128A (en) * | 2020-03-30 | 2020-06-19 | 百维金科(上海)信息科技有限公司 | Consumption financial credit scoring card development method based on third-party data |
CN111444094A (en) * | 2020-03-25 | 2020-07-24 | 中国邮政储蓄银行股份有限公司 | Test data generation method and system |
CN111582646A (en) * | 2020-04-09 | 2020-08-25 | 上海淇毓信息科技有限公司 | User policy risk early warning method and device and electronic equipment |
CN111767277A (en) * | 2020-07-08 | 2020-10-13 | 深延科技(北京)有限公司 | Data processing method and device |
CN111898675A (en) * | 2020-07-30 | 2020-11-06 | 北京云从科技有限公司 | Credit wind control model generation method and device, scoring card generation method, machine readable medium and equipment |
CN112184412A (en) * | 2020-09-22 | 2021-01-05 | 中国建设银行股份有限公司 | Modeling method, device, medium and electronic equipment of credit rating card model |
CN112330047A (en) * | 2020-11-18 | 2021-02-05 | 交通银行股份有限公司 | Credit card repayment probability prediction method based on user behavior characteristics |
CN112330280A (en) * | 2020-11-04 | 2021-02-05 | 山大地纬软件股份有限公司 | Method and system for inquiring credit of human resource market main body |
CN112446438A (en) * | 2020-12-16 | 2021-03-05 | 常州微亿智造科技有限公司 | Intelligent model training method under industrial Internet of things |
CN113449873A (en) * | 2020-03-25 | 2021-09-28 | 北京同邦卓益科技有限公司 | Data processing method and device, electronic equipment and computer storage medium |
CN113516547A (en) * | 2021-04-23 | 2021-10-19 | 武汉赢联数据技术股份有限公司 | Voice broadcast type graded credit card client risk early warning system |
CN113724876A (en) * | 2021-09-10 | 2021-11-30 | 南昌大学第二附属医院 | Intra-stroke hospital complication prediction model based on multi-mode fusion and DFS-LLE algorithm |
CN114154406A (en) * | 2021-11-22 | 2022-03-08 | 厦门深度赋智科技有限公司 | AI model automatic modeling system based on black box optimizer |
WO2022062193A1 (en) * | 2020-09-28 | 2022-03-31 | 南京博雅区块链研究院有限公司 | Individual credit assessment and explanation method and apparatus based on time sequence attribution analysis, and device and storage medium |
CN114783007A (en) * | 2022-06-22 | 2022-07-22 | 成都新希望金融信息有限公司 | Equipment fingerprint identification method and device and electronic equipment |
EP4105872A3 (en) * | 2021-11-10 | 2023-05-03 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Data processing method and apparatus |
CN116542511A (en) * | 2022-02-08 | 2023-08-04 | 百融云创科技股份有限公司 | Wind control model creation method and device, electronic equipment and storage medium |
CN117556225A (en) * | 2024-01-12 | 2024-02-13 | 杭银消费金融股份有限公司 | Pedestrian credit data risk management system |
-
2019
- 2019-10-18 CN CN201910991618.1A patent/CN110866819A/en active Pending
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444094A (en) * | 2020-03-25 | 2020-07-24 | 中国邮政储蓄银行股份有限公司 | Test data generation method and system |
CN113449873A (en) * | 2020-03-25 | 2021-09-28 | 北京同邦卓益科技有限公司 | Data processing method and device, electronic equipment and computer storage medium |
CN111311128A (en) * | 2020-03-30 | 2020-06-19 | 百维金科(上海)信息科技有限公司 | Consumption financial credit scoring card development method based on third-party data |
CN111582646A (en) * | 2020-04-09 | 2020-08-25 | 上海淇毓信息科技有限公司 | User policy risk early warning method and device and electronic equipment |
CN111767277A (en) * | 2020-07-08 | 2020-10-13 | 深延科技(北京)有限公司 | Data processing method and device |
CN111898675A (en) * | 2020-07-30 | 2020-11-06 | 北京云从科技有限公司 | Credit wind control model generation method and device, scoring card generation method, machine readable medium and equipment |
CN112184412A (en) * | 2020-09-22 | 2021-01-05 | 中国建设银行股份有限公司 | Modeling method, device, medium and electronic equipment of credit rating card model |
WO2022062193A1 (en) * | 2020-09-28 | 2022-03-31 | 南京博雅区块链研究院有限公司 | Individual credit assessment and explanation method and apparatus based on time sequence attribution analysis, and device and storage medium |
CN112330280A (en) * | 2020-11-04 | 2021-02-05 | 山大地纬软件股份有限公司 | Method and system for inquiring credit of human resource market main body |
CN112330047A (en) * | 2020-11-18 | 2021-02-05 | 交通银行股份有限公司 | Credit card repayment probability prediction method based on user behavior characteristics |
CN112446438A (en) * | 2020-12-16 | 2021-03-05 | 常州微亿智造科技有限公司 | Intelligent model training method under industrial Internet of things |
CN113516547A (en) * | 2021-04-23 | 2021-10-19 | 武汉赢联数据技术股份有限公司 | Voice broadcast type graded credit card client risk early warning system |
CN113516547B (en) * | 2021-04-23 | 2023-10-03 | 武汉赢联数据技术股份有限公司 | Voice broadcast type hierarchical credit card customer risk early warning system |
CN113724876A (en) * | 2021-09-10 | 2021-11-30 | 南昌大学第二附属医院 | Intra-stroke hospital complication prediction model based on multi-mode fusion and DFS-LLE algorithm |
EP4105872A3 (en) * | 2021-11-10 | 2023-05-03 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Data processing method and apparatus |
CN114154406A (en) * | 2021-11-22 | 2022-03-08 | 厦门深度赋智科技有限公司 | AI model automatic modeling system based on black box optimizer |
CN116542511A (en) * | 2022-02-08 | 2023-08-04 | 百融云创科技股份有限公司 | Wind control model creation method and device, electronic equipment and storage medium |
CN114783007A (en) * | 2022-06-22 | 2022-07-22 | 成都新希望金融信息有限公司 | Equipment fingerprint identification method and device and electronic equipment |
CN114783007B (en) * | 2022-06-22 | 2022-09-27 | 成都新希望金融信息有限公司 | Equipment fingerprint identification method and device and electronic equipment |
CN117556225A (en) * | 2024-01-12 | 2024-02-13 | 杭银消费金融股份有限公司 | Pedestrian credit data risk management system |
CN117556225B (en) * | 2024-01-12 | 2024-04-05 | 杭银消费金融股份有限公司 | Pedestrian credit data risk management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110866819A (en) | Automatic credit scoring card generation method based on meta-learning | |
CN109255506B (en) | Internet financial user loan overdue prediction method based on big data | |
CN108898479B (en) | Credit evaluation model construction method and device | |
CN109657947B (en) | Enterprise industry classification-oriented anomaly detection method | |
Utari et al. | Implementation of data mining for drop-out prediction using random forest method | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
CN113537807B (en) | Intelligent wind control method and equipment for enterprises | |
CN113256409A (en) | Bank retail customer attrition prediction method based on machine learning | |
CN107392217B (en) | Computer-implemented information processing method and device | |
Biecek et al. | Enabling machine learning algorithms for credit scoring--explainable artificial intelligence (XAI) methods for clear understanding complex predictive models | |
Rofik et al. | The Optimization of Credit Scoring Model Using Stacking Ensemble Learning and Oversampling Techniques | |
CN113591947A (en) | Power data clustering method and device based on power consumption behaviors and storage medium | |
CN112163731A (en) | Special transformer user electric charge recovery risk identification method based on weighted random forest | |
CN115271442A (en) | Modeling method and system for evaluating enterprise growth based on natural language | |
Raei et al. | A hybrid model for estimating the probability of default of corporate customers | |
Islam et al. | Forecasting of bank performance using hybrid machine learning techniques | |
KR20220074327A (en) | Loan regular auditing system using artificia intellicence | |
Zeng | A comparison study on the era of internet finance China construction of credit scoring system model | |
Ragab et al. | Intelligent data mining For automatic face recognition | |
Dhandayudam et al. | Rough set approach for characterizing customer behavior | |
AlSaif | Large scale data mining for banking credit risk prediction | |
CN113435655B (en) | Sector dynamic management decision method, server and system | |
Sayan et al. | A Review of Customer Segmentation Methods: The Case of Investment Sector | |
Bisht et al. | Principal component analysis and correlation coefficient-based decision-making approach for stock portfolio selection | |
Wen | Application of Clustering Algorithm in Corporate Strategy and Risk |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200306 |
|
WD01 | Invention patent application deemed withdrawn after publication |