CN112801775A - Client credit evaluation method and device - Google Patents

Client credit evaluation method and device Download PDF

Info

Publication number
CN112801775A
CN112801775A CN202110123755.0A CN202110123755A CN112801775A CN 112801775 A CN112801775 A CN 112801775A CN 202110123755 A CN202110123755 A CN 202110123755A CN 112801775 A CN112801775 A CN 112801775A
Authority
CN
China
Prior art keywords
data
credit evaluation
model
credit
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110123755.0A
Other languages
Chinese (zh)
Inventor
刘吉超
王文春
侯海波
吴欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110123755.0A priority Critical patent/CN112801775A/en
Publication of CN112801775A publication Critical patent/CN112801775A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a client credit evaluation method and a device, wherein the method comprises the following steps: collecting credit evaluation data of a client to be evaluated; performing credit evaluation on the client to be evaluated according to the credit evaluation data and a pre-established credit evaluation model; the credit evaluation model is obtained by performing model training by using input characteristic data, and the input characteristic data is determined from the credit evaluation data for the preset model training after characteristic selection and discrete processing according to a preset information value threshold. The invention belongs to the field of Internet of things and can also be used in the field of finance, and provides an effective customer credit evaluation method, aiming at credit business in a bank, effectively improving the risk control capability of the bank, reducing the auditing cost of the bank and improving the wind control efficiency and accuracy of the bank.

Description

Client credit evaluation method and device
Technical Field
The invention relates to a data processing technology, in particular to a client credit evaluation method and a client credit evaluation device.
Background
Currently, with the development of economic society, the internet financial products of banks are developing at a high speed. While providing credit service on the internet, the bank needs to quickly and comprehensively review the information of the credit customer so as to reduce the default risk of the customer. In this case, the original manual review method cannot meet the practical requirement. A set of credit scoring methods trained through machine learning gradually occupy the market due to the excellent performance of the credit scoring methods, which are fast and efficient.
In the prior art, most of the common customer credit scoring methods adopt a single model algorithm, such as: SVM, neural networks, tree model algorithms, and the like. However, the models trained by these algorithms at the present stage have some problems, on one hand, the existing algorithms generate nonlinear and discontinuous functions to fit user data, which reduces the interpretability of the models; in addition, in the characteristic engineering stage of the model construction process, the methods do not well explore useful information in data, waste of information is caused, and accuracy of model prediction is reduced. In addition, the traditional credit scoring method based on a single model cannot comprehensively depict the credit overall of the user, and the single model is also deficient in generalization.
Disclosure of Invention
Aiming at the defects of credit evaluation in the prior art, the invention provides a client credit evaluation method, which comprises the following steps:
collecting credit evaluation data of a client to be evaluated;
performing credit evaluation on the client to be evaluated according to the credit evaluation data and a pre-established credit evaluation model; the credit evaluation model is obtained by performing model training by using input characteristic data, and the input characteristic data is determined from the credit evaluation data for the preset model training after characteristic selection and discrete processing according to a preset information value threshold.
In an embodiment of the present invention, the credit evaluation data includes: the system comprises client personal information characteristic data, client credit transaction characteristic data, client property condition characteristic data, client behavior preference characteristic data, client operator information characteristic data and overdue identification data.
In the embodiment of the invention, the method comprises the following steps: training by using preset credit evaluation data and determining an established credit evaluation model by using a preset initial credit evaluation model; wherein the content of the first and second substances,
acquiring preset credit evaluation data for model training;
performing feature selection on the credit evaluation data to determine candidate feature data;
discretizing the candidate feature data;
determining input feature data from the candidate feature data according to the information value of the discretized candidate feature data and a preset information value threshold;
and training a preset initial credit evaluation model according to the determined input characteristic data to determine the established credit evaluation model.
In the embodiment of the present invention, before performing feature selection on the credit evaluation data to determine candidate feature data, the method includes:
and preprocessing the credit evaluation data, and deleting repeated data and abnormal data in the credit evaluation data.
In the embodiment of the present invention, the determining candidate feature data by performing feature selection on the credit evaluation data includes:
determining first candidate feature data from the credit rating data using a supervised machine learning algorithm;
determining second candidate feature data from the credit evaluation data using an unsupervised machine learning algorithm;
and taking the first candidate feature data and the second candidate feature data as the candidate feature data.
In this embodiment of the present invention, the discretizing the candidate feature data includes:
and discretizing the candidate characteristic data by using chi-square sub-boxes.
In an embodiment of the present invention, the determining input feature data from the candidate feature data according to the information value of the discretized candidate feature data and a preset information value threshold includes:
determining the proportion of overdue customers and the proportion of non-overdue customers in the discretized candidate feature data;
determining the information value of the discretized candidate feature data according to the weight value of the discretized candidate feature data, the proportion of overdue customers and the proportion of non-overdue customers in the candidate feature data;
the discretization candidate feature data with the information value larger than a preset information value threshold value is used as first input feature data;
training the discretized candidate feature data with the information value not greater than the preset information value threshold by using a lightGBM model to determine default probability values of the samples as second input feature data;
and taking the first input characteristic data and the second input characteristic data as determined input characteristic data.
In an embodiment of the present invention, the training of the preset initial credit evaluation model according to the determined input feature data to determine the established credit evaluation model includes:
acquiring training set data and test set data from the input feature data by utilizing hierarchical sampling;
performing model training and verification on a preset initial credit evaluation model by using the training set data and the test set data;
taking the credit evaluation model parameter corresponding to the KS index maximum value as the established credit evaluation model parameter to determine the established credit evaluation model; and the KS index is the maximum difference value between the cumulative distribution ratios of the non-overdue accounts and the overdue accounts.
Meanwhile, the invention also provides a client credit evaluation device, which comprises:
the data acquisition module is used for acquiring credit evaluation data of the client to be evaluated;
the evaluation module is used for carrying out credit evaluation on the client to be evaluated according to the credit evaluation data and a pre-established credit evaluation model; the credit evaluation model is obtained by performing model training by using input characteristic data, and the input characteristic data is determined from the credit evaluation data for the preset model training after characteristic selection and discrete processing according to a preset information value threshold.
In the embodiment of the invention, the device comprises: the model establishing module is used for training by utilizing preset credit evaluation data and determining an established credit evaluation model by utilizing a preset initial credit evaluation model; wherein the model building module comprises:
the training data acquisition unit is used for acquiring preset credit evaluation data for model training;
the candidate characteristic unit is used for performing characteristic selection on the credit evaluation data to determine candidate characteristic data;
the discretization unit is used for discretizing the candidate feature data;
the input characteristic determining unit is used for determining input characteristic data from the candidate characteristic data according to the information value of the discretized candidate characteristic data and a preset information value threshold;
and the training unit is used for training a preset initial credit evaluation model according to the determined input characteristic data to determine the established credit evaluation model.
Meanwhile, the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the method when executing the computer program.
Meanwhile, the invention also provides a computer readable storage medium, and a computer program for executing the method is stored in the computer readable storage medium.
According to the client credit evaluation method provided by the invention, credit evaluation is carried out on a client to be evaluated according to credit evaluation data of the client to be evaluated and a pre-established credit evaluation model; the credit evaluation method comprises the steps of determining input characteristic data from credit evaluation data for feature selection and discrete processing of preset model training according to a preset information value threshold, carrying out model training according to the determined input characteristic data to obtain a credit evaluation model, and designing an effective customer credit evaluation method through a multi-model fusion method.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for evaluating customer credit provided by the present invention;
FIG. 2 is a flow chart of an embodiment of the present invention;
FIG. 3 is a block diagram of a client credit evaluation device provided by the present invention;
fig. 4 is a schematic diagram of an embodiment of an electronic device provided in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The credit service of a bank is an important service plate. With the increase of credit demand of people, how to effectively reduce the overdue risk of users in credit business is an urgent problem to be solved. Aiming at the defects of the existing credit scoring method, the invention provides a client credit evaluation method, as shown in figure 1, which comprises the following steps:
step S101, collecting credit evaluation data of a client to be evaluated;
step S102, performing credit evaluation on the client to be evaluated according to the credit evaluation data and a pre-established credit evaluation model; the credit evaluation model is obtained by performing model training by using input characteristic data, and the input characteristic data is determined from the credit evaluation data for the preset model training after characteristic selection and discrete processing according to a preset information value threshold.
According to the client credit evaluation method provided by the invention, the credit evaluation data for model training is subjected to feature selection and discrete processing, the input feature data is determined according to the preset information value threshold, the determined input feature data is used for model training to determine the credit evaluation model, and various methods adopted in feature selection remove redundant features in data more comprehensively and effectively, so that a solid foundation is laid for subsequent model construction. Compared with the existing single model method, the scheme of multi-model fusion can better mine information in user data, improves the accuracy of credit scoring of customers, and greatly improves the generalization performance of the scheme.
In an embodiment of the present invention, the credit evaluation data includes: the system comprises client personal information characteristic data, client credit transaction characteristic data, client property condition characteristic data, client behavior preference characteristic data, client operator information characteristic data and overdue identification data.
The credit evaluation method provided by the invention has the advantages that the obtained user data is more comprehensive, the design is more reasonable, more effective, comprehensive and accurate evaluation can be obtained, the client can be more comprehensively known, the data can be more objectively analyzed, and the credit evaluation of the client is facilitated.
Specifically, the information data of the client, i.e., the credit evaluation data, collected in the embodiment of the present invention mainly includes the following aspects:
customer base case profile data comprising: data such as customer age, home address, mobile phone number, etc.;
client credit transaction characteristic data comprising: data such as credit repayment records and credit account historical records in the last N years (N is not less than 1);
customer property status characteristic data comprising: data such as the deposit condition of the client, real property assets, the loan amount and the like;
customer behavior preference profile data comprising: data of the conditions of activities such as shopping, financing, payment and the like;
operator information characteristic data of a customer, comprising: data such as recent call condition, call duration and the like;
overdue identification data, i.e., identifying whether the customer is overdue: no (no overdue client, in this embodiment, such user is "good user", i.e., good), and yes (such overdue client, in this embodiment, such user is "bad user", i.e., bad) in 0 and yes in 1.
In the embodiment of the invention, the method comprises the following steps: training by using preset credit evaluation data and determining an established credit evaluation model by using a preset initial credit evaluation model; wherein the content of the first and second substances,
acquiring preset credit evaluation data for model training;
performing feature selection on the credit evaluation data to determine candidate feature data;
discretizing the candidate feature data;
determining input feature data from the candidate feature data according to the information value of the discretized candidate feature data and a preset information value threshold;
and training a preset initial credit evaluation model according to the determined input characteristic data to determine the established credit evaluation model.
Specifically, in the embodiment of the present invention, before performing feature selection on the credit evaluation data to determine candidate feature data, the method includes:
and preprocessing the credit evaluation data, and deleting repeated data and abnormal data in the credit evaluation data.
The quality of the data plays a critical role in the accuracy of the model. However, real data is usually "dirty" data, and there may be instances of incomplete, inconsistent, unreasonable, repeated data, etc. data in the data. Therefore, in the embodiment of the invention, corresponding preprocessing operation is performed on the original data, the quality of the data is improved, and the distribution condition of each characteristic in the data is analyzed.
In the embodiment of the invention, a deletion strategy is adopted for processing both the repeated data and the abnormal data in the data; wherein, discrete data in the data is processed by using a labelencode code; and then carrying out normalization processing on the whole data.
In the embodiment of the present invention, the determining candidate feature data by performing feature selection on the credit evaluation data includes:
determining first candidate feature data from the credit rating data using a supervised machine learning algorithm;
determining second candidate feature data from the credit evaluation data using an unsupervised machine learning algorithm;
and taking the first candidate feature data and the second candidate feature data as the candidate feature data.
The feature set in the data plays a crucial role in the performance of the final model. After the preprocessing of the data in the previous steps, the data is also a high-dimensional state, which is a challenge for the training of the model, and some redundant features and low-correlation features exist in the high-dimensional data. Therefore, the feature dimension reduction work, i.e., feature selection, is required.
In the feature selection section, in this embodiment, a method of both supervised and unsupervised aspects is adopted to select the features.
The method comprises the following steps: and (3) hierarchically sampling the data set, dividing the training set and the test set into a lightGBM model, then training, adjusting model parameters, observing model indexes, and outputting the feature importance of each feature of the model when the model reaches an optimal state. Setting a threshold value, and selecting the features above the threshold value as candidate features.
The unsupervised method comprises the following steps: the correlation of each feature and label is calculated using the method of F-test (joint hypothesis test) in statistics. The p value of each feature is calculated by the F-test, and in the present embodiment, the correlation between the feature with the p value less than 0.05 and the label (label) is considered to be strong, so that this part of the features is selected as candidate features.
In the embodiment of the present invention, discretizing the candidate feature data includes:
and discretizing the candidate characteristic data by using chi-square sub-boxes.
Chi-Square binning is the merging together of adjacent bins with the smallest chi-Square value, if two adjacent bins have very similar class distributions, then the two bins can be merged, otherwise they should remain separate.
The characteristics after the binning processing have better robustness on abnormal data, and the generalization capability of the model is improved. For example, data such as 200,300 in age, may be binned to >80 bins, but would be very intrusive if directly introduced into the model training
And the binning process may convert variables to similar scales. For example, the revenue is 1000, 10000, million, etc., and can be discretized into 0 (low revenue), 1 (medium revenue), 2 (high revenue), etc.
And the discrete variables are processed through box separation to facilitate characteristic crossing. For example, grouping the discrete features with continuous variables, constructing mean, variance, etc. Missing values and abnormal values can be effectively processed, and the missing values can be combined into one box.
In an embodiment of the present invention, the determining input feature data from the candidate feature data according to the information value of the discretized candidate feature data and a preset information value threshold includes:
determining the proportion of overdue customers and the proportion of non-overdue customers in the discretized candidate feature data;
determining the information value of the discretized candidate feature data according to the weight value of the discretized candidate feature data, the proportion of overdue customers and the proportion of non-overdue customers in the candidate feature data;
the discretization candidate feature data with the information value larger than a preset information value threshold value is used as first input feature data;
training the discretized candidate feature data with the information value not greater than the preset information value threshold by using a lightGBM model to determine default probability values of the samples as second input feature data;
and taking the first input characteristic data and the second input characteristic data as determined input characteristic data.
In the embodiment of the invention, after the binning is finished, an IV value (IV, the information amount of a certain piece of information is measured) of each feature is calculated, and the IV value can measure the prediction capability of one feature.
In the embodiment of the present invention, to calculate the IV value of a feature, the WOE value (weight of evidence) of each box of the feature is first required. Assuming that good is a good client (i.e., a non-overdue client) and bad is a bad client (an overdue client), the WOE value of each box after binning is calculated as the following formula (1):
Figure BDA0002923014860000081
the result of WOE is the number of good customers in this box divided by the total number of good customers divided by the number of bad customers in this box divided by the total number of bad customers, i.e. the percentage of good customers divided by the percentage of bad customers. WOE reflects the difference between the default user to normal user occupancy per packet of the argument and the default user to normal user occupancy in the population; therefore, the WOE can be intuitively considered to contain the influence of the self-taken value on the target variable.
Iv (information values) is a formula for measuring the amount of information of a certain piece of information, and is shown in the following formula (2):
Figure BDA0002923014860000082
it can be seen from equation (2) that the IV is calculated by subtracting the bad fraction multiplied by the corresponding WOE value from the good fraction in each box, and finally adding the values of each box to obtain the IV value, where the IV value is actually a weighted sum of the WOE values, and this weighting is mainly to eliminate the error caused by the number difference in each packet. Therefore, the magnitude of the IV value calculated by the feature can be regarded as the magnitude of the predictive ability of the feature.
After the binning is completed according to the above formula, each characteristic IV value can be calculated, and then the characteristic with the IV value greater than 0.02 (the characteristic with the IV value greater than 0.02 is taken as a strongly correlated characteristic in the present embodiment) is taken as an input characteristic of the logistic model, i.e. the first input characteristic data.
In order to fully utilize the data, useful information in the data is mined, and the generalization capability of the model is improved. In the embodiment of the invention, the data set is reconstructed by the feature set with the IV value less than 0.02, the distribution consistency with the original data is kept as much as possible when the divided training set and the test set are used, and the influence of extra deviation introduced by data division on the training result is reduced. Therefore, a hierarchical sampling method is adopted in the division, 70% of good clients and 30% of bad clients of the original data are respectively selected as training sets at random, the training sets are put into a lightGBM model for training, the probability value obtained by the model through final training can be used as the default probability value of each sample, and the probability value can also be used as the input feature of a logistic model and the second input feature data.
In an embodiment of the present invention, the training of the preset initial credit evaluation model according to the determined input feature data to determine the established credit evaluation model includes:
acquiring training set data and test set data from the input feature data by utilizing hierarchical sampling;
performing model training and verification on a preset initial credit evaluation model by using the training set data and the test set data;
taking the credit evaluation model parameter corresponding to the KS index maximum value as the established credit evaluation model parameter to determine the established credit evaluation model; and the KS index is the maximum difference value between the cumulative distribution ratios of the non-overdue accounts and the overdue accounts.
In the embodiment of the invention, ten-fold cross validation is carried out on the training set and the testing set, the logistic model parameters are adjusted by observing the KS index (the KS index is the largest difference between the accumulated distribution proportion of good accounts and bad accounts, the greater the distance between the good accounts and the bad accounts is, the stronger the distinguishing capability of the model is), and finally the model with the largest KS index is used as the final credit scoring model by adjustment.
Compared with the prior art, the client credit evaluation method provided by the invention has the advantages that the data of the user is acquired more comprehensively, the design is more reasonable, more effective, comprehensive and accurate evaluation can be obtained, the client can be more comprehensively known, the data can be more objectively analyzed, and the credit evaluation of the client is facilitated. The method has the advantages that redundant features in data are removed more comprehensively and effectively by multiple methods adopted in feature engineering performed in feature selection, and a solid foundation is laid for subsequent model construction. Compared with the existing single model method, the scheme of multi-model fusion can better mine the information in the user data, improve the accuracy of the credit score of the client, and greatly improve the generalization performance of the scheme.
Aiming at the defects of the existing credit scoring method, the embodiment of the invention designs an effective customer credit scoring method by a multi-model fusion method, effectively improves the risk control capability of a bank, reduces the auditing cost of the bank and improves the wind control efficiency and accuracy of the bank aiming at the credit business in the bank.
As shown in fig. 2, a flow chart of a customer credit scoring method based on multi-model fusion for a business process provided by an embodiment of the present invention is provided.
The scheme mainly comprises the following steps:
step 1: preprocessing data;
the data and features determine the ceiling of the model algorithm, and the model and algorithm only approximate this upper bound. The quality of the data plays a critical role in the accuracy of the model. However, real data is usually "dirty" data, and there may be instances of incomplete, inconsistent, unreasonable, repeated data, etc. data in the data. Therefore, the embodiment performs corresponding preprocessing operation on the original data, improves the quality of the data, and analyzes the distribution condition of each feature in the data.
In the step, both the repeated data and the abnormal data in the data are processed by adopting a deletion strategy; discrete data in the data are processed by using a labelencode code; and then carrying out normalization processing on the whole data.
Step 2: selecting characteristics;
the feature set in the data plays a crucial role in the performance of the final model. After the preprocessing of the data, the data is also a high-dimensional state, which is a challenge for training the model, and some redundant features and low-correlation features exist in the high-dimensional data. Therefore, in the embodiment of the present invention, feature dimension reduction, that is, feature selection, is performed.
In the feature selection section, in the present embodiment, two types of machine learning methods, namely supervised learning and unsupervised learning, are adopted to select the features. Wherein the content of the first and second substances,
the method comprises the following steps: and (3) hierarchically sampling the data set, dividing the training set and the test set into a lightGBM model, then training, adjusting model parameters, observing model indexes, and outputting the feature importance of each feature of the model when the model reaches an optimal state. In the embodiment of the invention, a threshold is set, and the features above the threshold are selected as candidate features.
The unsupervised method comprises the following steps: the correlation of each feature and label is calculated using the method of F-test (joint hypothesis test) in statistics. The p value of each feature is calculated through the F test, and in the embodiment of the invention, the correlation between the feature with the p value less than 0.05 and label is considered to be strong, so that the feature is selected as a candidate feature.
And step 3: characteristic binning;
the feature binning is to discretize continuous variables and combine the discrete variables of multiple states into a few states. The characteristic binning can effectively process missing values and abnormal values in the characteristics, reduce the overfitting risk of the model and improve the generalization capability of the model. The chi-square binning is used in the model in the embodiments of the present invention.
The characteristic box separation has the following advantages:
1. the characteristic after the box separation has better robustness to abnormal data, and the generalization capability of the model is improved. For example, the age contains data such as 200, 300, and the data can be classified into 80 boxes, but the data can cause great interference to the model if the data is directly transmitted into the model training;
2. feature binning may convert variables to similar scales. For example, the revenue is 1000, 10000, million, etc., and can be discretized into 0 (low revenue), 1 (medium revenue), 2 (high revenue), etc.
3. The discretized variables are conveniently cross-characterized in embodiments of the invention. For example, grouping the discrete features with continuous variables, constructing mean, variance, etc.
4. Missing values and abnormal values can be effectively processed, and the missing values can be combined into one box.
The model of the embodiment of the invention adopts chi-square binning, which has the basic idea that: having the smallest chi-squared value means that adjacent bins are merged together, if two adjacent bins have very similar class distributions, then the two bins can be merged, otherwise they should remain separate.
And 4, step 4: reconstructing the data set;
after binning is completed, an IV value of each feature is calculated (IV, the information amount of a certain piece of information is measured by image values), and the IV value can measure the prediction capability of one feature.
In the embodiment of the present invention, to calculate the IV value of a feature, the WOE value (weight of evidence) of each box of the feature is first required. Assuming that good is a good customer and bad is a bad customer, the WOE value of each box after binning is calculated as the following formula (1):
Figure BDA0002923014860000111
the result of WOE is the number of good customers in this box divided by the total number of good customers divided by the number of bad customers in this box divided by the total number of bad customers, i.e. the percentage of good customers divided by the percentage of bad customers. WOE reflects the difference between the default user to normal user occupancy per packet of the argument and the default user to normal user occupancy in the population; therefore, the WOE can be intuitively considered to contain the influence of the self-taken value on the target variable.
Iv (information values) is a formula for measuring the amount of information of a certain piece of information, and is shown in the following formula (2):
Figure BDA0002923014860000112
it can be seen from equation (2) that the IV is calculated by subtracting the bad fraction multiplied by the corresponding WOE value from the good fraction in each box, and finally adding the values of each box to obtain the IV value, where the IV value is actually a weighted sum of the WOE values, and this weighting is mainly to eliminate the error caused by the number difference in each packet. Therefore, the magnitude of the IV value calculated by the feature can be regarded as the magnitude of the predictive ability of the feature.
After the binning is completed according to the above formula, each characteristic IV value can be calculated, and then the characteristics with IV value greater than 0.02 (the characteristics with IV value greater than 0.02 are taken as strongly correlated characteristics in the summary of the embodiment of the present invention) are used as the input characteristics of the logistic model.
And 5: constructing a model;
in order to fully utilize the data, useful information in the data is mined, and the generalization capability of the model is improved. In the embodiment of the invention, the data set is reconstructed by the feature set with the IV value less than 0.02, the distribution consistency with the original data is kept as much as possible when the divided training set and the test set are used, and the influence of extra deviation introduced by data division on the training result is reduced. Therefore, a hierarchical sampling method is adopted in the division, 70% of good clients and 30% of bad clients of the original data are respectively selected as training sets at random, the training sets are put into a lightGBM model for training as test sets, the probability value obtained by the model through final training can be used as the default probability value of each sample, and the probability value can also be used as the input feature of a logistic model.
Then, a training set and a test set are divided by adopting the same hierarchical sampling method, ten-fold cross validation is carried out, logistic model parameters are adjusted by observing KS indexes (the KS indexes are the largest difference between the accumulated distribution proportion of good accounts and bad accounts, the greater the distance between the good accounts and the bad accounts is, the stronger the distinguishing capability of the model is), and finally the model with the largest KS index is used as a final credit scoring model by adjustment.
The following is the calculation formula for the credit card, and the probability value derived by logistic can be converted into a credit score by the following formula:
Score=A-Blog(Odds)
Figure BDA0002923014860000121
wherein p is a probability value obtained by the logistic regression model, A, B is a preset adjusting parameter, Score is a final credit Score, and the final credit Score is used for credit evaluation of the client.
Compared with the prior art, the invention has the advantages that:
1. compared with the prior art, the method designed in the text has the advantages that the obtained data of the user is more comprehensive and the design is more reasonable, so that more effective, comprehensive and accurate assessment can be obtained, the user can be more comprehensively known, the data can be more objectively analyzed, and the credit scoring of the user is facilitated.
2. In the method designed by the text, redundant features in data are more comprehensively and effectively removed by multiple methods adopted in feature engineering, and a solid foundation is laid for subsequent model construction.
3. Compared with the existing single model method, the method adopting the multi-model fusion scheme can better mine the information in the user data, improve the accuracy of the credit score of the client, and greatly improve the generalization performance of the scheme.
Meanwhile, the present invention also provides a client credit evaluation device, as shown in fig. 3, including:
the data acquisition module 301 is used for acquiring credit evaluation data of a client to be evaluated;
the evaluation module 302 is used for performing credit evaluation on the client to be evaluated according to the credit evaluation data and a pre-established credit evaluation model; the credit evaluation model is obtained by performing model training by using input characteristic data, and the input characteristic data is determined from the credit evaluation data for the preset model training after characteristic selection and discrete processing according to a preset information value threshold.
In the embodiment of the present invention, the apparatus further includes: the model establishing module is used for training by utilizing preset credit evaluation data and determining an established credit evaluation model by utilizing a preset initial credit evaluation model; wherein the model building module comprises:
the training data acquisition unit is used for acquiring preset credit evaluation data for model training;
the candidate characteristic unit is used for performing characteristic selection on the credit evaluation data to determine candidate characteristic data;
the discretization unit is used for discretizing the candidate feature data;
the input characteristic determining unit is used for determining input characteristic data from the candidate characteristic data according to the information value of the discretized candidate characteristic data and a preset information value threshold;
and the training unit is used for training a preset initial credit evaluation model according to the determined input characteristic data to determine the established credit evaluation model.
For those skilled in the art, the implementation of the client credit evaluation apparatus provided by the present invention can be clearly understood through the description of the foregoing embodiments, and details are not repeated herein.
The client credit evaluation method and device disclosed by the present disclosure may be applied to the evaluation of client credit in the financial field, and may also be applied to the evaluation method in any field other than the financial field.
The present embodiment also provides an electronic device, which may be a desktop computer, a tablet computer, a mobile terminal, and the like, but is not limited thereto. In this embodiment, the electronic device may refer to the embodiments of the method and the apparatus, and the contents thereof are incorporated herein, and repeated descriptions are omitted.
Fig. 4 is a schematic block diagram of a system configuration of an electronic apparatus 600 according to an embodiment of the present invention. As shown in fig. 4, the electronic device 600 may include a central processor 100 and a memory 140; the memory 140 is coupled to the central processor 100. Notably, this diagram is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the client credit rating function may be integrated into the central processor 100. The central processor 100 may be configured to control as follows:
collecting credit evaluation data of a client to be evaluated;
performing credit evaluation on the client to be evaluated according to the credit evaluation data and a pre-established credit evaluation model; the credit evaluation model is obtained by performing model training by using input characteristic data, and the input characteristic data is determined from the credit evaluation data for the preset model training after characteristic selection and discrete processing according to a preset information value threshold.
In another embodiment, the client credit evaluation device may be configured separately from the central processor 100, for example, the client credit evaluation device may be configured as a chip connected to the central processor 100, and the client credit evaluation function is realized by the control of the central processor.
As shown in fig. 4, the electronic device 600 may further include: communication module 110, input unit 120, audio processing unit 130, display 160, power supply 170. It is noted that the electronic device 600 does not necessarily include all of the components shown in fig. 4; furthermore, the electronic device 600 may also comprise components not shown in fig. 4, which may be referred to in the prior art.
As shown in fig. 4, the central processor 100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 100 receiving input and controlling the operation of the various components of the electronic device 600.
The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 100 may execute the program stored in the memory 140 to realize information storage or processing, etc.
The input unit 120 provides input to the cpu 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used to display an object to be displayed, such as an image or a character. The display may be, for example, an LCD display, but is not limited thereto.
The memory 140 may be a solid state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 600 by the central processing unit 100.
The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).
The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and receive audio input from the microphone 132 to implement general telecommunications functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, an audio processor 130 is also coupled to the central processor 100, so that recording on the local can be enabled through a microphone 132, and so that sound stored on the local can be played through a speaker 131.
An embodiment of the present invention further provides a computer-readable program, where when the program is executed in an electronic device, the program causes a computer to execute the client credit evaluation method in the electronic device according to the above embodiment.
An embodiment of the present invention further provides a storage medium storing a computer-readable program, where the computer-readable program enables a computer to execute the client credit evaluation described in the above embodiment in an electronic device.
The credit service of a bank is an important service plate. With the increase of credit demand of people, how to effectively reduce the overdue risk of users in credit business is an urgent problem to be solved. Aiming at the defects of the existing credit scoring method, the invention designs an effective customer credit scoring method by a multi-model fusion method, effectively improves the bank risk control capability, reduces the bank audit cost and improves the wind control efficiency and accuracy of the bank aiming at the credit business in the bank.
The preferred embodiments of the present invention have been described above with reference to the accompanying drawings. The many features and advantages of the embodiments are apparent from the detailed specification, and thus, it is intended by the appended claims to cover all such features and advantages of the embodiments that fall within the true spirit and scope thereof. Further, since numerous modifications and changes will readily occur to those skilled in the art, it is not desired to limit the embodiments of the invention to the exact construction and operation illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope thereof.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. A method for evaluating customer credit, the method comprising:
collecting credit evaluation data of a client to be evaluated;
performing credit evaluation on the client to be evaluated according to the credit evaluation data and a pre-established credit evaluation model; the credit evaluation model is obtained by performing model training by using input characteristic data, and the input characteristic data is determined from the credit evaluation data for the preset model training after characteristic selection and discrete processing according to a preset information value threshold.
2. The method of claim 1, wherein the credit rating data comprises: the system comprises client personal information characteristic data, client credit transaction characteristic data, client property condition characteristic data, client behavior preference characteristic data, client operator information characteristic data and overdue identification data.
3. The method for evaluating customer credit of claim 1, wherein the method comprises: training by using preset credit evaluation data and determining an established credit evaluation model by using a preset initial credit evaluation model; it includes:
acquiring preset credit evaluation data for model training;
performing feature selection on the credit evaluation data to determine candidate feature data;
discretizing the candidate feature data;
determining input feature data from the candidate feature data according to the information value of the discretized candidate feature data and a preset information value threshold;
and training a preset initial credit evaluation model according to the determined input characteristic data to determine the established credit evaluation model.
4. The method as claimed in claim 3, wherein said determining candidate feature data by feature selection of said credit rating data comprises:
and preprocessing the credit evaluation data, and deleting repeated data and abnormal data in the credit evaluation data.
5. The method as claimed in claim 3, wherein said determining candidate feature data by feature selection of said credit rating data comprises:
determining first candidate feature data from the credit rating data using a supervised machine learning algorithm;
determining second candidate feature data from the credit evaluation data using an unsupervised machine learning algorithm;
and taking the first candidate feature data and the second candidate feature data as the candidate feature data.
6. The customer credit evaluation method of claim 3, wherein the discretizing the candidate feature data comprises:
and discretizing the candidate characteristic data by using chi-square sub-boxes.
7. The customer credit evaluation method of claim 3, wherein the determining input feature data from the candidate feature data based on the information value of the discretized candidate feature data and a predetermined information value threshold comprises:
determining the proportion of overdue customers and the proportion of non-overdue customers in the discretized candidate feature data;
determining the information value of the discretized candidate feature data according to the weight value of the discretized candidate feature data, the proportion of overdue customers and the proportion of non-overdue customers in the candidate feature data;
the discretization candidate feature data with the information value larger than a preset information value threshold value is used as first input feature data;
training the discretized candidate feature data with the information value not greater than the preset information value threshold by using a lightGBM model to determine default probability values of the samples as second input feature data;
and taking the first input characteristic data and the second input characteristic data as determined input characteristic data.
8. The method as claimed in claim 3, wherein the step of training a predetermined initial credit evaluation model based on the determined input feature data to determine a built credit evaluation model comprises:
acquiring training set data and test set data from the input feature data by utilizing hierarchical sampling;
performing model training and verification on a preset initial credit evaluation model by using the training set data and the test set data;
taking the credit evaluation model parameter corresponding to the KS index maximum value as the established credit evaluation model parameter to determine the established credit evaluation model; and the KS index is the maximum difference value between the cumulative distribution ratios of the non-overdue accounts and the overdue accounts.
9. A client credit rating device, said device comprising:
the data acquisition module is used for acquiring credit evaluation data of the client to be evaluated;
the evaluation module is used for carrying out credit evaluation on the client to be evaluated according to the credit evaluation data and a pre-established credit evaluation model; the credit evaluation model is obtained by performing model training by using input characteristic data, and the input characteristic data is determined from the credit evaluation data for the preset model training after characteristic selection and discrete processing according to a preset information value threshold.
10. The client credit evaluation device of claim 9, wherein said device comprises: the model establishing module is used for training by utilizing preset credit evaluation data and determining an established credit evaluation model by utilizing a preset initial credit evaluation model; wherein the model building module comprises:
the training data acquisition unit is used for acquiring preset credit evaluation data for model training;
the candidate characteristic unit is used for performing characteristic selection on the credit evaluation data to determine candidate characteristic data;
the discretization unit is used for discretizing the candidate feature data;
the input characteristic determining unit is used for determining input characteristic data from the candidate characteristic data according to the information value of the discretized candidate characteristic data and a preset information value threshold;
and the training unit is used for training a preset initial credit evaluation model according to the determined input characteristic data to determine the established credit evaluation model.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 8 when executing the computer program.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 8.
CN202110123755.0A 2021-01-29 2021-01-29 Client credit evaluation method and device Pending CN112801775A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110123755.0A CN112801775A (en) 2021-01-29 2021-01-29 Client credit evaluation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110123755.0A CN112801775A (en) 2021-01-29 2021-01-29 Client credit evaluation method and device

Publications (1)

Publication Number Publication Date
CN112801775A true CN112801775A (en) 2021-05-14

Family

ID=75812718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110123755.0A Pending CN112801775A (en) 2021-01-29 2021-01-29 Client credit evaluation method and device

Country Status (1)

Country Link
CN (1) CN112801775A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113379528A (en) * 2021-05-25 2021-09-10 杭州搜车数据科技有限公司 Wind control model establishing method and device and risk control method
CN113554334A (en) * 2021-08-02 2021-10-26 上海明略人工智能(集团)有限公司 Method, system, device, server and storage medium for evaluating user recording behaviors
CN115086343A (en) * 2022-06-29 2022-09-20 青岛华正信息技术股份有限公司 Internet of things data interaction method and system based on artificial intelligence
CN116091206A (en) * 2023-01-31 2023-05-09 金电联行(北京)信息技术有限公司 Credit evaluation method, credit evaluation device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898476A (en) * 2018-06-14 2018-11-27 中国银行股份有限公司 A kind of loan customer credit-graded approach and device
CN110909963A (en) * 2018-09-14 2020-03-24 中国软件与技术服务股份有限公司 Credit scoring card model training method and taxpayer abnormal risk assessment method
CN111191731A (en) * 2020-01-02 2020-05-22 同盾控股有限公司 Data processing method and device, storage medium and electronic equipment
CN111311128A (en) * 2020-03-30 2020-06-19 百维金科(上海)信息科技有限公司 Consumption financial credit scoring card development method based on third-party data
CN111695084A (en) * 2020-04-26 2020-09-22 北京奇艺世纪科技有限公司 Model generation method, credit score generation method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898476A (en) * 2018-06-14 2018-11-27 中国银行股份有限公司 A kind of loan customer credit-graded approach and device
CN110909963A (en) * 2018-09-14 2020-03-24 中国软件与技术服务股份有限公司 Credit scoring card model training method and taxpayer abnormal risk assessment method
CN111191731A (en) * 2020-01-02 2020-05-22 同盾控股有限公司 Data processing method and device, storage medium and electronic equipment
CN111311128A (en) * 2020-03-30 2020-06-19 百维金科(上海)信息科技有限公司 Consumption financial credit scoring card development method based on third-party data
CN111695084A (en) * 2020-04-26 2020-09-22 北京奇艺世纪科技有限公司 Model generation method, credit score generation method, device, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113379528A (en) * 2021-05-25 2021-09-10 杭州搜车数据科技有限公司 Wind control model establishing method and device and risk control method
CN113554334A (en) * 2021-08-02 2021-10-26 上海明略人工智能(集团)有限公司 Method, system, device, server and storage medium for evaluating user recording behaviors
CN115086343A (en) * 2022-06-29 2022-09-20 青岛华正信息技术股份有限公司 Internet of things data interaction method and system based on artificial intelligence
CN115086343B (en) * 2022-06-29 2023-02-28 青岛华正信息技术股份有限公司 Internet of things data interaction method and system based on artificial intelligence
CN116091206A (en) * 2023-01-31 2023-05-09 金电联行(北京)信息技术有限公司 Credit evaluation method, credit evaluation device, electronic equipment and storage medium
CN116091206B (en) * 2023-01-31 2023-10-20 金电联行(北京)信息技术有限公司 Credit evaluation method, credit evaluation device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112801775A (en) Client credit evaluation method and device
Koh et al. A two-step method to construct credit scoring models with data mining techniques
CN110909984B (en) Business data processing model training method, business data processing method and device
CN111275546B (en) Financial customer fraud risk identification method and device
KR102009309B1 (en) Management automation system for financial products and management automation method using the same
CN111932268A (en) Enterprise risk identification method and device
CN112598294A (en) Method, device, machine readable medium and equipment for establishing scoring card model on line
Moreno-Moreno et al. Success factors in peer-to-business (P2B) crowdlending: A predictive approach
CN111882420A (en) Generation method of response rate, marketing method, model training method and device
CN112232947A (en) Loan risk prediction method and device
CN110634060A (en) User credit risk assessment method, system, device and storage medium
CN115409518A (en) User transaction risk early warning method and device
CN112232950A (en) Loan risk assessment method and device, equipment and computer-readable storage medium
CN112116454A (en) Credit evaluation method and device
Chen Empirical analysis of bitcoin price
Kaniovski et al. Risk assessment for credit portfolios: a coupled Markov chain model
CN116361542A (en) Product recommendation method, device, computer equipment and storage medium
JP7344609B2 (en) Data quantification method based on confirmed and estimated values
CN112101950B (en) Suspicious transaction monitoring model feature extraction method and suspicious transaction monitoring model feature extraction device
CN114565450A (en) Overdue common debt-based collection strategy determination method and related equipment
Bernhardt et al. Profiting from the poor in competitive lending markets with adverse selection
CN112085497A (en) User account data processing method and device
CN111768306A (en) Risk identification method and system based on intelligent data analysis
CN111932018B (en) Bank business performance contribution information prediction method and device
Purda et al. Consumer Credit Assessments in the Age of Big Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination