CN112613977A - Personal credit loan admission credit granting method and system based on government affair data - Google Patents

Personal credit loan admission credit granting method and system based on government affair data Download PDF

Info

Publication number
CN112613977A
CN112613977A CN202011498280.5A CN202011498280A CN112613977A CN 112613977 A CN112613977 A CN 112613977A CN 202011498280 A CN202011498280 A CN 202011498280A CN 112613977 A CN112613977 A CN 112613977A
Authority
CN
China
Prior art keywords
credit
government affair
client
model
admission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011498280.5A
Other languages
Chinese (zh)
Inventor
许晴
陈圳渠
姜晓楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202011498280.5A priority Critical patent/CN112613977A/en
Publication of CN112613977A publication Critical patent/CN112613977A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The invention discloses a method and a system for personal credit loan admittance credit based on government affair data, wherein the method comprises the following steps: screening the clients by using the admission strategy set and the differentiation rules of the model according to the government affair data to obtain the admitted clients; training the relation between the government affair data and the credit risk by adopting a machine learning set algorithm according to the government affair data, and constructing an admission model; carrying out credit risk assessment on the admitted client by using the admission model to obtain a client credit score; training the relationship between government affair data and credit line by adopting a machine learning set algorithm to construct a credit model; calculating the basic credit line of the client by using a credit model; adjusting risks according to the credit scores of the clients and the number of institutions sharing the debt of the clients; obtaining the final credit line of the client by using the risk coefficient and the basic credit line; the invention can reduce the pressure of admission approval, fully utilize the value of government affair data, and accurately control the risk of the financial institution per se while expanding the general financial business.

Description

Personal credit loan admission credit granting method and system based on government affair data
Technical Field
The invention relates to the field of Internet finance, in particular to a method and a system for personal credit loan admission and credit granting based on government affair data.
Background
In recent years, China has paid high attention to the common finance, a plurality of relevant policy documents are exported, the common finance needs to be developed vigorously, and farmers, low-income urban people, poor people, disabled people, old people and the like need to obtain financial services with reasonable price, convenience and safety in time. In this context, more and more financial institutions expand the general financial business, open up a new model of cooperation with governments that provide government affair data, and use the government affair data to perform credit evaluation on individual customers, and then provide online credit loan business on the general financial platform.
Most of the current loan services have the characteristics of large number of applications and small single amount of money, the loan needs to be an online application and online real-time approval mode is determined, and government affair data generally comprises data from financial institutions, market supervision authorities, committees, national and local resource authorities, tax authorities and the like, so that the problems of data disorder, poor quality of partial data and the like exist. Most of the loan products approved on line still adopt the traditional scoring card method, but the traditional scoring card method based on the government affair data has a plurality of difficulties, and the subjective nature of scoring causes that useful information is difficult to be extracted from the government affair data of multiple sources and the importance of different characteristics is difficult to be measured; and the client credit risk can be evaluated only by using limited data, and the comprehensive evaluation of the risk cannot be realized.
At present, the common personal credit loan admission and credit granting means is mainly the traditional scoring card method or the rule plus logistic regression. The scoring card method is difficult to extract useful information from government data of multiple sources, risks can be evaluated only by using limited data, the rule + logistic regression method cannot well process multi-classification features and cannot well fit nonlinear conditions, the two common methods cannot accurately control credit risks, and the credit risks of customers are difficult to comprehensively evaluate.
In view of the above, there is a need for a technical solution that can overcome the above problems and realize a comprehensive evaluation of the credit risk of the client,
disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a system for personal credit loan admission and credit granting based on government affair data, which can expand the general financial business based on the government affair data and simultaneously realize effective control and accurate credit granting of the risk of customers.
In a first aspect of an embodiment of the present invention, a method for granting credit for personal credit loan based on government affair data is provided, where the method includes:
acquiring government affair data;
screening the clients by utilizing an admission strategy set and differentiation rules of the model according to the government affair data to obtain admitted clients;
training the relation between the government affair data and the credit risk by adopting a machine learning set algorithm according to the government affair data to construct an admission model;
carrying out credit risk assessment on the client by using the access model to obtain a client credit score;
training the relation between the government affair data and the credit line by adopting a machine learning set algorithm according to the government affair data to construct a credit model;
calculating the basic credit line of the customer by using the credit model according to the government affair data;
carrying out risk adjustment according to the client credit score and the client debt sharing institution number to obtain a corresponding risk coefficient;
and obtaining the final credit line of the client by using the risk coefficient and the basic credit line.
Further, according to the government affair data, screening the client by using the admission strategy set and the differentiation rule of the model to obtain the admitted client, the method further comprises the following steps:
and carrying out personal data cleaning, personal characteristic derivation and personal characteristic screening on the government affair data to obtain the processed government affair data.
Further, according to the government affair data, screening the client by using the admission strategy set and the differentiation rule of the model to obtain the admitted client, comprising:
setting a first differentiation rule set and a second differentiation rule set;
judging whether the processed government affair data accord with a first differentiation rule set or not, if so, rejecting the corresponding client, and if not, reserving the corresponding client as an admission client;
and judging whether the government affair data accords with a second differential rule set in the access client, and if so, annotating prompt information in the government affair data.
Further, the personal data washing is carried out on the government affair data, and the personal data washing comprises the following steps:
and eliminating all fields with empty values in the data table, deleting records with abnormal values, and performing unit unified processing on fields with non-unified units to obtain the cleaned characteristics.
Further, the individual characteristic derivation is carried out on the government affair data, and comprises the following steps:
and aggregating and converting the government affair data, and deriving new characteristics including a summary value, a mean value, a maximum value and a minimum value from the original characteristics.
Further, the personal characteristic screening is carried out on the government affair data, and the personal characteristic screening comprises the following steps:
and removing the features with the missing value ratio larger than the first threshold value, and deleting the features with the correlation larger than the second threshold value to obtain the screened features.
Further, according to the government affair data, a machine learning set algorithm is adopted to train the relationship between the government affair data and the credit risk, and an admission model is constructed, wherein the admission model comprises the following steps:
setting a bad sample and a good sample according to personal credit information;
according to the government affair data, a modeling sample of a bad sample and a good sample is extracted according to a certain proportion;
randomly sampling modeling samples according to a preset proportion to obtain a training set and a test set;
and training the admission model by using the training set, verifying the admission model by using the test set, and training the relationship between government affair data and credit risk to obtain the trained admission model.
Further, credit risk assessment is carried out on the admitted client by utilizing the admission model to obtain a client credit score;
and predicting the probability of a bad sample of the client to be accessed by using the trained access model, and converting the predicted probability into a client credit score.
Further, according to the government affair data, a machine learning set algorithm is adopted to train the relation between the government affair data and the credit line, and a credit model is constructed, wherein the method comprises the following steps:
carrying out feature cleaning on the modeling sample, and dividing the cleaned modeling sample according to a preset proportion to obtain a training set and a testing set;
training the credit granting model by using the training set, verifying the credit granting model by using the test set, and training the relationship between government affair data and credit granting amount to obtain the trained credit granting model.
Further, adjusting risks according to the credit score of the client and the number of institutions sharing the debt of the client to obtain a corresponding risk coefficient, comprising:
and dividing the credit score of the client into a plurality of grades according to the score, wherein the corresponding risk coefficient is set according to the number of the debt institutions of the client in each grade.
Further, obtaining the final credit line of the client by using the risk coefficient and the basic credit line comprises:
the final credit line calculation formula of the client is as follows:
L=L1×C;
wherein, L is the final credit line of the client; l1 is the basic credit line; and C is a risk coefficient.
In a second aspect of the embodiments of the present invention, a personal credit loan admission granting system based on government affairs data is provided, the system including:
the data acquisition module is used for acquiring government affair data;
the admission screening module is used for screening the clients by utilizing the admission strategy set and the differentiation rules of the model according to the government affair data to obtain the admitted clients;
the admission model building module is used for training the relation between the government affair data and the credit risk by adopting a machine learning set algorithm according to the government affair data and building an admission model;
the credit evaluation module is used for carrying out credit risk evaluation on the client by using the access model to obtain a client credit score;
the credit granting model building module is used for training the relation between the government affair data and the credit granting amount by adopting a machine learning set algorithm according to the government affair data to build a credit granting model;
the basic credit line calculation module is used for calculating the basic credit line of the customer by using the credit line model according to the government affair data;
the risk adjustment module is used for carrying out risk adjustment according to the client credit score and the client common debt institution number to obtain a corresponding risk coefficient;
and the final credit calculation module is used for obtaining the final credit line of the client by utilizing the risk coefficient and the basic credit line.
In a third aspect of the embodiments of the present invention, a computer device is provided, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements a method for granting credit for personal credit based on government affairs data when executing the computer program.
In a fourth aspect of embodiments of the present invention, a computer-readable storage medium is presented, which stores a computer program that, when executed by a processor, implements a method of personal credit loan admission credit based on government data.
According to the personal credit loan admission credit granting method and system based on government affair data, the client is screened by using the admission strategy set and the differentiation rules of the model according to the government affair data through acquiring the government affair data, and the admitted client is acquired; training the relation between the government affair data and the credit risk by adopting a machine learning set algorithm according to the government affair data, and constructing an admission model; carrying out credit risk assessment on the admitted client by using the admission model to obtain a client credit score; training the relation between the government affair data and the credit line by adopting a machine learning set algorithm according to the government affair data to construct a credit model; calculating the basic credit line of the client by using a credit model according to government affair data; carrying out risk adjustment according to the credit score of the client and the corporate debt institution number of the client to obtain a corresponding risk coefficient; the final credit line of the client is obtained by utilizing the risk coefficient and the basic credit line, the pressure of admission approval is reduced, the value of government affair data is fully utilized, the credit risk, the assets, the repayment capacity and the repayment willingness of the client are accurately evaluated, and the risk of the financial institution is accurately controlled while the general financial business is expanded.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for granting credit for personal credit loan based on government affairs data according to an embodiment of the invention.
Fig. 2 is a schematic diagram of a personal loan admission process according to an embodiment of the invention.
FIG. 3 is a diagram illustrating indicators of developing a training set according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating indicators of development of a test set according to an embodiment of the present invention.
Fig. 5 is a schematic diagram illustrating a distribution ratio of bad clients in each scoring interval on the development test set according to an embodiment of the present invention.
Fig. 6 is a flow chart illustrating a personal loan granting operation according to an embodiment of the invention.
Fig. 7 is a schematic diagram of a personal credit loan admission and credit granting system architecture based on government affairs data according to an embodiment of the invention.
Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the invention, a personal credit loan admission credit granting method and a system based on government affair data are provided, and an application client is subjected to admission primary screening by establishing a public admission rule set of a bottom line and a non-bottom line as a first line of defense; then models are respectively constructed in two links of admission and credit granting based on machine learning, the admission model calculates the risk score of a client based on government affair data and accurately evaluates the credit risk of the client, the credit granting model calculates the final credit line of the client based on the government affair data and the risk score of the client, and accurately evaluates the assets, the debt returning capability and the debt returning willingness of the client; in the implementation process, the pressure of admission approval is reduced through the setting of the admission rule set, two models of admission and credit granting are constructed through a machine learning method, the value of government affair data is fully utilized, the potential risk of loan is effectively reduced, and the risk of a financial institution is accurately controlled while the general people are popular.
The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.
Fig. 1 is a schematic flow chart of a method for granting credit for personal credit loan based on government affairs data according to an embodiment of the invention. As shown in fig. 1, the method includes:
step S1, acquiring government affair data;
step S2, screening the clients by using the admission strategy set and the differentiation rules of the model according to the government affair data to obtain the admitted clients;
step S3, according to the government affair data, a machine learning set algorithm is adopted to train the relation between the government affair data and credit risks, and an admission model is constructed;
step S4, credit risk assessment is carried out on the admittance client by utilizing the admittance model to obtain a client credit score;
step S5, according to the government affair data, a machine learning set algorithm is adopted to train the relation between the government affair data and the credit line, and a credit model is constructed;
step S6, calculating the basic credit line of the client by using the credit model according to government affair data;
step S7, adjusting risks according to the client credit score and the client common debt institution number to obtain a corresponding risk coefficient;
and step S8, obtaining the final credit line of the client by using the risk coefficient and the basic credit line.
It should be noted that although the operations of the method of the present invention have been described in the above embodiments and the accompanying drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the operations shown must be performed, to achieve the desired results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
For a more clear explanation of the above method for granting credit for personal credit based on government affairs data, the following description will be made in detail with reference to each step.
Referring to fig. 2, a schematic diagram of a personal loan admission process according to an embodiment of the invention is shown. As shown in fig. 2, the individual loan admission is explained in connection with steps S1 to S4.
In step S1, government affair data is acquired.
Government data may be used for the admission of personal loans, wherein the data may be from multiple departments such as a reform agency, a national office, a financial institution, a cell phone carrier, and the like.
Due to the fact that data quality is uneven, for example, some data are seriously misplaced (sex information is put on an identity card field), some units are not uniform (fields such as annual income, mortgage value and the like have units of yuan and ten-thousand yuan), and the missing rate of some fields is as high as more than 90%. Therefore, after the analysis processing, the data finally used are shown in table 1, and mainly relate to the following aspects:
table 1 data table statistics
Figure BDA0002842846020000071
And step S2, screening the clients by using the admission strategy set and the differentiation rules of the model according to the government affair data to obtain the admitted clients.
Wherein, specifically include:
and step S21, performing personal data cleaning, personal feature derivation and personal feature screening on the government affair data to obtain the processed government affair data.
Step S22, setting a first differentiation rule set and a second differentiation rule set; judging whether the processed government affair data accord with a first differentiation rule set or not, if so, rejecting the corresponding client, and if not, reserving the corresponding client as an admission client; and judging whether the government affair data accords with a second differential rule set in the access client, and if so, annotating prompt information in the government affair data.
Wherein, personal data washing: personal data cleansing is primarily concerned with three areas: first, all fields with null values in the data table are removed. Second, some outliers, such as records with an individual annual income greater than 500 billion, are deleted. Finally, for fields with non-uniform units, such as elements and ten thousand elements, are unified into elements.
Personal characteristic derivation: the technology used in the step is a feature engine, and new features such as a summary value, an average value, a maximum value, a minimum value and the like are derived from original features mainly through aggregation and conversion, so that the learning capability of the model is improved after the model is input. For characteristic derivation of personal data, the following derivation methods are used:
1. transparent transmission: the transparent transmission means that the light is directly transmitted into the model for use without any processing. For the condition that only a single record exists in a single sample, such as age, gender and the like, the method can be directly used without the characteristics of multi-value aggregation, namely, a transparent transmission method is adopted, and the model is directly used.
2. Synthesizing: synthesis refers to combining individual features to form a composite feature. Which can be multiplied, divided or cartesian products to obtain new combination features. In addition, for the classified variables, such as profession, employment, etc., since the included categories are usually many, even thousands, some unusual categories are usually merged before the model is entered, which is also an embodiment of the composition.
3. Polymerization: when the aggregation is applicable to a situation that each person may have multiple records, a typical representative is personal credit data, each client has credit records in different institutions in the month, and at this time, when the credit line of a certain user needs to be counted, summation, averaging, extremum calculation and the like can be performed on the credit line fields in the multiple records, and meanwhile, the number of institutions granting credit to the client can be calculated.
4. Cross alignment: the cross-matching is mostly used to calculate the similarity, for example, the similarity between the home address in the credit information and the home address in the basic information. For the case that one person has multiple records, each record can be subjected to cross comparison, and then the similarity is subjected to aggregation operation, such as summation, averaging, extremum calculation and the like.
Personal characteristic screening: the screening at this stage involves two aspects, one is to remove the characteristic that the missing value accounts for more than 90%, and prevent the characteristic from limiting the distinguishing capability of the model; secondly, deleting the characteristics with the correlation larger than 0.9, thereby reducing the collinearity of the model.
Specifically, the detailed process of determining the rule set in step S22 is as follows:
before entering a module, a user is firstly verified by an admission rule set, the rule set is divided into a strong rule (a first differentiated rule set) and a weak rule (a second differentiated rule set), wherein the strong rule indicates that when the strong rule is hit by a certain rule, the user does not reach the admission requirement in the aspect and is directly rejected, and the admission score is directly output by 0 score and the hit rule is output at the same time; after the weak rule is hit, the user cannot be directly rejected, the score of the user is not influenced, and the rule is only output as prompt information when the admission score is output subsequently. The strong and weak rules covered by the personal admission rule set are shown in tables 2 and 3:
TABLE 2 Strong rule set
Figure BDA0002842846020000091
TABLE 3 Weak rule set
Figure BDA0002842846020000092
When the user is not hit by the rule, the prompt message of 'passing the admission rule' is output, and the user directly enters the admission model to obtain the corresponding admission score.
And step S3, training the relation between the government affair data and the credit risk by adopting a machine learning set algorithm according to the government affair data, and constructing an admission model.
Wherein, specifically include:
step S31, setting a bad sample and a good sample according to personal credit information;
step S32, according to the government affair data, extracting modeling samples of a bad sample and a good sample according to a certain proportion;
step S33, randomly sampling the modeling samples according to a preset proportion to obtain a training set and a testing set;
and step S34, training the admission model by using the training set, verifying the admission model by using the test set, and training the relationship between government affair data and credit risk to obtain the trained admission model.
And step S4, performing credit risk evaluation on the admittance client by using the admittance model to obtain a client credit score.
Wherein, specifically include:
and predicting the probability of a bad sample of the client to be accessed by using the trained access model, and converting the predicted probability into a client credit score.
The invention adopts a LightGBM algorithm: the LigthGBM algorithm is a new member of a boosting set model, and the main idea is to use a weak classifier (decision tree) to carry out iterative training to obtain an optimal model. The algorithm has the advantages of good training effect, high training efficiency, high accuracy, support of parallelization learning and the like.
Firstly, when constructing the LightGBM admission model, good customers and bad customers need to be defined:
the definition of good clients and bad clients is the definition of model labels, namely the predicted value of the model is whether a client is a good client or a bad client. The concept of good or bad customers comes from fields in personal credit information, specifically: the sum of the credit business balances (hereinafter simply referred to as the sum of balances) of the customer in the interest class, the secondary class, the suspect class, and the loss class for a month is greater than 0, and the sum of the balances in the previous month is equal to 0. For example, a customer may be considered a bad customer if the sum of their balances equals 0 in month 4 and exceeds 0 in months 5 and 6. Conversely, when the sum of the user's balances in all months equals 0, then a good customer is defined.
In this embodiment, to ensure the training effect of the model, samples in the following three cases are eliminated: the sum of the balances of the customers in the first-month batch data is more than 0, the sum of the balances of the customers in all the month batch data is more than 0, and the customers in all the month batch data are changed from bad to good.
In the embodiment, to ensure that the model can learn enough bad sample data, on the basis of the full sample, the ratio of good to bad is 9: 1, sampling to obtain a modeling sample.
In this embodiment, on the sampled modeling samples, the following steps are performed according to 7: and 3, randomly sampling, and dividing a development training set and a development testing set, wherein the former is used for training the admission model, and the latter is used for verifying the model.
The following describes model explanations and model indices.
The LightGBM is used as the most advanced integrated learning model frame at present, and compared with other integrated models, the LightGBM has the characteristics of directly supporting class characteristics, optimizing multithreading, supporting high-efficiency parallelism and the like. The core of the LightGBM is an integrated learning model using a Decision tree as a base classifier, namely, a nonlinear model gbdt (gradient Boosting Decision tree) algorithm. The method aims to gradually improve the base learners through an iterative process of a plurality of decision trees until the number of the base learners reaches a target value. Meanwhile, in order to solve the problems of low efficiency and low expansibility of the GBDT, a unilateral Gradient Sampling GOSS (Gradient-based One-Side Sampling) algorithm and an exclusive Feature binding EFB (exclusive Feature bundling) algorithm are adopted, wherein the former excludes most samples with small gradients from the aspect of reducing samples, and the latter binds exclusive features from the aspect of reducing features, combines the two and manages together, so that high-efficiency implementation is ensured.
Considering that the model is used to deal with the two-classification problem, namely, whether the model predicts that the user is tagged as good or bad, the model index may use KS value (Kolmogorov-Smirnov), i.e., the difference between the good and bad sample cumulative divisions, and AUC value (Area under the ROC Curve), i.e., the Area under the ROC Curve, as the evaluation index of the model. The larger the KS and AUC indexes are, the stronger the risk discrimination ability of the model is, as shown in FIG. 3 and FIG. 4, which are schematic diagrams of indexes for developing training sets and test sets, respectively.
In addition to the two indices, feature importance, as a method of scoring an input feature of a prediction model, reveals the relative importance of each feature when performing prediction. As shown in table 4, the top ten features of importance ranking are listed.
TABLE 4 top ten characteristics and their importance
Feature name Importance of features
Sum of loan balance 0.1759
Mean value of credit line 0.1287
Credit line>Number of mechanisms 0 0.09
Average value of personal income (peasant household) 0.0879
Maximum value of credit line 0.0721
Loan balance average 0.0661
Personal annual income (worker) 0.0599
Professional information 0.0491
Family population (peasant household) 0.0462
Age (age) 0.0383
By using the access model, users of the test set can be developed, and the probability that the sample is a bad sample is obtained. In order to improve the readability and interpretability of the output result of the model, the invention converts the prediction probability into the score, thereby facilitating the business personnel of the financial institution to quickly study and judge.
In the time of transferring score, firstly setting the basic score as 575 scores, p0575, the doubling score of the good sample probability is set to 70, pdo is set to 70, the base score is adjusted according to the following formula, and the base score is updated to p1
Figure BDA0002842846020000111
The score is calculated according to the following formula:
Figure BDA0002842846020000112
wherein, ln (odds) is the ratio of the probability that a user is a good sample to the probability that the user is a bad sample; using x to represent the probability that the user is a bad sample, the formula is:
Figure BDA0002842846020000121
the interval of scoring is limited to [300,850], i.e., scores beyond this range will be mapped to 300 or 850.
Referring to fig. 5, in order to develop the proportion of bad clients in each scoring interval on the test set, it can be seen that the proportion of bad samples is basically gradually reduced along with the increase of the scoring interval, and the reduction is particularly obvious in the interval with lower score.
After the admission process, the credit process is described in detail below.
Referring to fig. 6, a flow chart of the credit granting of the personal loan is shown. As shown in fig. 6, the credit model uses the modeling sample in the admission scoring model, and based on the government affair data characteristics, establishes a credit measurement model, and outputs a basic amount to the client, thereby completing the preliminary measurement of the individual client. And then constructing a risk coefficient matrix based on credit scores of the access model and the current common debt institution number, calculating the risk coefficient of each client through the risk coefficient matrix, and finally multiplying the basic quota of the client by the risk coefficient of the client to obtain the final measurement quota.
The following describes the individual loan granting credit with reference to fig. 6 and steps S5 to S8.
And step S5, training the relation between the government affair data and the credit line by adopting a machine learning set algorithm according to the government affair data, and constructing a credit model.
Wherein, specifically include:
step S51, performing feature cleaning on the modeling sample, and dividing the cleaned modeling sample according to a preset proportion to obtain a training set and a testing set;
the personal credit model is constructed by using a modeling sample in an access scoring model, dividing the sample into a training set and a testing set according to a ratio of 7:3, and using a 'current credit line average value of a financial institution' in a credit table as a label.
Since the mortgage class feature and credit feature have a great correlation with the tag, this part of the feature is filtered out. Secondly, considering that the two characteristics of the number of family members and the annual income have abnormal values, abnormal data beyond the normal range possibly causes model instability, and therefore the two characteristics are subjected to box separation processing. Finally, similar to the flow at the time of admission, the feature that the correlation is greater than 0.9 is deleted.
And step S52, training the credit granting model by using the training set, verifying the credit granting model by using the testing set, and training the relationship between government affair data and credit granting amount to obtain the trained credit granting model.
In the invention, when credit is granted, the selected model is also LightGBM, and after sample selection and characteristic screening processing, the model begins to be trained. Since a basic credit line is required to be output finally and belongs to numerical value output, a LightGBM regression model is adopted to train on a training set, and the characteristic importance of the model is shown in table 5.
TABLE 5 top ten characteristics and their importance
Feature name Importance of features
Age (age) 4122
Income (R) 1810
Total amount of investment 1583
Sex 1445
Number of family members 1386
Number of invested enterprises 1345
Degree of culture 616
Industry 589
Number of revenue items 544
Occupation of the world 510
Step S6, calculating the basic credit line of the client by using the credit model according to government affair data;
and step S7, adjusting risks according to the client credit score and the client debt institution number to obtain a corresponding risk coefficient.
The credit score of the customer can be divided into a plurality of ratings according to the score, wherein a corresponding risk coefficient is set in each rating according to the number of institutions sharing the debt of the customer.
In this embodiment, the risk adjustment may be done by a risk coefficient matrix. The risk coefficient matrix is determined by two factors, one is the credit score output by the admission model, and the other is the mechanism number of the current common debt. As shown in table 6, the credit is divided into five segments and given A, B, C, D, E five ratings, specifically divided into: a (700 min. or more), B (600 + 700 min.), C (500 + 600 min.), D (400 + 500 min.), E (400 min. or less). Dividing the current common debt institution number into four sections, and giving A, B, C, D four ratings, which are specifically divided into: a (0-1 common debt institution), B (2 common debt institution), C (3-4 common debt institution), D (more than 4 common debt institution).
And obtaining the risk coefficient corresponding to each client through the risk coefficient matrix of the common decision of the two ratings.
TABLE 6 Risk coefficients matrix
Figure BDA0002842846020000131
And step S8, obtaining the final credit line of the client by using the risk coefficient and the basic credit line.
Specifically, the final credit line calculation formula of the client is as follows:
L=L1×C;
wherein, L is the final credit line of the client; l1 is the basic credit line; and C is a risk coefficient.
The final credit line is obtained by the formula calculation, the line is output, and the process of the personal client admission credit is completed.
In the process of the invention for carrying out the admission and credit granting of the personal client, the innovation of algorithm selection is realized, and the machine learning algorithm LightGBM is used in the two stages of admission and credit granting, so that the complex relationship between the dependent variable and the independent variable can be captured more efficiently and accurately, and the more reliable internet credit wind control is realized; in the aspect of data utilization, on the basis of the government affair big data, the characteristics of high precision and high quality are fully utilized, the data of each organization are subjected to correlation analysis, and a new feature with business meaning is derived on the basis of the original data by a feature derivation technology, so that the value of the government affair big data is fully mined; the whole process is greatly improved, and layer-by-layer screening and layer-by-layer control are realized through the admission rules, the admission model and the credit granting model, so that the individual fraud risk can be effectively identified while the data value is maximized; in the risk adjustment link, the grading result of the access model is graded and matched with the corresponding risk coefficient, and the final credit line is the result of the credit model multiplied by the risk adjustment coefficient, so that the line bad risk is controlled.
Compared with the prior art, the personal credit loan admission credit based on the government affair data provided by the invention has at least the following advantages:
the model is more reliable: compared with the traditional expert scoring card, a linear regression method and logistic regression, the LightGBM is used as an advanced machine learning integration model, can better capture the nonlinear relation so as to achieve higher accuracy, is very friendly to a large amount of data, and has extremely high model development efficiency;
the safety coefficient is high: three-layer mechanism, serial work and adjustment of risk coefficients, compared with the traditional single model system, the risk identification capability and the handling control capability are greatly improved;
boost general financial service: the invention aims to provide the public, such as farmers and workers, with online mortgage-free, guarantee-free and personal credit loan service, is more reliable and more convenient compared with the traditional mortgage physical or folk loan mode, and can be used for bricking and tiling for the development of general finance
The service efficiency of the financial institution is improved: compared with the traditional offline loan service, the online loan service system makes full use of advanced technologies such as artificial intelligence and the like, creates a flow service system for financial institutions such as admission scoring, amount measurement and calculation and the like based on exploration of multidimensional government affair data, avoids the link of offline manual examination and verification, reduces the workload, and greatly improves the service efficiency of the financial institutions.
Having described the method of an exemplary embodiment of the present invention, a personal credit loan admission credit system based on government data of an exemplary embodiment of the present invention will be described next with reference to fig. 7.
The implementation of the personal credit loan admission and credit granting system based on the government affair data can be referred to the implementation of the method, and repeated details are omitted. The term "module" or "unit" used hereinafter may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Based on the same inventive concept, the invention also provides a personal credit loan admittance credit system based on government affair data, as shown in fig. 7, the system comprises:
a data obtaining module 710, configured to obtain government affair data;
the admission screening module 720 is used for screening the customers according to the government affair data by using the admission strategy set and the differentiation rules of the model to obtain the admitted customers;
the admission model building module 730 is used for training the relationship between the government affair data and the credit risk by adopting a machine learning set algorithm according to the government affair data to build an admission model;
the credit evaluation module 740 is used for performing credit risk evaluation on the client by using the admission model to obtain a client credit score;
the credit granting model building module 750 is used for training the relationship between the government affair data and the credit granting amount by adopting a machine learning set algorithm according to the government affair data to build a credit granting model;
the basic credit calculation module 760 is used for calculating the basic credit line of the customer by using the credit model according to government affair data;
a risk adjustment module 770, configured to perform risk adjustment according to the client credit score and the client debt institution number to obtain a corresponding risk coefficient;
and a final credit calculation module 780, configured to obtain a final credit line of the client by using the risk coefficient and the basic credit line.
It should be noted that although several modules of the personal credit loan admission credit system based on government data are mentioned in the above detailed description, this division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module according to embodiments of the invention. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
In this embodiment, the admission screening module 720 is specifically configured to:
and carrying out personal data cleaning, personal characteristic derivation and personal characteristic screening on the government affair data to obtain the processed government affair data.
In this embodiment, the admission screening module 720 is specifically configured to:
setting a first differentiation rule set and a second differentiation rule set;
judging whether the processed government affair data accord with a first differentiation rule set or not, if so, rejecting the corresponding client, and if not, reserving the corresponding client as an admission client;
and judging whether the government affair data accords with a second differential rule set in the access client, and if so, annotating prompt information in the government affair data.
In this embodiment, the admission model building module 730 is specifically configured to:
setting a bad sample and a good sample according to personal credit information;
according to the government affair data, a modeling sample of a bad sample and a good sample is extracted according to a certain proportion;
randomly sampling modeling samples according to a preset proportion to obtain a training set and a test set;
and training the admission model by using the training set, verifying the admission model by using the test set, and training the relationship between government affair data and credit risk to obtain the trained admission model.
In this embodiment, the credit evaluation module 740 is specifically configured to:
and predicting the probability of a bad sample of the client to be accessed by using the trained access model, and converting the predicted probability into a client credit score.
In this embodiment, the trust model building module 750 is specifically configured to:
carrying out feature cleaning on the modeling sample, and dividing the cleaned modeling sample according to a preset proportion to obtain a training set and a testing set;
training the credit granting model by using the training set, verifying the credit granting model by using the test set, and training the relationship between government affair data and credit granting amount to obtain the trained credit granting model.
In this embodiment, the risk adjustment module 770 is specifically configured to:
and dividing the credit score of the client into a plurality of grades according to the score, wherein the corresponding risk coefficient is set according to the number of the debt institutions of the client in each grade.
In this embodiment, the final credit calculation module 780 calculates the final credit amount of the client by using the following calculation formula:
L=L1×C;
wherein, L is the final credit line of the client; l1 is the basic credit line; and C is a risk coefficient.
Based on the above inventive concept, as shown in fig. 8, the present invention further provides a computer device 800, which includes a memory 810, a processor 820 and a computer program 830 stored in the memory 810 and operable on the processor 820, wherein the processor 820 executes the computer program 830 to implement the above-mentioned method for credit admission and credit granting based on government affairs data.
Based on the above inventive concept, the present invention provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the above method for granting credit for personal credit loan based on government affairs data.
According to the personal credit loan admission credit granting method and system based on government affair data, the client is screened by using the admission strategy set and the differentiation rules of the model according to the government affair data through acquiring the government affair data, and the admitted client is acquired; training the relation between the government affair data and the credit risk by adopting a machine learning set algorithm according to the government affair data, and constructing an admission model; carrying out credit risk assessment on the admitted client by using the admission model to obtain a client credit score; training the relation between the government affair data and the credit line by adopting a machine learning set algorithm according to the government affair data to construct a credit model; calculating the basic credit line of the client by using a credit model according to government affair data; carrying out risk adjustment according to the credit score of the client and the corporate debt institution number of the client to obtain a corresponding risk coefficient; the final credit line of the client is obtained by utilizing the risk coefficient and the basic credit line, the pressure of admission approval is reduced, the value of government affair data is fully utilized, the credit risk, the assets, the repayment capacity and the repayment willingness of the client are accurately evaluated, and the risk of the financial institution is accurately controlled while the general financial business is expanded.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (14)

1. A personal credit loan admission credit granting method based on government affair data is characterized by comprising the following steps:
acquiring government affair data;
screening the clients by utilizing an admission strategy set and differentiation rules of the model according to the government affair data to obtain admitted clients;
training the relation between the government affair data and the credit risk by adopting a machine learning set algorithm according to the government affair data to construct an admission model;
carrying out credit risk assessment on the client by using the access model to obtain a client credit score;
training the relation between the government affair data and the credit line by adopting a machine learning set algorithm according to the government affair data to construct a credit model;
calculating the basic credit line of the customer by using the credit model according to the government affair data;
carrying out risk adjustment according to the client credit score and the client debt sharing institution number to obtain a corresponding risk coefficient;
and obtaining the final credit line of the client by using the risk coefficient and the basic credit line.
2. The personal credit loan admission credit granting method based on government affair data as claimed in claim 1, wherein according to the government affair data, the client is screened by using the admission strategy set and the differentiation rules of the model to obtain the admitted client, further comprising:
and carrying out personal data cleaning, personal characteristic derivation and personal characteristic screening on the government affair data to obtain the processed government affair data.
3. The method for personal credit loan admission and credit granting based on government affair data as claimed in claim 2, wherein the step of screening the clients by using the admission strategy set and the differentiation rules of the model according to the government affair data to obtain the admitted clients comprises:
setting a first differentiation rule set and a second differentiation rule set;
judging whether the processed government affair data accord with a first differentiation rule set or not, if so, rejecting the corresponding client, and if not, reserving the corresponding client as an admission client;
and judging whether the government affair data accords with a second differential rule set in the access client, and if so, annotating prompt information in the government affair data.
4. The method for personal credit loan admission and credit granting based on government affairs data as claimed in claim 2, wherein the personal data cleaning of the government affairs data comprises:
and eliminating all fields with empty values in the data table, deleting records with abnormal values, and performing unit unified processing on fields with non-unified units to obtain the cleaned characteristics.
5. The method for personal credit loan admission credit based on government affairs data as claimed in claim 2, wherein the personal characteristic derivation of the government affairs data comprises:
and aggregating and converting the government affair data, and deriving new characteristics including a summary value, a mean value, a maximum value and a minimum value from the original characteristics.
6. The method for personal credit loan admission and credit granting based on government affairs data as claimed in claim 2, wherein the personal characteristic screening of the government affairs data comprises:
and removing the features with the missing value ratio larger than the first threshold value, and deleting the features with the correlation larger than the second threshold value to obtain the screened features.
7. The method for personal credit loan admission and credit granting based on government affair data as claimed in claim 1, wherein the method for training the relationship between the government affair data and credit risk by adopting a machine learning aggregation algorithm according to the government affair data to construct the admission model comprises the following steps:
setting a bad sample and a good sample according to personal credit information;
according to the government affair data, a modeling sample of a bad sample and a good sample is extracted according to a certain proportion;
randomly sampling modeling samples according to a preset proportion to obtain a training set and a test set;
and training the admission model by using the training set, verifying the admission model by using the test set, and training the relationship between government affair data and credit risk to obtain the trained admission model.
8. The personal credit loan admission granting method based on government affairs data according to claim 7, wherein credit risk assessment is performed on the admitted customers by using the admission model to obtain customer credit scores;
and predicting the probability of a bad sample of the client to be accessed by using the trained access model, and converting the predicted probability into a client credit score.
9. The method for personal credit loan admission and credit granting based on government affair data as claimed in claim 7, wherein the method for training the relationship between the government affair data and the credit granting amount by using a machine learning integration algorithm according to the government affair data to construct a credit granting model comprises the following steps:
carrying out feature cleaning on the modeling sample, and dividing the cleaned modeling sample according to a preset proportion to obtain a training set and a testing set;
training the credit granting model by using the training set, verifying the credit granting model by using the test set, and training the relationship between government affair data and credit granting amount to obtain the trained credit granting model.
10. The method of claim 1, wherein the risk adjustment is performed according to the credit score of the client and the number of institutions sharing the debt of the client to obtain the corresponding risk coefficient, and the method comprises:
and dividing the credit score of the client into a plurality of grades according to the score, wherein the corresponding risk coefficient is set according to the number of the debt institutions of the client in each grade.
11. The method of claim 10, wherein the obtaining of the final credit line of the client using the risk factor and the basic credit line comprises:
the final credit line calculation formula of the client is as follows:
L=L1×C;
wherein, L is the final credit line of the client; l1 is the basic credit line; and C is a risk coefficient.
12. A personal credit loan admission credit system based on government affairs data, the system comprising:
the data acquisition module is used for acquiring government affair data;
the admission screening module is used for screening the clients by utilizing the admission strategy set and the differentiation rules of the model according to the government affair data to obtain the admitted clients;
the admission model building module is used for training the relation between the government affair data and the credit risk by adopting a machine learning set algorithm according to the government affair data and building an admission model;
the credit evaluation module is used for carrying out credit risk evaluation on the client by using the access model to obtain a client credit score;
the credit granting model building module is used for training the relation between the government affair data and the credit granting amount by adopting a machine learning set algorithm according to the government affair data to build a credit granting model;
the basic credit line calculation module is used for calculating the basic credit line of the customer by using the credit line model according to the government affair data;
the risk adjustment module is used for carrying out risk adjustment according to the client credit score and the client common debt institution number to obtain a corresponding risk coefficient;
and the final credit calculation module is used for obtaining the final credit line of the client by utilizing the risk coefficient and the basic credit line.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 11 when executing the computer program.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 11.
CN202011498280.5A 2020-12-17 2020-12-17 Personal credit loan admission credit granting method and system based on government affair data Pending CN112613977A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011498280.5A CN112613977A (en) 2020-12-17 2020-12-17 Personal credit loan admission credit granting method and system based on government affair data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011498280.5A CN112613977A (en) 2020-12-17 2020-12-17 Personal credit loan admission credit granting method and system based on government affair data

Publications (1)

Publication Number Publication Date
CN112613977A true CN112613977A (en) 2021-04-06

Family

ID=75240280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011498280.5A Pending CN112613977A (en) 2020-12-17 2020-12-17 Personal credit loan admission credit granting method and system based on government affair data

Country Status (1)

Country Link
CN (1) CN112613977A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990369A (en) * 2021-04-26 2021-06-18 四川新网银行股份有限公司 Social network-based method and system for identifying waste escaping and debt behaviors
CN113256402A (en) * 2021-06-03 2021-08-13 上海冰鉴信息科技有限公司 Risk control rule determination method and device and electronic equipment
CN113362176A (en) * 2021-06-29 2021-09-07 中国农业银行股份有限公司 Data processing method and data processing device
CN113919933A (en) * 2021-08-25 2022-01-11 北京睿知图远科技有限公司 Client scoring verification method based on quality label
CN117114858A (en) * 2023-10-19 2023-11-24 湖南三湘银行股份有限公司 Collocation realization method of calculation checking formula based on averator expression

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990369A (en) * 2021-04-26 2021-06-18 四川新网银行股份有限公司 Social network-based method and system for identifying waste escaping and debt behaviors
CN113256402A (en) * 2021-06-03 2021-08-13 上海冰鉴信息科技有限公司 Risk control rule determination method and device and electronic equipment
CN113362176A (en) * 2021-06-29 2021-09-07 中国农业银行股份有限公司 Data processing method and data processing device
CN113362176B (en) * 2021-06-29 2024-03-22 中国农业银行股份有限公司 Data processing method and data processing device
CN113919933A (en) * 2021-08-25 2022-01-11 北京睿知图远科技有限公司 Client scoring verification method based on quality label
CN117114858A (en) * 2023-10-19 2023-11-24 湖南三湘银行股份有限公司 Collocation realization method of calculation checking formula based on averator expression
CN117114858B (en) * 2023-10-19 2024-03-19 湖南三湘银行股份有限公司 Collocation realization method of calculation checking formula based on averator expression

Similar Documents

Publication Publication Date Title
CN112613977A (en) Personal credit loan admission credit granting method and system based on government affair data
CN111652710B (en) Personal credit risk assessment method based on integrated tree feature extraction and Logistic regression
CN112700319A (en) Enterprise credit line determination method and device based on government affair data
CN112348654A (en) Automatic assessment method, system and readable storage medium for enterprise credit line
CN112541817A (en) Marketing response processing method and system for potential customers of personal consumption loan
CN112102073A (en) Credit risk control method and system, electronic device and readable storage medium
CN111401600A (en) Enterprise credit risk evaluation method and system based on incidence relation
CN112598500A (en) Credit processing method and system for non-limit client
CN111476660A (en) Intelligent wind control system and method based on data analysis
CN108492001A (en) A method of being used for guaranteed loan network risk management
CN114328461A (en) Big data analysis-based enterprise innovation and growth capacity evaluation method and system
CN113095927A (en) Method and device for identifying suspicious transactions of anti-money laundering
CN111062597A (en) Method and device for detecting criminal suspicion of financial statement of listed company
CN111951093A (en) Personal credit score scoring method
CN110796539A (en) Credit investigation evaluation method and device
CN112037006A (en) Credit risk identification method and device for small and micro enterprises
WO2022143431A1 (en) Method and apparatus for training anti-money laundering model
CN113177733B (en) Middle and small micro enterprise data modeling method and system based on convolutional neural network
CN115564591A (en) Financing product determination method and related equipment
Mao et al. Information system construction and research on preference of model by multi-class decision tree regression
CN114626940A (en) Data analysis method and device and electronic equipment
CN114154682A (en) Customer loan yield grade prediction method and system
CN114693428A (en) Data determination method and device, computer readable storage medium and electronic equipment
CN112633709A (en) Enterprise credit investigation evaluation method and device
Bakhshi et al. Developing a hybrid approach to credit priority based on accounting variables (using analytical network process (ANP) and multi-criteria decision-making)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination