CN112241916A - Personal credit risk default early warning method, device, equipment and storage medium - Google Patents
Personal credit risk default early warning method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112241916A CN112241916A CN202011141264.0A CN202011141264A CN112241916A CN 112241916 A CN112241916 A CN 112241916A CN 202011141264 A CN202011141264 A CN 202011141264A CN 112241916 A CN112241916 A CN 112241916A
- Authority
- CN
- China
- Prior art keywords
- sample
- samples
- risk
- overdue
- personal credit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 23
- 230000004927 fusion Effects 0.000 claims abstract description 14
- 238000007635 classification algorithm Methods 0.000 claims abstract description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 238000010219 correlation analysis Methods 0.000 claims description 5
- 238000000513 principal component analysis Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims description 2
- 238000011002 quantification Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000013210 evaluation model Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
Abstract
The invention provides a personal credit risk default early warning method, a device, equipment and a storage medium, wherein the method comprises the following steps: collecting samples, wherein the samples comprise positive samples and negative samples; performing missing feature completion on the positive sample with the missing features through feature fusion; grouping the samples by adopting a classification algorithm to obtain a plurality of sample groups; calculating the risk grade of different overdue periods of each sample group; adding a risk level label to the samples in each sample group; respectively carrying out model training on each sample group to obtain a plurality of trained risk default early warning models which correspond to the sample groups one by one; and inputting the sample to be early-warned into the selected risk default early-warning model so as to obtain the risk grade corresponding to the sample to be early-warned. The personal credit risk default early warning method can realize the early warning of the risk level of the borrower in the presentation period, thereby realizing the quantification and classification of the risk and helping the lending institution to formulate a more effective collection urging strategy.
Description
Technical Field
The invention relates to the field of credit risk early warning, in particular to a personal credit risk default early warning method and a personal credit risk default early warning device.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
The credit risk identification and early warning are very important steps for lenders, and early risk early warning can strive more time for lenders to clear, so that default loss is reduced, and systemic risks of the whole structure are avoided. Meanwhile, better customer group management can be supported, loan people groups with different risks can be served differently, and suitable products can be put forward in a targeted mode.
The essence of the credit risk is that the actual execution of the repayment willingness of the borrowed item by the borrower and the comprehensive judgment of the actual repayment capacity form an assessment of whether the borrower will be eligible for on-time full repayment. With the development of market economy after the innovation is opened, the traditional loan situation that the close relationship is guarantee is broken through. Rapid trading in the market has evolved to a credit economy based on human personal credit. The principal of lending to an individual has long been a bank, and recently various P2P companies have developed rapidly, have experienced rapid growth periods, competitive periods, thunderstorm periods, and are now in the period of rule tightening. The problem with these P2P companies is represented by the inability to efficiently identify loan risks, which in turn results in large area losses.
In view of the default caused by credit risk, many machine learning algorithm-based evaluation models are applied to the personal credit investigation field. Most assessment models are formed by processing information of a plurality of lenders into features, forming a training data set by taking whether default is used as a target value, and training the models through machine learning algorithms such as logistic regression, decision trees, gradient boosting trees, support vector machines and the like. And inputting the characteristics of the borrower to be predicted into the trained model, and judging whether the borrower will default according to the output result.
However, the default is the appearance of the characteristic after a certain period of time after the observation point, and the collected characteristic does not fully determine whether the lender will default in the future period of time, and is influenced by other uncontrollable factors. Therefore, only predicting whether the condition will be violated as an early warning index obviously cannot well satisfy richer early warning levels required by the business, and cannot evaluate whether the risk is in a controllable range, which is not beneficial for risk processing personnel to take different risk handling measures.
Disclosure of Invention
In order to overcome the defects of the existing personal credit risk default early warning method, the invention provides a novel personal credit risk default early warning method in a first aspect, and the specific technical scheme is as follows:
a personal credit risk breach early warning method, comprising:
collecting samples, wherein the samples comprise positive samples and negative samples, the positive samples are samples with overdue existence, and the negative samples are samples without overdue existence;
performing missing feature completion on the positive sample with the missing features through feature fusion;
grouping the samples by adopting a classification algorithm to obtain a plurality of sample groups;
calculating the risk grade of different overdue periods of each sample group;
adding a risk grade label to the samples in each sample group, wherein the risk grade label is constructed based on the overdue characteristics of the samples in the observation period and the presentation period and the risk grades of the sample groups to which the samples belong;
respectively carrying out model training on each sample group to obtain a plurality of trained risk default early warning models which correspond to the sample groups one by one;
and inputting the sample to be early-warned into the selected risk default early-warning model so as to obtain the risk grade corresponding to the sample to be early-warned.
The second aspect of the present invention provides a personal credit risk default early warning device, which comprises:
the sample acquisition module is used for acquiring samples, wherein the samples comprise positive samples and negative samples, the positive samples are samples with overdue existence, and the negative samples are samples without overdue existence;
the characteristic fusion module is used for supplementing the missing characteristics of the positive samples with the missing characteristics through characteristic fusion;
the grouping module is used for grouping the samples by adopting a classification algorithm to obtain a plurality of sample groups;
the risk grade calculation module is used for calculating the risk grade of different overdue times of each sample group;
the system comprises a label adding module, a risk grade label adding module and a risk grade constructing module, wherein the label adding module is used for adding a risk grade label to the samples in each sample group, and the risk grade label is constructed based on the overdue characteristics of the samples in the observation period and the presentation period and the risk grades of the sample groups to which the samples belong;
the training module is used for respectively carrying out model training on each sample group to obtain a plurality of trained risk default early warning models which correspond to the sample groups one by one;
and the early warning module is used for inputting the sample to be early warned into the selected risk default early warning model so as to obtain the risk grade corresponding to the sample to be early warned.
A third aspect of the present invention provides an electronic device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method for personal credit risk breach early warning of the first aspect of the present invention.
A fourth aspect of the present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the personal credit risk breach pre-warning method of the first aspect of the present invention.
Compared with the prior art, the personal credit risk default early warning method can realize the early warning of the risk level of the borrower in the presentation period, thereby realizing the quantification and classification of the risk and helping the lending institution to make a more effective collection urging strategy. In addition, the training samples are grouped, so that a plurality of risk default early warning models are trained, and the prediction precision of the risk grade is improved.
Drawings
FIG. 1 is a flow chart illustrating a method for pre-warning of personal credit risk default in an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a method for warning a personal credit risk breach in accordance with another embodiment of the present invention;
FIG. 3 is a flow chart illustrating a method for warning a personal credit risk breach in accordance with another embodiment of the present invention;
FIG. 4 is a flow chart illustrating a method for warning a personal credit risk breach in accordance with another embodiment of the present invention;
FIG. 5 is a logic structure diagram of an apparatus for early warning of personal credit risk default in an embodiment of the present invention;
fig. 6 is a logical structure diagram of an electronic device in an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Summary of the application
As described in the background section, most of the evaluation models in the prior art are formed by processing information of several lenders into features, forming a training data set with default as a target value, and training the models by machine learning algorithms such as logistic regression, decision tree, gradient boosting tree, support vector machine, and the like. And inputting the characteristics of the borrower to be predicted into the trained model, and judging whether the borrower will default according to the output result.
However, the default is the appearance of the characteristic after a certain period of time after the observation point, and the collected characteristic does not fully determine whether the lender will default in the future period of time, and is influenced by other uncontrollable factors. Therefore, only predicting whether the condition will be violated as an early warning index obviously cannot well satisfy richer early warning levels required by the business, and cannot evaluate whether the risk is in a controllable range, which is not beneficial for risk processing personnel to take different risk handling measures.
Therefore, it is necessary to perform quantization and grading processing on the default risk of the lender, and finally realize grading early warning on the risk grade of the lender in the future time period so as to make corresponding hastening measures.
In view of this consideration, the present invention first provides a risk quantification index.
As is known, in a personal credit loan transaction, a lending institution will agree with a lender for a predetermined number of payment periods, and each period is set with a corresponding payment amount.
The overdue number (M) is the number of overdue days between the actual payment date and the due payment date, and is classified as overdue state by section. The different divisions defined by different lending institutions may differ, for example, 30 days as one term, and the overdue numbers M0-M7 are as follows:
m0, not overdue;
m1, overdue for 1-30 days;
m2, overdue for 31-60 days;
m3, overdue for 61-90 days;
m4, overdue for 91-120 days;
m5, being 121 to 150 days after the expiration;
m6, being 151-180 days out;
m7, expiration of more than 180 days.
All past numbers beyond a certain number of overdues are denoted by Mx +, e.g., past-dates beyond M4 can be denoted by M4 +. A great deal of facts show that if the borrower moves to the M2 period when not repayment is carried out in the M1 period, and if an effective collection mechanism is not available, the possibility that the borrower still does not repay after the M3 period and the M4 period is very high.
According to the definition of the overdue number, the conversion of the overdue number can be induced. For example, M0- > M1 and M1- > M2 respectively indicate that the observation period is not over-term and is over-term for 1 to 30 days, and that the observation period is over-term and is over-term for 1 to 30 days and is over-term for 31 to 60 days.
The risk quantitative index provided by the invention is marked by the overdue period number migration in the observation period. The method comprises the following specific steps:
1) counting the number of borrowers entering M1 in the current month/the number of borrowers at the end of the previous month M0 as P in an observation period according to the collected borrower sample set01Calculate P in the same way12、P23、P34、P4+;
2) The total risk for the sample set was:
wherein: i starts at 0, indicating the start from stage M0. j denotes the new number of overdue entries from the last overdue entry. On the overdue time scale, the risk level is defined by the overdue time from short to long, the corresponding risk level is lower if the overdue time is short, and the occurrence risk level is higher if the overdue time is long.
Calculating the ratio of the risk amount of different overdue in the total risk amount:
such as: risk ratio of stage M1 is R1=P01The risk ratio of the/T, M2 phase is R2=P01P12The risk ratio of the/T, M3 phase is R3=P01P12P23/T,…。
The risk score is used to indicate the level of risk for each different overdue.
3) The lender sample will eventually arrive at an overdue number M within a selected time periodjRisk of (R)jI.e. the risk target value for the sample.
4) According to the condition that the total risk level value set by the service is U, the corresponding risk level of the lender sample is Rj.U。
Method embodiment
As shown in fig. 1, the method for warning the personal credit risk default provided by the embodiment of the present invention includes the following steps:
s100, collecting samples, wherein the samples comprise positive samples and negative samples, the positive samples are samples with overdue existence, and the negative samples are samples without overdue existence.
And (3) constructing a risk early warning model, wherein the acquisition of training sample data is required to be completed at first, namely the acquisition of the data related to the lender is realized. The sample data acquisition source comprises an internal database of the lending institution, various internet platform data, shared data of external institutions and the like.
The collected sample data includes samples with overdue (non-payment records exist within a certain overdue number), namely positive samples, and samples without overdue, namely negative samples. Of course, the number of samples of positive samples is much smaller than the number of samples of negative samples.
Optionally, the collected sample data is distributed and stored in a storage space arranged in advance, and is logically distinguished to generate a fact data layer, a dimension data layer and a statistical data layer. Wherein: the fact data layer is data with transaction attributes, including credit card transaction data, credit record data checked by a lender, credit investigation and approval record data and the like, and is characterized by large transaction data volume and few contained effective characteristics. The dimension data layer refers to data with dimension attributes, such as personal information of a borrower, and is characterized by small data volume and large effective characteristics. The statistical data layer is statistical data generated by performing statistical calculation on some indexes on the basis of other data layers.
And S200, carrying out missing feature completion on the positive samples with the missing features through feature fusion.
Since the number of positive samples is inherently less than the number of negative samples, each positive sample collected should be as reliable as possible to be finally used as model training data.
However, as known to those of ordinary skill in the art, for a selected model, when the missing features in the sample exceed a predetermined percentage (e.g., 10%), the sample cannot be used as training data for the model. Therefore, for a positive sample with missing features, it is necessary to adopt a proper feature fusion strategy to fill the missing features, so as to meet the model training requirement. Of course, the filled-in features need to match the positive sample as much as possible.
As shown in fig. 2, optionally, the missing feature filling the positive sample with the missing feature through feature fusion specifically includes:
s201, filling up the missing features of the positive sample with the missing features by using external data corresponding to the positive sample with the missing features.
For example, each sample datum may include a unique identification ID (e.g., an identification number of the corresponding lender). And collecting relevant data of the corresponding lender from an external data source according to the ID of the current positive sample with the missing characteristic. The missing features of the positive sample can be reduced by filling up the missing feature values with the corresponding feature values in the collected external data.
After the feature completion of the external data, if the missing feature of the positive sample is lower than the predetermined ratio at this time, the feature completion process is ended. Otherwise, step S202 is performed.
S202, at least one similar sample of the positive sample with the missing features is obtained through correlation analysis, and the corresponding features of the at least one similar sample are extracted to achieve completion of the missing features of the positive sample.
Optionally, as shown in fig. 3, step S202 includes the following sub-steps:
s2021, classifying the sample features to obtain a plurality of feature classifications.
If, optionally, the features are classified into five categories: the first category is personal finance category features, the second category is qualification category features (scholars, positions, etc.), the third category is credit category features, the fourth category is consumption category features, and the fifth category is behavior category features.
S2022, performing principal component analysis on the features in the feature classifications to achieve feature simplification.
Because each type of feature may include a large number of features, there may be dependencies between many features. In order to reduce the computational complexity of the correlation analysis and improve the correlation analysis effect, Principal Component Analysis (PCA) is selected to extract principal components of the features contained in each class of features so as to realize feature reduction.
S2023, calculating a plurality of Peachsen distances between the positive sample with the missing features and other samples under each feature classification, and when a predetermined number of the Peachsen distances in the plurality of Peachsen distances are smaller than a predetermined threshold value, the corresponding samples are similar samples.
The pearson correlation coefficient is as follows,
wherein XiDenotes the i-th feature, Y, of the sample XiRepresents the ith feature of the sample Y;represents the mean of the features X of the sample X,represents the mean of the sample Y features.
The Pearson distance is then: dX,Y=1-rX,Y。
Pearson's correlation coefficient falls within [ -1, 1 [)]Thus the Pearson distance dX,YFalls in [0, 2 ]]。
Taking the classification of features into five classes as an example, five Pearson distances between a current positive sample with missing features and a sample in the sample set are recorded as d1,d2,…,d5If all three of them are smaller than a predetermined threshold dthresholdThen the corresponding sample is considered to be a similar sample of the current positive sample in which the missing feature exists. After the traversal calculation, a plurality of similar samples can be found.
After a plurality of similar samples are obtained, for a certain missing feature, feature completion can be performed by using a feature value mode or a feature value average value of the feature corresponding to the similar samples. Of course, if only one similar sample is obtained, the corresponding feature values of the similar sample are directly used for feature completion.
The method ensures that the similar samples are similar under the condition of a plurality of classification features, and the similar samples judged as the samples cannot be caused by extremely high similarity of certain features, so that the features of the selected similar samples and the features which are not deleted meet the actual similarity.
And searching the missing features of the similar samples in the obtained similar samples, wherein the discrete features adopt mode values in the similar sample features. The continuous type numerical characteristics are averaged. So as to fill in missing features as much as possible and promote the positive samples contained in the in-mold training data.
S300, grouping the samples by adopting a classification algorithm to obtain a plurality of sample groups.
In order to improve the accuracy of personal credit risk early warning, the early warning model can better embody the risk characteristics of the lender. It is necessary to group the borrowers with customers. Different risk default early warning models are trained aiming at different customers in groups.
Optionally, as shown in fig. 4, step S300 includes the following sub-steps:
s301, selecting the grouping characteristics.
Optionally, the sample data is preprocessed to construct a suitable clustering characteristic.
For example, a calculation is performed based on the original features of the sample to obtain a predetermined index feature as a clustering feature, such as a revenue index, which is the consumption data of the lender in multiple sources, consumption revenue factor, and local revenue average. Wherein the consumption income factor is the annual income and expenditure ratio of the region where the lender is located. The growth index is log (graduation working year) + employment index published by lenders in the previous year.
S302, based on the grouping characteristics, grouping the samples by adopting a clustering algorithm to obtain a plurality of sample groups.
Optionally, a gaussian mixture (GMM) model clustering algorithm is used for classification, iterative computation is performed on the samples through an EM algorithm, and the method is ended when the number of iterations reaches a threshold value or the parameter change is smaller than the threshold value. Thereby obtaining a number of sample groups.
S303, calculating the population stability index of each sample group.
S304, selecting to reserve the sample group or combine the sample group with other sample groups based on the group stability index of each sample group.
Each sample population stability index PSI is a characterizing population stability index defined as follows:
wherein A isiRepresenting the actual distribution of the customer sample, EiRepresenting the expected distribution of the customer samples. In modeling, a training Sample (INS) is usually used as an expected distribution, and a verification Sample is usually used as an actual distribution. The validation samples generally include Out of Sample (OOS) and Out of Time samples (OOT).
When the PSI is in the range of 0-0.1, the group is relatively stable, and the group is kept.
When PSI is in the range of 0.1-0.25, it indicates that the cluster is slightly unstable, and if clusters with similar stability exist in the vicinity of the cluster, the clusters can be merged.
When PSI is greater than 0.25, the cluster should be merged with its nearest cluster.
Grouping has the following advantages: the model can be more effectively fitted in a group with smaller variance; the volatility caused by index differences such as regional economic development level can be reduced, and the clients with consistent risks but not similar consumption amounts can be identified as similar groups to obtain similar risks.
Optionally, after grouping the samples, visualizing the grouped sample groups by using a visualization technology, so as to visually implement the content of the state, risk level, group scale and the like of each sample group, thereby facilitating further mining and utilization of data.
S400, calculating the risk level of different overdue periods of each sample group.
After the grouping process, all samples are divided into several sample groups. Samples within the same group of samples are more similar.
The purpose of step S400 is to calculate the risk levels of different overdue rates for each sample group. The calculation process can refer to the related contents in the summary of the foregoing application. Specifically, the method comprises the following steps:
firstly, calculating the total risk amount characterized by all samples in the sample group:
then calculating the risk levels of different overdue times:
and S500, adding a risk grade label to the samples in each sample group, wherein the risk grade label is constructed based on the overdue characteristics of the samples in the observation period and the presentation period and the risk grades of the sample groups to which the samples belong.
The specific treatment process is as follows:
1. the risk observation period and the risk presentation period are determined, e.g., the risk observation period is set to 120 days and the risk presentation period is set to 30 days.
2. And constructing an overdue migration characteristic according to the overdue condition of the sample in the observation period in the risk observation period.
3. Determining a risk rating of the sample based on the determined risk presentation period.
In order to show the adding process of the risk level label more clearly, three samples in a certain sample group are taken as an example for description. For the sake of brevity, we retain only the overdue features of the sample, while omitting other features. Overdue feature DEF _ CNT _ X represents the unpaid amount of overdue X days, specifically:
sample 1: id is 10010, DEF _ CNT _30 is 3520, DEF _ CNT _60 is 2000, DEF _ CNT _90 is 1000, DEF _ CNT _120 is 1000, and DEF _ CNT _120+ is 1000.
The overdue characteristic of this sample at risk expression is DEF _ CNT _120+ with a characteristic value of 1000, which indicates that 1000 outstanding payments are present at risk expression. Recording the unreturned amount of the presentation period as D, the total debiting amount as M, and when D/M is not less than lambda, wherein: λ is a preset risk neglecting factor. At this point, the risk level of the sample is labeled R4U. Otherwise, the performance period is moved forward for 30 days in turn, and the D/M is compared with the risk neglect factor again.
Sample 2: id is 10035, DEF _ CNT _30 is 5000, DEF _ CNT _60 is 0, DEF _ CNT _90 is 0, DEF _ CNT _120 is 0, and DEF _ CNT _120+ (0).
The overdue eigenvalue of the sample at risk performance is 0, and the risk rating of the sample is labeled as 0.
Sample 3: id is 10052, DEF _ CNT _30 is 1500, DEF _ CNT _60 is 1500, DEF _ CNT _90 is None, DEF _ CNT _120 is None, and DEF _ CNT _120+ (None).
The sample had only data over 60 days, the restMissing overdue data, taking DEF _ CNT _60 in the presentation period, recording the unreturned amount of the presentation period D and the total debited amount M, and marking the risk grade of the sample as R when D/M is not less than lambda2.U。
After the above step S500 is performed, the samples in each sample group have obtained the risk level labels. So far, sample data has satisfied the model training needs.
S600, model training is respectively carried out on each sample group, and a plurality of trained risk default early warning models which correspond to the sample groups one by one are obtained.
Optionally, the LightGBM algorithm model is used as a risk breach early warning model, and the LightGBM algorithm is an integrated algorithm based on a gradient lifting tree principle. The method has the advantages that the method can realize the automatic processing of a small number of missing values, and the category characteristics can conveniently adopt a one-hot or one-vs-other mode.
And aiming at each sample group, obtaining a corresponding trained risk default early warning model.
S700, inputting the sample to be early-warned into the selected risk default early-warning model so as to obtain the risk grade corresponding to the sample to be early-warned.
Firstly, preprocessing a sample to be early-warned so that the data structure of the early-warned sample meets the requirement of a model.
And then selecting a proper risk default early warning model according to the data characteristics of the sample to be early warned.
And finally, inputting the sample to be early-warned into the selected risk default early-warning model, wherein the risk default early-warning model outputs the risk grade corresponding to the early-warned sample in the expression period.
And the lending mechanism makes a corresponding risk early warning scheme according to the risk level of the sample to be early warned in the presentation period. And if the risk level is low, reminding the lender to which the sample to be pre-warned belongs to repay through the short message. And when the risk level is a middle level, reminding the lender to which the sample to be early-warned belongs to repay through a telephone. And when the risk level is high, the borrower to which the sample to be early-warned belongs is required to pay by means of home urging.
Device embodiment
As shown in fig. 5, the personal credit risk default early warning device provided in the embodiment of the present invention includes a sample collection module 10, a feature fusion module 20, a clustering module 30, a risk level calculation module 40, a label adding module 50, a model training module 60, and an early warning module 70.
Wherein:
the sample collection module 10 is used for collecting samples, wherein the samples include a positive sample and a negative sample, the positive sample is a sample with overdue existence, and the negative sample is a sample without overdue existence;
the feature fusion module 20 is configured to perform missing feature completion on the positive samples with missing features through feature fusion;
the grouping module 30 is configured to group the samples by using a classification algorithm to obtain a plurality of sample groups;
the risk grade calculation module 40 is used for calculating the risk grade of different overdue rates of each sample group;
the label adding module 50 is used for adding risk grade labels to the samples in each sample group, wherein the risk grade labels are constructed based on the overdue characteristics of the samples in the observation period and the presentation period and the risk grades of the sample groups to which the samples belong;
the model training module 60 is configured to perform model training on each sample group, respectively, to obtain a plurality of trained risk breach early warning models corresponding to the plurality of sample groups one to one;
the early warning module 70 is configured to input the sample to be early warned into the selected risk default early warning model, so as to obtain a risk level corresponding to the sample to be early warned.
Since the processing procedure of each functional module of the personal credit risk default warning device in this embodiment is consistent with the processing procedure of the personal credit risk default warning method in the foregoing embodiment, the processing procedure of each functional module of the personal credit risk default warning device in this embodiment is not described repeatedly, and reference may be made to the related introduction of the personal credit risk default warning method in the foregoing embodiment.
Of course, each functional module may also include several sub-functional modules.
Example electronic device
Fig. 6 is a schematic structural diagram of an electronic device 80 according to an embodiment of the present disclosure, and as shown in fig. 6, the electronic device 80 includes a processor 81 and a memory 83, and the processor 81 is connected to the memory 83, for example, through a bus 82.
The processor 81 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. Processor 81 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
The memory 83 is used for storing application program codes of the present application, and is controlled to be executed by the processor 81. The processor 81 is configured to execute the application program code stored in the memory 83 to implement the personal credit risk breach pre-warning method in the foregoing embodiments.
The invention has been described above with a certain degree of particularity. It will be understood by those of ordinary skill in the art that the description of the embodiments is merely exemplary and that all changes that come within the true spirit and scope of the invention are desired to be protected. The scope of the invention is defined by the appended claims rather than by the foregoing description of the embodiments.
Claims (10)
1. A personal credit risk default early warning method is characterized by comprising the following steps:
collecting samples, wherein the samples comprise positive samples and negative samples, the positive samples are samples with overdue existence, and the negative samples are samples without overdue existence;
performing missing feature completion on the positive sample with the missing features through feature fusion;
grouping the samples by adopting a classification algorithm to obtain a plurality of sample groups;
calculating the risk grade of different overdue periods of each sample group;
adding a risk grade label to the samples in each sample group, wherein the risk grade label is constructed based on the overdue characteristics of the samples in the observation period and the presentation period and the risk grades of the sample groups to which the samples belong;
respectively carrying out model training on each sample group to obtain a plurality of trained risk default early warning models which correspond to the sample groups one by one;
and inputting the sample to be early-warned into the selected risk default early-warning model so as to obtain the risk grade corresponding to the sample to be early-warned.
2. The personal credit risk breach pre-warning method of claim 1, wherein the missing feature filling the positive sample with missing features through feature fusion comprises:
filling up the missing features of the positive sample with the missing features by using external data corresponding to the positive sample with the missing features;
and acquiring at least one similar sample of the positive sample with the missing features through correlation analysis, and extracting corresponding features of the at least one similar sample to complete the missing features of the positive sample.
3. The personal credit risk breach pre-warning method of claim 2, wherein the obtaining at least one similar sample of the positive samples with missing features by correlation analysis comprises:
classifying the characteristics of the sample to obtain a plurality of characteristic classifications;
performing principal component analysis on the features in each feature classification to realize feature simplification;
and calculating a plurality of Peachsen distances between the positive sample with the missing features and other samples under the plurality of feature classifications, wherein when a predetermined number of the Peachsen distances in the plurality of Peachsen distances are smaller than a predetermined threshold value, the corresponding samples are similar samples.
4. The personal credit risk breach pre-warning method of claim 1, wherein clustering samples using a classification algorithm comprises:
selecting grouping characteristics;
based on the grouping features, grouping the samples by adopting a clustering algorithm to obtain a plurality of sample groups;
calculating a population stability index for each of the sample groups;
selecting to retain or combine a group of samples with other groups of samples based on a population stability index for each of the groups of samples.
6. The personal credit risk breach pre-warning method of claim 1,
said calculating a risk rating for a different number of overdue periods for each of said sample groups comprises:
calculating the total risk amount of the sample group according to the following formula:
wherein: pijRepresents the conversion of the number of overdue periods i to the number of overdue periods j;
calculating the risk grades of different overdue rates according to the following formula:
7. The personal credit risk breach pre-warning method of claim 1, wherein the risk breach pre-warning model is a LightGBM model.
8. An apparatus for early warning of personal credit risk breach, comprising:
the sample acquisition module is used for acquiring samples, wherein the samples comprise positive samples and negative samples, the positive samples are samples with overdue existence, and the negative samples are samples without overdue existence;
the characteristic fusion module is used for supplementing the missing characteristics of the positive samples with the missing characteristics through characteristic fusion;
the grouping module is used for grouping the samples by adopting a classification algorithm to obtain a plurality of sample groups;
the risk grade calculation module is used for calculating the risk grade of different overdue times of each sample group;
the system comprises a label adding module, a risk grade label adding module and a risk grade constructing module, wherein the label adding module is used for adding a risk grade label to the samples in each sample group, and the risk grade label is constructed based on the overdue characteristics of the samples in the observation period and the presentation period and the risk grades of the sample groups to which the samples belong;
the model training module is used for respectively carrying out model training on each sample group to obtain a plurality of trained risk default early warning models which correspond to the sample groups one by one;
and the early warning module is used for inputting the sample to be early warned into the selected risk default early warning model so as to obtain the risk grade corresponding to the sample to be early warned.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the personal credit risk breach pre-warning method of any of claims 1-8 when the program is executed by the processor.
10. A storage medium, wherein the computer readable storage medium stores thereon a computer program, which when executed by a processor implements the personal credit risk breach pre-warning method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011141264.0A CN112241916A (en) | 2020-10-22 | 2020-10-22 | Personal credit risk default early warning method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011141264.0A CN112241916A (en) | 2020-10-22 | 2020-10-22 | Personal credit risk default early warning method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112241916A true CN112241916A (en) | 2021-01-19 |
Family
ID=74169890
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011141264.0A Pending CN112241916A (en) | 2020-10-22 | 2020-10-22 | Personal credit risk default early warning method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112241916A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112990989A (en) * | 2021-05-17 | 2021-06-18 | 太平金融科技服务(上海)有限公司深圳分公司 | Value prediction model input data generation method, device, equipment and medium |
CN113837863A (en) * | 2021-09-27 | 2021-12-24 | 上海冰鉴信息科技有限公司 | Business prediction model creation method and device and computer readable storage medium |
-
2020
- 2020-10-22 CN CN202011141264.0A patent/CN112241916A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112990989A (en) * | 2021-05-17 | 2021-06-18 | 太平金融科技服务(上海)有限公司深圳分公司 | Value prediction model input data generation method, device, equipment and medium |
CN112990989B (en) * | 2021-05-17 | 2021-07-30 | 太平金融科技服务(上海)有限公司深圳分公司 | Value prediction model input data generation method, device, equipment and medium |
CN113837863A (en) * | 2021-09-27 | 2021-12-24 | 上海冰鉴信息科技有限公司 | Business prediction model creation method and device and computer readable storage medium |
CN113837863B (en) * | 2021-09-27 | 2023-12-29 | 上海冰鉴信息科技有限公司 | Business prediction model creation method and device and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Huang et al. | CEO overconfidence and corporate debt maturity | |
Thomas | Consumer credit models: Pricing, profit and portfolios | |
Maghyereh et al. | Bank distress prediction: Empirical evidence from the Gulf Cooperation Council countries | |
US20200265512A1 (en) | System, method and computer program for underwriting and processing of loans using machine learning | |
CN107633265A (en) | For optimizing the data processing method and device of credit evaluation model | |
CN109657894A (en) | Credit Risk Assessment of Enterprise method for early warning, device, equipment and storage medium | |
Chen | Classifying credit ratings for Asian banks using integrating feature selection and the CPDA-based rough sets approach | |
CN111340236B (en) | Bond breach prediction method based on bond estimation data and integrated machine learning | |
CN112598500A (en) | Credit processing method and system for non-limit client | |
CN112241916A (en) | Personal credit risk default early warning method, device, equipment and storage medium | |
CN107590735A (en) | Data digging method and device for credit evaluation | |
Mun | Advanced analytical models: over 800 models and 300 applications from the basel II accord to Wall Street and beyond | |
Zhu et al. | Explainable prediction of loan default based on machine learning models | |
Hauzenberger et al. | Stochastic model specification in Markov switching vector error correction models | |
CN113344692A (en) | Method for establishing network loan credit risk assessment model with multi-information-source fusion | |
Zhou et al. | A two-stage credit scoring model based on random forest: Evidence from Chinese small firms | |
Modina | Credit Rating and Bank-Firm Relationships: New Models to Better Evaluate SMEs | |
Abdou et al. | Modelling risk for construction cost estimating and forecasting: a review | |
Chong et al. | Threshold effect of scale and skill in active mutual fund management | |
Cucaro | Measuring the" health" of Italian SMEs with insolvency prediction models Z'-ScoreM and D-Score: Measuring default index of Italian SMEs | |
Yin et al. | Supply Chain Financial Default Risk Early Warning System Based on Particle Swarm Optimization Algorithm | |
CN112732866A (en) | Investor emotion index construction method, heterogeneous subject market simulation method, equipment and medium | |
Guerra et al. | Value creation and investment projects: An application of fuzzy sensitivity analysis to project financing transactions | |
Lee et al. | Dynamic prediction of hedge fund survival in crisis-prone financial markets | |
Terzi et al. | Comparison of financial distress prediction models: Evidence from turkey |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210119 |