CN113610638A

CN113610638A - SMAA-DS-based rating system and method for matching credit rating with default loss rate

Info

Publication number: CN113610638A
Application number: CN202110962078.1A
Authority: CN
Inventors: 李刚; 马洪栋; 刘荣月; 张可心
Original assignee: Northeastern University Qinhuangdao Branch
Current assignee: Northeastern University Qinhuangdao Branch
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-11-05
Anticipated expiration: 2041-08-20
Also published as: CN113610638B

Abstract

The invention provides a system and a method for grading a credit grade matched with a default loss rate based on SMAA-DS, and relates to the technical field of credit evaluation. The method comprises a user login registration module, a user data management module and a user credit rating module, wherein a binary Logistic regression model between a single qualitative index and a default state is constructed by collecting historical data of all borrowers of credit loan services accumulated by a financial institution, and the binary Logistic regression model is divided into a discrete index and a continuous index according to index properties; performing multiple collinearity detection on the discrete index and the continuous index by using a Lasso-Logistic model, constructing a credit score index system with optimal overall default discrimination capability under two index types, and calculating the credit score of the borrower under the two index types; and establishing a credit rating division optimization model, combining the two index types in different proportions, determining the credit rating information of the borrower, and obtaining the credit rating of the user.

Description

SMAA-DS-based rating system and method for matching credit rating with default loss rate

Technical Field

The invention relates to the technical field of credit evaluation, in particular to a system and a method for rating a credit grade matched with default loss rate based on SMAA-DS.

Background

The credit score is a main tool for evaluating the credit condition of the borrower, and the financial institution or the borrower makes corresponding credit decisions according to the credit score and the credit level and effectively allocates credit resources including loan, loan interest rate, loan amount and the like. The specific method comprises the steps of analyzing a plurality of samples of historically default customers and non-default customers, mining key characteristics influencing whether the borrowers default or not from known data, establishing a mathematical model and measuring default risks of the borrowers.

In the current research on credit score, the credit score of the borrower is generally calculated by directly and comprehensively using all index information. In recent years, some scholars classify indices into hard information and soft information according to whether information can be accurately quantified and reliably delivered. The soft information can reduce the information asymmetry between the borrower and the lender to a certain extent, so that the lender and the loan platform can better evaluate the default risk of borrowing and the loan interest rate. The soft information and the hard information are distinguished, so that the problem of reverse selection brought by the financial market is solved, and the operation efficiency of the loan market is improved. In particular, the availability of loans is increased for borrowers and the investment risk is reduced for investors.

For the borrower, the soft information comprises the sex, the age, text information in loan description, photos published by the social network, online behaviors, personality, morality and other indexes of the borrower. The hard information comprises the indexes of annual income, working age, income ratio of liability, FICO score, cyclic loan utilization rate and the like. For small micro-enterprises, the hard information includes indexes such as pre-tax profit/gross assets (ROA), Short-term debt/equity, and cash/gross assets (cash/total asset), and the soft information includes intangible assets/fixed assets (intangible assets/fixed assets), development expenditure/sales (R & D/sales), and potential market situation (potential market) indexes.

The text information contains information such as the character organization ability of the borrower, the percentage of misspelled words, the borrowing purpose, the debt condition and the like, the repayment ability and the repayment willingness, so the text information is widely researched. The role of rientsgaem (2020) in crediting the loan assignment of the investor familiar with borrowers in the P2P loan is confirmed by the P2P loan data of the netherlands, which shows that the investor familiar with borrowers can invest before other P2P investors, and that the loan has a low probability of default. Weiguo Zhang et al (2020) propose a new approach to fully mining textual information in the description of loans. The results of the demonstration of loan data of the United states of America Lendingclub and Chinese people show that a combined model considering soft information and hard information of loan description text information is superior to a model only considering hard information in the indexes of AUC and G-mean of loan prediction. The statistical measurement model for distinguishing the index types considers different influences of soft information and hard information on default probability, and improves the accuracy of default judgment.

However, the existing research neglects the different influence of exploring the discrete index and the continuous index on the default probability from the angle of the index change mode and the change amplitude. Mathematically, a discrete index is a variable that has only a few values, and a continuous index is a variable that has an infinite number of values within a certain interval. In the field of credit scoring, a typical discrete indicator is the number of credit card accounts and a typical continuous indicator is annual revenue. The soft information and the hard information are crossed with the discrete type and the continuous type indexes, for example, the index of annual income belongs to the hard information from the classification of the hard information and the soft information, and belongs to the continuous type indexes from the classification of the discrete type indexes and the continuous type indexes. From the perspective of the index types, the mapping relationships between different index types and default states are different, and the default discrimination capabilities of different index types are also different.

In many practical decision problems, it is difficult for a decision maker to know the exact relevant information, or there may be a lack of information required for many decisions. Random multi-attribute acceptability analysis (SMAA) is a decision-making method to help a decision-maker make a multi-attribute decision in an uncertain situation. The method of the inverse weight space analysis can help the decision maker to find the best scheme under the condition of not knowing the accurate attribute value and the preference information of the decision maker. By the method of inverse weight space analysis, a lot of information can be obtained, and a decision maker can judge an optimal scheme according to the information.

Evidence theory was introduced in 1967 by Dempster, university of Harvard, USA, who his student Shafer promoted and perfected it, so evidence theory is also called Dempster-Shafer theory (D-S theory for short). Evidence theory is an uncertainty reasoning method that allows a problem to be treated as an evidence, the description of the problem as the proposition of supporting evidence, and the set of all possible propositions as the power set of supporting evidence. For example, in other words, the proposition of supporting evidence is a subset of the power set. Such as: if a criminal crimes a suspect, the suspect must be described as a subset of the power set { convicted } { guilty } and { indeterminate }. Evidence theory represents each proposition in the form of a confidence level due to the complexity of the problem or uncertainty of the subjective decision. For example, m (a) is 0.5, m (b) is 0.2, and m (c) is 0.3, which indicates that the confidence of the certificate to be guilty is 0.5, the confidence of the certificate to be guilty is 0.2, and the confidence of the certificate to be indeterminate is 0.3. However, in practical application, there is a conflict problem between evidences, the D-S theory sometimes cannot be fused to achieve reasonable effects, and especially in 1984, Zadeh proposes that the D-S theory has paradox phenomena, which greatly restricts the application of the D-S theory. In order to solve the conflict problem, an Evidence Reasoning (ER) method is proposed by the Yankee waves, the Wang Ying and the like, evidence weight is introduced into the method, an ER combination rule is adopted on the basis of a D-S theory, and the paradox problem during evidence synthesis is effectively solved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a system and a method for grading a credit grade based on SMAA-DS and matching default loss rate.

In order to solve the technical problems, the invention adopts the following technical scheme:

in one aspect, a system for rating SMAA-DS based credit ratings to match default loss rates, comprising: the system comprises a user login registration module, a user data management module and a user credit rating module.

The user login registration module is used for registering personal information of a borrower through a user page, wherein the personal information comprises a mobile phone number and a name, and the borrower enters a login page to log in an account after the registration is successful;

the user data management module clicks a borrowing button to enter a borrowing detailed information page after the borrower logs in, and index data of the borrower are added and modified on the borrowing detailed information page;

and the user credit rating module clicks a credit rating button to enter a credit rating information page after the borrower logs in, performs credit rating and displays the credit rating page.

On the other hand, a credit rating adjusting method based on matching of the SMAA-DS credit rating and the default loss rate is realized based on the SMAA-DS credit rating system, and specifically comprises the following steps:

step 1: preprocessing all borrower historical data of credit loan businesses accumulated by a financial institution;

step 1.1: carrying out standardization treatment on the quantitative index value by adopting a maximum and minimum standardization method;

step 1.2: matching the scoring result of the qualitative index value with the default state according to the principle that the larger the scoring result corresponding to the qualitative index value is, the lower the default probability is, and counting the corresponding relation between the qualitative index value and the default probability by using a data perspective tool of Excel, wherein the default number corresponding to the same qualitative index value of the default probability is divided by the total number;

step 1.3: the qualitative index value is scored using the efficacy coefficient method, as shown in the following formula:

let x_ijRepresenting the membership value of the jth index of the ith borrower; v_ijThe value of the jth index of the ith borrower is represented, and the efficiency coefficient method formula (1):

wherein:

and

is a satisfied value and an unallowed value of the jth index, and c and d are constants;

step 2: constructing a binary Logistic regression model between a single qualitative index and a default state, screening the single index with default discrimination capability by using Wald statistic, and then dividing the reserved single index into a discrete index and a continuous index according to the index property;

in the binary Logistic regression model, a set of independent and identically distributed observation data (X, Y) is assumed, wherein Y is a dependent variable, and Y ═ Y₁,y₂,…,y_n)，y_nThe default state of the nth borrower is shown, n represents a total of n observation data, namely the borrower, X is an independent variable of a J-dimensional vector, and X is (X ═ X)_i1,x_i2,…,x_ij…,x_iJ)，x_ijThe value of the jth index of the ith borrower is represented, J represents a total of J dimensions (indexes), y_iE {0,1 }. Setting y_i1 indicates a default by the borrower, y_i0 means that the borrower has no default, P (y)_i＝1|x_i)＝p_iIs represented by x_iKnown condition y_iProbability of 1, x_iIndicating the ith borrower, p_iAs shown in equation (2), where β is the coefficient vector.

The Logistic model can be obtained by transforming the equation (2), which is shown in equation (3).

Building Wald statistics, and performing significance test on regression coefficients of single indexes; let W_jFor the Wald statistical value of the jth index,

is an estimate of the coefficient of the j-th index, SE_βjIs a coefficient of beta_jSo the Wald statistic W of the j-th index_iAs shown in formula (4).

And step 3: performing multiple collinearity inspection on discrete indexes and continuous indexes by using a Lasso-Logistic model, constructing a credit score index system with optimal overall default discrimination capability under two index types, setting weight constraints by respectively using regression coefficients of Lasso-Logistic as the basis of the index weight, solving the optimal weights under the two index types by using a random multi-attribute acceptability analysis method SMAA, and calculating credit scores of borrowers under the two index types;

in the Lasso-Logistic model, a set of independent and identically distributed observed data (X, Y) is assumed, wherein Y is a dependent variable, and Y ═ Y₁,y₂,…,y_n) X is an argument of a p-dimensional vector, and X is (X)_i1,x_i2,…,x_ip) And n groups of data are provided, and each group of data comprises p independent variables and 1 dependent variable. When the dependent variable is a binary classification variable, the dependent variable is a repayment state of the borrower showing default or non-default conditions, and fitting is carried out by using a Lasso-Logistic regression model;

the parameter estimation of the Lasso-Logistic regression on the coefficient beta of n independent variables X is specifically shown in the formula (5).

Wherein, lambda is a parameter for adjusting the compression coefficient,

the degree of fit of the model is represented,

is a penalty to the variable coefficients in the model, since each observation data is independently and identically distributed, their joint distribution can be represented by the product of each marginal distribution, i.e. the likelihood function of n observations (borrowers) is represented by L (θ).

And calculating the credit score of the borrower in a linear addition mode by using the index score and the index weight in the feature combination. Suppose that

Weights (J1, 2,3, …, J), (K1, 2,3, …, K,), u, representing the kth simulation of the jth index_ijIndicating the normalized value of the ith borrower under the index j, the credit score of the ith borrower in the k simulation

As shown in formula (6):

credit score mean value of ith borrower in K times of simulation

As shown in formula (7):

and 4, step 4: establishing a credit grade division optimization model, wherein the optimization model consists of 1 objective function and 2 constraint conditions, the maximum square sum of default loss rate differences between adjacent credit grades is taken as the objective function, the higher the credit grade is, the lower the default loss rate is taken as the constraint condition 1, the default loss rate difference value of the adjacent grades is taken as the constraint condition 2, and the range from a to b times that the latter default loss rate difference value is the former difference value is taken as the constraint condition 2, so as to determine the credit grade of the borrower under the two index types.

Step 4.1: establishing an objective function f;

divide the borrower into L credit classes, LGD_lThe penalty loss rate for the ith credit rating is the objective function as shown in equation (8).

obj:maxf＝(LGD_l-LGD_l-1)²+(LGD_l-1-LGD_l-2)²+...+(LGD₂-LGD₁)² (8)

Step 4.2: establishment of constraint 1 and constraint 2

Constraint 1: the default loss rate of each credit rating increases as the credit rating decreases, as shown in equation (9):

s.t.:0<LGD₁<LGD₂<...<LGD_l≤1 (9)

constraint 2: and the default loss rate difference value of the adjacent grade is constrained, and the later default loss rate difference value is a-b times of the former difference value.

Let Δ LGD_l,l+1For the ith credit level LGD_lAnd the l +1 st credit level LGD_l+1The default loss rate difference (d), i.e., the difference between adjacent levels, is represented by equation (10).

ΔLGD_l,l+1＝LGD_l+1-LGD_l (10)

The default loss rate difference between adjacent levels is set as shown in equation (11).

ΔLGD_l,l+1＝[a,b]×ΔLGD_l-1,l (11)

Wherein the default loss rate LGD of the first credit rating_lAs shown in formula (12):

wherein, g_lThe number of borrower samples representing the first credit level; g_lTotal number of borrower samples representing first 1 credit levels, G_l＝g₁₊g₂+…+g_l-1；L_liIndicating the amount of default loss of the ith borrower in the ith credit rating(Loss)；R_liIndicating the receivable sum (Receivables) of the ith borrower in the ith credit rating.

And 5: respectively determining T credit scores of the borrowers based on an SMAA method under the two index types, and further comparing the T credit scores with threshold points of the credit grades to determine the probability that each borrower belongs to different credit grades;

taking the average value of the T-time scores as the credit score of the borrower, recording the credit score as the average value of the credit score, and determining the credit grade of the borrower according to the average value of the credit score; simulating the credit scores of T times, comparing the credit scores of T times with the threshold points of the credit grades, and determining the probability that each borrower belongs to different credit grades;

step 6: combining the two index types in different proportions, integrating the credit levels of the borrowers under the two index types by using an evidence theory ER combination rule, and comparing the default loss discrimination f under the different combination proportions by calculating, namely the target function f in the step 4.1, obtaining the optimal combination proportion of the discrete index and the continuous index as the final combination proportion of the two index types in the SMAA-DS model according to the maximum discrimination f, and inputting the proportion into the evidence theory ER combination rule to determine the credit level information of the borrowers;

the evidence theory ER combination rule is as follows:

step D1: calculating the BPA value of each evidence before the synthesis, wherein each evidence refers to the probability that the borrower belongs to all grades in the discrete index and the continuous index respectively

Let Θ be the recognition framework, 2 evidences m₁、m₂：2Θ→[0,1]。

Wherein w_oIs a relative weight of evidence o, and

o is the number of evidences. Beta is a_o(H_n) And beta_o(H) Respectively represent the focal unit

And

the degree of confidence of (a) is,

is the BPA value for evidence o, L represents the L-th level, for a total of L levels.

Step D2: and synthesizing the evidence by using an evidence theory ER combination rule, and calculating the BPA value of each synthesized evidence.

Wherein m is₁₂(H_l) A BPA value representing the first level to which the borrower is combined; k is an evidence conflict coefficient, and the greater K is the greater the conflict between the evidences is; b and C represent that the borrower belongs to a certain grade under the discrete index type and the continuous index type respectively;

is a process parameter;

respectively representing the belief ratios (probabilities) to be distributed in discrete type, continuous type, discrete type and continuous type indexes; m is₁(B),m₂(C) BPA (mass function) before combination for B and C, respectively.

Step D3: and converting the BPA value of each synthesized evidence into a confidence Bel, namely that the borrower belongs to a credit level H in the SMAA-DS model_lThe probability of (c).

And 7: the user logs in at the user login module, and then fills in and modifies personal information at the user data management module. And inputting the data input by the borrower in the user data management module into the user credit rating module, and clicking the user credit rating module by the user to display the credit rating.

The invention has the following beneficial effects:

the invention provides a credit rating system and a credit rating method based on matching of SMAA-DS rating and default loss rate, which have the following beneficial effects:

1. according to the invention, from the angle of index change amplitude and mode, a credit rating model constructed by a discrete type index and a continuous type index in a nonlinear mode meets the standard of 'higher credit grade, lower default loss rate', and has stronger capacity of distinguishing borrowers with different default possibilities.

2. The invention provides important reference for credit rating and credit decision of financial institutions or borrowers, and effectively allocates credit resources including loan and credit rating.

Drawings

FIG. 1 is a block diagram of the overall architecture of a credit rating system in an embodiment of the invention;

FIG. 2 is a general flow diagram of a credit rating method in an embodiment of the invention;

FIG. 3 is a comparison chart of credit rating classification results of different models according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

In one aspect, a system for rating SMAA-DS based credit ratings to match default loss rates, as shown in FIG. 1, comprises: the system comprises a user login registration module, a user data management module and a user credit rating module.

On the other hand, a credit rating adjustment method based on matching of the credit rating of the SMAA-DS and the default loss rate is implemented based on the aforementioned credit rating system based on matching of the credit rating of the SMAA-DS and the default loss rate, as shown in fig. 2, and specifically includes the following steps:

step 1.1: carrying out standardization treatment on the quantitative index value by adopting a maximum and minimum standardization method; wherein the index value is an index which is all numbers;

step 1.2: matching the scoring result of the qualitative index value with the default state according to the principle that the larger the scoring result corresponding to the qualitative index value is, the lower the default probability is, and counting the corresponding relation between the qualitative index value and the default probability by using a data perspective tool of Excel, wherein the default number corresponding to the same qualitative index value of the default probability is divided by the total number; for example, if the home address is 10000 people in a certain city, wherein 10000 people default, the value of the borrower in the qualitative index of the home address is the certain city, and the corresponding default probability is 0.1;

the default state is an index carried by each piece of data in the history sample;

wherein:

and

is the satisfied and disallowed values for the jth index, c, d are constants, c functions to translate the transformed value, and d functions to either enlarge or reduce the transformed value. In this case, c is 0.5 and d is 0.5. According to default of each classification of qualitative indexes in the initial index system, the qualitative indexes are scored, and the 6 qualitative indexes are subjected to standardization treatment processes as shown in the table1 is shown.

TABLE 1 qualitative index standardization process

In this embodiment, 43471 loan data of three years of 2012 of a platform are used for demonstration, and 16574 loan data are left as a sample after data cleaning.

Step 2: constructing a binary Logistic regression model between a single qualitative index and a default state, screening the single index with default discrimination capacity by using Wald statistic, removing the index with Wald statistic smaller than 3.841, reserving the index with Wald statistic larger than 3.841, and then dividing the reserved single index into a discrete index and a continuous index according to the index property;

the discrete index includes age, academic history, marital status, house ownership status, loan record existence or not, whether unreturned bank borrowing exists or not, and the like. The continuous indexes include monthly income, monthly payment amount accounting for monthly income proportion, monthly daily expenditure and the like.

Logistic regression is a common classification model, in which dependent variables are discrete variables and binary variables are the main variables.

In the binary Logistic regression model, a set of independent and identically distributed observation data (X, Y) is assumed, wherein Y is a dependent variable, and Y ═ Y₁,y₂,…,y_n)，y_nThe default state of the nth borrower is shown, n represents a total of n observation data, namely the borrower, X is an independent variable of a J-dimensional vector, and X is (X ═ X)_i1,x_i2,…,x_ij…,x_iJ)，x_ijThe value of the jth index of the ith borrower is represented, J represents a total of J dimensions (indexes), y_iE {0,1 }. Setting y_i1 indicates a default by the borrower, y_i0 means that the borrower has no default, P (y)_i＝1|x_i)＝p_iIs represented by x_iKnown condition y_iProbability of 1, x_iIndicating the ith borrower, p_iAs shown in formula (2)Where β is the coefficient vector.

Because the Wald statistic test is carried out on a single index, the method obeys the chi with the degree of freedom of 1²Is distributed by x²Distribution table shows that χ at 0.05 is satisfied at this time²Critical value is χ_0.05 ²(1) 3.841, i.e. when the Wald statistic of index i is greater than 3.841, the index is considered to have significant default discrimination ability.

And step 3: performing multiple collinearity inspection on discrete indexes and continuous indexes by using a Lasso-Logistic model, constructing a credit score index system with optimal overall default discrimination capability under two index types, setting weight constraints by respectively using regression coefficients of Lasso-Logistic as the basis of the index weight, determining an index weight reasonable interval, solving the optimal weights under the two index types by using a random multi-attribute acceptability analysis method SMAA, and calculating the credit score of a borrower under the two index types;

the reasonable interval is that the influence of the index regression coefficient on the data change is considered, so that the weight of part of indexes is too large or too small, the accuracy of credit scoring is interfered, and the weight is set for constraining

n is the index number to reduce the influence of the objective data change on the index weight. When the index weight is less than

Then, the weight of the index is adjusted to

When the index weight is greater than

Then, the weight of the index is adjusted to

The credit score is calculated as follows.

The Lasso-Logistic model is a technology for realizing variable selection by adding coefficient punishment to perform coefficient shrinkage on the basis of a least square method. In the Lasso-Logistic model, a set of independent and identically distributed observed data (X, Y) is assumed, wherein Y is a dependent variable, and Y ═ Y₁,y₂,…,y_n) X is an argument of a p-dimensional vector, and X is (X)_i1,x_i2,…,x_ip) And n groups of data are provided, and each group of data comprises p independent variables and 1 dependent variable. When the dependent variable is a binary classification variable, the dependent variable is a repayment state of the borrower showing default or non-default conditions, and fitting is carried out by using a Lasso-Logistic regression model;

Wherein, lambda is a parameter for adjusting the compression coefficient,

the degree of fit of the model is represented,

As shown in formula (6):

credit score mean value of ith borrower in K times of simulation

As shown in formula (7):

Step 4.1: establishing an objective function f;

divide the borrower into L credit classes, LGD_lThe Default Loss rate Loss of the ith credit rating Loss Given by Given defaults is Given by the objective function shown in equation (8).

obj:max f＝(LGD_l-LGD_l-1)²+(LGD_l-1-LGD_l-2)²+...+(LGD₂-LGD₁)² (8)

The scheme with the maximum default loss value is selected from all the credit grade division schemes by calculating the sum of the square sums of default loss rate difference values between every two adjacent credit grades, so that the maximum default loss division degree of each credit grade of the screened credit grade division result is ensured, borrowers with different credit conditions are well distinguished, reasonable and effective credit grade division is realized, and then threshold points among all grades are determined.

Step 4.2: establishment of constraint 1 and constraint 2

s.t.:0<LGD₁<LGD₂<...<LGD_l≤1 (9)

constraint 2: and the default loss rate difference value of the adjacent grade is constrained, and the later default loss rate difference value is a-b times of the former difference value. In order to avoid the situation that the change of the default loss rate is too sensitive, and because the objective function is maximum, the blind pursuit of the default loss rate difference value between grades easily causes unreasonable division results, and through setting a reasonable interval range, the degree of distinction of the division results is guaranteed, and the rationality of the division results is also guaranteed. In this embodiment, the adjacent-level default loss rate difference interval range is set to [1,1.2 ].

ΔLGD_l,l+1＝LGD_l+1-LGD_l (10)

ΔLGD_l,l+1＝[a,b]×ΔLGD_l-1,l (11)

wherein, g_lThe number of borrower samples representing the first credit level; g_lTotal number of borrower samples representing first 1 credit levels, G_l＝g₁₊g₂+…+g_l-1；L_liRepresenting a Loss of default (Loss) amount for the ith borrower in the ith credit rating; r_liIndicating the receivable sum (Receivables) of the ith borrower in the ith credit rating.

The model may output a credit score threshold point, default loss amount, receivables interest, number of people, and default loss rate for each level.

for example, if the number of times the 1 st borrower is classified into the first, second, third, fourth, fifth, sixth, and seventh levels is 0, 817, 183, 0, and 0, respectively, and the ratio obtained by dividing the number of times classified into each level by 1000 is 0, 0.817, 0.183, 0, and 0, respectively, the probabilities that the borrower belongs to the first, second, third, fourth, fifth, sixth, and seventh levels are 0, 0.817, 0.183, 0, and 0, respectively, by simulating 1000 credit scores. The probabilities of the borrower belonging to each level under the discrete index and the continuous index are respectively put into the 2 nd column to the 15 th column in the table 2, and the 1 st column in the table 2 is the serial number of the borrower.

TABLE 2 probability of borrower belonging to different credit classes under two index types

w corresponding to the maximum default loss discrimination f of 0.0086₁＝0.54，w₂Example 0.46 illustrates the calculation process of the ER rule in table 2, 1 st borrower credit rating in the SMAA-DS model, in which H_lThe values are respectively A, B, C, D, E, F and G grades, and H is a set of A, B, C, D, E, F and G grades.

Step D1: a mass function m1, m2 is constructed, and BPA values of a discrete index type (abbreviated as 1) and a continuous index type (abbreviated as 2) are obtained

The first row of Table 2 is numbered with w₁＝0.54，w₂Substituting 0.46 into equation 13 yields:

m₁({A})＝w₁*β₁({A})＝0.54*0＝0

m₁({B})＝w₁*β₁({B})＝0.54*0＝0

m₁({C})＝w₁*β₁({C})＝0.54*0.8170＝0.4412

in the same way, m₁({D})＝0.0988，m₁({E})＝0，m₁({F})＝0，m₁({G})＝0。

m₂({A})＝w₂*β₂({A})＝0.46*0＝0

m₂({B})＝w₂*β₂({B})＝0.46*0＝0

m₂({C})＝w₂*β₂({C})＝0.46*0＝0

In the same way, m₂({D})＝0，m₂({E})＝0.0069，m₂({F})＝0.4531，m₂({G})＝0。

Will w₁＝0.54，w₂Substituting 0.46 into equation 14 yields:

will w₁＝0.54，w₂Substituting the figure in 0.46 and the first row of table 2 into equation 15 yields:

will be provided with

And

substituting equation 16 yields:

step D2: synthesizing by using an evidence theory ER rule, and calculating BPA values of the synthesized evidence of a discrete index type (abbreviated as 1) and a continuous index type (abbreviated as 2):

substituting formula 17 to calculate the collision coefficient K

(1) Calculating the BPA value after the credit grades A, B, C, D, E, F and G are combined

By substituting into equation 18, the result is obtained

By substituting into equation 19, it is possible to obtain

By substituting equation 20

(2) Calculating BPA value after each credit grade combination

The combined BPA value for credit rating A can be obtained by substituting equation 20

The combined BPA value for credit rating B can be obtained by substituting equation 20

In the same way, m₁₂({C})＝0.3170，m₁₂({D})＝0.0710，m₁₂({E})＝0.0042，m₁₂({F})＝0.2773，m₁₂({G})＝0。

Step D3: the BPA value of the combined evidence is converted into confidence, namely the probability of the borrower belonging to each credit level in the SMAA-DS model

By substituting into equation 21, the result is obtained

The probability of the borrower belonging to the credit rating A in the SMAA-DS model can be obtained by substituting the formula 22

Probability of borrower belonging to credit class B in SMAA-DS model

Probability of borrower belonging to credit class C in SMAA-DS model

Probability of borrower belonging to credit level D in SMAA-DS model

For the same reason of beta₁₂({E})＝0.0063，β₁₂({F})＝0.4142，β₁₂({G})＝0。

And putting the probability of the borrower in the SMAA-DS model into the 2 nd row in the table 3, and selecting the grade with the highest probability as the credit grade of the borrower in the SMAA-DS model. The probability that the 1 st borrower belongs to level C is the greatest, and the credit rating of the borrower in the SMAA-DS model is C. Similarly, the probability distribution of the borrowers from lines 2-16574 belonging to the credit classes in the SMAA-DS model can be calculated and placed in lines 3-16575 of Table 3. When the ratio w in the formula (13)₁Traversal procedure from 0 to 1 (w)₂＝1-w₁) In, w₁＝0.54，w₂The maximum default loss distinguishing degree f is 0.0086 when the default loss distinguishing degree f is 0.46, and borrowers with different credit levels can be distinguished to the maximum degree. Thus w₁＝0.54，w₂0.46 is the optimal combination ratio of the two index types in the SMAA-DS model.

TABLE 3 probability distribution of borrowers belonging to credit classes in SMAA-DS

And counting the data in the last column of the table 3 to obtain the number of people in each grade in the SMAA-DS model, substituting the data in the last column of the table 3 into a formula (12) to calculate the loss rate of default and the number of people in each grade of the SMAA-DS model, and determining the credit score threshold point of each grade according to the number of people in each grade.

The credit rating classification result pair in this embodiment is shown in Table 4, and as can be seen from FIG. 3, w is in the SMAA-DS model₁＝0.54，w₂The credit rating using the ER method of evidence theory is optimal at 0.46.

TABLE 4 comparison of Credit rating results

FIG. 1 compares the results of 4 model credit rankings

Meanwhile, the SMAA-DS model has interpretability and can guide loan practices of financial institutions such as actual banks and the like. The SMAA-DS model has the advantages of 6 aspects:

1) the default loss discrimination f of the SMAA-DS model is the largest, and borrowers with different default possibilities can be furthest distinguished. The essence of credit rating classification is that the borrowers with different default possibilities can be differentiated to the greatest extent, and the larger the difference f of default loss rates between adjacent levels, the larger the difference of credit levels of the borrowers with the adjacent levels, the smaller the difference of credit levels of the borrowers with the same credit level. The default loss discrimination f of the SMAA-DS model is 0.0086, the default loss discrimination f corresponding to the binding club 43471 samples and 16574 samples, the indistinguishable index type, the discrete index and the continuous index is 0.0028, 0.0022, 0.0065, 0.0067 and 0.0032 respectively. The default loss discrimination f of the SMAA-DS model is 132.31% of the non-discrimination index type model f. Obviously, the SMAA-DS model can best distinguish borrowers with different defaults.

2) The SMAA-DS model has the lowest default loss rate of high credit rating and can retain high-quality clients with high credit rating to the maximum extent. In a more developed market of the financial industry in the united states, borrowers with higher credit ratings often have a plurality of loan sources, so that default loss rate of customers with high credit ratings is lowest, corresponding loan interest rate is also lowest, and high-quality customers with high credit ratings can be better reserved. The default loss rate of the A grade of the SMAA-DS model is 0.68 percent, and the default loss rates of the A grades corresponding to the leaving club 43471 samples and 16574 samples, the indistinguishable index types, the discrete index and the continuous index are 3.41 percent, 2.91 percent, 1.25 percent, 0.75 percent and 0.62 percent respectively. The default loss rate of the A grade of the SMAA-DS model is reduced by 45.60 percent relative to the default loss rate of the A grade of the index type which is not distinguished. Obviously, the default loss rate of the high level in the SMAA-DS model is the minimum, and the high-quality client can be reserved most.

3) The SMAA-DS model has the highest default loss rate of low level, and can reduce the default loss expected by the platform to the maximum extent. For borrowers with lower credit ratings, the primary purpose of the P2P platform is to reduce default losses, so the lowest rated borrower will be identified and given higher pricing for the loan, making up for the default losses of the P2P platform to the greatest extent possible. The loss rate of the SMAA-DS model in the G grade is 21.69%, and the loss rates of the G grade default corresponding to the drawing club 43471 samples and 16574 samples, the nondifferential index type, the discrete index and the continuous index are 8.87%, 7.27%, 20.19%, 20.53% and 13.93%, respectively. Compared with the default loss rate of the G grade without distinguishing the index type, the default loss rate of the G grade of the SMAA-DS model is improved by 7.43 percent. Obviously, the default loss rate of the lower level in the SMAA-DS model is the minimum, and the default loss can be reduced to the minimum.

4) And the minimum interval of default loss rate between adjacent levels of the SMAA-DS model is ranked second from large to small, and is only second to the discrete index. The minimum gap loss rate between adjacent levels of the SMAA-DS model is 2.41%, and the minimum gap loss rate between adjacent levels corresponding to the binding club 43471 samples and 16574 samples, the indistinguishable index types, the discrete index and the continuous index is 0.81%, 0.27%, 1.97%, 2.44% and 1.50%, respectively. Compared with the minimum interval of the default loss rate without distinguishing the index types, the SMAA-DS model is improved by 22.34 percent. Obviously, the SMAA-DS model ensures that the credit level difference of the borrowers with the same credit level is small, and the credit level difference of the borrowers among the levels is large.

5) The default loss rate difference between the A grade and the G grade in the SMAA-DS model is the largest, and the credit levels of all borrowers can be differentiated to the largest extent. The default loss rate difference between the A grade and the G grade of the SMAA-DS model is 21.01 percent, and the default loss rate minimum intervals between adjacent grades corresponding to the binding club 43471 samples and 16574 samples, the indistinguishable index types, the discrete index and the continuous index are 5.46 percent, 4.36 percent, 18.94 percent, 19.78 percent and 13.31 percent respectively. Compared with the gap of default loss rate of the index type A and the index type G which are not distinguished, the SMAA-DS model is improved by 22.34 percent. Obviously, the gap between the default loss rates of the A grade and the G grade in the SMAA-DS model is the largest.

6) The SMAA-DS model is subject to a credit rating division criterion that the higher the credit rating, the lower the loss rate of default. As can be seen from table 11, the credit rating classification results of the SMAA-DS model and the nondifferential index types, the discrete index, and the continuous index comply with the credit rating classification standard that the higher the credit rating, the lower the default loss rate; however, the binding club 43471 samples and 16574 samples partition results that do not comply with the above criteria. Therefore, the SMAA-DS model has better credit rating result.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims

1. A system for rating SMAA-DS based credit ratings to match default loss rates, comprising: the system comprises a user login registration module, a user data management module and a user credit rating module;

2. A credit rating adjusting method based on matching of credit rating and default loss rate of SMAA-DS is realized based on the rating system based on matching of credit rating and default loss rate of SMAA-DS, and is characterized by comprising the following steps:

and 4, step 4: establishing a credit grade division optimization model, wherein the optimization model consists of 1 objective function and 2 constraint conditions, the maximum square sum of default loss rate differences between adjacent credit grades is taken as the objective function, the higher the credit grade is, the lower the default loss rate is taken as the constraint condition 1, the default loss rate difference values of the adjacent grades are taken as the constraint condition 2, and the range from a to b times that the latter default loss rate difference value is the former difference value is taken as the constraint condition 2, so as to determine the credit grade of the borrower under the two index types;

step 6: combining the two index types in different proportions, integrating the credit levels of the borrowers under the two index types by using an evidence theory ER combination rule, calculating and comparing the size of default loss discrimination f under different combination proportions, namely an objective function f, obtaining the optimal combination proportion of a discrete index and a continuous index by taking the maximum discrimination f as the basis, taking the optimal combination proportion as the final combination proportion of the two index types in an SMAA-DS model, and inputting the proportion into an evidence theory ER combination rule to determine the credit level information of the borrowers;

and 7: the user logs in the user login module, then fills in and modifies personal information in the user data management module, data input by the borrower in the user data management module is input into the user credit rating module, and the user clicks the user credit rating module to display the credit rating.

3. The SMAA-DS based credit rating matching default loss rate adjustment method as claimed in claim 2, wherein the step 1 comprises:

wherein:

and

are the satisfied and disallowed values for the jth index, and c, d are constants.

4. The SMAA-DS-based credit rating adjustment method for matching credit rating with default loss rate as claimed in claim 2, wherein in the binary Logistic regression model in step 2, a set of independent and identically distributed observation data (X, Y) is assumed, wherein Y is a dependent variable, and Y is (Y-Y)₁,y₂,…,y_n)，y_nThe default state of the nth borrower is shown, n represents a total of n observation data, namely the borrower, X is an independent variable of a J-dimensional vector, and X is (X ═ X)_i1,x_i2,…,x_ij…,x_iJ)，x_ijJ-th representing the ith borrowerThe value of each index, J represents a total of J dimensions (indexes), y_iE {0,1}, and y is set_i1 indicates a default by the borrower, y_i0 means that the borrower has no default, P (y)_i＝1|x_i)＝p_iIs represented by x_iKnown condition y_iProbability of 1, x_iIndicating the ith borrower, p_iAs shown in formula (2), where β is a coefficient vector;

the Logistic model can be obtained by transforming the pair of the formula (2), as shown in the formula (3):

is an estimate of the coefficient of the j-th index, SE_βjIs a coefficient of beta_jSo the Wald statistic W of the j-th index_iAs shown in formula (4):

5. the SMAA-DS-based credit rating adjustment method for matching the default loss rate according to claim 2, wherein in the Lasso-Logistic model in the step 3, a set of independent and identically distributed observation data (X, Y) is assumed, wherein Y is a dependent variable, and Y is (Y-Y)₁,y₂,…,y_n) X is an argument of a p-dimensional vector, and X is (X)_i1,x_i2,…,x_ip) N groups of data are provided, and each group of data comprises p independent variables and 1 dependent variable; when the dependent variable is a binary classification variable, the dependent variable is a repayment state of the borrower showing default or non-default conditions, and fitting is carried out by using a Lasso-Logistic regression model;

the parameter estimation of the Lasso-Logistic regression on the coefficient beta of n independent variables X is specifically shown in the formula (5):

wherein, lambda is a parameter for adjusting the compression coefficient,

the degree of fit of the model is represented,

is punishment to variable coefficient in the model, because each observation data is independently and identically distributed, the joint distribution of the observation data can be represented by the product of each marginal distribution, namely the likelihood function of n observation values (borrowers) is represented by L (theta);

calculating credit score of the borrower in a linear addition mode by using the index score and the index weight in the feature combination; suppose w_jK represents the weight of the kth simulation of the jth index (J ═ 1,2,3, …, J), (K ═ 1,2,3, …, K,), u_ijIndicating the normalized value of the ith borrower under the index j, the credit score of the ith borrower in the k simulation

As shown in formula (6):

credit score mean value of ith borrower in K times of simulation

As shown in formula (7):

6. the SMAA-DS-based credit rating adjustment method for matching credit rating with default loss rate as claimed in claim 2, wherein the step 4 comprises the following steps:

step 4.1: establishing an objective function f;

divide the borrower into L credit classes, LGD_lFor the default loss rate of the ith credit rating, the objective function is shown as equation (8):

obj:maxf＝(LGD_l-LGD_l-1)²+(LGD_l-1-LGD_l-2)²+...+(LGD₂-LGD₁)² (8)

step 4.2: establishing a constraint condition 1 and a constraint condition 2;

s.t.:0<LGD₁<LGD₂<...<LGD_l≤1 (9)

constraint 2: the default loss rate difference value of the adjacent levels is constrained, and the later default loss rate difference value is a-b times of the former difference value;

let Δ LGD_l,l+1For the ith credit level LGD_lAnd the l +1 st credit level LGD_l+1The default loss rate difference, i.e. the difference between adjacent levels, is expressed as shown in equation (10):

ΔLGD_l,l+1＝LGD_l+1-LGD_l (10)

then, the default loss rate difference between adjacent levels is set as shown in equation (11):

ΔLGD_l,l+1＝[a,b]×ΔLGD_l-1,l (11)

7. The SMAA-DS based credit rating matching default loss rate adjustment method as claimed in claim 2, wherein the evidence theory ER in step 6 is combined as follows:

Let Θ be the recognition framework, 2 evidences m₁、m₂：2Θ→[0,1]；

Wherein w_oIs a relative weight of evidence o, and

o is the number of evidences, β_o(H_n) And beta_o(H) Respectively represent the focal unit

And

the degree of confidence of (a) is,

is the BPA value for evidence o, L represents the L-th level, for a total of L levels;

step D2: synthesizing the evidence by using an evidence theory ER combination rule, and solving BPA values of the synthesized evidence:

wherein m is₁₂(H_l) A BPA value representing the first level to which the borrower is combined; k is evidence conflict coefficient, and the larger K is shownThe greater the conflict between the evidences; b and C represent that the borrower belongs to a certain grade under the discrete index type and the continuous index type respectively;

is a process parameter;

respectively representing the belief ratios to be distributed in discrete type, continuous type, discrete type and continuous type indexes; m is₁(B),m₂(C) BPA before combination for B and C, respectively;

step D3: and converting the BPA value of each synthesized evidence into a confidence Bel, namely that the borrower belongs to a credit level H in the SMAA-DS model_lProbability of (c):