CN113421154A - Credit risk assessment method and system based on control chart - Google Patents
Credit risk assessment method and system based on control chart Download PDFInfo
- Publication number
- CN113421154A CN113421154A CN202110584049.6A CN202110584049A CN113421154A CN 113421154 A CN113421154 A CN 113421154A CN 202110584049 A CN202110584049 A CN 202110584049A CN 113421154 A CN113421154 A CN 113421154A
- Authority
- CN
- China
- Prior art keywords
- data
- transaction flow
- risk assessment
- credit
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Technology Law (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Development Economics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention provides a credit risk assessment method and system based on a control chart, which comprises the steps of collecting transaction flow data, credit auditing data and overdue days data, and preprocessing the data to obtain conventional characteristics and default characteristics; aggregating transaction flow data to obtain an initial transaction flow index; converting the initial transaction flow index into a warning signal; processing the warning signal into a signal characteristic; integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples; aiming at the wind control evaluation samples, establishing a machine learning model, and selecting the optimal wind control model according to the machine learning model results of different wind control evaluation samples; and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result. The method improves the accuracy of credit risk assessment, is applied to credit risk assessment under different scenes, and is beneficial to improving the application range of credit risk assessment.
Description
Technical Field
The invention relates to the technical field of credit risk assessment, in particular to a credit risk assessment method and system based on a control chart.
Background
In the credit approval process, a plurality of current risk assessment methods focus on realizing intelligent risk rating, but lack mining on data sources of different channels.
Chinese patent publication No. CN110415111A discloses a method for merging logistic regression credit approval based on user data and expert features, which includes inputting data for cleaning, data reduction and preprocessing, classifying data, performing feature engineering on data and extracting features, introducing expert features, predicting features, and outputting an approval list. In the patent, the credit approval method combines expert features in a traditional financial model with a classical machine learning method, predicts the possible future default possibility of dynamic change by combining with market real-time updated data and feature engineering, adopts a prediction model and an optimized logistic regression algorithm, meets complex credit constraints, obtains more accurate default probability prediction and risk premium results, frees auditors from heavy credit risk assessment audit and pricing, quickly realizes large-scale small and micro enterprise credit approval, and ensures that intelligent rating and risk avoidance are possible. The method analyzes user data and expert characteristics, integrates two types of common data sources, can realize quick approval, and still has a space for improving the accuracy of default probability prediction.
The Chinese patent with publication number CN107093101A discloses a potential loan user mining and risk scoring method based on POS transaction flow data, which comprises the following steps: acquiring POS transaction flow data; the method includes the steps that potential loan user mining is conducted on the obtained POS transaction flow data in the aspects of expansion of operation and fund turnover; and determining a statistical index for POS transaction running water risk scoring, and performing POS transaction running water risk scoring by adopting a set scoring model according to the determined statistical index and the obtained POS transaction running water data. The method disclosed by the invention has the advantages that potential loan users are mined by combining POS transaction flow data from the aspects of expanded operation and capital turnover, the potential loan users can be rapidly and accurately mined, the POS transaction flow data well reflects the demands of merchants on funds and loans, and the conversion success rate is higher; a new POS transaction running risk scoring method is provided based on POS transaction running data, and the method is more effective. The method can be widely applied to the field of data mining. The method provides risk assessment through the statistical indexes of POS transaction flow, although analysis of POS transaction flow data is achieved, the used data mining method is too simple and lacks expansibility, and the method disclosed by the technology and the given numerical value result are too specific and are not suitable for credit auditing processes widely applied to different scenes.
For the prior art in the above, the inventor considers that the accuracy of the default probability prediction is poor, and the application range of the credit auditing process is small, so that the credit risk assessment effect is poor.
Disclosure of Invention
In view of the shortcomings in the prior art, it is an object of the present invention to provide a credit risk assessment method and system based on control charts.
The credit risk assessment method based on the control chart comprises the following steps:
step 1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
step 2: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
and step 3: converting the normalized initial transaction flow indicators into warning signals;
and 4, step 4: processing the warning signal into a signal signature;
and 5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
step 6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
and 7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
Preferably, the pretreatment in step 1 comprises the following steps:
preprocessing transaction flow data: rejecting transactions within a predetermined transaction amount range for a transaction pipeline;
credit audit data preprocessing step: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
pre-processing the data of overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
Preferably, the step 2 comprises the following steps:
step 2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
step 2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
Preferably, the step 3 comprises the following steps:
step 3.1: calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the index in each group aiming at the normalized initial transaction flow indexes;
step 3.2: and (4) formulating a warning signal of a preset time period in the transaction running water monitoring period according to the mean value, the upper limit and the lower limit of the multi-class control chart obtained in the step 3.1.
Preferably, the step 4 comprises the following steps:
step 4.1: respectively counting the sum of each warning signal in each control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
step 4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
The invention provides a credit risk assessment system based on a control chart, which comprises the following modules:
module M1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
module M2: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
module M3: converting the normalized initial transaction flow indicators into warning signals;
module M4: processing the warning signal into a signal signature;
module M5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
module M6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
module M7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
Preferably, the preprocessing in the module M1 includes the following modules:
the transaction flow data preprocessing module: rejecting transactions within a predetermined transaction amount range for a transaction pipeline;
credit audit data preprocessing module: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
the data preprocessing module of the overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
Preferably, the module M2 includes the following modules:
module M2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
module M2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
Preferably, the module M3 includes the following modules:
module M3.1: calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the index in each group aiming at the normalized initial transaction flow indexes;
module M3.2: and formulating a warning signal of a preset time period in the transaction flow monitoring period according to the mean value, the upper limit and the lower limit of the various control charts obtained by the module M3.1.
Preferably, the module M4 includes the following modules:
module M4.1: respectively counting the sum of each warning signal in each control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
module M4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
Compared with the prior art, the invention has the following beneficial effects:
1. analyzing the transaction flow data of the customer through the technical principle of three types of control charts, capturing and analyzing abnormal transaction flow, forming a warning signal, and further converting the warning signal into a credit index for measuring risks. Fills the blank of risk assessment by utilizing trade running water in the industry;
2. the signal characteristics extracted from the transaction flow data source are added to serve as input indexes of the credit wind control model. The result shows that the technology improves the accuracy of credit risk assessment, thereby improving the effect of credit risk assessment;
3. the credit risk assessment method can be applied to credit risk assessment under different scenes, for example, the credit risk assessment method can be applied to pre-loan review help decision credit approval and credit management help customer management. The method is suitable for the small and micro enterprises at the B end to carry out credit assessment by utilizing the operation and transaction flow of the small and micro enterprises, and is also suitable for the consumers at the C end to carry out credit assessment by utilizing the personal transaction flow. The applicability is strong, and the application range of credit risk assessment is favorably widened;
4. in the process of constructing the characteristics, the transaction flow data can be collected by obtaining the authorization of the client for the financial institution depending on the own transaction flow information of the client. The data source is easy to obtain and the method is easy to realize;
5. the dynamic evaluation is carried out on the transaction flow data, the real-time performance is strong, the financial institution can master the most real risk condition of the client, and the corresponding management means is rapidly adopted.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a chart I of the I-MR control charts;
FIG. 2 is an MR control map of the I-MR control maps;
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment of the invention discloses a credit risk assessment method and a credit risk assessment system based on a control chart, which comprises the following steps:
step 1: and collecting transaction flow data of the loan-released customer and other available credit auditing data from a credit platform database, wherein the time window of the transaction flow data is 30 days before applying for loan. For example, a client applies for a loan at 1/4/2021, and the transaction flow data collected by the financial institution is the transaction flow generated by the client at 31/3/2/2021 to 3/2021. Meanwhile, the data of the overdue days of the loan clients which have been issued are collected, and various data are preprocessed.
The pretreatment in step 1 comprises the following steps: preprocessing transaction flow data: and (3) rejecting the transactions within a preset transaction amount range aiming at each transaction flow, wherein the preset transaction amount range is smaller than 0.1 yuan. Credit audit data preprocessing step: and rejecting all data records of the loan-released customer within a predetermined date range for the life cycle of the customer credit, wherein the predetermined date range is 30 days without exceeding the first repayment date. Pre-processing the data of overdue days: according to the overdue degree of the client, all data records of the client within a preset overdue day range are removed, wherein the preset overdue day range is that the overdue day is more than 0 and less than 30 days; default characteristics are formed according to the number of overdue days, the default characteristic value of a client with the number of overdue days being 0 is 0, and the default characteristic value of a client with the number of overdue days being more than 0 is 1.
Step 2: and aggregating and standardizing the preprocessed transaction flow data to obtain the standardized initial transaction flow index.
The step 2 comprises the following steps: step 2.1: and analyzing the transaction flow data from the aspects of transaction time, transaction amount, transaction type and transaction card type. Wherein the transaction types include consumption, pre-authorization and refund; transaction card types include credit cards, debit cards, quasi-credit cards. Aggregating the transaction flow data into initial transaction flow characteristic indexes shown in Table 1 by taking daily statistical dimension as statistical dimension, such as transaction amount, transaction number, terminal usage number, etc., and grouping the initial transaction flow characteristic indexes according to three types of control charts, wherein the three types of control charts are I-MR control chart, and I-MR control chart, respectively,Control chart andcontrol a chart. As in the I-MR control chart, the transaction amount is group 1, the transaction number is group 2, and the terminal usage number is group 3; in thatIn the control chart, three initial transaction flow characteristic indexes, namely transaction amount, transaction number and terminal use number, form a group 1; in thatIn the control chart, all of the initial transaction flow profiles constitute set 1. And listing values set by control chart parameters according to the quantity of the initial transaction flow characteristic indexes in each group of the three types of control charts. Table 1 details the initial transaction flow characteristic indicators and their description, the belonging group numbers of each indicator in the three types of control charts and their parameter settings.
TABLE 1 initial transaction pipeline characteristic index
Step 2.2: to a singleThe initial transaction running characteristic index is normalized with the mean value of 0 and the variance of 1 to obtain the normalized initial transaction running characteristic index, for example, in the scene that the transaction running monitoring time window length is T, the ith single initial transaction running characteristic index XiIncluding the value of the initial trade flow characteristic index every day, using Xi={x1i,x2i,...xti,...xTiMeans, for example, the value of the ith single initial transaction running water characteristic index on the t day is xtiThe value on the last day is xTiAnd the normalized initial transaction flow index is marked as Xi′={x′1i,x′2i,...x′ti,...x′TiAnd the value calculation logic of the ith single initial transaction running water characteristic index after standardization on the t day isWhereinRepresenting the ith single initial transaction running characteristic index XiThe mean value over the monitoring time T,representing the ith single initial transaction running characteristic index XiThe variance over the monitoring time T, and T ═ 30, considered from the business scenario, also represents the current time, i.e., the point in time at which the customer submitted the loan application.
And step 3: the normalized initial transaction flow indicator is converted into a warning signal through three types of control diagram principles.
The step 3 comprises the following steps: step 3.1: and calculating the mean value, the upper limit and the lower limit of each type of control chart in each group of standardized initial transaction flow indexes according to the grouping condition of the initial transaction flow characteristic indexes of the three types of control charts in the table 1. Wherein, since each type of control chart generates two groups of control charts, namely the I-MR control chart comprises the I control chart and the MR control chart, control chart containingThe control map and the R control map,control chart containingControl map and s control map, with CLx,UCLxAnd LCLxMean, upper and lower limits of the first set of control charts are shown as CLs,UCLsAnd LCLsThe mean, upper and lower limits of the second set of control charts are shown, with T as the transaction running water monitoring time window length (T30). The calculation method of the mean, upper limit and lower limit in each type of control map is described in detail below.
As shown in fig. 1 and 2, for the I-MR control chart, each packet contains a normalized initial transaction flow indicator as listed in table 1. Same definition as step 2.2, in x'tiAnd (4) representing the value of the ith single normalized initial transaction running index in the group on the t day. Since the I-MR control map contains only one normalized initial trade flow indicator per group, I is constantly equal to 1, for simplicity of illustration, in x'tRepresenting the value of the initial transaction running water index after standardization in each group on the t day (namely, the subscript i is omitted), and calculating the mean valueThe difference between the t day and the previous day MRt, and further calculating the movement difference
According to the mean valueAnd movement rangeAnd I-MR control map parameter d shown in Table 12、D3And D4Calculating the mean, upper limit and lower limit of the I-MR control chart:
with the transaction running water monitoring time T as the abscissa (T is 1, 2, …, T), drawing x in the I control charttMean value CLxUpper limit UCLxAnd lower limit LCLxThe MR control chart is used to draw the MRtMean value CLsUpper limit UCLsAnd lower limit LCLs. As shown in FIG. 1, with X in Table 11The transaction amount is taken as an example, the standardized value of the transaction amount 30 days before a certain client applies for loan is depicted, and the mean value, the upper limit and the lower limit calculated according to the I control chart are given.
As shown in fig. 3 and 4, forFor control purposes, each packet contains 2 to 4 normalized initial transaction flow indicators as listed in Table 1. Same definition as step 2.2, in x'tiAnd (4) representing the value of the ith single normalized initial transaction running index in the group on the t day. Firstly, the standardized initial transaction flow indexes in each group are summarized by taking the transaction flow monitoring time t as a statistical dimension to obtain an average value in each groupAnd intraclass range RtAnd further calculating the mean value in the groupAnd intraclass range of RtAre respectively recorded asAndthe calculation logic is as follows:
Rt=max{x′ti,x′ti,...x′tN}-min{x′ti,x′ti,...x′tN},t=1,2,...,T;
wherein, N is the number of initial transaction flow indicators after standardization in the group, and the value is 2, 3 or 4, and the detailed grouping and value taking conditions can be seen in table 1. According toAndand shown in Table 1Control chart parameter A2、D3And D4Calculating Mean, upper and lower limits of control charts:
with the transaction flow monitoring time T as the abscissa (T takes the value 1, 2, …, T), atControl chart is drawnMean, upper and lower limits, and drawing R in the control chart of RtMean, upper limit, and lower limit.
As shown in fig. 5 and 6, forFor the control chart, there is only one packet that contains 23 normalized initial transaction flow indicators as listed in table 1. Same definition as step 2.2, in x'tiAnd (4) representing the value of the ith single normalized initial transaction running index in the group on the t day.Mean of control chartsAnd mean valueMean value ofAndthe mean value of the control chart is logically consistent according to xtiSum mean valueCalculating variance s of transaction running water monitoring time ttAnd further calculating the variance stIs the mean value ofThe calculation logic is as follows:
wherein, N is the number of the initial transaction flow indicators after the group standardization, which is 23. According toAndand shown in Table 1Control chart parameter A3、B3And B4CalculatingMean, upper and lower limits of control charts:
with the transaction flow monitoring time T as the abscissa (T takes the value 1, 2, …, T), atControl chart is drawnMean, upper and lower limits, and s is plotted in the s control charttMean, upper limit, and lower limit.
Step 3.2: and (3) according to the calculation results of the average value, the upper limit and the lower limit of each control chart in the step 3.1, setting a warning signal of each preset time period in the transaction running water monitoring period, wherein the preset time period is a monitoring day, the warning signal is a binary system, a value of '1' represents that the transaction running water on the monitoring day has obvious change, and a value of '0' represents that the transaction running water on the monitoring day has no abnormity. In a control chart, each monitoring day comprises three warning signals, and whether the following three conditions occur on the monitoring day is counted respectively: (1) the value of the monitoring day exceeds the upper limit and the lower limit of the control chart; (2) in nearly 8 days, the values of 8 continuous monitoring days are all positioned on the same side of the mean value; (3) in the last 6 days, the values of the continuous 6 monitoring days continuously rise or fall. If yes, the value is 1.
And 4, step 4: the warning signal is processed into a signal characteristic.
The step 4 comprises the following steps: step 4.1: counting transaction flow monitoring days of three abnormal conditions of each type of control chart during transaction flow monitoring respectively, namely counting the sum of each warning signal in each type of control chart during transaction flow monitoring respectively and converting the warning signals into signal characteristics.
Step 4.2: in order to facilitate the interpretation of the abnormal state of the last monitoring day T, a signal characteristic is introduced to identify the general abnormal situation of the transaction running water of the last monitoring day, the value logic of the abnormal situation is the union operation of six warning signals generated by the monitoring day, namely, if any control chart of the monitoring day has one of the three situations in the step 3.2, the value is '1', otherwise, the value is '0'. As can be seen from the combination of the six signal signatures in step 4.1, each type of control map produces a total of seven signal signatures. Taking the I-MR control chart as an example, the signal characteristics of the I-MR control chart comprise: (1) i controlling the days exceeding the upper limit and the lower limit of the control chart in the chart; (2) i controls the number of days in the graph for which 8 consecutive monitoring days are on the same side of the mean; (3) i controls the number of days in which 6 consecutive monitoring days continue to rise or fall; (4) days in the MR control chart that exceed the upper and lower limits of the control chart; (5) the MR control chart includes the number of days in which 8 consecutive monitoring days are on the same side of the mean; (6) the number of days in which 6 consecutive monitoring days in the MR control chart continuously rise or fall; (7) and the last step of monitoring whether the daily transaction flow is abnormal or not.
And 5: and integrating the signal characteristics, the conventional characteristics extracted from the credit auditing data and the default characteristics extracted from the overdue days data to obtain three types of wind control evaluation samples.
The step 5 comprises the following steps: step 5.1: and integrating 23 groups of signal characteristics (23 multiplied by 7 in total) generated by the I-MR control chart with conventional characteristics refined from credit audit data and default characteristics refined from overdue days data to obtain an I-MR wind control evaluation sample. Step 5.2: will be composed of8 groups of signal characteristics (8 multiplied by 7 in total) generated by the control chart are integrated with conventional characteristics refined from credit audit data and default characteristics refined from overdue days data to obtainAnd (4) evaluating a sample by wind control. Step 5.3: will be composed of1 group of signal characteristics (total 1 × 7) generated by the control chart is integrated with conventional characteristics refined from credit audit data and default characteristics refined from overdue days data to obtainAnd (4) evaluating a sample by wind control.
Step 6: for each type of wind control evaluation sample, respectively performing sample preprocessing and independent variable selection, establishing a machine learning model, evaluating the machine learning model result according to different wind control evaluation samples, and selecting the optimal wind control model, wherein the machine learning model is a logistic regression model.
The step 6 comprises the following steps: step 6.1: processing the missing values of the samples, checking the condition of the missing values in each feature, filling the missing values according to business meanings for the features which can be filled according to the business meanings (if the transaction running transaction amount of a certain day is empty, the filling can be 0), deleting the features for the features which cannot be filled according to the business meanings and have high missing proportion, judging the data type of the features with small missing proportion, if the features are type variables, grouping the missing values, and if the features are numerical type variables, taking the mean value of the features for filling.
Step 6.2: and processing the type variable, and converting the type variable in the sample into a value of 0-1 by virtue of the dummy variable.
Step 6.3: the independent variables are grouped based on principal component analysis, as follows,
and performing principal component analysis on all independent variables, selecting the most significant first principal component and second principal component, and dividing the variables into two groups A and B according to the magnitude relation of correlation coefficients of all the independent variables and the two principal components. For any argument, if its correlation coefficient with the first principal component is greater than that with the second principal component, it is classified into group A, otherwise it is classified into group B. For each group, the principal component analysis is again used to divide into two groups until one of the following conditions is met:
1) there is only one argument in the set;
2) the decision factor Ratio (R-Squared Ratio) of the independent variable x is reduced by more than half compared with the previous iteration result, and the decision factor Ratio R is reduced2The calculation logic of-r (x) is
And when the current iteration is finished, all the independent variables are divided into n groups.Represents the coefficient of determination obtained by fitting x to all independent variables of the i-th group by linear regression.Represents the coefficient of determination obtained by fitting x to the m-th set of independent variables except x by linear regression.
Step 6.4: the arguments are selected based on the information values, as follows,
calculating the information value of each independent variable, deleting the independent variables with the information value larger than 0.5, screening the independent variables of each class obtained in the step 6.3, and keeping the number of the independent variables under the condition of ensuring that at least one independent variable is in each class, wherein the number of the reserved independent variables is determined by the ratio of the information values of all the independent variables in the class to the total information value of all the classes, if the class i contains niAn argument, the information value of this class being MiAll independent variables have information value M, and the selection process of the independent variables is to select the top n with large information value in the categoryi×MiM arguments and follows the rounding-up principle, where the information value is calculated logically as,
for a single independent variable, dividing the independent variable into K groups according to the value of the independent variable, and calculating the evidence weight of the group i according to the following formula
%defaultiRepresents the percentage of default samples in the ith group in all default samples,% paidiRepresenting the proportion of the normal repayment samples in the ith group in all the normal repayment samples, and calculating the information value IV of the independent variable according to the following formula,
step 6.5: cutting the training sample and the test sample by using a random sampling method to keep the ratio of 6: 4, at the same time, the method of repeated sampling is adopted for the training samples, so that the ratio of non-default samples to default samples in the training samples is kept to be 1: 1, in the structure of (1).
Step 6.6: for each type of sample after pretreatment and independent variable selection, a machine is respectively established for training samplesThe learning model expresses the default probability p (X) ═ Pr (Y ═ 1| X) as the argument X ═ (X ═ X)1,...,Xn) As regards the dependent variable Y, i.e. a function of the violation features,
wherein, XiRepresents the ith independent variable, betai(i-0 to n) is a regression coefficient, YiRepresenting the ith dependent variable and solving for the regression coefficient using maximum likelihood estimation, i.e. maximization
And solving to obtain regression coefficients, entering the expression of the default probability p (X), checking the significance of the equation and the significance of each independent variable, and determining the final variable combination and the regression coefficients thereof as a final machine learning model.
Step 6.7: and calculating the prediction result of the test sample in the final machine learning model, comparing the confusion matrix with the actual default characteristics, selecting an evaluation index according to a business target on the basis of the confusion matrix, evaluating three machine learning models of the three types of wind control evaluation samples, and selecting the optimal wind control model.
And 7: and (3) establishing a credit platform online machine learning model (such as a credit platform online logistic regression model) according to the optimal wind control model, performing real-time risk assessment on the applicant client, outputting a risk assessment result, periodically repeating the steps 1-6, importing a new client for training, and updating the credit platform online machine learning model.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A method for credit risk assessment based on control charts, comprising the steps of:
step 1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
step 2: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
and step 3: converting the normalized initial transaction flow indicators into warning signals;
and 4, step 4: processing the warning signal into a signal signature;
and 5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
step 6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
and 7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
2. The control chart-based credit risk assessment method according to claim 1, wherein said preprocessing in step 1 comprises the steps of:
preprocessing transaction flow data: rejecting transactions within a predetermined transaction amount range for a transaction pipeline;
credit audit data preprocessing step: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
pre-processing the data of overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
3. The control chart-based credit risk assessment method according to claim 1, wherein said step 2 comprises the steps of:
step 2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
step 2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
4. The control chart-based credit risk assessment method according to claim 3, wherein said step 3 comprises the steps of:
step 3.1: aiming at the normalized initial transaction flow indexes, calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the indexes in each group;
step 3.2: and (4) formulating a warning signal of a preset time period in the transaction running water monitoring period according to the mean value, the upper limit and the lower limit of the multi-class control chart obtained in the step 3.1.
5. The control chart-based credit risk assessment method according to claim 4, wherein said step 4 comprises the steps of:
step 4.1: respectively counting the sum of each warning signal in each control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
step 4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
6. An exercise chart-based credit risk assessment system, to which the exercise chart-based credit risk assessment method according to any one of claims 1 to 5 is applied, comprising the following modules:
module M1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
module M2: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
module M3: converting the normalized initial transaction flow indicators into warning signals;
module M4: processing the warning signal into a signal signature;
module M5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
module M6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
module M7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
7. The control chart-based credit risk assessment system according to claim 6, wherein said preprocessing in module M1 includes the following modules:
the transaction flow data preprocessing module: rejecting transactions within a predetermined transaction amount range for a transaction pipeline;
credit audit data preprocessing module: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
the data preprocessing module of the overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
8. The control chart-based credit risk assessment system according to claim 6, wherein said module M2 comprises the following modules:
module M2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
module M2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
9. The control chart-based credit risk assessment system according to claim 8, wherein said module M3 comprises the following modules:
module M3.1: calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the index in each group aiming at the normalized initial transaction flow indexes;
module M3.2: and formulating a warning signal of a preset time period in the transaction flow monitoring period according to the mean value, the upper limit and the lower limit of the various control charts obtained by the module M3.1.
10. The control chart-based credit risk assessment system according to claim 9, wherein said module M4 comprises the following modules:
module M4.1: respectively counting the sum of each warning signal in each control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
module M4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110584049.6A CN113421154B (en) | 2021-05-27 | 2021-05-27 | Credit risk assessment method and system based on control chart |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110584049.6A CN113421154B (en) | 2021-05-27 | 2021-05-27 | Credit risk assessment method and system based on control chart |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113421154A true CN113421154A (en) | 2021-09-21 |
CN113421154B CN113421154B (en) | 2022-10-04 |
Family
ID=77713100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110584049.6A Active CN113421154B (en) | 2021-05-27 | 2021-05-27 | Credit risk assessment method and system based on control chart |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113421154B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113793060A (en) * | 2021-09-27 | 2021-12-14 | 武汉众邦银行股份有限公司 | Customer rating method and device based on customer transaction data and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101421756A (en) * | 2006-02-10 | 2009-04-29 | 芝加哥气候交易公司 | Present valuation of emission credit and allowance futures |
CN104346749A (en) * | 2013-08-07 | 2015-02-11 | 辅富投资(上海)有限公司 | Pledge-based network borrowing process monitoring method |
US20200005310A1 (en) * | 2018-06-29 | 2020-01-02 | Paypal, Inc. | Machine learning engine for fraud detection during cross-location online transaction processing |
CN110738564A (en) * | 2019-10-16 | 2020-01-31 | 信雅达系统工程股份有限公司 | Post-loan risk assessment method and device and storage medium |
CN111507831A (en) * | 2020-05-29 | 2020-08-07 | 长安汽车金融有限公司 | Credit risk automatic assessment method and device |
-
2021
- 2021-05-27 CN CN202110584049.6A patent/CN113421154B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101421756A (en) * | 2006-02-10 | 2009-04-29 | 芝加哥气候交易公司 | Present valuation of emission credit and allowance futures |
CN104346749A (en) * | 2013-08-07 | 2015-02-11 | 辅富投资(上海)有限公司 | Pledge-based network borrowing process monitoring method |
US20200005310A1 (en) * | 2018-06-29 | 2020-01-02 | Paypal, Inc. | Machine learning engine for fraud detection during cross-location online transaction processing |
CN110738564A (en) * | 2019-10-16 | 2020-01-31 | 信雅达系统工程股份有限公司 | Post-loan risk assessment method and device and storage medium |
CN111507831A (en) * | 2020-05-29 | 2020-08-07 | 长安汽车金融有限公司 | Credit risk automatic assessment method and device |
Non-Patent Citations (1)
Title |
---|
栗秋佳: "基于质量管理工具的交行包头分行信贷风险管理研究", 《中国优秀博硕士学位论文全文数据库(硕士)经济与管理科学辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113793060A (en) * | 2021-09-27 | 2021-12-14 | 武汉众邦银行股份有限公司 | Customer rating method and device based on customer transaction data and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113421154B (en) | 2022-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11599939B2 (en) | System, method and computer program for underwriting and processing of loans using machine learning | |
KR102009309B1 (en) | Management automation system for financial products and management automation method using the same | |
TWI257556B (en) | Rapid valuation of portfolios of assets such as financial instruments | |
EP1361526A1 (en) | Electronic data processing system and method of using an electronic processing system for automatically determining a risk indicator value | |
CN110895758B (en) | Screening method, device and system for credit card account with cheating transaction | |
CN112598500A (en) | Credit processing method and system for non-limit client | |
WO2007106786A2 (en) | Methods and systems for multi-credit reporting agency data modeling | |
KR20010102452A (en) | Methods and systems for finding value and reducing risk | |
CN114048436A (en) | Construction method and construction device for forecasting enterprise financial data model | |
CN107392217B (en) | Computer-implemented information processing method and device | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
US20140278774A1 (en) | In the market model systems and methods | |
US20150269668A1 (en) | Voting mechanism and multi-model feature selection to aid for loan risk prediction | |
CN112508689A (en) | Method for realizing decision evaluation based on multiple dimensions | |
CN107133862A (en) | Dynamic produces the method and system of the detailed transaction payment experience of enhancing credit evaluation | |
CN113421154B (en) | Credit risk assessment method and system based on control chart | |
CN117114812A (en) | Financial product recommendation method and device for enterprises | |
CN117252677A (en) | Credit line determination method and device, electronic equipment and storage medium | |
Niknya et al. | Financial distress prediction of Tehran Stock Exchange companies using support vector machine | |
JP7344609B2 (en) | Data quantification method based on confirmed and estimated values | |
CN113822751A (en) | Online loan risk prediction method | |
Moe et al. | A Hybrid Approach of Logistic Regression with Grid Search Optimization in Credit Scoring Modeling for Financial Institutions | |
CN118014719B (en) | Intelligent enterprise credit analysis method and system based on linear regression model | |
CN113610638B (en) | Rating system and method for matching credit rating with default loss rate based on SMAA-DS | |
CN117764692A (en) | Method for predicting credit risk default probability |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |