CN113421154A - Credit risk assessment method and system based on control chart - Google Patents

Credit risk assessment method and system based on control chart Download PDF

Info

Publication number
CN113421154A
CN113421154A CN202110584049.6A CN202110584049A CN113421154A CN 113421154 A CN113421154 A CN 113421154A CN 202110584049 A CN202110584049 A CN 202110584049A CN 113421154 A CN113421154 A CN 113421154A
Authority
CN
China
Prior art keywords
data
transaction flow
risk assessment
credit
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110584049.6A
Other languages
Chinese (zh)
Other versions
CN113421154B (en
Inventor
陈宏�
叶恒青
张思宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110584049.6A priority Critical patent/CN113421154B/en
Publication of CN113421154A publication Critical patent/CN113421154A/en
Application granted granted Critical
Publication of CN113421154B publication Critical patent/CN113421154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a credit risk assessment method and system based on a control chart, which comprises the steps of collecting transaction flow data, credit auditing data and overdue days data, and preprocessing the data to obtain conventional characteristics and default characteristics; aggregating transaction flow data to obtain an initial transaction flow index; converting the initial transaction flow index into a warning signal; processing the warning signal into a signal characteristic; integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples; aiming at the wind control evaluation samples, establishing a machine learning model, and selecting the optimal wind control model according to the machine learning model results of different wind control evaluation samples; and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result. The method improves the accuracy of credit risk assessment, is applied to credit risk assessment under different scenes, and is beneficial to improving the application range of credit risk assessment.

Description

Credit risk assessment method and system based on control chart
Technical Field
The invention relates to the technical field of credit risk assessment, in particular to a credit risk assessment method and system based on a control chart.
Background
In the credit approval process, a plurality of current risk assessment methods focus on realizing intelligent risk rating, but lack mining on data sources of different channels.
Chinese patent publication No. CN110415111A discloses a method for merging logistic regression credit approval based on user data and expert features, which includes inputting data for cleaning, data reduction and preprocessing, classifying data, performing feature engineering on data and extracting features, introducing expert features, predicting features, and outputting an approval list. In the patent, the credit approval method combines expert features in a traditional financial model with a classical machine learning method, predicts the possible future default possibility of dynamic change by combining with market real-time updated data and feature engineering, adopts a prediction model and an optimized logistic regression algorithm, meets complex credit constraints, obtains more accurate default probability prediction and risk premium results, frees auditors from heavy credit risk assessment audit and pricing, quickly realizes large-scale small and micro enterprise credit approval, and ensures that intelligent rating and risk avoidance are possible. The method analyzes user data and expert characteristics, integrates two types of common data sources, can realize quick approval, and still has a space for improving the accuracy of default probability prediction.
The Chinese patent with publication number CN107093101A discloses a potential loan user mining and risk scoring method based on POS transaction flow data, which comprises the following steps: acquiring POS transaction flow data; the method includes the steps that potential loan user mining is conducted on the obtained POS transaction flow data in the aspects of expansion of operation and fund turnover; and determining a statistical index for POS transaction running water risk scoring, and performing POS transaction running water risk scoring by adopting a set scoring model according to the determined statistical index and the obtained POS transaction running water data. The method disclosed by the invention has the advantages that potential loan users are mined by combining POS transaction flow data from the aspects of expanded operation and capital turnover, the potential loan users can be rapidly and accurately mined, the POS transaction flow data well reflects the demands of merchants on funds and loans, and the conversion success rate is higher; a new POS transaction running risk scoring method is provided based on POS transaction running data, and the method is more effective. The method can be widely applied to the field of data mining. The method provides risk assessment through the statistical indexes of POS transaction flow, although analysis of POS transaction flow data is achieved, the used data mining method is too simple and lacks expansibility, and the method disclosed by the technology and the given numerical value result are too specific and are not suitable for credit auditing processes widely applied to different scenes.
For the prior art in the above, the inventor considers that the accuracy of the default probability prediction is poor, and the application range of the credit auditing process is small, so that the credit risk assessment effect is poor.
Disclosure of Invention
In view of the shortcomings in the prior art, it is an object of the present invention to provide a credit risk assessment method and system based on control charts.
The credit risk assessment method based on the control chart comprises the following steps:
step 1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
step 2: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
and step 3: converting the normalized initial transaction flow indicators into warning signals;
and 4, step 4: processing the warning signal into a signal signature;
and 5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
step 6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
and 7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
Preferably, the pretreatment in step 1 comprises the following steps:
preprocessing transaction flow data: rejecting transactions within a predetermined transaction amount range for a transaction pipeline;
credit audit data preprocessing step: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
pre-processing the data of overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
Preferably, the step 2 comprises the following steps:
step 2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
step 2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
Preferably, the step 3 comprises the following steps:
step 3.1: calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the index in each group aiming at the normalized initial transaction flow indexes;
step 3.2: and (4) formulating a warning signal of a preset time period in the transaction running water monitoring period according to the mean value, the upper limit and the lower limit of the multi-class control chart obtained in the step 3.1.
Preferably, the step 4 comprises the following steps:
step 4.1: respectively counting the sum of each warning signal in each control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
step 4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
The invention provides a credit risk assessment system based on a control chart, which comprises the following modules:
module M1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
module M2: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
module M3: converting the normalized initial transaction flow indicators into warning signals;
module M4: processing the warning signal into a signal signature;
module M5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
module M6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
module M7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
Preferably, the preprocessing in the module M1 includes the following modules:
the transaction flow data preprocessing module: rejecting transactions within a predetermined transaction amount range for a transaction pipeline;
credit audit data preprocessing module: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
the data preprocessing module of the overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
Preferably, the module M2 includes the following modules:
module M2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
module M2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
Preferably, the module M3 includes the following modules:
module M3.1: calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the index in each group aiming at the normalized initial transaction flow indexes;
module M3.2: and formulating a warning signal of a preset time period in the transaction flow monitoring period according to the mean value, the upper limit and the lower limit of the various control charts obtained by the module M3.1.
Preferably, the module M4 includes the following modules:
module M4.1: respectively counting the sum of each warning signal in each control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
module M4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
Compared with the prior art, the invention has the following beneficial effects:
1. analyzing the transaction flow data of the customer through the technical principle of three types of control charts, capturing and analyzing abnormal transaction flow, forming a warning signal, and further converting the warning signal into a credit index for measuring risks. Fills the blank of risk assessment by utilizing trade running water in the industry;
2. the signal characteristics extracted from the transaction flow data source are added to serve as input indexes of the credit wind control model. The result shows that the technology improves the accuracy of credit risk assessment, thereby improving the effect of credit risk assessment;
3. the credit risk assessment method can be applied to credit risk assessment under different scenes, for example, the credit risk assessment method can be applied to pre-loan review help decision credit approval and credit management help customer management. The method is suitable for the small and micro enterprises at the B end to carry out credit assessment by utilizing the operation and transaction flow of the small and micro enterprises, and is also suitable for the consumers at the C end to carry out credit assessment by utilizing the personal transaction flow. The applicability is strong, and the application range of credit risk assessment is favorably widened;
4. in the process of constructing the characteristics, the transaction flow data can be collected by obtaining the authorization of the client for the financial institution depending on the own transaction flow information of the client. The data source is easy to obtain and the method is easy to realize;
5. the dynamic evaluation is carried out on the transaction flow data, the real-time performance is strong, the financial institution can master the most real risk condition of the client, and the corresponding management means is rapidly adopted.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a chart I of the I-MR control charts;
FIG. 2 is an MR control map of the I-MR control maps;
FIG. 3 is a drawing showing
Figure BDA0003087428160000051
In control charts
Figure BDA0003087428160000056
A control chart;
FIG. 4 is a drawing showing
Figure BDA0003087428160000052
R of the control charts;
FIG. 5 is a drawing showing
Figure BDA0003087428160000053
In control charts
Figure BDA0003087428160000055
A control chart;
FIG. 6 is a drawing showing
Figure BDA0003087428160000054
S control map in the control map.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment of the invention discloses a credit risk assessment method and a credit risk assessment system based on a control chart, which comprises the following steps:
step 1: and collecting transaction flow data of the loan-released customer and other available credit auditing data from a credit platform database, wherein the time window of the transaction flow data is 30 days before applying for loan. For example, a client applies for a loan at 1/4/2021, and the transaction flow data collected by the financial institution is the transaction flow generated by the client at 31/3/2/2021 to 3/2021. Meanwhile, the data of the overdue days of the loan clients which have been issued are collected, and various data are preprocessed.
The pretreatment in step 1 comprises the following steps: preprocessing transaction flow data: and (3) rejecting the transactions within a preset transaction amount range aiming at each transaction flow, wherein the preset transaction amount range is smaller than 0.1 yuan. Credit audit data preprocessing step: and rejecting all data records of the loan-released customer within a predetermined date range for the life cycle of the customer credit, wherein the predetermined date range is 30 days without exceeding the first repayment date. Pre-processing the data of overdue days: according to the overdue degree of the client, all data records of the client within a preset overdue day range are removed, wherein the preset overdue day range is that the overdue day is more than 0 and less than 30 days; default characteristics are formed according to the number of overdue days, the default characteristic value of a client with the number of overdue days being 0 is 0, and the default characteristic value of a client with the number of overdue days being more than 0 is 1.
Step 2: and aggregating and standardizing the preprocessed transaction flow data to obtain the standardized initial transaction flow index.
The step 2 comprises the following steps: step 2.1: and analyzing the transaction flow data from the aspects of transaction time, transaction amount, transaction type and transaction card type. Wherein the transaction types include consumption, pre-authorization and refund; transaction card types include credit cards, debit cards, quasi-credit cards. Aggregating the transaction flow data into initial transaction flow characteristic indexes shown in Table 1 by taking daily statistical dimension as statistical dimension, such as transaction amount, transaction number, terminal usage number, etc., and grouping the initial transaction flow characteristic indexes according to three types of control charts, wherein the three types of control charts are I-MR control chart, and I-MR control chart, respectively,
Figure BDA0003087428160000062
Control chart and
Figure BDA0003087428160000063
control a chart. As in the I-MR control chart, the transaction amount is group 1, the transaction number is group 2, and the terminal usage number is group 3; in that
Figure BDA0003087428160000064
In the control chart, three initial transaction flow characteristic indexes, namely transaction amount, transaction number and terminal use number, form a group 1; in that
Figure BDA0003087428160000065
In the control chart, all of the initial transaction flow profiles constitute set 1. And listing values set by control chart parameters according to the quantity of the initial transaction flow characteristic indexes in each group of the three types of control charts. Table 1 details the initial transaction flow characteristic indicators and their description, the belonging group numbers of each indicator in the three types of control charts and their parameter settings.
TABLE 1 initial transaction pipeline characteristic index
Figure BDA0003087428160000061
Step 2.2: to a singleThe initial transaction running characteristic index is normalized with the mean value of 0 and the variance of 1 to obtain the normalized initial transaction running characteristic index, for example, in the scene that the transaction running monitoring time window length is T, the ith single initial transaction running characteristic index XiIncluding the value of the initial trade flow characteristic index every day, using Xi={x1i,x2i,...xti,...xTiMeans, for example, the value of the ith single initial transaction running water characteristic index on the t day is xtiThe value on the last day is xTiAnd the normalized initial transaction flow index is marked as Xi′={x′1i,x′2i,...x′ti,...x′TiAnd the value calculation logic of the ith single initial transaction running water characteristic index after standardization on the t day is
Figure BDA0003087428160000071
Wherein
Figure BDA0003087428160000072
Representing the ith single initial transaction running characteristic index XiThe mean value over the monitoring time T,
Figure BDA0003087428160000073
representing the ith single initial transaction running characteristic index XiThe variance over the monitoring time T, and T ═ 30, considered from the business scenario, also represents the current time, i.e., the point in time at which the customer submitted the loan application.
And step 3: the normalized initial transaction flow indicator is converted into a warning signal through three types of control diagram principles.
The step 3 comprises the following steps: step 3.1: and calculating the mean value, the upper limit and the lower limit of each type of control chart in each group of standardized initial transaction flow indexes according to the grouping condition of the initial transaction flow characteristic indexes of the three types of control charts in the table 1. Wherein, since each type of control chart generates two groups of control charts, namely the I-MR control chart comprises the I control chart and the MR control chart,
Figure BDA0003087428160000074
Figure BDA0003087428160000075
control chart containing
Figure BDA0003087428160000076
The control map and the R control map,
Figure BDA0003087428160000077
control chart containing
Figure BDA0003087428160000078
Control map and s control map, with CLx,UCLxAnd LCLxMean, upper and lower limits of the first set of control charts are shown as CLs,UCLsAnd LCLsThe mean, upper and lower limits of the second set of control charts are shown, with T as the transaction running water monitoring time window length (T30). The calculation method of the mean, upper limit and lower limit in each type of control map is described in detail below.
As shown in fig. 1 and 2, for the I-MR control chart, each packet contains a normalized initial transaction flow indicator as listed in table 1. Same definition as step 2.2, in x'tiAnd (4) representing the value of the ith single normalized initial transaction running index in the group on the t day. Since the I-MR control map contains only one normalized initial trade flow indicator per group, I is constantly equal to 1, for simplicity of illustration, in x'tRepresenting the value of the initial transaction running water index after standardization in each group on the t day (namely, the subscript i is omitted), and calculating the mean value
Figure BDA0003087428160000079
The difference between the t day and the previous day MRt, and further calculating the movement difference
Figure BDA00030874281600000710
Figure BDA00030874281600000711
According to the mean value
Figure BDA00030874281600000712
And movement range
Figure BDA00030874281600000713
And I-MR control map parameter d shown in Table 12、D3And D4Calculating the mean, upper limit and lower limit of the I-MR control chart:
Figure BDA00030874281600000714
Figure BDA00030874281600000715
with the transaction running water monitoring time T as the abscissa (T is 1, 2, …, T), drawing x in the I control charttMean value CLxUpper limit UCLxAnd lower limit LCLxThe MR control chart is used to draw the MRtMean value CLsUpper limit UCLsAnd lower limit LCLs. As shown in FIG. 1, with X in Table 11The transaction amount is taken as an example, the standardized value of the transaction amount 30 days before a certain client applies for loan is depicted, and the mean value, the upper limit and the lower limit calculated according to the I control chart are given.
As shown in fig. 3 and 4, for
Figure BDA00030874281600000833
For control purposes, each packet contains 2 to 4 normalized initial transaction flow indicators as listed in Table 1. Same definition as step 2.2, in x'tiAnd (4) representing the value of the ith single normalized initial transaction running index in the group on the t day. Firstly, the standardized initial transaction flow indexes in each group are summarized by taking the transaction flow monitoring time t as a statistical dimension to obtain an average value in each group
Figure BDA0003087428160000081
And intraclass range RtAnd further calculating the mean value in the group
Figure BDA0003087428160000082
And intraclass range of RtAre respectively recorded as
Figure BDA0003087428160000083
And
Figure BDA0003087428160000084
the calculation logic is as follows:
Figure BDA0003087428160000085
Rt=max{x′ti,x′ti,...x′tN}-min{x′ti,x′ti,...x′tN},t=1,2,...,T;
Figure BDA0003087428160000086
wherein, N is the number of initial transaction flow indicators after standardization in the group, and the value is 2, 3 or 4, and the detailed grouping and value taking conditions can be seen in table 1. According to
Figure BDA0003087428160000087
And
Figure BDA0003087428160000088
and shown in Table 1
Figure BDA0003087428160000089
Control chart parameter A2、D3And D4Calculating
Figure BDA00030874281600000810
Figure BDA00030874281600000811
Mean, upper and lower limits of control charts:
Figure BDA00030874281600000812
Figure BDA00030874281600000813
with the transaction flow monitoring time T as the abscissa (T takes the value 1, 2, …, T), at
Figure BDA00030874281600000814
Control chart is drawn
Figure BDA00030874281600000815
Mean, upper and lower limits, and drawing R in the control chart of RtMean, upper limit, and lower limit.
As shown in fig. 5 and 6, for
Figure BDA00030874281600000816
For the control chart, there is only one packet that contains 23 normalized initial transaction flow indicators as listed in table 1. Same definition as step 2.2, in x'tiAnd (4) representing the value of the ith single normalized initial transaction running index in the group on the t day.
Figure BDA00030874281600000817
Mean of control charts
Figure BDA00030874281600000818
And mean value
Figure BDA00030874281600000819
Mean value of
Figure BDA00030874281600000820
And
Figure BDA00030874281600000834
the mean value of the control chart is logically consistent according to xtiSum mean value
Figure BDA00030874281600000822
Calculating variance s of transaction running water monitoring time ttAnd further calculating the variance stIs the mean value of
Figure BDA00030874281600000823
The calculation logic is as follows:
Figure BDA00030874281600000824
wherein, N is the number of the initial transaction flow indicators after the group standardization, which is 23. According to
Figure BDA00030874281600000825
And
Figure BDA00030874281600000826
and shown in Table 1
Figure BDA00030874281600000827
Control chart parameter A3、B3And B4Calculating
Figure BDA00030874281600000828
Mean, upper and lower limits of control charts:
Figure BDA00030874281600000829
Figure BDA00030874281600000830
with the transaction flow monitoring time T as the abscissa (T takes the value 1, 2, …, T), at
Figure BDA00030874281600000831
Control chart is drawn
Figure BDA00030874281600000832
Mean, upper and lower limits, and s is plotted in the s control charttMean, upper limit, and lower limit.
Step 3.2: and (3) according to the calculation results of the average value, the upper limit and the lower limit of each control chart in the step 3.1, setting a warning signal of each preset time period in the transaction running water monitoring period, wherein the preset time period is a monitoring day, the warning signal is a binary system, a value of '1' represents that the transaction running water on the monitoring day has obvious change, and a value of '0' represents that the transaction running water on the monitoring day has no abnormity. In a control chart, each monitoring day comprises three warning signals, and whether the following three conditions occur on the monitoring day is counted respectively: (1) the value of the monitoring day exceeds the upper limit and the lower limit of the control chart; (2) in nearly 8 days, the values of 8 continuous monitoring days are all positioned on the same side of the mean value; (3) in the last 6 days, the values of the continuous 6 monitoring days continuously rise or fall. If yes, the value is 1.
And 4, step 4: the warning signal is processed into a signal characteristic.
The step 4 comprises the following steps: step 4.1: counting transaction flow monitoring days of three abnormal conditions of each type of control chart during transaction flow monitoring respectively, namely counting the sum of each warning signal in each type of control chart during transaction flow monitoring respectively and converting the warning signals into signal characteristics.
Step 4.2: in order to facilitate the interpretation of the abnormal state of the last monitoring day T, a signal characteristic is introduced to identify the general abnormal situation of the transaction running water of the last monitoring day, the value logic of the abnormal situation is the union operation of six warning signals generated by the monitoring day, namely, if any control chart of the monitoring day has one of the three situations in the step 3.2, the value is '1', otherwise, the value is '0'. As can be seen from the combination of the six signal signatures in step 4.1, each type of control map produces a total of seven signal signatures. Taking the I-MR control chart as an example, the signal characteristics of the I-MR control chart comprise: (1) i controlling the days exceeding the upper limit and the lower limit of the control chart in the chart; (2) i controls the number of days in the graph for which 8 consecutive monitoring days are on the same side of the mean; (3) i controls the number of days in which 6 consecutive monitoring days continue to rise or fall; (4) days in the MR control chart that exceed the upper and lower limits of the control chart; (5) the MR control chart includes the number of days in which 8 consecutive monitoring days are on the same side of the mean; (6) the number of days in which 6 consecutive monitoring days in the MR control chart continuously rise or fall; (7) and the last step of monitoring whether the daily transaction flow is abnormal or not.
And 5: and integrating the signal characteristics, the conventional characteristics extracted from the credit auditing data and the default characteristics extracted from the overdue days data to obtain three types of wind control evaluation samples.
The step 5 comprises the following steps: step 5.1: and integrating 23 groups of signal characteristics (23 multiplied by 7 in total) generated by the I-MR control chart with conventional characteristics refined from credit audit data and default characteristics refined from overdue days data to obtain an I-MR wind control evaluation sample. Step 5.2: will be composed of
Figure BDA0003087428160000091
8 groups of signal characteristics (8 multiplied by 7 in total) generated by the control chart are integrated with conventional characteristics refined from credit audit data and default characteristics refined from overdue days data to obtain
Figure BDA0003087428160000092
And (4) evaluating a sample by wind control. Step 5.3: will be composed of
Figure BDA0003087428160000093
1 group of signal characteristics (total 1 × 7) generated by the control chart is integrated with conventional characteristics refined from credit audit data and default characteristics refined from overdue days data to obtain
Figure BDA0003087428160000094
And (4) evaluating a sample by wind control.
Step 6: for each type of wind control evaluation sample, respectively performing sample preprocessing and independent variable selection, establishing a machine learning model, evaluating the machine learning model result according to different wind control evaluation samples, and selecting the optimal wind control model, wherein the machine learning model is a logistic regression model.
The step 6 comprises the following steps: step 6.1: processing the missing values of the samples, checking the condition of the missing values in each feature, filling the missing values according to business meanings for the features which can be filled according to the business meanings (if the transaction running transaction amount of a certain day is empty, the filling can be 0), deleting the features for the features which cannot be filled according to the business meanings and have high missing proportion, judging the data type of the features with small missing proportion, if the features are type variables, grouping the missing values, and if the features are numerical type variables, taking the mean value of the features for filling.
Step 6.2: and processing the type variable, and converting the type variable in the sample into a value of 0-1 by virtue of the dummy variable.
Step 6.3: the independent variables are grouped based on principal component analysis, as follows,
and performing principal component analysis on all independent variables, selecting the most significant first principal component and second principal component, and dividing the variables into two groups A and B according to the magnitude relation of correlation coefficients of all the independent variables and the two principal components. For any argument, if its correlation coefficient with the first principal component is greater than that with the second principal component, it is classified into group A, otherwise it is classified into group B. For each group, the principal component analysis is again used to divide into two groups until one of the following conditions is met:
1) there is only one argument in the set;
2) the decision factor Ratio (R-Squared Ratio) of the independent variable x is reduced by more than half compared with the previous iteration result, and the decision factor Ratio R is reduced2The calculation logic of-r (x) is
Figure BDA0003087428160000101
And when the current iteration is finished, all the independent variables are divided into n groups.
Figure BDA0003087428160000102
Represents the coefficient of determination obtained by fitting x to all independent variables of the i-th group by linear regression.
Figure BDA0003087428160000103
Represents the coefficient of determination obtained by fitting x to the m-th set of independent variables except x by linear regression.
Step 6.4: the arguments are selected based on the information values, as follows,
calculating the information value of each independent variable, deleting the independent variables with the information value larger than 0.5, screening the independent variables of each class obtained in the step 6.3, and keeping the number of the independent variables under the condition of ensuring that at least one independent variable is in each class, wherein the number of the reserved independent variables is determined by the ratio of the information values of all the independent variables in the class to the total information value of all the classes, if the class i contains niAn argument, the information value of this class being MiAll independent variables have information value M, and the selection process of the independent variables is to select the top n with large information value in the categoryi×MiM arguments and follows the rounding-up principle, where the information value is calculated logically as,
for a single independent variable, dividing the independent variable into K groups according to the value of the independent variable, and calculating the evidence weight of the group i according to the following formula
Figure BDA0003087428160000104
%defaultiRepresents the percentage of default samples in the ith group in all default samples,% paidiRepresenting the proportion of the normal repayment samples in the ith group in all the normal repayment samples, and calculating the information value IV of the independent variable according to the following formula,
Figure BDA0003087428160000111
step 6.5: cutting the training sample and the test sample by using a random sampling method to keep the ratio of 6: 4, at the same time, the method of repeated sampling is adopted for the training samples, so that the ratio of non-default samples to default samples in the training samples is kept to be 1: 1, in the structure of (1).
Step 6.6: for each type of sample after pretreatment and independent variable selection, a machine is respectively established for training samplesThe learning model expresses the default probability p (X) ═ Pr (Y ═ 1| X) as the argument X ═ (X ═ X)1,...,Xn) As regards the dependent variable Y, i.e. a function of the violation features,
Figure BDA0003087428160000112
wherein, XiRepresents the ith independent variable, betai(i-0 to n) is a regression coefficient, YiRepresenting the ith dependent variable and solving for the regression coefficient using maximum likelihood estimation, i.e. maximization
Figure BDA0003087428160000113
And solving to obtain regression coefficients, entering the expression of the default probability p (X), checking the significance of the equation and the significance of each independent variable, and determining the final variable combination and the regression coefficients thereof as a final machine learning model.
Step 6.7: and calculating the prediction result of the test sample in the final machine learning model, comparing the confusion matrix with the actual default characteristics, selecting an evaluation index according to a business target on the basis of the confusion matrix, evaluating three machine learning models of the three types of wind control evaluation samples, and selecting the optimal wind control model.
And 7: and (3) establishing a credit platform online machine learning model (such as a credit platform online logistic regression model) according to the optimal wind control model, performing real-time risk assessment on the applicant client, outputting a risk assessment result, periodically repeating the steps 1-6, importing a new client for training, and updating the credit platform online machine learning model.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A method for credit risk assessment based on control charts, comprising the steps of:
step 1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
step 2: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
and step 3: converting the normalized initial transaction flow indicators into warning signals;
and 4, step 4: processing the warning signal into a signal signature;
and 5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
step 6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
and 7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
2. The control chart-based credit risk assessment method according to claim 1, wherein said preprocessing in step 1 comprises the steps of:
preprocessing transaction flow data: rejecting transactions within a predetermined transaction amount range for a transaction pipeline;
credit audit data preprocessing step: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
pre-processing the data of overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
3. The control chart-based credit risk assessment method according to claim 1, wherein said step 2 comprises the steps of:
step 2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
step 2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
4. The control chart-based credit risk assessment method according to claim 3, wherein said step 3 comprises the steps of:
step 3.1: aiming at the normalized initial transaction flow indexes, calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the indexes in each group;
step 3.2: and (4) formulating a warning signal of a preset time period in the transaction running water monitoring period according to the mean value, the upper limit and the lower limit of the multi-class control chart obtained in the step 3.1.
5. The control chart-based credit risk assessment method according to claim 4, wherein said step 4 comprises the steps of:
step 4.1: respectively counting the sum of each warning signal in each control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
step 4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
6. An exercise chart-based credit risk assessment system, to which the exercise chart-based credit risk assessment method according to any one of claims 1 to 5 is applied, comprising the following modules:
module M1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
module M2: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
module M3: converting the normalized initial transaction flow indicators into warning signals;
module M4: processing the warning signal into a signal signature;
module M5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
module M6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
module M7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
7. The control chart-based credit risk assessment system according to claim 6, wherein said preprocessing in module M1 includes the following modules:
the transaction flow data preprocessing module: rejecting transactions within a predetermined transaction amount range for a transaction pipeline;
credit audit data preprocessing module: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
the data preprocessing module of the overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
8. The control chart-based credit risk assessment system according to claim 6, wherein said module M2 comprises the following modules:
module M2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
module M2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
9. The control chart-based credit risk assessment system according to claim 8, wherein said module M3 comprises the following modules:
module M3.1: calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the index in each group aiming at the normalized initial transaction flow indexes;
module M3.2: and formulating a warning signal of a preset time period in the transaction flow monitoring period according to the mean value, the upper limit and the lower limit of the various control charts obtained by the module M3.1.
10. The control chart-based credit risk assessment system according to claim 9, wherein said module M4 comprises the following modules:
module M4.1: respectively counting the sum of each warning signal in each control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
module M4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
CN202110584049.6A 2021-05-27 2021-05-27 Credit risk assessment method and system based on control chart Active CN113421154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110584049.6A CN113421154B (en) 2021-05-27 2021-05-27 Credit risk assessment method and system based on control chart

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110584049.6A CN113421154B (en) 2021-05-27 2021-05-27 Credit risk assessment method and system based on control chart

Publications (2)

Publication Number Publication Date
CN113421154A true CN113421154A (en) 2021-09-21
CN113421154B CN113421154B (en) 2022-10-04

Family

ID=77713100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110584049.6A Active CN113421154B (en) 2021-05-27 2021-05-27 Credit risk assessment method and system based on control chart

Country Status (1)

Country Link
CN (1) CN113421154B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793060A (en) * 2021-09-27 2021-12-14 武汉众邦银行股份有限公司 Customer rating method and device based on customer transaction data and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101421756A (en) * 2006-02-10 2009-04-29 芝加哥气候交易公司 Present valuation of emission credit and allowance futures
CN104346749A (en) * 2013-08-07 2015-02-11 辅富投资(上海)有限公司 Pledge-based network borrowing process monitoring method
US20200005310A1 (en) * 2018-06-29 2020-01-02 Paypal, Inc. Machine learning engine for fraud detection during cross-location online transaction processing
CN110738564A (en) * 2019-10-16 2020-01-31 信雅达系统工程股份有限公司 Post-loan risk assessment method and device and storage medium
CN111507831A (en) * 2020-05-29 2020-08-07 长安汽车金融有限公司 Credit risk automatic assessment method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101421756A (en) * 2006-02-10 2009-04-29 芝加哥气候交易公司 Present valuation of emission credit and allowance futures
CN104346749A (en) * 2013-08-07 2015-02-11 辅富投资(上海)有限公司 Pledge-based network borrowing process monitoring method
US20200005310A1 (en) * 2018-06-29 2020-01-02 Paypal, Inc. Machine learning engine for fraud detection during cross-location online transaction processing
CN110738564A (en) * 2019-10-16 2020-01-31 信雅达系统工程股份有限公司 Post-loan risk assessment method and device and storage medium
CN111507831A (en) * 2020-05-29 2020-08-07 长安汽车金融有限公司 Credit risk automatic assessment method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
栗秋佳: "基于质量管理工具的交行包头分行信贷风险管理研究", 《中国优秀博硕士学位论文全文数据库(硕士)经济与管理科学辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793060A (en) * 2021-09-27 2021-12-14 武汉众邦银行股份有限公司 Customer rating method and device based on customer transaction data and storage medium

Also Published As

Publication number Publication date
CN113421154B (en) 2022-10-04

Similar Documents

Publication Publication Date Title
US11599939B2 (en) System, method and computer program for underwriting and processing of loans using machine learning
KR102009309B1 (en) Management automation system for financial products and management automation method using the same
TWI257556B (en) Rapid valuation of portfolios of assets such as financial instruments
EP1361526A1 (en) Electronic data processing system and method of using an electronic processing system for automatically determining a risk indicator value
CN110895758B (en) Screening method, device and system for credit card account with cheating transaction
CN112598500A (en) Credit processing method and system for non-limit client
WO2007106786A2 (en) Methods and systems for multi-credit reporting agency data modeling
KR20010102452A (en) Methods and systems for finding value and reducing risk
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN107392217B (en) Computer-implemented information processing method and device
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
US20140278774A1 (en) In the market model systems and methods
US20150269668A1 (en) Voting mechanism and multi-model feature selection to aid for loan risk prediction
CN112508689A (en) Method for realizing decision evaluation based on multiple dimensions
CN107133862A (en) Dynamic produces the method and system of the detailed transaction payment experience of enhancing credit evaluation
CN113421154B (en) Credit risk assessment method and system based on control chart
CN117114812A (en) Financial product recommendation method and device for enterprises
CN117252677A (en) Credit line determination method and device, electronic equipment and storage medium
Niknya et al. Financial distress prediction of Tehran Stock Exchange companies using support vector machine
JP7344609B2 (en) Data quantification method based on confirmed and estimated values
CN113822751A (en) Online loan risk prediction method
Moe et al. A Hybrid Approach of Logistic Regression with Grid Search Optimization in Credit Scoring Modeling for Financial Institutions
CN118014719B (en) Intelligent enterprise credit analysis method and system based on linear regression model
CN113610638B (en) Rating system and method for matching credit rating with default loss rate based on SMAA-DS
CN117764692A (en) Method for predicting credit risk default probability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant