CN113421154B - Credit risk assessment method and system based on control chart - Google Patents

Credit risk assessment method and system based on control chart Download PDF

Info

Publication number
CN113421154B
CN113421154B CN202110584049.6A CN202110584049A CN113421154B CN 113421154 B CN113421154 B CN 113421154B CN 202110584049 A CN202110584049 A CN 202110584049A CN 113421154 B CN113421154 B CN 113421154B
Authority
CN
China
Prior art keywords
data
transaction flow
credit
module
risk assessment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110584049.6A
Other languages
Chinese (zh)
Other versions
CN113421154A (en
Inventor
陈宏�
叶恒青
张思宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110584049.6A priority Critical patent/CN113421154B/en
Publication of CN113421154A publication Critical patent/CN113421154A/en
Application granted granted Critical
Publication of CN113421154B publication Critical patent/CN113421154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a credit risk assessment method and system based on a control chart, which comprises the steps of collecting transaction flow data, credit auditing data and overdue days data, and preprocessing the data to obtain conventional characteristics and default characteristics; aggregating transaction flow data to obtain an initial transaction flow index; converting the initial transaction flow index into a warning signal; processing the warning signal into a signal characteristic; integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples; aiming at the wind control evaluation samples, establishing a machine learning model, and selecting the optimal wind control model according to the machine learning model results of different wind control evaluation samples; and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the applicant, and outputting a risk assessment result. The method improves the accuracy of credit risk assessment, is applied to credit risk assessment in different scenes, and is favorable for improving the application range of the credit risk assessment.

Description

Credit risk assessment method and system based on control chart
Technical Field
The invention relates to the technical field of credit risk assessment, in particular to a credit risk assessment method and system based on a control chart.
Background
In the credit approval process, a plurality of current risk assessment methods focus on realizing intelligent risk rating, but lack mining on data sources of different channels.
Chinese patent publication No. CN110415111A discloses a method for merging logistic regression credit approval based on user data and expert features, which includes cleaning input data, performing data reduction and preprocessing, classifying data, performing feature engineering on data and extracting features, introducing expert features, predicting features, and outputting an approval list. In the patent, the credit approval method combines expert features in a traditional financial model with a classical machine learning method, predicts the possible future default possibility of dynamic change by combining with market real-time updated data and feature engineering, adopts a prediction model and an optimized logistic regression algorithm, meets complex credit constraints, obtains more accurate default probability prediction and risk premium results, frees auditors from heavy credit risk assessment audit and pricing, quickly realizes large-scale small and micro enterprise credit approval, and ensures that intelligent rating and risk avoidance are possible. The method analyzes user data and expert characteristics, integrates two types of common data sources, can realize quick approval, and still has a space for improving the accuracy of default probability prediction.
Chinese patent publication No. CN107093101A discloses a method for potential loan user mining and risk scoring based on POS transaction flow data, which comprises: acquiring POS transaction flow data; the method includes the steps that potential loan user mining is conducted on the obtained POS transaction flow data in the aspects of expansion of operation and fund turnover; and determining a statistical index for POS transaction running water risk scoring, and performing POS transaction running water risk scoring by adopting a set scoring model according to the determined statistical index and the obtained POS transaction running water data. The method disclosed by the invention has the advantages that potential loan users are excavated from the aspects of expanding operation and turning funds by combining POS transaction flow data, the potential loan users can be excavated quickly and accurately, the POS transaction flow data well reflects the demands of merchants on the funds and the loans, and the conversion success rate is higher; a new POS transaction running risk scoring method is provided based on POS transaction running data, and the method is more effective. The method can be widely applied to the field of data mining. The method provides risk assessment through the statistical indexes of POS transaction flow, although analysis of POS transaction flow data is achieved, the used data mining method is too simple and lacks expansibility, and the method disclosed by the technology and the given numerical value result are too specific and are not suitable for credit auditing processes widely applied to different scenes.
For the prior art, the inventor considers that the accuracy of default probability prediction is poor, and the application range of the credit auditing process is small, so that the credit risk assessment effect is poor.
Disclosure of Invention
In view of the shortcomings in the prior art, it is an object of the present invention to provide a credit risk assessment method and system based on control charts.
The invention provides a credit risk assessment method based on a control chart, which comprises the following steps:
step 1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
step 2: aggregating and standardizing the preprocessed transaction flow data to obtain standardized initial transaction flow indexes;
and step 3: converting the normalized initial transaction flow indicators into warning signals;
and 4, step 4: processing the warning signal into a signal signature;
and 5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
step 6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating machine learning model results according to different wind control evaluation samples, and selecting an optimal wind control model;
and 7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
Preferably, the pretreatment in step 1 comprises the following steps:
preprocessing transaction flow data: removing the transaction within a preset transaction amount range aiming at transaction running water;
credit audit data preprocessing step: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
pre-processing the data of overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
Preferably, the step 2 comprises the following steps:
step 2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
step 2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
Preferably, the step 3 comprises the following steps:
step 3.1: calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the index in each group aiming at the normalized initial transaction flow index;
step 3.2: and (4) formulating a warning signal of a preset time period in the transaction running water monitoring period according to the mean value, the upper limit and the lower limit of the multi-class control chart obtained in the step 3.1.
Preferably, the step 4 comprises the following steps:
step 4.1: respectively counting the sum of each warning signal in each type of control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
step 4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
The invention provides a credit risk assessment system based on a control chart, which comprises the following modules:
a module M1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
a module M2: aggregating and standardizing the preprocessed transaction flow data to obtain standardized initial transaction flow indexes;
a module M3: converting the normalized initial transaction flow indicators into warning signals;
a module M4: processing the warning signal into a signal signature;
a module M5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
a module M6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
a module M7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
Preferably, the preprocessing in the module M1 includes the following modules:
the transaction flow data preprocessing module: rejecting transactions within a predetermined transaction amount range for a transaction pipeline;
credit review data preprocessing module: according to the life cycle of the credit of the customer, removing all data records of the customer in a preset date range;
the data preprocessing module of the overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
Preferably, the module M2 includes the following modules:
module M2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
module M2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
Preferably, the module M3 includes the following modules:
module M3.1: calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the index in each group aiming at the normalized initial transaction flow indexes;
module M3.2: and formulating a warning signal of a preset time period in the transaction flow monitoring period according to the mean value, the upper limit and the lower limit of the various control charts obtained by the module M3.1.
Preferably, the module M4 includes the following modules:
module M4.1: respectively counting the sum of each warning signal in each type of control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
module M4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
Compared with the prior art, the invention has the following beneficial effects:
1. analyzing the transaction flow data of the customer through the technical principle of three types of control charts, capturing and analyzing abnormal transaction flow, forming a warning signal, and further converting the warning signal into a credit index for measuring risks. Fills the blank of risk assessment by utilizing trade running water in the industry;
2. the signal characteristics extracted from the transaction flow data source are added to serve as input indexes of the credit wind control model. The result shows that the technology improves the accuracy of credit risk assessment, thereby improving the effect of credit risk assessment;
3. the credit risk assessment method can be applied to credit risk assessment under different scenes, for example, the credit risk assessment method can be applied to pre-loan review help decision credit approval and credit management help customer management. The method is suitable for the small and micro enterprises at the B end to carry out credit assessment by utilizing the operation and transaction flow of the small and micro enterprises, and is also suitable for the consumers at the C end to carry out credit assessment by utilizing the personal transaction flow. The applicability is strong, and the application range of credit risk assessment is favorably widened;
4. in the process of constructing the characteristics, the transaction flow data can be collected by obtaining the authorization of the client for the financial institution depending on the own transaction flow information of the client. The data source is easy to obtain and the method is easy to realize;
5. the dynamic evaluation is carried out on the transaction flow data, the real-time performance is strong, the financial institution can master the most real risk condition of the client, and the corresponding management means is rapidly adopted.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a chart I of the I-MR control charts;
FIG. 2 is an MR control map of the I-MR control maps;
FIG. 3 is a drawing showing
Figure BDA0003087428160000051
In control charts
Figure BDA0003087428160000056
A control chart;
FIG. 4 is a schematic view of
Figure BDA0003087428160000052
R of the control charts;
FIG. 5 is a drawing showing
Figure BDA0003087428160000053
In control charts
Figure BDA0003087428160000055
Controlling a chart;
FIG. 6 is a drawing showing
Figure BDA0003087428160000054
S control map in the control map.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will aid those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any manner. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the concept of the invention. All falling within the scope of the present invention.
The embodiment of the invention discloses a credit risk assessment method and a credit risk assessment system based on a control chart, which comprises the following steps:
step 1: and collecting transaction flow data of the loan-released customer and other available credit auditing data from a credit platform database, wherein the time window of the transaction flow data is 30 days before applying for loan. For example, a client applies for a loan at 1/4/2021, and the transaction flow data collected by the financial institution is the transaction flow generated by the client at 31/3/2/2021 to 3/2021. Meanwhile, the data of the number of overdue days of the loan clients which have issued are collected, and various data are preprocessed.
The pretreatment in step 1 comprises the following steps: preprocessing transaction flow data: and (3) rejecting the transactions within a preset transaction amount range aiming at each transaction flow, wherein the preset transaction amount range is smaller than 0.1 yuan. Credit audit data preprocessing step: and rejecting all data records of the loan-released customer within a predetermined date range for the life cycle of the customer credit, wherein the predetermined date range is 30 days without exceeding the first repayment date. Pre-processing the data of overdue days: according to the overdue degree of the client, all data records of the client within a preset overdue day range are removed, wherein the preset overdue day range is that the overdue day is more than 0 and less than 30 days; default characteristics are formed according to the number of overdue days, the default characteristic value of a client with the number of overdue days being 0 is 0, and the default characteristic value of a client with the number of overdue days being more than 0 is 1.
Step 2: and aggregating and standardizing the preprocessed transaction flow data to obtain the standardized initial transaction flow index.
The step 2 comprises the following steps: step 2.1: and analyzing the transaction flow data from the aspects of transaction time, transaction amount, transaction type and transaction card type. Wherein the transaction types include consumption, pre-authorization and refund; transaction card types include credit cards, debit cards, quasi-credit cards. And aggregating the transaction running water data into an initial transaction running water characteristic index shown in the table 1 by taking the daily dimension as a statistical dimension, such as transaction amount, transaction stroke number, the number of the terminal used, etc., and grouping the initial transaction flow characteristic indexes according to three types of control charts, three control charts are I-MR control chart,
Figure BDA0003087428160000062
Control chart and
Figure BDA0003087428160000063
and (5) controlling the image. As in the I-MR control chart, the transaction amount is group 1, the transaction number is group 2, and the terminal usage number is group 3; in that
Figure BDA0003087428160000064
In a control chart, three initial transaction flow characteristic indexes, namely transaction amount, transaction number and terminal use number, form a group 1; in that
Figure BDA0003087428160000065
In the control chart, all of the initial transaction flow profiles constitute set 1. And listing values set by control chart parameters according to the quantity of the initial transaction flow characteristic indexes in each group of the three types of control charts. Table 1 details the initial transaction flow characteristic indicators and their description, the belonging group numbers of each indicator in the three types of control charts and their parameter settings.
TABLE 1 initial transaction flow characteristics index
Figure BDA0003087428160000061
Step 2.2: standardizing the single initial transaction running water characteristic index with the mean value of 0 and the variance of 1 to obtain the standardized initial transaction running water characteristic index, for example, in the scene that the transaction running water monitoring time window length is T, the ith single initial transaction running water characteristic index X i Including the value of the initial trade flow characteristic index every day by X i ={x 1i ,x 2i ,...x ti ,...x Ti Means, for example, the value of the ith single initial transaction running water characteristic index on the t day is x ti The value on the last day is x Ti And the normalized initial transaction flow index is marked as X i ′={x′ 1i ,x′ 2i ,...x′ ti ,...x′ Ti And the value calculation logic of the ith single initial transaction running water characteristic index after standardization on the t day is
Figure BDA0003087428160000071
Wherein
Figure BDA0003087428160000072
Representing the ith single initial transaction running characteristic index X i The mean value over the monitoring time T,
Figure BDA0003087428160000073
representing the ith single initial transaction running characteristic index X i The variance over the monitoring time T, and T =30, is considered from the business scenario, and T also represents the current time, i.e., the point in time at which the customer submitted the loan application.
And step 3: the normalized initial transaction flow index is converted into a warning signal through three types of control chart principles.
The step 3 comprises the following steps: step 3.1: and aiming at the grouping condition of the initial transaction streamline characteristic indexes of the three types of control charts in the table 1, calculating the mean value, the upper limit and the lower limit of each type of control chart in each group of standardized initial transaction streamline indexes. Wherein, each type of control chart generates two groups of control charts, namely the I-MR control chart comprises an I control chart and an MR control chart,
Figure BDA0003087428160000074
Figure BDA0003087428160000075
control chart containing
Figure BDA0003087428160000076
The control map and the R control map,
Figure BDA0003087428160000077
control chart containing
Figure BDA0003087428160000078
Control map and s control map, with CL x ,UCL x And LCL x Mean, upper and lower limits of the first set of control charts are shown as CL s ,UCL s And LCL s The mean, upper and lower limits of the second set of control charts are represented, with T as the transaction running water monitoring time window length (T = 30). The calculation method of the mean, upper and lower limits in each type of control map is described in detail below.
As shown in fig. 1 and 2, for the I-MR control chart, each packet contains a normalized initial transaction flow indicator as listed in table 1. Same definition as step 2.2, in x' ti And (4) representing the value of the ith single normalized initial transaction running index in the group on the t day. Since the I-MR control map contains only one normalized initial trade flow indicator per group, I is constantly equal to 1, for simplicity of illustration, in x' t Representing the value of the initial transaction running water index after standardization in each group on the t day (namely, the subscript i is omitted), and calculating the mean value
Figure BDA0003087428160000079
The extreme difference MRt between the t day and the previous day is calculated, and the mobile extreme difference is further calculated
Figure BDA00030874281600000710
Figure BDA00030874281600000711
According to the mean value
Figure BDA00030874281600000712
And movement range
Figure BDA00030874281600000713
And I-MR control map parameter d shown in Table 1 2 、D 3 And D 4 Calculating the mean, the upper limit and the lower limit of the I-MR control chart:
Figure BDA00030874281600000714
Figure BDA00030874281600000715
using the transaction flow monitoring time T as abscissa (T is 1,2, \8230;, T), drawing x in I control chart t Mean value C Lx Upper limit UCL x And lower limit LCL x The MR control chart is used to draw the MR t Mean value CL s Upper limit UCL s And lower limit LCL s . As shown in FIG. 1, with X in Table 1 1 The transaction amount is an example, the standardized value of the transaction amount 30 days before a certain client applies for loan is depicted, and the mean value, the upper limit and the lower limit calculated according to the I control chart are given.
As shown in fig. 3 and 4, for
Figure BDA00030874281600000833
For control purposes, each packet contains 2 to 4 normalized initial transaction flow indicators as listed in Table 1. Same definition as step 2.2, in x' ti And (4) representing the value of the ith single normalized initial transaction running index in the group on the t day. Firstly, summarizing the standardized initial transaction streamline indexes in each group by taking the transaction streamline monitoring time t as a statistical dimension to obtain an average value in each group
Figure BDA0003087428160000081
And intraclass range R t And further calculating the mean value in the group
Figure BDA0003087428160000082
And intraclass range of R t Are respectively recorded as
Figure BDA0003087428160000083
And
Figure BDA0003087428160000084
the calculation logic is as follows:
Figure BDA0003087428160000085
R t =max{x′ ti ,x′ ti ,...x′ tN }-min{x′ ti ,x′ ti ,...x′ tN },t=1,2,...,T;
Figure BDA0003087428160000086
wherein, N is the number of the initial transaction flow indexes after the standardization in the group, the value is 2,3 or 4, and the detailed grouping and the value taking condition can be seen in Table 1. According to
Figure BDA0003087428160000087
And
Figure BDA0003087428160000088
and shown in Table 1
Figure BDA0003087428160000089
Control chart parameter A 2 、D 3 And D 4 Calculating
Figure BDA00030874281600000810
Figure BDA00030874281600000811
Control charts mean, upper and lower limits:
Figure BDA00030874281600000812
Figure BDA00030874281600000813
the transaction running water monitoring time T is used as an abscissa (T is 1,2, \8230; T) at which
Figure BDA00030874281600000814
Control chart is drawn
Figure BDA00030874281600000815
Mean, upper and lower limits, and drawing R in the control chart of R t Mean, upper and lower limits.
As shown in fig. 5 and 6, for
Figure BDA00030874281600000816
For the control chart, there is only one packet that contains 23 normalized initial transaction flow indicators as listed in table 1. X 'as defined in step 2.2' ti And (4) representing the value of the ith single normalized initial transaction running water index in the group on the t day.
Figure BDA00030874281600000817
Mean of control charts
Figure BDA00030874281600000818
And mean value
Figure BDA00030874281600000819
Mean value of
Figure BDA00030874281600000820
And with
Figure BDA00030874281600000834
The mean value of the control chart is logically consistent according to x ti Sum mean value
Figure BDA00030874281600000822
Calculating variance s of transaction running water monitoring time t t And further calculating the variance s t Is the mean value of
Figure BDA00030874281600000823
The calculation logic is as follows:
Figure BDA00030874281600000824
wherein, N is the number of the initial transaction flow indicators after the group standardization, which is 23. Root of herbaceous plantAccording to
Figure BDA00030874281600000825
And
Figure BDA00030874281600000826
and shown in Table 1
Figure BDA00030874281600000827
Control chart parameter A 3 、B 3 And B 4 Calculating
Figure BDA00030874281600000828
Mean, upper and lower limits of control charts:
Figure BDA00030874281600000829
Figure BDA00030874281600000830
the transaction running water monitoring time T is used as an abscissa (T is 1,2, \8230; T) at which
Figure BDA00030874281600000831
Control chart is drawn
Figure BDA00030874281600000832
Mean, upper and lower limits, and s is plotted in the s control chart t Mean, upper and lower limits.
Step 3.2: and (3) according to the calculation results of the average value, the upper limit and the lower limit of each control chart in the step 3.1, setting a warning signal of each preset time period in the transaction running water monitoring period, wherein the preset time period is a monitoring day, the warning signal is a binary system, a value of '1' represents that the transaction running water on the monitoring day has obvious change, and a value of '0' represents that the transaction running water on the monitoring day has no abnormity. In a control chart, each monitoring day comprises three warning signals, and whether the following three conditions occur on the monitoring day is counted respectively: (1) Monitoring that the value of the current day exceeds the upper limit and the lower limit of the control chart; (2) In nearly 8 days, the values of 8 continuous monitoring days are all positioned on the same side of the mean value; (3) In the last 6 days, the values of the continuous 6 monitoring days continuously increase or decrease. If yes, the value is 1.
And 4, step 4: the warning signal is processed into a signal characteristic.
The step 4 comprises the following steps: step 4.1: and respectively counting the transaction flow monitoring day number of three abnormal conditions of each type of control chart during the transaction flow monitoring, namely respectively counting the sum of each warning signal in each type of control chart during the transaction flow monitoring, and converting the warning signals into signal characteristics.
Step 4.2: in order to facilitate the interpretation of the abnormal state of the last monitoring day T, a signal characteristic is introduced to identify the general abnormal situation of the transaction running water of the last monitoring day, the value logic of the abnormal situation is the union operation of six warning signals generated by the monitoring day, namely, if any control chart of the monitoring day has one of the three situations in the step 3.2, the value is '1', otherwise, the value is '0'. As can be seen from the combination of the six signal signatures in step 4.1, each type of control map produces a total of seven signal signatures. Taking the I-MR control chart as an example, the signal characteristics of the I-MR control chart comprise: (1) I days in control chart that exceed the upper and lower limits of control chart; (2) I controls the number of days in the graph for which 8 consecutive monitoring days are on the same side of the mean; (3) I controls the number of days in which 6 consecutive monitoring days continue to rise or fall; (4) the number of days in the MR control chart which exceed the upper and lower limits of the control chart; (5) The MR control chart includes the number of days in which 8 consecutive monitoring days are on the same side of the mean; (6) The number of days of continuous rising or falling of 6 monitoring days in the MR control chart; and (7) monitoring whether the daily transaction flow is abnormal or not in the last step.
And 5: and integrating the signal characteristics, the conventional characteristics of credit audit data extraction and the default characteristics of overdue day data extraction to obtain three types of wind control evaluation samples.
The step 5 comprises the following steps: step 5.1: and integrating 23 groups of signal characteristics (23 multiplied by 7 in total) generated by the I-MR control chart with conventional characteristics refined from credit audit data and default characteristics refined from overdue days data to obtain an I-MR wind control evaluation sample. Step 5.2: will be composed of
Figure BDA0003087428160000091
8 groups of signal characteristics (totally 8 multiplied by 7) generated by the control chart are integrated with the conventional characteristics of credit auditing data extraction and the default characteristics of overdue days data extraction to obtain
Figure BDA0003087428160000092
And (4) evaluating a sample through wind control. Step 5.3: will be composed of
Figure BDA0003087428160000093
1 group of signal characteristics (total 1 × 7) generated by the control chart is integrated with conventional characteristics refined from credit audit data and default characteristics refined from overdue days data to obtain
Figure BDA0003087428160000094
And (4) evaluating a sample by wind control.
Step 6: for each type of wind control evaluation sample, respectively performing sample preprocessing and independent variable selection, establishing a machine learning model, evaluating the machine learning model result according to different wind control evaluation samples, and selecting the optimal wind control model, wherein the machine learning model is a logistic regression model.
The step 6 comprises the following steps: step 6.1: processing the missing values of the samples, checking the condition of the missing values in each feature, filling the missing values of the features which can be filled according to business meanings and the business meanings (if the transaction amount of a transaction flow line in a certain day is empty, the transaction amount can be filled to be 0), deleting the features of the features which cannot be filled according to the business meanings and have too high missing proportion, judging the data type of the features with small missing proportion, if the features are type variables, grouping the missing values into a group, and if the features are numerical type variables, filling the feature mean value.
Step 6.2: and processing the type variable, and converting the type variable in the sample into a value of 0-1 by virtue of the dummy variable.
Step 6.3: the independent variables are grouped based on principal component analysis, as follows,
and performing principal component analysis on all independent variables, selecting the most significant first principal component and second principal component, and dividing the variables into two groups A and B according to the magnitude relation of correlation coefficients of all the independent variables and the two principal components. For any argument, if the correlation coefficient between it and the first principal component is larger than that between it and the second principal component, it is classified into group A, otherwise it is classified into group B. For each group, the principal component analysis is again used to divide into two groups until one of the following conditions is met:
1) There is only one argument in the set;
2) The decision factor Ratio (R-Squared Ratio) of the independent variable x is reduced by more than half compared with the previous iteration result, and the decision factor Ratio R is reduced 2 The logic of calculation of-r (x) is
Figure BDA0003087428160000101
And when the current iteration is finished, all the independent variables are divided into n groups.
Figure BDA0003087428160000102
Represents the coefficient of determination obtained by fitting x to all independent variables of the i-th group by linear regression.
Figure BDA0003087428160000103
Represents the coefficient of determination obtained by fitting x to the m-th set of independent variables other than x by linear regression.
Step 6.4: the arguments are selected based on the information values, as follows,
calculating the information value of each independent variable, deleting the independent variables with the information value larger than 0.5, screening the independent variables of each class obtained in the step 6.3, and keeping the number of the independent variables under the condition of ensuring that at least one independent variable is in each class, wherein the number of the reserved independent variables is determined by the ratio of the information values of all the independent variables in the class to the total information value of all the classes, if the class i contains n i An argument, the information value of this class being M i All independent variables have information value M, and the selection process of the independent variables is to select the top n with large information value in the category i ×M i M arguments, and follows the rounding-up principle, where,the logic of the calculation of the information value is,
for a single independent variable, dividing the independent variable into K groups according to the value of the independent variable, and calculating the evidence weight of the group i according to the following formula
Figure BDA0003087428160000104
%default i Represents the percentage of default samples in the ith group in all default samples,% paid i Representing the proportion of the normal repayment samples in the ith group in all the normal repayment samples, and calculating the information value IV of the independent variable according to the following formula,
Figure BDA0003087428160000111
step 6.5: cutting the training sample and the test sample by using a random sampling method to keep the ratio of 6:4, and at the same time, the method of sampling the training samples repeatedly keeps the ratio of the non-default samples to the default samples in the ratio of 1:1, in the structure of (1).
Step 6.6: for each type of preprocessed sample and the sample with independent variable selected, respectively establishing a machine learning model for the training sample, and expressing the default probability p (X) = Pr (Y = 1Y X) as the independent variable X = (X) 1 ,...,X n ) As regards the dependent variable Y, i.e. a function of the violation features,
Figure BDA0003087428160000112
wherein, X i Represents the ith independent variable, beta i (i =0 to n) is a regression coefficient, Y i Representing the ith dependent variable and solving for the regression coefficient using maximum likelihood estimation, i.e. maximization
Figure BDA0003087428160000113
And (4) solving to obtain regression coefficients, then entering the expression of the default probability p (X), checking the significance of the equation and the significance of each independent variable, and determining the final variable combination and the regression coefficients thereof as a final machine learning model.
Step 6.7: and calculating the prediction result of the test sample in the final machine learning model, comparing the confusion matrix with the actual default characteristics, selecting an evaluation index according to a business target on the basis of the confusion matrix, evaluating three machine learning models of the three types of wind control evaluation samples, and selecting the optimal wind control model.
And 7: and (3) establishing a credit platform online machine learning model (such as a credit platform online logistic regression model) according to the optimal wind control model, performing real-time risk assessment on the applicant client, outputting a risk assessment result, periodically repeating the steps 1-6, importing a new client for training, and updating the credit platform online machine learning model.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description has described specific embodiments of the present invention. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (7)

1. A method for credit risk assessment based on control charts, comprising the steps of:
step 1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
and 2, step: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
and 3, step 3: converting the normalized initial transaction flow indicators into warning signals;
and 4, step 4: processing the warning signal into a signal signature;
and 5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
and 6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
and 7: establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result;
the step 2 comprises the following steps:
step 2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
step 2.2: standardizing the initial transaction flow characteristic indexes to obtain standardized initial transaction flow indexes;
the step 3 comprises the following steps:
step 3.1: aiming at the normalized initial transaction flow indexes, calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the indexes in each group;
step 3.2: according to the mean value, the upper limit and the lower limit of the multi-category control charts obtained in the step 3.1, a warning signal of a preset time period in the transaction flow monitoring period is formulated;
the step 4 comprises the following steps:
step 4.1: respectively counting the sum of each warning signal in each type of control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
step 4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
2. The control chart-based credit risk assessment method according to claim 1, wherein said preprocessing in step 1 comprises the steps of:
preprocessing transaction flow data: removing the transaction within a preset transaction amount range aiming at transaction running water;
credit audit data preprocessing step: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
pre-processing the data of overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
3. An exercise chart-based credit risk assessment system, to which the exercise chart-based credit risk assessment method according to any one of claims 1-2 is applied, comprising the following modules:
a module M1: collecting transaction flow data, credit auditing data and overdue days data of a loan-issued customer, and preprocessing various data to obtain conventional characteristics and default characteristics;
a module M2: aggregating and standardizing the preprocessed transaction flow data to obtain a standardized initial transaction flow index;
a module M3: converting the normalized initial transaction flow indicators into warning signals;
a module M4: processing the warning signal into a signal signature;
a module M5: integrating the signal characteristics, the conventional characteristics extracted from credit audit data and the default characteristics extracted from overdue days data to obtain a plurality of types of wind control evaluation samples;
a module M6: aiming at the wind control evaluation samples, establishing corresponding machine learning models, evaluating the machine learning model results according to different wind control evaluation samples, and selecting the optimal wind control model;
a module M7: and establishing an online machine learning model of the credit platform according to the optimal wind control model, performing real-time risk assessment on the application client, and outputting a risk assessment result.
4. The control chart-based credit risk assessment system according to claim 3, wherein said preprocessing in module M1 comprises the following modules:
transaction pipeline data preprocessing module: removing the transaction within a preset transaction amount range aiming at transaction running water;
credit audit data preprocessing module: rejecting all data records of the customer within a predetermined date range according to the life cycle of the customer credit;
the data preprocessing module of the overdue days: and (3) removing all data records of the clients within a preset overdue day range according to the overdue degree of the clients, and forming default characteristics according to the overdue days.
5. The control chart-based credit risk assessment system according to claim 3, wherein said module M2 comprises the following modules:
module M2.1: aggregating the preprocessed transaction flow data into initial transaction flow characteristic indexes, and grouping the initial transaction flow characteristic indexes according to multiple types of control charts;
module M2.2: and standardizing the initial transaction running characteristic indexes to obtain the standardized initial transaction running indexes.
6. The control chart-based credit risk assessment system according to claim 5, wherein said module M3 comprises the following modules:
module M3.1: calculating the mean value, the upper limit and the lower limit of a control chart corresponding to the index in each group aiming at the normalized initial transaction flow indexes;
module M3.2: and formulating a warning signal of a preset time period in the transaction flow monitoring period according to the mean value, the upper limit and the lower limit of the various control charts obtained by the module M3.1.
7. The control chart-based credit risk assessment system according to claim 6, wherein said module M4 comprises the following modules:
module M4.1: respectively counting the sum of each warning signal in each control chart during transaction flow monitoring, and converting the warning signals into signal characteristics;
module M4.2: in order to facilitate the interpretation of the abnormal state of the last predetermined time period, a signal characteristic is introduced to identify the transaction flow general abnormal condition of the last predetermined time period.
CN202110584049.6A 2021-05-27 2021-05-27 Credit risk assessment method and system based on control chart Active CN113421154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110584049.6A CN113421154B (en) 2021-05-27 2021-05-27 Credit risk assessment method and system based on control chart

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110584049.6A CN113421154B (en) 2021-05-27 2021-05-27 Credit risk assessment method and system based on control chart

Publications (2)

Publication Number Publication Date
CN113421154A CN113421154A (en) 2021-09-21
CN113421154B true CN113421154B (en) 2022-10-04

Family

ID=77713100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110584049.6A Active CN113421154B (en) 2021-05-27 2021-05-27 Credit risk assessment method and system based on control chart

Country Status (1)

Country Link
CN (1) CN113421154B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793060A (en) * 2021-09-27 2021-12-14 武汉众邦银行股份有限公司 Customer rating method and device based on customer transaction data and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346749A (en) * 2013-08-07 2015-02-11 辅富投资(上海)有限公司 Pledge-based network borrowing process monitoring method
CN110738564A (en) * 2019-10-16 2020-01-31 信雅达系统工程股份有限公司 Post-loan risk assessment method and device and storage medium
CN111507831A (en) * 2020-05-29 2020-08-07 长安汽车金融有限公司 Credit risk automatic assessment method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101421756A (en) * 2006-02-10 2009-04-29 芝加哥气候交易公司 Present valuation of emission credit and allowance futures
US10977654B2 (en) * 2018-06-29 2021-04-13 Paypal, Inc. Machine learning engine for fraud detection during cross-location online transaction processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346749A (en) * 2013-08-07 2015-02-11 辅富投资(上海)有限公司 Pledge-based network borrowing process monitoring method
CN110738564A (en) * 2019-10-16 2020-01-31 信雅达系统工程股份有限公司 Post-loan risk assessment method and device and storage medium
CN111507831A (en) * 2020-05-29 2020-08-07 长安汽车金融有限公司 Credit risk automatic assessment method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于质量管理工具的交行包头分行信贷风险管理研究;栗秋佳;《中国优秀博硕士学位论文全文数据库(硕士)经济与管理科学辑》;20120215;文章第2-4章 *

Also Published As

Publication number Publication date
CN113421154A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
US11599939B2 (en) System, method and computer program for underwriting and processing of loans using machine learning
KR102009309B1 (en) Management automation system for financial products and management automation method using the same
EP1361526A1 (en) Electronic data processing system and method of using an electronic processing system for automatically determining a risk indicator value
CN110895758B (en) Screening method, device and system for credit card account with cheating transaction
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN112598500A (en) Credit processing method and system for non-limit client
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN111738819A (en) Method, device and equipment for screening characterization data
CN111709826A (en) Target information determination method and device
CN107392217B (en) Computer-implemented information processing method and device
CA2845645A1 (en) In the market model systems and methods
CN107133862A (en) Dynamic produces the method and system of the detailed transaction payment experience of enhancing credit evaluation
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN113421154B (en) Credit risk assessment method and system based on control chart
CN112508689A (en) Method for realizing decision evaluation based on multiple dimensions
CN117252677A (en) Credit line determination method and device, electronic equipment and storage medium
CN115545469A (en) Credit risk management and evaluation method and system based on electric power retail market
JP7344609B2 (en) Data quantification method based on confirmed and estimated values
Niknya et al. Financial distress prediction of Tehran Stock Exchange companies using support vector machine
CN113822751A (en) Online loan risk prediction method
CN114626940A (en) Data analysis method and device and electronic equipment
CN113807943A (en) Multi-factor valuation method, system, medium and equipment for bad assets
Zeng A comparison study on the era of internet finance China construction of credit scoring system model
CN117764692A (en) Method for predicting credit risk default probability
CN118071483A (en) Method for constructing retail credit risk prediction model and personal credit business Scorepsi model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant