CN116720946A - Credit risk prediction method, device and storage medium based on recurrent neural network - Google Patents

Credit risk prediction method, device and storage medium based on recurrent neural network Download PDF

Info

Publication number
CN116720946A
CN116720946A CN202310537029.2A CN202310537029A CN116720946A CN 116720946 A CN116720946 A CN 116720946A CN 202310537029 A CN202310537029 A CN 202310537029A CN 116720946 A CN116720946 A CN 116720946A
Authority
CN
China
Prior art keywords
month
credit risk
input
span
tested
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310537029.2A
Other languages
Chinese (zh)
Inventor
黄开胜
张有容
袁宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze Delta Region Institute of Tsinghua University Zhejiang
Original Assignee
Yangtze Delta Region Institute of Tsinghua University Zhejiang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze Delta Region Institute of Tsinghua University Zhejiang filed Critical Yangtze Delta Region Institute of Tsinghua University Zhejiang
Priority to CN202310537029.2A priority Critical patent/CN116720946A/en
Publication of CN116720946A publication Critical patent/CN116720946A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The credit risk prediction method, the credit risk prediction device and the storage medium based on the recurrent neural network provided by the disclosure comprise the following steps: traversing the basic month T, the input month span M and the predicted month span N in the value range aiming at all the historical operation data which can be acquired by the object to be detected to obtain a sample set, wherein each sample comprises the selected input month span, the predicted month span, the input set of the object to be detected and the corresponding risk rating; constructing a credit risk prediction model based on a cyclic neural network, training the model by using a sample set to determine the mapping relation between an input set and a risk rating under any input month span and predicted month span, and outputting a prediction result; and determining the credit risk condition of the object to be tested by using the trained credit risk prediction model. The risk prediction method and the risk prediction system can predict risks in advance and scientifically and accurately label, fully consider the time sequence of input data, and flexibly adjust the span of input and prediction months.

Description

Credit risk prediction method, device and storage medium based on recurrent neural network
Technical Field
The disclosure belongs to the technical field of data processing, and particularly relates to a credit risk prediction method, a credit risk prediction device and a storage medium based on a recurrent neural network.
Background
Monitoring the credit condition of the object to be tested is particularly important in a wind control scene, and the possibility of future occurrence risk of the object to be tested is predicted according to the operation data of the object to be tested in a period of time, so that the monitoring party can be facilitated to screen in advance, take medicine for symptoms, avoid the risk and improve the income.
The existing monitoring schemes mainly have two kinds: based on expert experience evaluation, and based on decision trees or linear classifiers.
The problems of the monitoring scheme based on expert experience evaluation are:
1. the efficiency is low: the arrangement work flow of a large number of data reports is complicated, the error rate is high, and the credit risk needs to be manually ground and judged, so that time and labor are wasted.
2. Hysteresis was evaluated: at present, synchronous indexes are often adopted to evaluate credit risks of users to be tested, and when some core indexes (financial data, repayment, inventory, credit investigation and the like) are abnormal, the risks are synchronous, so that a pre-judging target cannot be realized.
3. The subjectivity is strong: the credit risk evaluation system established according to expert experience is extremely dependent on expert experience, potential connection behind data cannot be mined, and objectivity is not enough.
The problems with decision tree or linear classifier based monitoring schemes are:
1. poor timing: for time sequence operation data of an object to be tested, potential risks cannot be mined in a correlated mode according to the change of data of a plurality of months before and after, and different importance degrees cannot be given according to the distance of data time.
2. The flexibility is poor: the span of input and predicted months cannot be flexibly changed, and the original algorithm structure needs to be thoroughly changed once the original month span needs to be prolonged.
Disclosure of Invention
The present disclosure is directed to solving at least one of the technical problems existing in the prior art.
Therefore, the credit risk prediction method based on the recurrent neural network provided by the embodiment of the first aspect of the present disclosure can predict the possible credit risk condition of the object to be measured after a plurality of months according to the historical operation data of the object to be measured in a period of time, and mark a specific risk level. The credit risk prediction method based on the scheme can predict risk in advance and scientifically and accurately label, fully considers the time sequence of input data, and can flexibly adjust the span of input and prediction months.
An embodiment of a first aspect of the present disclosure provides a credit risk prediction method based on a recurrent neural network, including:
Setting any month as a reference month T, preprocessing the historical operation data of the object to be tested in the reference month and M-1 month before the reference month and N month after the reference month, forming an input set x by the preprocessed reference month and the historical operation data of the object to be tested in the previous M-1 month, and manually evaluating the historical operation data of the object to be tested in the last N months after the preprocessed reference month to obtain the risk rating of the object to be tested when the predicted month span is NTraversing the basic month T, the input month span M and the predicted month span N in the value range to obtain sample sets, wherein each sample comprises the selected input month span, the predicted month span, an input set of an object to be tested and a corresponding risk rating, and dividing the sample sets into training sets and test sets according to a set proportion;
constructing a credit risk prediction model based on a cyclic neural network, and training the credit risk prediction model by utilizing a training set to determine an input set x and a risk rating under any input month span M and predicted month span NThe mapping relation between the two is used for outputting a prediction result y, and the trained cyclic neural network is tested on a test set to obtain a trained credit risk prediction model;
The input month span is set arbitrarily according to the demand of the monitoring partyAnd forecast month span +.>Acquiring the current month and the previous +.>The operation data of one month is used as the data to be predicted, and the data is preprocessed to obtain the object to be detected +.>Input set for month->The input set +.>Inputting a trained credit risk prediction model to obtain +.>And (5) predicting the credit risk situation y of the month, thereby determining the credit risk situation of the object to be tested.
In some embodiments, preprocessing the operational data includes sequentially integrating, cleaning, complementing, normalizing, and PCA dimension reduction of the operational data.
In some embodiments, the normalization process employs a Z-score normalization process.
In some embodiments, the risk rating of the object under test for a predicted month span of N is obtained as follows
Calendar of object to be measured N months after reference month after pretreatmentThe history operation data constitutes a rating set x' = { x n },n=[1,2,…,N],x n Historical operation data representing an object to be tested in the nth month after the reference month;
for monthly historical operational data x in the rating set x n Respectively performing manual rating, and dividing the monthly credit risk level according to the rating result The method is characterized by comprising the following steps of low risk, medium risk and high risk:
selecting a plurality of core indexes from various indexes in the rating set x ', respectively setting a step threshold for each core index, and aiming at historical operation data x in the rating set x' n When the historical operation data x n When the core index of the model exceeds the corresponding stepwise threshold, accumulating corresponding scores on the total risk value of the nth month after the reference month respectively; when the total risk value of the nth month after the reference month belongs to the first interval, the monthly credit risk level is recordedIs a low risk; when the total risk value of the nth month after the reference month belongs to the second interval, the monthly credit risk level +.>Is a medium risk; when the total risk value of the nth month after the reference month belongs to the third interval, the monthly credit risk level +.>Is a high risk; taking the overall risk rating of N months +.>Maximum in the monthly credit risk rating for N months after the benchmark month, i.e. +.>
In some embodiments, traversing the reference month T, the input month span M, and the predicted month span N over the range of values T, M, N requires that the following 3 conditions be satisfied simultaneously:
(1) the data acquisition window does not exceed the beginning, ending, year and month of all the existing operation data of the object to be tested, namely, the data acquisition window meets the following conditions: T-M+1 is greater than or equal to M start ,T+N≤M end ,M start For the initial year and month of all the operating data existing for the object to be tested, M end The method comprises the steps of terminating the year and month of all operation data existing for an object to be tested;
②.M∈[5,13],N∈[4,12];
(3) the input month span M is larger than the predicted month span N, namely, the following conditions are satisfied: m > N.
In some embodiments, the recurrent neural network has 2 hidden layers, each layer having 128 neurons, activated using a Softmax function.
In some embodiments, the weight U, W, V of the recurrent neural network is updated by minimizing the loss function using the mean square error as the loss function when training the recurrent neural network; wherein U represents an input weight value input to the hidden layer and is used for preprocessing an input set x input to the cyclic neural network; v represents a circulation weight, which is used for sequentially acting on the intermediate result of each month output by the hidden layer according to the distance between the operation data of each month in the input set x and the reference month, and transmitting the result to the next circulation until all the months in the input set x are circulated; w represents the output weight from the hidden layer to the output, and is used for processing the final result of the last month to obtain the predicted result y.
A credit risk prediction apparatus based on a recurrent neural network provided in an embodiment of a second aspect of the present disclosure includes:
The preprocessing module is used for preprocessing the current month and the previous month of the object to be detectedPreprocessing operation data of one month to obtain +.>Input set for month->
A credit risk prediction module, in which a trained credit risk prediction model is configured for determining the target to be testedInput set for month->Get +.>Prediction result y of monthly credit risk condition;
the trained credit risk prediction model is obtained according to the following steps:
setting any month as a reference month T, preprocessing the historical operation data of the object to be tested in the reference month and M-1 month before the reference month and N month after the reference month, forming an input set x by the preprocessed reference month and the historical operation data of the object to be tested in the previous M-1 month, and manually evaluating the historical operation data of the object to be tested in the last N months after the preprocessed reference month to obtain the risk rating of the object to be tested when the predicted month span is NTraversing the basic month T, the input month span M and the predicted month span N in the value range to obtain sample sets, wherein each sample comprises the selected input month span, the predicted month span, an input set of an object to be tested and a corresponding risk rating, and dividing the sample sets into training sets and test sets according to a set proportion;
Constructing a credit risk prediction model based on a cyclic neural network, and training the credit risk prediction model by utilizing a training set to determine whether any one is in progressInputting a set x and risk rating under the conditions of inputting month span M and predicting month span NAnd outputting a prediction result y, and testing the trained cyclic neural network on a test set to obtain a trained credit risk prediction model.
Embodiments of the third aspect of the present disclosure provide a computer-readable storage medium storing computer instructions for causing the computer to perform the cyclic neural network-based credit risk prediction method according to any one of the embodiments of the first aspect of the present disclosure.
The present disclosure has the following features and beneficial effects:
1. the risk is predicted in advance. The algorithm can predict the possible credit risk of the object to be measured in advance for a plurality of months according to the operation data of the object to be measured for a period of time.
2. Scientific and accurate labeling. The algorithm can classify risks of the object to be detected within a plurality of months and give corresponding probabilities of different risk classes.
3. And (5) data time domain association. The cyclic neural network has time sequence, so that the input operation data of the object to be tested can be associated in time sequence, abnormal operation data can be mined, different judgment weights are given in time, and credit risks can be accurately predicted.
4. The span is flexible to change. The cyclic neural network has the characteristic that the data with the same structure are sequentially input according to the time axis, so that the algorithm can flexibly and freely adjust the input month span and the output month span under the condition of not changing the original network structure.
Drawings
Fig. 1 is an overall flowchart of a credit risk prediction method provided by an embodiment of a first aspect of the present disclosure.
Fig. 2 is a flow chart of generating a sample set using historical data and training and testing a credit risk prediction model provided by an embodiment of a first aspect of the present disclosure.
Fig. 3 is a schematic diagram of the structure of a credit risk prediction model provided by an embodiment of the first aspect of the disclosure.
Fig. 4 is a flowchart of integrating data to be predicted provided by an embodiment of the first aspect of the present disclosure, and utilizing a credit risk prediction model for credit risk prediction.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of a third aspect of the present disclosure.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
On the contrary, the application is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the application as defined by the appended claims. Further, in the following detailed description of the present application, certain specific details are set forth in order to provide a better understanding of the present application. The present application will be fully understood by those skilled in the art without the details described herein.
Referring to fig. 1, a credit risk prediction method based on a recurrent neural network provided in an embodiment of a first aspect of the present disclosure includes:
step S1, sample set construction
Setting any month as a reference month T, preprocessing the historical operation data of the object to be tested in the reference month and M-1 month before the reference month and N month after the reference month, forming an input set x by the preprocessed reference month and the historical operation data of the object to be tested in the previous M-1 month, and manually evaluating the historical operation data of the object to be tested in the last N months after the preprocessed reference month to obtain the risk rating of the object to be tested when the predicted month span is NTraversing the reference month T, the input month span M and the predicted month span N within the value range to obtain The system comprises a sample set, a test set and a test set, wherein each sample comprises a selected input month span, a predicted month span, an input set of an object to be tested and a corresponding risk rating, and the sample set is divided into the training set and the test set according to a set proportion;
s2, credit risk prediction model construction and training
Constructing a credit risk prediction model based on a cyclic neural network, and training the credit risk prediction model by utilizing a training set to determine an input set x and a risk rating under any input month span M and predicted month span NThe mapping relation between the two is output, a prediction result y is output, the trained cyclic neural network is tested on a test set, network parameters with optimal comprehensive performance are saved, and a trained credit risk prediction model is obtained;
step S3, credit risk prediction
The input month span is set arbitrarily according to the demand of the monitoring partyAnd forecast month span +.>Acquiring the current month and the previous +.>The operation data of one month is used as the data to be predicted, and the data is preprocessed to obtain the object to be detected +.>Input set for month->The input set +.>Inputting trained creditRisk prediction model, obtaining +. >And (5) predicting the credit risk situation y of the month, thereby determining the credit risk situation of the object to be tested.
In some embodiments, referring to fig. 2, step S1 specifically includes the steps of:
step S101, acquiring historical operation data of an object to be detected and preprocessing the historical operation data
Setting any month as a reference month T, acquiring historical operation data of an object to be tested in the reference month and M-1 months before the reference month and N months after the reference month, and sequentially carrying out pretreatment such as integration, cleaning, completion, standardization, PCA dimension reduction and the like on the data, wherein the pretreated reference month and the historical operation data of the object to be tested in the previous M-1 months form an input set x= { x -m },m=[0,1,…,M-1],x -m The historical operation data of the object to be measured in the m month before the reference month is represented, the historical operation data of the object to be measured in the reference month is represented when m is taken to be 0, and the larger the value of m is, the farther the distance from the reference month is represented; the historical operation data of the object to be tested in N months after the reference month after the pretreatment is formed into a rating set x' = { x n },n=[1,2,…,N],x n The historical operation data of the object to be measured in the nth month after the reference month is represented, and the larger the value of n is, the farther the object to be measured is from the reference month is represented.
Further, the step of preprocessing the historical operation data of the object to be tested is as follows:
Integration: reading and combining historical operation data such as finance, repayment, inventory, credit investigation and the like of the object to be tested;
cleaning: detecting and cleaning (namely deleting) data with wrong format and wrong value range in the historical operation data of the object to be detected;
complement: null values in historical operation data of an object to be detected are complemented by adopting a processing mode of zero padding, taking a month value or taking an average value of two adjacent months according to respective data formats;
standardization: calculating the mean value and the variance of the historical operation data of the integrated, cleaned and complemented object to be tested according to the columns, and performing Z-score standardization processing to obtain standardized historical operation data, wherein the orders of the data in each column are unified through the standardization processing, so that the subsequent training process can be converged more quickly, and the model prediction accuracy is improved;
PCA dimension reduction: and recombining indexes with correlation in the standardized historical operation data to form a group of new mutually independent comprehensive indexes, selecting the first K main components for dimension reduction according to the requirement that the information utilization rate is more than 85%, storing the obtained PCA parameters, and obtaining the preprocessed historical operation data of the object to be tested in each month.
Step S102, determining a credit risk rating of N months after a reference month of the object to be measured
According to the rating set x 'obtained in the step S101, the monthly historical operation data x in the rating set x' is compared with the historical operation data x n Respectively performing manual rating, and dividing the monthly credit risk level according to the rating resultLow risk, medium risk, high risk third gear, n= [1,2, …, N]The method comprises the steps of carrying out a first treatment on the surface of the The manual rating method comprises the following specific processes:
selecting a plurality of core indexes (such as financial data, repayment records, inventory, credit and inventory audit values and the like of month of an object to be tested) from various indexes in a rating set x', respectively setting a step threshold for each core index, and aiming at historical operation data x in the rating set x n When the historical operation data x n When the core index of the model exceeds the corresponding stepwise threshold, accumulating corresponding scores on the total risk value of the nth month after the reference month respectively; when the total risk value of the nth month after the reference month is 1-4, the monthly credit risk level is recordedIs a low risk; when (when)When the total risk value of the nth month after the reference month is 5, the monthly credit risk level is recorded +.>Is a medium risk; when the total risk value of the nth month after the reference month is 6-9, the monthly credit risk level is recorded +. >Is a high risk;
obtaining a monthly credit risk level N months after the baseline monthAfter this, take the overall risk rating +.>Maximum in the monthly credit risk rating for N months after the benchmark month, i.e. +.>
Step S103, constructing and dividing the sample set
Steps S101 and S102 describe that under the values of the selected reference month T, the input month span M, and the predicted month span N, the object to be measured inputs the set x and the risk ratingIs generated by the generation process of (a). Traversing T, M, N in the self value range, and repeating the operations of the step S101 and the step S102 to generate a complete sample set; in the traversal, the T, M, N value needs to satisfy the following 3 conditions simultaneously:
(1) the data acquisition window does not exceed the beginning, ending, year and month of all the existing operation data of the object to be tested, namely, the data acquisition window meets the following conditions: T-M+1 is greater than or equal to M start ,T+N≤M end ,M start For the initial year and month of all operation data existing in the object to be measured (the earliest month corresponding to all operation data existing in the object to be measured), M end All operations already existing for the object to be testedThe expiration year and month of the data (the latest month corresponding to all the operation data existing for the object to be tested);
(2) the month span has a fixed value range, M is 5 and 13, and N is 4 and 12;
(3) inputting month span M larger than predicted month span N, M > N;
Obtaining sample sets of the object to be tested under different reference months, input month spans and predicted month spans according to the conditions, wherein each sample in the sample sets comprises a selected input month span M, a predicted month span N, an input set x of the object to be tested and a corresponding risk rating
The sample set is randomly divided according to the number of samples, 80% of samples are used as a training set for training a subsequent credit risk prediction model, and 20% of samples are used as a test set for verifying training effects.
In some embodiments, referring to fig. 2, step S2 specifically includes:
step S201, constructing a credit risk prediction model
The credit risk prediction model constructed by the embodiment of the present disclosure has 2 hidden layers, each layer having 128 neurons, is constructed based on the recurrent neural network RNN, and is activated using the Softmax function, and these parameters (i.e., the number of hidden layers, the number of neurons per layer, and the type of activation function) are determined by repeating steps S202 and S203. Fig. 3 is a schematic diagram of a credit risk prediction model, where x represents an input set, y represents a prediction result, h represents a hidden layer, and M, N represents an input month span and a predicted month span, respectively. U, V and W are model parameters to be determined, specifically, U represents an input weight input to the hidden layer, and is used for preprocessing an input set x input to the recurrent neural network; v represents a circulation weight, which is used for sequentially acting on the intermediate result of each month output by the hidden layer according to the distance between the operation data of each month in the input set x and the reference month, and transmitting the result to the next circulation until all the months in the input set x are circulated; w represents the output weight from the hidden layer to the output, and is used for processing the final result of the last month (namely the reference month) to obtain the network output. On the final prediction result, the cyclic mechanism enables the input data to enjoy different influence weights according to time and time, and the time relevance is displayed.
Step S202, training a credit risk prediction model:
in training, each training sample has a determined input month span M and a predicted month span N, which are input as parameters to a credit risk prediction model. The input set x contained in this piece of data gives the prediction result y and the probability of each risk level after model operation. The training purpose is to enable the prediction result y of the credit risk prediction model to be rated with the risk by a supervised learning methodAs consistent as possible. For this purpose, in this embodiment, the average mean square error is used as the loss function, and the weight U, W, V of the credit risk prediction model is updated by minimizing the loss function.
The batch size of the training process was 128 and the maximum training round number was 1000. After training is completed, the RNN network parameters U, W, V are saved.
Step S203, test credit risk prediction model
Initializing a credit risk prediction model by using the RNN parameters stored in the step S202, selecting any sample in a test set, determining month span parameters M and N, inputting data x to be predicted, storing a prediction result y given by the model and the probability of each risk level, and comparing the prediction result y with the risk level Is a difference in (a) between the two. And evaluating the prediction accuracy of the RNN according to a series of indexes such as accuracy, medium-high risk detection rate and the like. If the accuracy does not reach the standard, the super parameters such as the RNN network structure, the batch size, the training round number and the like are debugged until the prediction accuracy meets the requirement. Selecting the RNN parameter with highest medium-high risk detection rate from the RNN parameters with the accuracy rate of 3, and processing the RNN parameterAnd saving the optimal parameters to obtain the trained credit risk prediction model.
In some embodiments, referring to fig. 4, step S3 specifically includes:
s301, obtaining data to be predicted and preprocessing
The input month span is set arbitrarily according to the demand of the monitoring partyAnd forecast month span +.>Acquiring the current month and the previous +.>Operational data for one month, here +.>And->The values of M and N in step S103 are also satisfied.
After integrating, cleaning, and complementing the data in the same flow as in step S101, the data is normalized using the normalization parameters stored in step S101. Then, PCA dimension reduction is carried out on the data according to the PCA parameters output in the step S101, and an input set to be predicted is obtained
S302, completing credit risk prediction by using credit risk prediction model
Initializing the neural network by using the RNN parameters saved in step S202, and inputting an input set to be predictedThe credit risk prediction model can be based on the approach +.>Input set of monthly operating conditions->Predicting future->And (5) the credit risk level of each month, displaying the corresponding probability of each level, and outputting the result in an Excel table form.
And (3) screening the individuals with medium-high credit risk levels from the objects to be tested by using the risk prediction result, so as to continuously track the individuals.
Validity verification of embodiments of the present disclosure:
according to the embodiment of the disclosure, validity verification of a credit risk prediction model is carried out on 2 objects to be tested of a dealer and a group according to 2 different month spans, and specific parameter conditions of the 4 groups of tests are shown in table 1. Wherein, for each object to be measured,
table 1 specific parameters of each test group
Validity is tested by the performance index of the credit risk prediction model on the test set as shown in table 2.
Table 2 test index for each test group
In Table 2, indices 1-10 are direct indices that can be directly passed through the prediction result y and the tag valueThe statistics show that the indexes 3, 5, 7, 9 and 10 are all parameters determined manually (the index 9. The under-estimation of the risk samples in the middle is the number of samples which are manually determined to be medium risk but the prediction result is low risk; the index 10. The under-estimation of the risk samples in the high risk is the number of samples which are manually determined to be high risk but the prediction result is low risk and medium risk, the indexes 9 and 10 can reflect the neglect degree of the trained credit risk prediction model on the determined risk object, the smaller the values of the indexes 9 and 10 are, the better the prediction effect of the credit risk prediction model is, and the indexes 4, 6 and 8 are parameters determined by the trained credit risk prediction model. The indexes 11-16 are indirect indexes, can be obtained through the association operation of the direct indexes, and can intuitively reflect the prediction effect of the credit risk prediction model. The risk upgrading in the index means that the object to be tested has the improvement of credit risk level in the current reference month, and the object to be tested needs to be focused on.
It can be derived from the index of the table that the embodiment of the disclosure has a total accuracy rate of more than 98%, a risk detection rate of more than 98%, a high risk detection rate of more than 92%, and a risk upgrading detection rate of more than 90% for the 2 sets of month spans of the 2 objects to be tested. Meanwhile, the medium risk underestimation rate of the sample is lower than 2%, and the high risk underestimation rate is lower than 8%. Overall, there is good accuracy and predictive effect. Furthermore, the validity verification shows that setting different month spans has little effect on the model performance. Under different month spans, the total accuracy, the risk detection rate in the process, the high risk detection rate and the risk upgrading detection rate of each test group in table 2 can be kept above 90%, and the risk underestimation rate in the process and the high risk underestimation rate can be controlled within 8%. Thus, the input month span M and the predicted month span N can be set according to actual demands under defined conditions.
A credit risk prediction apparatus based on a recurrent neural network provided in an embodiment of a second aspect of the present disclosure includes:
the preprocessing module is used for preprocessing the current month and the previous month of the object to be detectedPreprocessing operation data of one month to obtain +. >Input set for month->
A credit risk prediction module, in which a trained credit risk prediction model is configured for determining the target to be testedInput set for month->Get +.>Prediction of monthly credit risk situation y.
In some embodiments, the preprocessing module is configured to determine the current month and the previous month of the object to be testedPreprocessing operation data of a month comprises the following steps: and (3) after sequentially integrating, cleaning and complementing operation data of each month, normalizing the data by using the normalization parameters stored in the step (S101), and then performing PCA dimension reduction on the data according to the PCA parameters output in the step (S101).
To achieve the above embodiments, the embodiments of the present disclosure further provide a computer-readable storage medium having stored thereon a computer program to be executed by a processor for performing the cyclic neural network-based credit risk prediction method provided by the embodiments of the first aspect of the present disclosure.
Referring now to fig. 5, a schematic diagram of an electronic device suitable for implementing embodiments of the third aspect of the present disclosure is shown. Among them, the electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), and the like, and fixed terminals such as digital TVs, desktop computers, servers, and the like. The electronic device shown in fig. 5 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 5, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 101, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 102 or a program loaded from a storage means 108 into a Random Access Memory (RAM) 103. In the RAM 103, various programs and data required for the operation of the electronic device are also stored. The processing device 101, ROM 102, and RAM 103 are connected to each other by a bus 104. An input/output (I/O) interface 105 is also connected to bus 104.
In general, the following devices may be connected to the I/O interface 105: input devices 106 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, etc.; an output device 107 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 108 including, for example, magnetic tape, hard disk, etc.; and a communication device 109. The communication means 109 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, the present embodiment includes a computer program product comprising a computer program loaded on a computer readable medium, the computer program comprising program code for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 109, or from the storage means 108, or from the ROM 102. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 101.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:
setting any month as a reference month T, preprocessing the historical operation data of the object to be tested in the reference month and M-1 month before the reference month and N month after the reference month, forming an input set x by the preprocessed reference month and the historical operation data of the object to be tested in the previous M-1 month, and manually evaluating the historical operation data of the object to be tested in the last N months after the preprocessed reference month to obtain the risk rating of the object to be tested when the predicted month span is NTraversing the basic month T, the input month span M and the predicted month span N in the value range to obtain sample sets, wherein each sample comprises the selected input month span, the predicted month span, an input set of an object to be tested and a corresponding risk rating, and dividing the sample sets into training sets and test sets according to a set proportion;
constructing a credit risk prediction model based on a cyclic neural network, and training the credit risk prediction model by utilizing a training set to determine an input set x and a risk rating under any input month span M and predicted month span N The mapping relation between the two is output, a prediction result y is output, the trained cyclic neural network is tested on a test set, network parameters with optimal comprehensive performance are saved, and a trained credit risk prediction model is obtained;
the input month span is set arbitrarily according to the demand of the monitoring partyAnd forecast month span +.>Acquiring the current month and the previous +.>The operation data of one month is used as the data to be predicted, and the data is preprocessed to obtain the object to be detected +.>Input set for month->The input set +.>Inputting a trained credit risk prediction model to obtain +.>And (5) predicting the credit risk situation y of the month, thereby determining the credit risk situation of the object to be tested.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++, python and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that implementing all or part of the steps carried by the method of the above embodiments may be accomplished by a program to instruct related hardware and the developed program may be stored in a computer readable storage medium, which when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (9)

1. A method for credit risk prediction based on a recurrent neural network, comprising:
setting any month as a reference month T, preprocessing the historical operation data of the object to be tested in the reference month and M-1 month before the reference month and N month after the reference month, forming an input set x by the preprocessed reference month and the historical operation data of the object to be tested in the previous M-1 month, and manually evaluating the historical operation data of the object to be tested in the last N months after the preprocessed reference month to obtain the risk rating of the object to be tested when the predicted month span is NTraversing the basic month T, the input month span M and the predicted month span N in the value range to obtain sample sets, wherein each sample comprises the selected input month span, the predicted month span, an input set of an object to be tested and a corresponding risk rating, and dividing the sample sets into training sets and test sets according to a set proportion;
Constructing a credit risk prediction model based on a cyclic neural network, and training the credit risk prediction model by utilizing a training set to determine an input set x and a risk assessment under any input month span M and predicted month span NStageThe mapping relation between the two is used for outputting a prediction result y, and the trained cyclic neural network is tested on a test set to obtain a trained credit risk prediction model;
the input month span is set arbitrarily according to the demand of the monitoring partyAnd forecast month span +.>Acquiring the current month and the previous +.>The operation data of one month is used as the data to be predicted, and the data to be predicted is preprocessed to obtain the object to be detectedInput set for month->The input set +.>Inputting a trained credit risk prediction model to obtain +.>And (5) predicting the credit risk situation y of the month, thereby determining the credit risk situation of the object to be tested.
2. The credit risk prediction method according to claim 1, wherein the preprocessing of the operation data includes sequentially integrating, cleaning, complementing, normalizing and PCA dimension reduction processing of the operation data.
3. The credit risk prediction method according to claim 2, wherein the normalization process employs a Z-score normalization process.
4. The credit risk prediction method according to claim 1, wherein the risk rating of the object to be measured when the predicted month span is N is obtained by the steps of
The historical operation data of the object to be tested in N months after the reference month after the pretreatment is formed into a rating set x' = { x n },n=[1,2,…,N],x n Historical operation data representing an object to be tested in the nth month after the reference month;
for monthly historical operational data x in the rating set x n Respectively performing manual rating, and dividing the monthly credit risk level according to the rating resultThe method is characterized by comprising the following steps of low risk, medium risk and high risk:
selecting a plurality of core indexes from various indexes in the rating set x ', respectively setting a step threshold for each core index, and aiming at historical operation data x in the rating set x' n When the historical operation data x n When the core index of the model exceeds the corresponding stepwise threshold, accumulating corresponding scores on the total risk value of the nth month after the reference month respectively; when the total risk value of the nth month after the reference month belongs to the first interval, the monthly credit risk level is recordedIs a low risk; when the total risk value of the nth month after the reference month belongs to the second interval, the monthly credit risk level +. >Is a medium risk; when the total risk value of the nth month after the reference month belongs to the third interval, the monthly credit risk level +.>Is a high risk; taking the overall risk rating of N months +.>Maximum in the monthly credit risk rating for N months after the benchmark month, i.e. +.>
5. The credit risk prediction method according to claim 1, wherein the traversal of the reference month T, the input month span M and the predicted month span N within the value range is performed, and the value of T, M, N is required to satisfy the following 3 conditions simultaneously:
(1) the data acquisition window does not exceed the beginning, ending, year and month of all the existing operation data of the object to be tested, namely, the data acquisition window meets the following conditions: T-M+1 is greater than or equal to M start ,T+N≤M end ,M start For the initial year and month of all the operating data existing for the object to be tested, M end The method comprises the steps of terminating the year and month of all operation data existing for an object to be tested;
②.M∈[5,13],N∈[4,12];
(3) the input month span M is larger than the predicted month span N, namely, the following conditions are satisfied: m > N.
6. The credit risk prediction method according to claim 1, wherein the recurrent neural network has 2 hidden layers, each layer having 128 neurons, activated using a Softmax function.
7. The credit risk prediction method according to claim 1, characterized in that when training the recurrent neural network, the average mean square error is adopted as a loss function, and the weight U, W, V of the recurrent neural network is updated by minimizing the loss function; wherein U represents an input weight value input to the hidden layer and is used for preprocessing an input set x input to the cyclic neural network; v represents a circulation weight, which is used for sequentially acting on the intermediate result of each month output by the hidden layer according to the distance between the operation data of each month in the input set x and the reference month, and transmitting the result to the next circulation until all the months in the input set x are circulated; w represents the output weight from the hidden layer to the output, and is used for processing the final result of the last month to obtain the predicted result y.
8. A cyclic neural network-based credit risk prediction apparatus, comprising:
the preprocessing module is used for preprocessing the current month and the previous month of the object to be detectedPreprocessing operation data of one month to obtain +.>Input set for month->
A credit risk prediction module, in which a trained credit risk prediction model is configured for determining the target to be testedInput set for month->Get +.>Prediction result y of monthly credit risk condition;
the trained credit risk prediction model is obtained according to the following steps:
setting any month as a reference month T, preprocessing the historical operation data of the object to be tested in the reference month and M-1 month before the reference month and N month after the reference month, forming an input set x by the preprocessed reference month and the historical operation data of the object to be tested in the previous M-1 month, and manually evaluating the historical operation data of the object to be tested in the last N months after the preprocessed reference month to obtain the risk rating of the object to be tested when the predicted month span is NTraversing the basic month T, the input month span M and the predicted month span N in the value range to obtain sample sets, wherein each sample comprises the selected input month span, the predicted month span, an input set of an object to be tested and a corresponding risk rating, and dividing the sample sets into training sets and test sets according to a set proportion;
Constructing a credit risk prediction model based on a cyclic neural network, and training the credit risk prediction model by utilizing a training set to determine an input set x and a risk rating under any input month span M and predicted month span NAnd outputting a prediction result y, and testing the trained cyclic neural network on a test set to obtain a trained credit risk prediction model.
9. A computer-readable storage medium storing computer instructions for causing the computer to perform the cyclic neural network-based credit risk prediction method of any one of claims 1 to 7.
CN202310537029.2A 2023-05-13 2023-05-13 Credit risk prediction method, device and storage medium based on recurrent neural network Pending CN116720946A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310537029.2A CN116720946A (en) 2023-05-13 2023-05-13 Credit risk prediction method, device and storage medium based on recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310537029.2A CN116720946A (en) 2023-05-13 2023-05-13 Credit risk prediction method, device and storage medium based on recurrent neural network

Publications (1)

Publication Number Publication Date
CN116720946A true CN116720946A (en) 2023-09-08

Family

ID=87868765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310537029.2A Pending CN116720946A (en) 2023-05-13 2023-05-13 Credit risk prediction method, device and storage medium based on recurrent neural network

Country Status (1)

Country Link
CN (1) CN116720946A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117891811A (en) * 2024-03-13 2024-04-16 南京数策信息科技有限公司 Customer data acquisition and analysis method and device and cloud server

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117891811A (en) * 2024-03-13 2024-04-16 南京数策信息科技有限公司 Customer data acquisition and analysis method and device and cloud server
CN117891811B (en) * 2024-03-13 2024-05-07 南京数策信息科技有限公司 Customer data acquisition and analysis method and device and cloud server

Similar Documents

Publication Publication Date Title
Pan et al. Test case selection and prioritization using machine learning: a systematic literature review
US11282000B2 (en) Systems and methods for predictive coding
US20150294246A1 (en) Selecting optimal training data set for service contract prediction
US8762180B2 (en) Claims analytics engine
CN111401940B (en) Feature prediction method, device, electronic equipment and storage medium
EP2625628A2 (en) Probabilistic data mining model comparison engine
CN109816021B (en) Intelligent contract processing method, device and system, storage medium and electronic equipment
US20140149174A1 (en) Financial Risk Analytics for Service Contracts
CN111612040B (en) Financial data anomaly detection method and related device based on isolated forest algorithm
US20170103150A1 (en) System and method of designing models in a feedback loop
CN110688536A (en) Label prediction method, device, equipment and storage medium
CN116720946A (en) Credit risk prediction method, device and storage medium based on recurrent neural network
CN117151329A (en) Carbon emission strength prediction method, device, equipment and storage medium
CN111191677A (en) User characteristic data generation method and device and electronic equipment
CN111178687A (en) Financial risk classification method and device and electronic equipment
CN111260142A (en) Commodity index data prediction method and device, storage medium and electronic equipment
US11593700B1 (en) Network-accessible service for exploration of machine learning models and results
US20220058669A1 (en) Method and system for forecasting demand with respect to an entity
CN117252688A (en) Financial risk assessment method, system, terminal equipment and storage medium
CN117093477A (en) Software quality assessment method and device, computer equipment and storage medium
CN117437019A (en) Credit card overdue risk prediction method, apparatus, device, medium and program product
CN111582647A (en) User data processing method and device and electronic equipment
CN116245630A (en) Anti-fraud detection method and device, electronic equipment and medium
CN115994093A (en) Test case recommendation method and device
CN115994684A (en) Enterprise risk assessment method, enterprise risk assessment device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination