CN116911983A

CN116911983A - Credit risk prediction model obtaining method, credit risk prediction method and device

Info

Publication number: CN116911983A
Application number: CN202310908444.4A
Authority: CN
Inventors: 葛洋洋
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2023-10-20

Abstract

The credit risk prediction model obtaining method, the credit risk prediction method and the credit risk prediction device can be applied to the field of artificial intelligence or the field of finance. The present disclosure obtains credit behavior data for a sample banking user; obtaining an advantage function of a trust domain policy optimization algorithm model by using credit behavior data of a sample bank user; and based on the dominance function and credit behavior data of the sample banking user, iteratively updating a risk prediction strategy of the trust domain strategy optimization algorithm model by using a random gradient ascending algorithm to obtain a credit risk prediction model. The credit risk prediction method and the credit risk prediction system adopt a trust domain policy optimization algorithm model, and can utilize credit behavior data with unbalanced data and quicker change to update a risk prediction policy in a rapid iteration mode so as to accurately predict the credit risk.

Description

Credit risk prediction model obtaining method, credit risk prediction method and device

Technical Field

The disclosure relates to the field of artificial intelligence, and in particular relates to a credit risk prediction model obtaining method, a credit risk prediction method and a credit risk prediction device.

Background

The credit business is taken as an important asset business of commercial banks, and brings benefits to the banks and also brings certain financial risks to the banks. Therefore, how to accurately predict the credit risk of the bank becomes an important reference basis for the bank to avoid the related risk of melting and timely take the risk coping strategy.

However, since credit businesses produce less data and the related data of credit businesses vary widely over time, it is difficult to accurately predict credit risk using conventional machine learning algorithms.

Therefore, how to accurately predict credit risk becomes a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above problems, the present disclosure provides a credit risk prediction model obtaining method, a credit risk prediction method and a credit risk prediction device for overcoming or at least partially solving the above problems, and the technical solutions are as follows:

a credit risk prediction model acquisition method, comprising:

obtaining credit behavior data of a sample bank user;

obtaining an advantage function of a trust domain policy optimization algorithm model by using credit behavior data of the sample bank user;

and based on the dominance function and the credit behavior data of the sample banking user, iteratively updating the risk prediction strategy of the trust domain strategy optimization algorithm model by using a random gradient ascending algorithm to obtain a credit risk prediction model.

Optionally, the obtaining the dominance function of the trust domain policy optimization algorithm model by using the credit behavior data of the sample banking user includes:

Obtaining a state action cost function and a state cost function of a trust domain policy optimization algorithm model by using credit behavior data of the sample bank user;

and obtaining the advantage function of the trust domain policy optimization algorithm model by using the state action cost function and the state cost function.

Optionally, the obtaining the state action cost function and the state cost function of the trust domain policy optimization algorithm model by using the credit behavior data of the sample banking user includes:

according to the formula:

calculating a state action cost function of the trust domain policy optimization algorithm model;

according to the formula:

calculating a state cost function of the trust domain policy optimization algorithm model;

wherein pi is a risk prediction strategy; t is the sampling number; s is(s) _t Representing credit business states corresponding to the sample banking users under a sampling number t; a, a _t A credit action taken by the sample user under a sampling number t in credit action data of the sample bank user; q (Q) _π (s _t ,a _t ) For and credit business state s _t And credit behavioural action a _t A corresponding state action cost function representing the state s of the credit business of the sample user _t Action a of taking credit action _t The expected return value obtained later; v (V) _π (s _t ) Representing that the sample user is in a credit business state s for a state cost function corresponding to the credit business state _t Obtaining a desired return value; e represents averaging; gamma is discount factor, and the value range is [0,1 ]]； Is a natural number set; r(s) _t+l ) Representing that the sample user is in a credit business state s _t+l The return value obtained below.

Optionally, the obtaining the dominance function of the trust domain policy optimization algorithm model by using the state action cost function and the state cost function includes:

according to the formula:

U _π (s,a)＝Q _π (s,a)-V _π (s)

calculating a dominance function of the trust domain policy optimization algorithm model, wherein s represents a credit business state; a represents the credit behavior in the credit behavior data of the sample banking userAction; u (U) _π (s, a) is the dominance function; q (Q) _π (s, a) is the state action cost function; v (V) _π (s) is the state cost function.

Optionally, the step of iteratively updating the risk prediction policy of the trust domain policy optimization algorithm model by using a random gradient ascent algorithm based on the merit function and the credit behavior data of the sample banking user to obtain a credit risk prediction model includes:

Obtaining a policy approximation value and a penalty factor of the trust domain policy optimization algorithm model under the risk prediction policy of the round of iteration by using the dominance function and credit behavior data of the sample bank user;

acquiring a risk prediction strategy of the next iteration of the trust domain strategy optimization algorithm model based on the strategy approximation value and the penalty factor;

and determining the trust domain policy optimization algorithm model under the risk prediction policy obtained in the last iteration as a credit risk prediction model.

Optionally, the obtaining, by using the dominance function and credit behavior data of the sample banking user, a policy approximation value and a penalty factor of the trust domain policy optimization algorithm model under the risk prediction policy of the present iteration includes:

according to the formula:

calculating a policy approximation value of the trust domain policy optimization algorithm model under a risk prediction policy of the round of iteration;

according to the formula:

C＝4∈γ/(1-γ) ²

∈＝max _s |E _{a～π′(a|s)} [U _π (s,a)]|

calculating a punishment factor of the trust domain policy optimization algorithm model under a risk prediction policy of the round of iteration;

wherein i represents the iteration round number; pi represents a risk prediction strategy; pi _i Representing a risk prediction strategy obtained by the ith round of iteration; Approximating the policy; sigma (pi) _i ) Is pi _i The average discount return value; s represents a credit business status;representing risk prediction strategy pi _i Distribution of credit down business states s; a represents a credit behavior action in the credit behavior data of the sample banking user; pi (s|a) represents a risk prediction policy that the algorithm further selects after the sample user takes credit action a into credit business state s; a-pi' (a|s) represents the selection of credit action a according to risk prediction policy pi; />Risk prediction strategy pi representing ith round of iteration _i The following dominance function; c is the penalty factor; gamma is discount factor, and the value range is [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the E represents averaging; u (U) _π (s, a) represents a merit function.

Optionally, the obtaining, based on the policy approximation and the penalty factor, a risk prediction policy of a next iteration of the trust domain policy optimization algorithm model includes:

according to the formula:

calculating a risk prediction strategy of the next iteration of the trust domain strategy optimization algorithm model, wherein pi _i+1 Representing a risk prediction strategy obtained by the (i+1) th round of iteration; KL represents KL divergence;representing risk prediction strategy pi _i Maximum value of total difference divergence from risk prediction strategy pi.

A credit risk prediction method, comprising:

obtaining credit behavior data of a target bank user;

inputting the credit behavior data of the target bank user into the credit risk prediction model according to any one of the above claims, and obtaining a credit risk prediction result which is output by the credit risk prediction model and corresponds to the target bank user.

A credit risk prediction model obtaining apparatus, comprising: a first credit behavior data obtaining unit, a dominance function obtaining unit and a credit risk prediction model obtaining unit,

the first credit behavior data obtaining unit is used for obtaining credit behavior data of a sample bank user;

the advantage function obtaining unit is used for obtaining the advantage function of the trust domain policy optimization algorithm model by utilizing the credit behavior data of the sample bank user;

the credit risk prediction model obtaining unit is used for iteratively updating the risk prediction strategy of the trust domain strategy optimization algorithm model by using a random gradient ascending algorithm based on the dominance function and the credit behavior data of the sample bank user to obtain a credit risk prediction model.

A credit risk prediction apparatus comprising: a second credit behavior data obtaining unit and a credit risk prediction result obtaining unit,

The second credit behavior data obtaining unit is used for obtaining the credit behavior data of the target bank user;

the credit risk prediction result obtaining unit is used for inputting the credit behavior data of the target bank user into the credit risk prediction model to obtain a credit risk prediction result which is output by the credit risk prediction model and corresponds to the target bank user.

By means of the technical scheme, the credit risk prediction model obtaining method, the credit risk prediction method and the credit risk prediction device can be applied to the field of artificial intelligence or the field of finance. The present disclosure obtains credit behavior data for a sample banking user; obtaining an advantage function of a trust domain policy optimization algorithm model by using credit behavior data of a sample bank user; and based on the dominance function and credit behavior data of the sample banking user, iteratively updating a risk prediction strategy of the trust domain strategy optimization algorithm model by using a random gradient ascending algorithm to obtain a credit risk prediction model. The credit risk prediction method and the credit risk prediction system adopt a trust domain policy optimization algorithm model, and can utilize credit behavior data with unbalanced data and quicker change to update a risk prediction policy in a rapid iteration mode so as to accurately predict the credit risk.

The foregoing description is merely an overview of the technical solutions of the present disclosure, and may be implemented according to the content of the specification in order to make the technical means of the present disclosure more clearly understood, and in order to make the above and other objects, features and advantages of the present disclosure more clearly understood, the following specific embodiments of the present disclosure are specifically described.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the disclosure. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 illustrates a flow diagram of one implementation of a credit risk prediction model acquisition method provided by an embodiment of the present disclosure;

FIG. 2 illustrates a flow diagram of another implementation of a credit risk prediction model acquisition method provided by an embodiment of the present disclosure;

FIG. 3 illustrates a flow diagram of another implementation of a credit risk prediction model acquisition method provided by an embodiment of the present disclosure;

FIG. 4 illustrates a flow diagram of one implementation of a credit risk prediction method provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a credit risk prediction model obtaining device according to an embodiment of the present disclosure;

fig. 6 shows a schematic structural diagram of a credit risk prediction apparatus provided by an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The banking business is also called as a banking business, in a practical situation, the banking business can provide less data, and the problems of unbalanced data types and overlarge change of data along with time exist, which greatly improves the difficulty of accurately predicting the credit risk. Because the traditional machine learning algorithm is difficult to distinguish data of different attributes and different examples in a real scene, the accuracy of credit risk prediction is further reduced.

In order to solve the problems, improve accuracy of credit risk prediction, the embodiments of the present disclosure provide a credit risk prediction model obtaining method, which improves learning efficiency by using a dominance function, and updates and trains model parameters by using a random gradient ascent algorithm, so that a learning and training process is more stable, meanwhile, the problem of overfitting is avoided, and the method is helpful for learning a trained credit risk prediction model to quickly learn a credit risk prediction task, and improves prediction accuracy of credit risk.

As shown in fig. 1, a flowchart of one implementation of a credit risk prediction model obtaining method provided by an embodiment of the disclosure may include:

s100, credit behavior data of the sample bank user are obtained.

Wherein the credit behavior data is a credit behavior action exhibited by the bank user to the credit business. The embodiment of the disclosure can collect the credit behavior data of the bank user and ensure that the credit behavior data a= (a) ₁ ,a ₂ ,…,a _n ) Wherein a is _i ∈A，i＝1,…,n，a _i For representing a behavior exhibited by a banking user to a credit transaction, a represents a set of actions of the banking user's executable behavior, for describing the credit behavior actions of the banking user.

S110, obtaining an advantage function of the trust domain policy optimization algorithm model by using credit behavior data of the sample bank user.

Wherein the trust domain policy optimization algorithm (Trust Region Policy Optimization, TRPO) is a deep reinforcement learning (Deep Reinforcement Learning, DRL) algorithm for training a policy function to maximize the jackpot. Deep reinforcement learning is a machine learning method which combines deep learning and reinforcement learning and has the perception capability of deep learning and the decision capability of reinforcement learning. Can be controlled according to the input information, and is an artificial intelligence method which is more similar to the human thinking mode.

Because the model constructed based on the trust domain policy optimization algorithm has strong self-learning capability and low requirement on priori knowledge of the environment, the risk prediction policy can be learned and optimized according to a small amount of sample credit behavior data of the bank user, the accuracy of bank credit risk prediction is improved, the management and control of the credit risk by the bank are facilitated, and better service is provided for the bank user.

The advantage function is used for representing the advantage of a bank client to execute a certain credit action relative to other credit action under a specific credit business state, namely the deviation of a random action variable relative to an action mean value. According to the embodiment of the disclosure, the advantage function is estimated by using the credit behavior data of the sample bank user, so that the learning efficiency of the trust domain policy optimization algorithm model can be improved, the learning is more stable, and the problems of over fitting and the like are avoided.

And S120, based on the dominance function and credit behavior data of the sample banking user, iteratively updating a risk prediction strategy of the trust domain strategy optimization algorithm model by using a random gradient ascending algorithm to obtain a credit risk prediction model.

The risk prediction strategy is a prediction strategy adopted when the trust domain strategy optimization algorithm model executes a credit risk prediction task. The risk prediction strategy is generally determined by a plurality of strategy item parameters, and by adjusting the specific numerical values of the strategy item parameters, the emphasis degree of the risk prediction strategy on each item of data in the credit behavior data of the bank user can be determined when the credit risk prediction task is executed.

According to the embodiment of the disclosure, by using the random gradient ascending algorithm, the risk prediction strategy can be iteratively updated by using credit behavior data of only one sample bank user at a time, so that the calculation complexity can be reduced, and the convergence speed of the trust domain strategy optimization algorithm model can be improved.

It may be appreciated that, in the embodiment of the present disclosure, before learning and training the trust domain policy optimization algorithm model for the first time, the trust domain policy optimization algorithm model may be initialized, so that the trust domain policy optimization algorithm model iteratively updates the risk prediction policy through a random gradient ascent algorithm from the initialized risk prediction policy and model parameters.

The credit risk prediction model obtaining method provided by the disclosure can be applied to the field of artificial intelligence or the field of finance. The present disclosure obtains credit behavior data for a sample banking user; obtaining an advantage function of a trust domain policy optimization algorithm model by using credit behavior data of a sample bank user; and based on the dominance function and credit behavior data of the sample banking user, iteratively updating a risk prediction strategy of the trust domain strategy optimization algorithm model by using a random gradient ascending algorithm to obtain a credit risk prediction model. The credit risk prediction method and the credit risk prediction system adopt a trust domain policy optimization algorithm model, and can utilize credit behavior data with unbalanced data and quicker change to update a risk prediction policy in a rapid iteration mode so as to accurately predict the credit risk.

In a credit risk prediction task of a bank, a trust domain policy optimization algorithm model needs to use a dominance function in a relevant parameter updating process so as to improve learning efficiency of the trust domain policy optimization algorithm model.

Optionally, based on the method shown in fig. 1, as shown in fig. 2, a flowchart of another implementation of the credit risk prediction model obtaining method provided by the embodiment of the disclosure, step S110 may include:

s111, obtaining a state action cost function and a state cost function of the trust domain policy optimization algorithm model by using credit behavior data of the sample bank user.

Wherein the state-cost function is used to represent the average of all state-action-cost functions with respect to credit-action probabilities in any credit business state. The state action cost function is a value function corresponding to a single credit action.

Specifically, the embodiment of the disclosure can substitute the credit behavior data of the sample banking user into a calculation formula of the state action cost function to calculate the state action cost function of the trust domain policy optimization algorithm model.

Specifically, the embodiment of the disclosure can substitute the credit behavior data of the sample banking user into a calculation formula of the state cost function to calculate the state cost function of the trust domain policy optimization algorithm model.

S112, utilizing the state action cost function and the state cost function to obtain the advantage function of the trust domain policy optimization algorithm model.

Specifically, the embodiment of the disclosure can substitute the state action cost function and the state cost function into a calculation formula of the dominant function to calculate the dominant function of the trust domain policy optimization algorithm model.

The embodiment of the disclosure can evaluate the magnitude of the current state action cost function relative to the state cost function through the state action cost function and the state cost function, so that the advantage of the state action cost function compared with the current state cost function can be evaluated through the advantage function. If the dominance function is greater than 0, then the currently selected credit behavior action is indicated to be better than the average credit behavior action. If the dominance function is less than 0, then the currently selected credit action is worse than the average credit action.

According to the embodiment of the disclosure, the advantage function capable of reflecting the advantages of any credit behavior action relative to other credit behavior actions can be obtained through the state action cost function and the state cost function, so that the learning efficiency is improved by using the advantage function later, learning is more stable, and the problem of overfitting is avoided.

Alternatively, embodiments of the present disclosure may be according to the formula:

and calculating a state action cost function of the trust domain policy optimization algorithm model.

and calculating a state cost function of the trust domain policy optimization algorithm model.

Wherein pi is a risk prediction strategy; t is the sampling number; s is(s) _t Representing credit business states corresponding to sample banking users under a sampling number t; a, a _t The credit behavior action taken by the sample user under the sampling number t is taken in the credit behavior data of the sample bank user; q (Q) _π (s _t ,a _t ) For and credit business state s _t And credit behavioural action a _t Corresponding state action cost function representing the state s of the credit business of the sample user _t Action a of taking credit action _t The expected return value obtained later; v (V) _π (s _t ) Representing a sample user in a credit business state s for a state cost function corresponding to the credit business state _t Obtaining a desired return value; e represents averaging; gamma is discount factor, and the value range is [0,1 ]]； Is a natural number set; r(s) _t+l ) Representing sample user in credit business state s _t+l The return obtained belowReporting the value.

According to the embodiment of the disclosure, the state action cost function and the state cost function of the trust domain policy optimization algorithm model are calculated through the credit behavior data of the sample bank user, so that the advantages of each credit behavior action taken by the bank user relative to other credit behavior actions can be accurately evaluated later, and a reliable basis is provided for iterative updating of the risk prediction policy.

U _π (S,a)＝Q _π (s,a)-V _π (s)

calculating a dominance function of a trust domain policy optimization algorithm model, wherein s represents a credit business state; a represents a credit behavior action in credit behavior data of a sample banking user; u (U) _π (s, a) is a dominance function; q (Q) _π (s, a) is a state action cost function; v (V) _π (s) is a state cost function.

According to the embodiment of the disclosure, by estimating the dominance function, the variance can be reduced, and the overfitting problem caused by overlarge variance is prevented, so that the learning efficiency of the trust domain policy optimization algorithm model is improved, a reliable basis is provided for iterative updating of the risk prediction policy, and the trust domain policy optimization algorithm model is facilitated to accurately predict the credit risk.

Optionally, based on the method shown in fig. 1, as shown in fig. 3, a flowchart of another implementation of the credit risk prediction model obtaining method provided by the embodiment of the disclosure, step S120 may include:

s121, utilizing the advantage function and credit behavior data of the sample bank user to obtain a policy approximation value and a penalty factor of the trust domain policy optimization algorithm model under the risk prediction policy of the round of iteration.

The policy approximation value is used for representing state distribution of credit business states corresponding to the bank clients under the current risk prediction policy.

Wherein, the penalty factor is a parameter of the optimization algorithm and is used for constraining the step length of each iteration.

Specifically, the embodiment of the disclosure can substitute the dominance function and the credit behavior data of the sample banking user into a calculation formula of the policy approximation value, and calculate the policy approximation value of the trust domain policy optimization algorithm model under the risk prediction policy of the round of iteration.

Specifically, the embodiment of the disclosure can substitute the advantage function and the credit behavior data of the sample banking user into a calculation formula of the penalty factor, and calculate the penalty factor of the trust domain policy optimization algorithm model under the risk prediction policy of the round of iteration.

S122, obtaining a risk prediction strategy of the next iteration of the trust domain strategy optimization algorithm model based on the strategy approximation value and the penalty factor.

Specifically, in the embodiment of the disclosure, the policy approximation value and the penalty factor are substituted into a calculation formula of the risk prediction policy, so as to calculate the risk prediction policy of the next iteration of the trust domain policy optimization algorithm model.

S123, determining a trust domain strategy optimization algorithm model under the risk prediction strategy obtained in the last iteration as a credit risk prediction model.

According to the embodiment of the disclosure, the risk prediction strategy is iterated round by using the random gradient ascending algorithm, so that the risk prediction strategy iterated out in the last round can meet the prediction accuracy requirement of the credit risk prediction task under the actual condition, and the credit risk prediction model applying the risk prediction strategy is put into the credit risk prediction task of the bank user under the actual scene, so that an accurate credit risk prediction result can be obtained.

and calculating a policy approximation value of the trust domain policy optimization algorithm model under the risk prediction policy of the round of iteration.

C＝4∈γ/(1-γ) ²

∈＝max _s |E _{a～π′(a|s)} [U _π (s,a)]|

and calculating a punishment factor of the trust domain policy optimization algorithm model under the risk prediction policy of the round of iteration.

Wherein i represents the iteration round number; pi represents a risk prediction strategy; pi _i Representing a risk prediction strategy obtained by the ith round of iteration;is a policy approximation; sigma (pi) _i ) Is pi _i The average discount return value; s represents a credit business status; />Representing risk prediction strategy pi _i Distribution of credit down business states s; a represents a credit behavior action in credit behavior data of a sample banking user; pi (s|a) represents a risk prediction policy that the algorithm further selects after the sample user takes credit action a into credit business state s; a-pi' (a|s) represents the selection of credit action a according to risk prediction policy pi; />Risk prediction strategy pi representing ith round of iteration _i The following dominance function; c is penalty factor; gamma is discount factor, and the value range is [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the E represents averaging; u (U) _π (s, a) represents a merit function.

According to the embodiment of the disclosure, the policy approximation value and the penalty factor of the trust domain policy optimization algorithm model under the risk prediction policy of the iteration of the present round can be accurately calculated through the advantage function and the credit behavior data of the sample bank user, so that a reliable basis is provided for the risk prediction policy with better credit risk prediction effect updated by the subsequent iteration.

According to the embodiment of the invention, the KL divergence is used as a constraint condition, so that the problem of too small iteration step length caused by constraint of the penalty factor can be avoided, and a risk prediction strategy with better credit risk prediction effect can be updated by utilizing a strategy approximation value and the penalty factor iteration.

As shown in fig. 4, a flow diagram of one implementation of a credit risk prediction method provided by an embodiment of the present disclosure may include:

s200, credit behavior data of the target bank user are obtained.

S210, the credit behavior data of the target bank user are input into a credit risk prediction model, and a credit risk prediction result corresponding to the target bank user, which is output by the credit risk prediction model, is obtained.

Wherein the credit risk prediction result may comprise a credit risk assessment report of the bank user on a specific credit transaction.

The credit risk prediction method provided by the embodiment of the disclosure can be used in the field of artificial intelligence or finance, and by carrying a credit risk prediction model with a risk prediction strategy with excellent credit risk prediction effect, an accurate credit risk prediction result can be predicted based on credit behavior data of a target bank user, so that management and control of the credit risk by a bank are facilitated, and matched credit business service is also facilitated to be provided for the target bank user.

Although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

Corresponding to the credit risk prediction model obtaining method, the embodiment of the disclosure further provides a credit risk prediction model obtaining device, the structure of which is shown in fig. 5, may include: a first credit behavior data obtaining unit 10, a merit function obtaining unit 20, and a credit risk prediction model obtaining unit 30.

A first credit behaviour data obtaining unit 10 for obtaining credit behaviour data of a sample banking user.

The dominance function obtaining unit 20 is configured to obtain a dominance function of the trust domain policy optimization algorithm model by using credit behavior data of the sample banking user.

The credit risk prediction model obtaining unit 30 is configured to obtain a credit risk prediction model by iteratively updating a risk prediction policy of the trust domain policy optimization algorithm model using a random gradient ascent algorithm based on the dominance function and the credit behavior data of the sample banking user.

Alternatively, the merit function obtaining unit 20 may include: a first acquisition subunit and a second acquisition subunit.

The first obtaining subunit is used for obtaining the state action cost function and the state cost function of the trust domain policy optimization algorithm model by using the credit behavior data of the sample banking user.

And the second obtaining subunit is used for obtaining the advantage function of the trust domain policy optimization algorithm model by using the state action cost function and the state cost function.

Alternatively, the first obtaining subunit may be according to the formula:

The first obtaining subunit is according to the formula:

Wherein pi is a risk prediction strategy; t is the sampling number; s is(s) _t Representing credit business states corresponding to sample banking users under a sampling number t; a, a _t The credit behavior action taken by the sample user under the sampling number t is taken in the credit behavior data of the sample bank user; q (Q) _π (s _t ,a _t ) For and credit business state s _t And credit behavioural action a _t Corresponding state action cost function representing the state s of the credit business of the sample user _t Action a of taking credit action _t The expected return value obtained later; v (V) _π (s _t ) Representing a sample user in a credit business state s for a state cost function corresponding to the credit business state _t Obtaining a desired return value; e represents averaging; gamma is discount factor, and the value range is [0,1 ]]； Is a natural number set; r(s) _t+l ) Representing sample user in credit business state s _t+l The return value obtained below.

Alternatively, the second obtaining subunit may be according to the formula:

U _π (s,a)＝Q _π (s,a)-V _π (s)

calculating a dominance function of a trust domain policy optimization algorithm model, wherein s represents a credit business state; a represents a credit behavior action in credit behavior data of a sample banking user；U _π (s, a) is a dominance function; q (Q) _π (s, a) is a state action cost function; v (V) _π (s) is a state cost function.

Alternatively, the credit risk prediction model obtaining unit 30 may include: the third obtaining subunit, the fourth obtaining subunit, and the model determining subunit.

And the third obtaining subunit is used for obtaining the policy approximation value and the penalty factor of the trust domain policy optimization algorithm model under the risk prediction policy of the round of iteration by utilizing the dominance function and the credit behavior data of the sample bank user.

And a fourth obtaining subunit, configured to obtain a risk prediction policy of a next iteration of the trust domain policy optimization algorithm model based on the policy approximation value and the penalty factor.

And the model determining subunit is used for determining the trust domain policy optimization algorithm model under the risk prediction policy obtained in the last iteration as a credit risk prediction model.

Alternatively, the third obtaining subunit may be according to the formula:

The third obtaining subunit may be according to the formula:

C＝4∈γ/(1-γ) ²

∈＝max _s |E _{a～π′(a|s)} [U _π (s,a)]|

Optionally, the fourth obtaining subunit may be according to the formula:

The credit risk prediction model obtaining device provided by the disclosure can be applied to the field of artificial intelligence or the field of finance. The present disclosure obtains credit behavior data for a sample banking user; obtaining an advantage function of a trust domain policy optimization algorithm model by using credit behavior data of a sample bank user; and based on the dominance function and credit behavior data of the sample banking user, iteratively updating a risk prediction strategy of the trust domain strategy optimization algorithm model by using a random gradient ascending algorithm to obtain a credit risk prediction model. The credit risk prediction method and the credit risk prediction system adopt a trust domain policy optimization algorithm model, and can utilize credit behavior data with unbalanced data and quicker change to update a risk prediction policy in a rapid iteration mode so as to accurately predict the credit risk.

Corresponding to the credit risk prediction method, the embodiment of the disclosure further provides a credit risk prediction device, the structure of which is shown in fig. 6, may include: a second credit behavior data obtaining unit 40 and a credit risk prediction result obtaining unit 50.

A second credit behaviour data obtaining unit 40 for obtaining credit behaviour data of the target banking user.

A credit risk prediction result obtaining unit 50, configured to input credit behavior data of the target bank user into the credit risk prediction model, and obtain a credit risk prediction result corresponding to the target bank user output by the credit risk prediction model.

The credit risk prediction device provided by the embodiment of the disclosure can be used in the field of artificial intelligence or finance, and by carrying a credit risk prediction model with a risk prediction strategy with excellent credit risk prediction effect, an accurate credit risk prediction result can be predicted based on credit behavior data of a target bank user, so that management and control of the credit risk by a bank are facilitated, and matched credit business service is also facilitated to be provided for the target bank user.

The specific manner in which the individual units perform the operations in relation to the apparatus of the above embodiments has been described in detail in relation to the embodiments of the method and will not be described in detail here.

The credit risk prediction model obtaining means includes a processor and a memory, the above-mentioned first credit behavior data obtaining unit 10, the advantage function obtaining unit 20, the credit risk prediction model obtaining unit 30, and the like are stored as program units in the memory, and the above-mentioned program units stored in the memory are executed by the processor to realize the corresponding functions.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the credit risk can be accurately predicted by adjusting kernel parameters to adopt a trust domain policy optimization algorithm model and rapidly and iteratively updating a risk prediction policy by using credit behavior data with unbalanced data and quicker change.

Embodiments of the present disclosure provide a computer-readable storage medium having stored thereon a program that, when executed by a processor, implements a credit risk prediction model acquisition method.

The embodiment of the disclosure provides a processor for running a program, wherein the program runs to execute a credit risk prediction model obtaining method.

The embodiment of the disclosure provides an electronic device, which comprises at least one processor, and at least one memory and a bus connected with the processor; the processor and the memory complete communication with each other through a bus; the processor is configured to invoke program instructions in the memory to perform the credit risk prediction model acquisition method described above.

The present disclosure also provides a computer program product adapted to perform a program for initializing the steps of a credit risk prediction model obtaining method when executed on an electronic device.

The credit risk prediction means comprises a processor and a memory, said second credit behaviour data obtaining unit 40 and credit risk prediction result obtaining unit 50 etc. being stored as program elements in the memory, the execution of said program elements stored in the memory by the processor realizing the corresponding functions.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and by adjusting kernel parameters to carry out a credit risk prediction model with a risk prediction strategy with excellent credit risk prediction effect, an accurate credit risk prediction result can be predicted based on credit behavior data of a target bank user, so that the management and control of the credit risk by the bank are facilitated, and matched credit business services are also facilitated to be provided for the target bank user.

Embodiments of the present disclosure provide a computer-readable storage medium having stored thereon a program that, when executed by a processor, implements a credit risk prediction method.

The embodiment of the disclosure provides a processor for running a program, wherein the program runs to execute a credit risk prediction method.

The embodiment of the disclosure provides an electronic device, which comprises at least one processor, and at least one memory and a bus connected with the processor; the processor and the memory complete communication with each other through a bus; the processor is configured to invoke program instructions in the memory to perform the credit risk prediction method described above.

The present disclosure also provides a computer program product adapted to perform a program for initializing the steps of a credit risk prediction method when executed on an electronic device.

The electronic device herein may be a server, a PC, a PAD, a mobile phone, etc.

It should be noted that the credit risk prediction model obtaining method and the credit risk prediction device provided by the present disclosure may be used in the artificial intelligence field or the financial field. The foregoing is merely an example, and does not limit the application fields of the credit risk prediction model obtaining method, the credit risk prediction method and the apparatus provided by the present disclosure.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the related data need to comply with the related laws and regulations and standards of the related countries and regions.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, the electronic device includes one or more processors (CPUs), memory, and a bus. The electronic device may also include input/output interfaces, network interfaces, and the like.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

In the description of the present disclosure, it should be understood that, if the directions or positional relationships indicated by the terms "upper", "lower", "front", "rear", "left" and "right", etc., are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the positions or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limitations of the present disclosure.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the present disclosure. Various modifications and variations of this disclosure will be apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present disclosure, are intended to be included within the scope of the claims of the present disclosure.

Claims

1. A credit risk prediction model obtaining method, comprising:

obtaining credit behavior data of a sample bank user;

2. The method of claim 1, wherein said obtaining a dominance function of a trust domain policy optimization algorithm model using credit behavior data of said sample banking user comprises:

3. The method of claim 2, wherein the obtaining a state action cost function and a state cost function of the trust domain policy optimization algorithm model using the sample banking user credit behavior data comprises:

according to the formula:

4. A method according to claim 3, wherein said obtaining a dominance function of said trust domain policy optimization algorithm model using said state action cost function and said state cost function comprises:

according to the formula:

U _π (s，a)＝Q _π (s，a)-V _π (s)

calculating a dominance function of the trust domain policy optimization algorithm model, wherein s represents a credit business state; a represents a credit behavior action in the credit behavior data of the sample banking user; u (U) _π (s, a) is the dominance function; q (Q) _π (s, a) is the state actionA cost function; v (V) _π (s) is the state cost function.

5. The method of claim 1, wherein iteratively updating the risk prediction strategy of the trust domain strategy optimization algorithm model using a random gradient ascent algorithm based on the dominance function and credit behavior data of the sample banking user to obtain a credit risk prediction model, comprising:

6. The method of claim 5, wherein said obtaining policy approximations and penalty factors of the trust domain policy optimization algorithm model under the risk prediction policy of the present round of iterations using the dominance function and credit behavior data of the sample banking user comprises:

according to the formula:

C＝4∈γ/(1-γ) ²

∈＝max _s |E _{a～π′(a|s)} [U _π (s，a)]|

wherein i represents the iteration round number; pi represents a risk prediction strategy; pi _i Representing a risk prediction strategy obtained by the ith round of iteration;approximating the policy; sigma (pi) _i ) Is pi _i The average discount return value; s represents a credit business status; />Representing risk prediction strategy pi _i Distribution of credit down business states s; a represents a credit behavior action in the credit behavior data of the sample banking user; pi (s|a) represents a risk prediction policy that the algorithm further selects after the sample user takes credit action a into credit business state s; a-pi' (a|s) represents the selection of credit action a according to risk prediction policy pi;risk prediction strategy pi representing ith round of iteration _i The following dominance function; c is the penalty factor; gamma is discount factor, and the value range is [0,1 ]]The method comprises the steps of carrying out a first treatment on the surface of the E represents averaging; u (U) _π (s, a) represents a merit function.

7. The method of claim 6, wherein the obtaining a risk prediction policy for a next iteration of the trust domain policy optimization algorithm model based on the policy approximation and the penalty factor comprises:

according to the formula:

calculating a risk prediction strategy of the next iteration of the trust domain strategy optimization algorithm model, wherein pi _i+1 Representing a risk prediction strategy obtained by the (i+1) th round of iteration; KL represents KL divergence; Representing risk prediction strategy pi _i Maximum value of total difference divergence from risk prediction strategy pi.

8. A credit risk prediction method, comprising:

obtaining credit behavior data of a target bank user;

inputting the credit behavior data of the target bank user into the credit risk prediction model according to any one of claims 1 to 7, and obtaining a credit risk prediction result corresponding to the target bank user, which is output by the credit risk prediction model.

9. A credit risk prediction model obtaining apparatus, characterized by comprising: a first credit behavior data obtaining unit, a dominance function obtaining unit and a credit risk prediction model obtaining unit,

10. A credit risk prediction apparatus, comprising: a second credit behavior data obtaining unit and a credit risk prediction result obtaining unit,

the credit risk prediction result obtaining unit is configured to input the credit behavior data of the target bank user into the credit risk prediction model according to claim 9, and obtain a credit risk prediction result corresponding to the target bank user, which is output by the credit risk prediction model.