CN114971891A

CN114971891A - Risk prediction method and device, processor and electronic equipment

Info

Publication number: CN114971891A
Application number: CN202210826125.4A
Authority: CN
Inventors: 韩奇城; 梁婷; 孙少杰; 徐晓琳
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2022-08-30

Abstract

The application discloses a risk prediction method and device, a processor and electronic equipment, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring target data information of a target object; the method comprises the steps of processing target data information through a target risk control model to obtain risk levels corresponding to the target data information, wherein characteristic information corresponding to the target data information corresponds to characteristic information used for building the target risk control model one to one, the target risk control model is obtained by adopting an energy function of an annealing algorithm and performing iterative training on an initial risk control model, and the initial risk control model is built by part of characteristic information obtained by screening through a preset algorithm. By the method and the device, the problem that the accuracy of risk prediction of the risk control model is low due to the fact that the feature set used for constructing the risk control model needs to be obtained through manual screening in the related technology is solved.

Description

Risk prediction method and device, processor and electronic equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a risk prediction method and device, a processor and electronic equipment.

Background

At present, more and more intelligent risk control models are used in the big data risk control application field of financial institutions. In the construction process of the risk control model, the most complicated and time-consuming is the derivation and screening work of the features, a large number of features are derived from a data asset table of a financial institution, and the features of the risk control model can be obtained through a series of screening modes. In the prior art, special screening is often performed in a manual mode, and the following problems exist in the manual mode:

1. manual feature screening is time consuming and the screened features do not guarantee a good discriminative assessment index (ks index) for the risk control model.

2. After the characteristics are manually screened, the risk control model cannot be guaranteed to have higher ks on the training data set, and the situation that ks is obviously reduced easily occurs.

Aiming at the problem that the accuracy of risk prediction of a risk control model is low due to the fact that a feature set used for constructing the risk control model needs to be obtained through manual screening in the related technology, an effective solution is not provided at present.

Disclosure of Invention

The main purpose of the present application is to provide a risk prediction method and apparatus, a processor, and an electronic device, so as to solve the problem that the accuracy of risk prediction of a risk control model is low because a feature set for constructing the risk control model needs to be obtained by manual screening in the related art.

To achieve the above object, according to one aspect of the present application, there is provided a risk prediction method. The method comprises the following steps: acquiring target data information of a target object; the target data information is processed through a target risk control model, risk levels corresponding to the target data information are obtained, the characteristic information corresponding to the target data information is in one-to-one correspondence with the characteristic information used for constructing the target risk control model, the target risk control model is obtained by adopting an energy function of an annealing algorithm and performing iterative training on an initial risk control model, and the initial risk control model is constructed by part of characteristic information obtained by screening through a preset algorithm.

Further, after the target data information is processed through a target risk control model to obtain a risk level corresponding to the target data information, the method further includes: displaying a risk level corresponding to the target data information on a target interface; if a signal for adjusting the risk level corresponding to the target data information is detected, adjusting the risk level corresponding to the target data information; and optimizing the target risk control model according to the adjusted risk level corresponding to the target data information.

Further, the target risk control model is trained by the following method: acquiring a target data set, and dividing the target data set to obtain a training set, a verification set and a test set; obtaining a first target feature set through the preset algorithm according to the training set and the verification set; constructing a model according to a second target feature set to obtain an initial risk control model, wherein the second target feature set is a subset of the first target feature set; and obtaining an energy function of the annealing algorithm according to the first target feature set, the second target feature set and the test set, and performing iterative training on the initial risk control model by adopting the energy function of the annealing algorithm to obtain the target risk control model.

Further, obtaining a first target feature set according to the training set and the verification set by the preset algorithm includes: performing feature derivation on the training set and the verification set to obtain an initial feature set; and screening the initial feature set through the preset algorithm to obtain the first target feature set.

Further, the preset algorithm at least comprises: the information value algorithm, the logistic regression algorithm with punishment items, the variance expansion coefficient, the population stability index and the bidirectional stepwise regression algorithm, and the initial feature set is screened through the preset algorithm to obtain the first target feature set, wherein the first target feature set comprises: screening the initial feature set through an information value algorithm to obtain a first feature set; screening the first feature set through a logistic regression algorithm with penalty terms to obtain a second feature set; performing multiple collinearity screening on the second feature set through the variance expansion coefficient to obtain a third feature set; screening the third feature set through a population stability index to obtain a fourth feature set; and screening the fourth feature set through a bidirectional stepwise regression algorithm to obtain the first target feature set.

Further, constructing a model according to the second target feature set, and obtaining an initial risk control model comprises: determining a feature quantity for constructing the initial risk control model; and obtaining the second target feature set from the first target feature set according to the feature quantity, and constructing a model according to the second target feature set to obtain the initial risk control model.

Further, obtaining an energy function of the annealing algorithm according to the first target feature set, the second target feature set, and the test set, and performing iterative training on the initial risk control model by using the energy function of the annealing algorithm to obtain the target risk control model includes: calculating the discrimination evaluation index of the training set according to the second target feature set to obtain a first index value; calculating the discrimination evaluation index of the verification set according to the second target feature set to obtain a second index value; calculating the discrimination evaluation index of the test set according to the second target feature set to obtain a third index value; calculating according to the first index value, the second index value and the third index value to obtain a first initial energy function corresponding to the initial risk control model; selecting a plurality of features in a preset proportion from the first target feature set to replace the features in the second target feature set to obtain a third target feature set, and constructing an updated initial risk control model based on the third target feature set; according to the third target feature set, calculating to obtain a second initial energy function corresponding to the updated initial risk control model; calculating a difference value between the first initial energy function and the second initial energy function to obtain a target difference value; if the target difference is smaller than a preset value, taking the updated initial risk control model as a latest risk control model; judging whether the current temperature corresponding to the latest risk control model is less than or equal to a preset temperature value or not, and if the current temperature is less than or equal to the preset temperature value, setting the latest risk control model as the target risk control model; if the current temperature is higher than the preset temperature value, the current temperature is cooled, and a plurality of characteristics in a preset proportion are selected from the first target characteristic set to replace the characteristics in the third target characteristic set until the current temperature is lower than or equal to the preset temperature value.

Further, if the target difference is smaller than the preset value, the method further includes: calculating the target difference value and the current temperature according to a Monte Carlo criterion to obtain a target probability value; and if the target probability value is greater than a preset probability value, taking the updated initial risk control model as the latest risk control model.

In order to achieve the above object, according to another aspect of the present application, there is provided a risk prediction apparatus. The device includes: a first acquisition unit configured to acquire target data information of a target object; the processing unit is used for processing the target data information through a target risk control model to obtain a risk grade corresponding to the target data information, wherein the characteristic information corresponding to the target data information corresponds to the characteristic information used for constructing the target risk control model in a one-to-one manner, the target risk control model is obtained by adopting an energy function of an annealing algorithm and performing iterative training on an initial risk control model, and the initial risk control model is constructed by part of characteristic information obtained by screening through a preset algorithm.

Further, the apparatus further comprises: the display module is used for displaying the risk level corresponding to the target data information on a target interface after the target data information is processed through a target risk control model to obtain the risk level corresponding to the target data information; the adjusting module is used for adjusting the risk level corresponding to the target data information if a signal for adjusting the risk level corresponding to the target data information is detected; and the optimization module is used for optimizing the target risk control model according to the adjusted risk level corresponding to the target data information.

Further, the target risk control model is trained by the following means: the second acquisition unit is used for acquiring a target data set and dividing the target data set to obtain a training set, a verification set and a test set; the screening unit is used for obtaining a first target feature set through the preset algorithm according to the training set and the verification set; the construction unit is used for constructing a model according to a second target feature set to obtain an initial risk control model, wherein the second target feature set is a subset of the first target feature set; and the training unit is used for obtaining an energy function of the annealing algorithm according to the first target feature set, the second target feature set and the test set, and performing iterative training on the initial risk control model by adopting the energy function of the annealing algorithm to obtain the target risk control model.

Further, the screening unit includes: the derivation module is used for carrying out feature derivation on the training set and the verification set to obtain an initial feature set; and the screening module is used for screening the initial feature set through the preset algorithm to obtain the first target feature set.

Further, the preset algorithm at least comprises: the system comprises an information value algorithm, a logistic regression algorithm with punishment items, a variance expansion coefficient, a population stability index and a bidirectional stepwise regression algorithm, wherein the screening module comprises: the first screening submodule is used for screening the initial characteristic set through an information value algorithm to obtain a first characteristic set; the second screening submodule is used for screening the first feature set through a logistic regression algorithm with a penalty term to obtain a second feature set; the third screening submodule is used for carrying out multiple collinearity screening on the second feature set through the variance expansion coefficient to obtain a third feature set; the fourth screening submodule is used for screening the third feature set through a population stability index to obtain a fourth feature set; and the fifth screening submodule is used for screening the fourth feature set through a bidirectional stepwise regression algorithm to obtain the first target feature set.

Further, the construction unit includes: a determining module, configured to determine a feature quantity for constructing the initial risk control model; the selection module is used for obtaining the second target feature set from the first target feature set according to the feature quantity; and the construction module is used for constructing a model according to the second target feature set to obtain the initial risk control model.

Further, the training unit comprises: the first calculation module is used for calculating the discrimination evaluation index of the training set according to the second target feature set to obtain a first index value; calculating the discrimination evaluation index of the verification set according to the second target feature set to obtain a second index value; calculating the discrimination evaluation index of the test set according to the second target feature set to obtain a third index value; a second calculation module, configured to perform calculation according to the first index value, the second index value, and the third index value, to obtain a first initial energy function corresponding to the initial risk control model; a replacing module, configured to select a plurality of features in a preset proportion from the first target feature set to replace features in the second target feature set, so as to obtain a third target feature set, and construct an updated initial risk control model based on the third target feature set; a third calculation module, configured to calculate, according to the third target feature set, a second initial energy function corresponding to the updated initial risk control model; the fourth calculation module is used for calculating the difference value of the first initial energy function and the second initial energy function to obtain a target difference value; the determining module is used for taking the updated initial risk control model as a latest risk control model if the target difference is smaller than a preset value; the judging module is used for judging whether the current temperature corresponding to the latest risk control model is less than or equal to a preset temperature value or not, and if the current temperature is less than or equal to the preset temperature value, the latest risk control model is the target risk control model; and the processing module is used for cooling the current temperature if the current temperature is higher than the preset temperature value, and continuously selecting a plurality of characteristics with preset proportions from the first target characteristic set to replace the characteristics in the third target characteristic set until the current temperature is lower than or equal to the preset temperature value.

Further, if the target difference is smaller than the preset value, the apparatus further includes: the calculating unit is used for calculating the target difference value and the current temperature according to a Monte Carlo criterion to obtain a target probability value; and the processing unit is used for taking the updated initial risk control model as the latest risk control model if the target probability value is greater than a preset probability value.

To achieve the above object, according to one aspect of the present application, there is provided a processor for executing a program, wherein the program executes to perform the risk prediction method according to any one of the above.

To achieve the above object, according to one aspect of the present application, there is provided an electronic device including one or more processors and a memory for storing the one or more processors to implement the risk prediction method of any one of the above.

Through the application, the following steps are adopted: acquiring target data information of a target object; the target risk control model is used for processing target data information to obtain risk levels corresponding to the target data information, wherein the characteristic information corresponding to the target data information corresponds to the characteristic information used for building the target risk control model one to one, the target risk control model is obtained by adopting an energy function of an annealing algorithm and performing iterative training on an initial risk control model, the initial risk control model is built by part of characteristic information obtained by screening through a preset algorithm, and the problem that the accuracy of risk prediction of the risk control model is low due to the fact that a characteristic set used for building the risk control model is obtained by manual screening in the related technology is solved. The method comprises the steps of obtaining characteristic information through preset function screening, constructing an initial risk control model based on the obtained characteristic information, iteratively training the initial risk control model to obtain a target risk control model through an energy function of an annealing algorithm, and predicting a risk program of target data information more accurately through the target risk control model, so that the effect of improving the accuracy of risk prediction of the risk control model is achieved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

fig. 1 is a flow chart of a risk prediction method provided according to an embodiment of the present application;

FIG. 2 is a flowchart of training of a target risk control model provided according to an embodiment of the present application

FIG. 3 is a schematic diagram of a risk prediction device provided in accordance with an embodiment of the present application;

fig. 4 is a schematic diagram of an electronic device provided according to an embodiment of the application.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the relevant information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or organization, before obtaining the relevant information, an obtaining request needs to be sent to the user or organization through the interface, and after receiving the consent information fed back by the user or organization, the relevant information is obtained.

The present invention is described below with reference to preferred implementation steps, and fig. 1 is a flowchart of a risk prediction method provided according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

step S101, target data information of the target object is acquired.

Step S102, processing target data information through a target risk control model to obtain risk levels corresponding to the target data information, wherein the characteristic information corresponding to the target data information corresponds to the characteristic information used for building the target risk control model one to one, the target risk control model is obtained by carrying out iterative training on an initial risk control model by adopting an energy function of an annealing algorithm, and the initial risk control model is built by part of characteristic information obtained by screening through a preset algorithm.

Specifically, it is important to grasp the risk level in the financial institution, and for example, when a client (i.e., a target object) handles loan transaction, it is necessary to perform risk assessment based on information (e.g., credit investigation, property condition, etc.) related to the client, and determine the processing mode of the loan transaction based on the risk assessment result of the client.

Therefore, the target data information of the target object is firstly acquired, and then the target risk control model is utilized to process the target data information so as to accurately predict the risk level of the target data information through the target risk control model. The characteristic information corresponding to the target data information corresponds to the characteristic information used for constructing the target risk control model, for example, the loan transaction, and the corresponding target risk control model needs to be constructed by the characteristic information influencing the credit of the client (for example, credit investigation, property condition, income condition), so that the target data information is the data information corresponding to the characteristic information influencing the credit of the client.

And (4) performing iterative training on the initial risk control model through an energy function of an annealing algorithm to obtain a target risk control model, so that the accuracy of the target risk control model can be effectively improved.

In summary, the feature information is obtained through screening of the preset function, the initial risk control model is constructed based on the obtained feature information, the target risk control model obtained through iterative training of the initial risk control model is subjected to the energy function of the annealing algorithm, the risk program of the target data information can be more accurately predicted through the target risk control model, and the accuracy of risk prediction of the risk control model is improved.

In order to improve the accuracy of the target data classification model, the target data information is processed through the target risk control model, and after the risk level corresponding to the target data information is obtained, the target risk control model is optimized through the risk level corresponding to the target data information: displaying a risk level corresponding to the target data information on a target interface; if a signal for adjusting the risk level corresponding to the target data information is detected, adjusting the risk level corresponding to the target data information; and optimizing the target risk control model according to the risk grade corresponding to the adjusted target data information.

Specifically, the risk grade corresponding to the target data information is predicted to be the medium risk through the target risk control model, and then the risk grade corresponding to the target data information is displayed in the target interface. However, in practice, the risk level corresponding to the target data information is low risk, and then the risk level corresponding to the target data information is modified in the target interface to be low risk, and the modified risk level corresponding to the target data information is used for optimizing the target risk control model. The accuracy of the target risk control model is further improved.

How to train to obtain the target risk control model is crucial, so in the risk prediction method provided in the embodiment of the present application, the target risk control model is trained by the following method: acquiring a target data set, and dividing the target data set to obtain a training set, a verification set and a test set; obtaining a first target feature set through a preset algorithm according to the training set and the verification set; carrying out feature derivation on the training set and the verification set to obtain an initial feature set; and screening the initial feature set through a preset algorithm to obtain a first target feature set. Constructing a model according to a second target feature set to obtain an initial risk control model, wherein the second target feature set is a subset of the first target feature set; and obtaining an energy function of an annealing algorithm according to the first target feature set, the second target feature set and the test set, and performing iterative training on the initial risk control model by adopting the energy function of the annealing algorithm to obtain the target risk control model.

Specifically, a target data set is obtained first, and the target data set is divided into a training set, a verification set, and a test set, and generally divided in the following manner: and dividing according to time periods, recording the data set of the first 80% of the time periods as A, recording the data set of the last 20% of the time periods as B, and obtaining the test set B. Next, a is processed, and 80% of samples that are not returned from a randomly are marked as a1, i.e., training set, and the remaining 20% are marked as a2, i.e., verification set. Generally, the failure rates between a1 and a2 are similar, requiring resampling if the difference between a1 and a2 is too large.

And then, carrying out feature derivation on the training set and the verification set to obtain a large number of initial feature sets, wherein the initial feature sets contain a lot of useless feature information, so that the initial feature sets are screened through a preset algorithm to obtain a first target feature set. And constructing an initial risk control model by using the partial feature information of the first target feature set (namely the second target feature set).

And finally, calculating an energy function of the annealing algorithm through the first target feature set, the second target feature set and the test set, and performing iterative training on the initial risk control model by using the energy function to obtain the target risk control model.

In conclusion, the characteristics are screened through the preset algorithm, the initial risk control model is subjected to iterative training through the annealing algorithm, so that the construction process of the whole risk control model is more standardized and automated, the human resources are saved, and the modeling efficiency and the accuracy of the risk control model are improved.

In an optional embodiment, the following algorithm can be used as an information value algorithm, a logistic regression algorithm with penalty terms, a variance expansion coefficient, a population stability index and a bidirectional stepwise regression algorithm preset algorithm, and the initial feature set is screened through the algorithm to obtain a first target feature set: screening the initial feature set through an information value algorithm to obtain a first feature set; screening the first feature set through a logistic regression algorithm with penalty terms to obtain a second feature set; performing multiple collinearity screening on the second feature set through the variance expansion coefficient to obtain a third feature set; screening the third feature set through a population stability index to obtain a fourth feature set; and screening the fourth feature set through a bidirectional stepwise regression algorithm to obtain a first target feature set.

Specifically, (1) screening by iv (information value algorithm): the iv is the information value of each feature, and for each feature, woe binning is firstly carried out, then the iv value of each bin is calculated, and finally the iv values are added to obtain the iv value of each feature. And (4) removing the features with the iv value of more than or equal to 0.1 and the features with the iv value of less than 0.1.

(2) Screening by logistic regression algorithm with L1 penalty term: l1 can bring sparsity to the model, if the influence of a certain feature on the label value is small, under the constraint of L1, the weight of the feature becomes 0, and the feature with the weight of 0 is rejected.

(3) Multiple collinearity screening is carried out on the features through vif (variance expansion coefficient), the larger the vif value is, the higher the collinearity among the features is, and the feature information with the vif larger than 5 is removed in the application.

(4) Feature screening by psi (population stability indicator): the psi index shows the degree of stability of the feature over time, with lower psi indicating higher time stability of the feature, which is a desired feature for the model. And (5) taking the characteristics with the psi value less than 0.25 and rejecting the characteristics with the psi value more than 0.25.

(5) Screening the characteristic information by a bidirectional stepwise regression method: the significance of all the features is checked while adding a new significant feature, and the non-significant features are removed, so that the optimal feature combination is obtained.

Through the steps, compared with manual screening of the characteristic information, the characteristic information beneficial to the risk control model can be obtained more accurately, the ks (Kolmogorov-Smirnov) index of the risk control model can be improved, and manpower and material resources are saved.

How to construct the initial risk control model is defined as follows: determining the characteristic quantity for constructing an initial risk control model; obtaining a second target feature set from the first target feature set according to the feature quantity; and constructing a model according to the second target feature set to obtain an initial risk control model.

Specifically, when constructing the initial risk control model, not all the first target feature sets are directly adopted, but a part of the first target feature sets is selected to construct the model, that is, the second target feature set is used to construct the initial risk control model. The number of the selected feature information as the second target feature set can be set according to actual conditions, or the second target feature set can be selected from the first target feature set in a random sampling and non-replacement mode, and after the second target feature set is obtained, the target feature set is used for constructing the initial risk control model.

Obtaining an energy function of an annealing algorithm according to the first target feature set, the second target feature set and the test set, and performing iterative training on the initial risk control model by adopting the energy function of the annealing algorithm for further limitation, wherein the method specifically comprises the following contents: calculating a discrimination evaluation index of the training set according to the second target feature set to obtain a first index value; calculating the discrimination evaluation index of the verification set according to the second target feature set to obtain a second index value; calculating the discrimination evaluation index of the test set according to the second target feature set to obtain a third index value; calculating according to the first index value, the second index value and the third index value to obtain a first initial energy function corresponding to the initial risk control model; selecting a plurality of features with preset proportions from the first target feature set to replace the features in the second target feature set to obtain a third target feature set, and constructing an updated initial risk control model based on the third target feature set; according to the third target feature set, calculating to obtain a second initial energy function corresponding to the updated initial risk control model; calculating a difference value between the first initial energy function and the second initial energy function to obtain a target difference value; if the target difference is smaller than a preset value, taking the updated initial risk control model as a latest risk control model; judging whether the current temperature corresponding to the latest risk control model is less than or equal to a preset temperature value or not, and if the current temperature is less than or equal to the preset temperature value, taking the latest risk control model as a target risk control model; if the current temperature is higher than the preset temperature value, the current temperature is cooled, and a plurality of characteristics in a preset proportion are selected from the first target characteristic set to replace the characteristics in the third target characteristic set until the current temperature is lower than or equal to the preset temperature value. If the target difference value is smaller than a preset value, calculating the target difference value and the current temperature according to the Monte Carlo criterion to obtain a target probability value; and if the target probability value is greater than the preset probability value, taking the updated initial risk control model as the latest risk control model.

Specifically, as shown in fig. 2, an iterative training process of the initial risk control model is utilized with an energy function of an annealing algorithm. The training process mainly comprises the following steps: the method comprises the following steps: and constructing an initial risk control model by using the second target feature set to obtain an initial risk control model m0, wherein the initial risk control model m0 comprises setting an initial temperature, the number of initial features, the initial features and the like.

Step two: calculating an energy function E (m0) of the initialization risk control model, ks of the training set (i.e. the first index value), ks of the verification set (i.e. the second index value) and ks of the test set (i.e. the third index value), and combining the two values to obtain the energy function E by the following formula:

E1＝0.5*ks _train +0.5*ks _valid +ks _test (1)

E2＝w ₁ *|ks _train -ks _valid |+w ₂ *|ks _train -ks _test |+w ₃ *|ks _valid -ks _test | (2)

E＝-E1+E2 (3)

wherein, ks _train Is ks, ks on the training set _valid Is ks, ks on the validation set _test Is ks on the test set, E1 is the first part of the energy function, and the effect is to calculate ks for the three data sets as a whole after linear weighted combination, with the expectation that the larger the E1, the better. The ultimate goal of the model is of course to achieve the best results on the test set, i.e., the generalization capability is sufficiently strong, so the weight of ks on the test set is set to 1, which is large relative to the other two. E2 is mainly to calculate the L1 distance between three ks, and the purpose is to ensure the distance between three ks is as small as possible, because the model with too much ks difference is unstable and cannot be used. In the annealing algorithm, the smaller the energy function E, the better.

Step three: and selecting a plurality of features with preset proportions from the first target feature set to replace the features in the second target feature set to obtain a third target feature set, for example, randomly replacing 20% of the features, and calculating a new model m1 (i.e., the updated initial risk control model) and an energy function E (m1) of m1 through the third target feature set.

Step four: calculating a difference between E (m1) and E (m0), where Δ E is E (m1) -E (m0), if Δ E is less than 0, the new model m1 has better performance than the old model m0, then m1 is used as the latest risk control model, and if Δ E is greater than 0, using a monte carlo criterion (Metropolis criterion) to judge whether to accept the updated initial risk control model, specifically including: and (3) calculating to obtain a target probability value by using the target difference value and the current temperature, wherein the formula is shown as (4):

where Δ E is the energy of change, k is a constant, and T is the current temperature. The Metropolis criterion calculates a probability that is compared to a random probability (i.e., the preset probability value described above), and if greater than the random probability, m1 is accepted, otherwise m1 is rejected.

Step five: judging whether the current temperature is less than or equal to a preset temperature value or not, and if the current temperature is less than or equal to the preset temperature value, taking the latest risk control model as a target risk control model; if the current temperature is higher than the preset temperature value, cooling the current temperature, repeating the process until the temperature reaches the set minimum temperature (namely the preset temperature value), and ending iteration to obtain the target risk control model.

According to the risk prediction method provided by the embodiment of the application, target data information of a target object is obtained; the target risk control model is used for processing target data information to obtain risk levels corresponding to the target data information, wherein the characteristic information corresponding to the target data information corresponds to the characteristic information used for building the target risk control model one to one, the target risk control model is obtained by adopting an energy function of an annealing algorithm and performing iterative training on an initial risk control model, the initial risk control model is built by part of characteristic information obtained by screening through a preset algorithm, and the problem that the accuracy of risk prediction of the risk control model is low due to the fact that a characteristic set used for building the risk control model is obtained by manual screening in the related technology is solved. The method comprises the steps of obtaining characteristic information through preset function screening, constructing an initial risk control model based on the obtained characteristic information, iteratively training the initial risk control model to obtain a target risk control model through an energy function of an annealing algorithm, and predicting a risk program of target data information more accurately through the target risk control model, so that the effect of improving the accuracy of risk prediction of the risk control model is achieved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

The embodiment of the present application further provides a risk prediction apparatus, and it should be noted that the risk prediction apparatus of the embodiment of the present application may be used to execute the method for risk prediction provided in the embodiment of the present application. The risk prediction device provided by the embodiment of the present application is described below.

Fig. 3 is a schematic diagram of a risk prediction device according to an embodiment of the present application. As shown in fig. 3, the apparatus includes: a first acquisition unit 301 and a processing unit 302.

A first obtaining unit 301, configured to obtain target data information of a target object.

The processing unit 302 is configured to process target data information through a target risk control model to obtain a risk level corresponding to the target data information, where feature information corresponding to the target data information corresponds to feature information used for constructing the target risk control model in a one-to-one correspondence manner, and the target risk control model is obtained by performing iterative training on an initial risk control model by using an energy function of an annealing algorithm, where the initial risk control model is constructed by using part of feature information obtained by screening through a preset algorithm.

In the risk prediction apparatus provided in the embodiment of the present application, target data information of a target object is acquired by a first acquisition unit 301; the processing unit 302 processes the target data information through the target risk control model to obtain a risk level corresponding to the target data information, wherein the feature information corresponding to the target data information corresponds to the feature information used for constructing the target risk control model one to one, the target risk control model is obtained by iterative training of an initial risk control model by using an energy function of an annealing algorithm, and the initial risk control model is constructed by using part of feature information obtained by screening through a preset algorithm, so that the problem that the accuracy of risk prediction of the risk control model is low due to the fact that a feature set used for constructing the risk control model is obtained by manual screening in the related art is solved. The method comprises the steps of obtaining characteristic information through preset function screening, constructing an initial risk control model based on the obtained characteristic information, iteratively training the initial risk control model to obtain a target risk control model through an energy function of an annealing algorithm, and predicting a risk program of target data information more accurately through the target risk control model, so that the effect of improving the accuracy of risk prediction of the risk control model is achieved.

Optionally, in the risk prediction apparatus provided in this embodiment of the present application, the apparatus further includes: the display module is used for displaying the risk level corresponding to the target data information on a target interface after the target data information is processed through the target risk control model to obtain the risk level corresponding to the target data information; the adjusting module is used for adjusting the risk level corresponding to the target data information if a signal for adjusting the risk level corresponding to the target data information is detected; and the optimization module is used for optimizing the target risk control model according to the risk level corresponding to the adjusted target data information.

Optionally, in the risk prediction apparatus provided in the embodiment of the present application, the target risk control model is obtained by training through the following apparatuses: the second acquisition unit is used for acquiring a target data set and dividing the target data set to obtain a training set, a verification set and a test set; the screening unit is used for obtaining a first target feature set through a preset algorithm according to the training set and the verification set; the construction unit is used for constructing a model according to a second target feature set to obtain an initial risk control model, wherein the second target feature set is a subset of the first target feature set; and the training unit is used for obtaining an energy function of an annealing algorithm according to the first target feature set, the second target feature set and the test set, and performing iterative training on the initial risk control model by adopting the energy function of the annealing algorithm to obtain the target risk control model.

Optionally, in the risk prediction apparatus provided in the embodiment of the present application, the screening unit includes: the derivation module is used for carrying out feature derivation on the training set and the verification set to obtain an initial feature set; and the screening module is used for screening the initial feature set through a preset algorithm to obtain a first target feature set.

Optionally, in the risk prediction apparatus provided in the embodiment of the present application, the preset algorithm at least includes: the system comprises an information value algorithm, a logistic regression algorithm with punishment items, a variance expansion coefficient, a group stability index and a bidirectional stepwise regression algorithm, wherein a screening module comprises: the first screening submodule is used for screening the initial characteristic set through an information value algorithm to obtain a first characteristic set; the second screening submodule is used for screening the first characteristic set through a logistic regression algorithm with a penalty term to obtain a second characteristic set; the third screening submodule is used for performing multiple collinearity screening on the second feature set through the variance expansion coefficient to obtain a third feature set; the fourth screening submodule is used for screening the third feature set through the population stability index to obtain a fourth feature set; and the fifth screening submodule is used for screening the fourth feature set through a bidirectional stepwise regression algorithm to obtain a first target feature set.

Optionally, in the risk prediction apparatus provided in the embodiment of the present application, the construction unit includes: the determining module is used for determining the characteristic quantity for constructing the initial risk control model; the selection module is used for obtaining a second target feature set from the first target feature set according to the feature quantity; and the construction module is used for constructing a model according to the second target feature set to obtain an initial risk control model.

Optionally, in the risk prediction apparatus provided in this embodiment of the present application, the training unit includes: the first calculation module is used for calculating the discrimination evaluation index of the training set according to the second target feature set to obtain a first index value; calculating the discrimination evaluation index of the verification set according to the second target feature set to obtain a second index value; calculating the discrimination evaluation index of the test set according to the second target feature set to obtain a third index value; the second calculation module is used for calculating according to the first index value, the second index value and the third index value to obtain a first initial energy function corresponding to the initial risk control model; the replacing module is used for selecting a plurality of features with preset proportions from the first target feature set to replace the features in the second target feature set to obtain a third target feature set, and constructing an updated initial risk control model based on the third target feature set; the third calculation module is used for calculating to obtain a second initial energy function corresponding to the updated initial risk control model according to the third target feature set; the fourth calculation module is used for calculating the difference value of the first initial energy function and the second initial energy function to obtain a target difference value; the determining module is used for taking the updated initial risk control model as a latest risk control model if the target difference value is smaller than a preset value; the judging module is used for judging whether the current temperature corresponding to the latest risk control model is less than or equal to a preset temperature value or not, and if the current temperature is less than or equal to the preset temperature value, the latest risk control model is taken as a target risk control model; and the processing module is used for cooling the current temperature if the current temperature is greater than the preset temperature value, and continuously selecting a plurality of characteristics with preset proportions from the first target characteristic set to replace the characteristics in the third target characteristic set until the current temperature is less than or equal to the preset temperature value.

Optionally, in the risk prediction apparatus provided in this embodiment of the present application, if the target difference is smaller than a preset value, the apparatus further includes: the calculating unit is used for calculating the target difference value and the current temperature according to the Monte Carlo criterion to obtain a target probability value; and the processing unit is used for taking the updated initial risk control model as the latest risk control model if the target probability value is greater than the preset probability value.

The risk prediction device includes a processor and a memory, the first acquiring unit 301 and the processing unit 302, etc. are stored in the memory as program units, and the processor executes the program units stored in the memory to implement corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more, and the prediction of the risk level is realized by adjusting the parameters of the kernel.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), including at least one memory chip.

The embodiment of the invention provides a processor, which is used for running a program, wherein a risk prediction method is executed when the program runs.

As shown in fig. 4, an embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and the processor executes the program to implement the following steps: acquiring target data information of a target object; the method comprises the steps of processing target data information through a target risk control model to obtain risk levels corresponding to the target data information, wherein characteristic information corresponding to the target data information corresponds to characteristic information used for building the target risk control model one to one, the target risk control model is obtained by adopting an energy function of an annealing algorithm and performing iterative training on an initial risk control model, and the initial risk control model is built by part of characteristic information obtained by screening through a preset algorithm.

Optionally, after the target data information is processed by the target risk control model to obtain a risk level corresponding to the target data information, the method further includes: displaying a risk level corresponding to the target data information on a target interface; if a signal for adjusting the risk level corresponding to the target data information is detected, adjusting the risk level corresponding to the target data information; and optimizing the target risk control model according to the risk grade corresponding to the adjusted target data information.

Optionally, the target risk control model is trained by: acquiring a target data set, and dividing the target data set to obtain a training set, a verification set and a test set; obtaining a first target feature set through a preset algorithm according to the training set and the verification set; constructing a model according to a second target feature set to obtain an initial risk control model, wherein the second target feature set is a subset of the first target feature set; and obtaining an energy function of an annealing algorithm according to the first target feature set, the second target feature set and the test set, and performing iterative training on the initial risk control model by adopting the energy function of the annealing algorithm to obtain the target risk control model.

Optionally, obtaining the first target feature set by a preset algorithm according to the training set and the verification set includes: carrying out feature derivation on the training set and the verification set to obtain an initial feature set; and screening the initial feature set through a preset algorithm to obtain a first target feature set.

Optionally, the preset algorithm at least includes: the method comprises the following steps of carrying out information value algorithm, logistic regression algorithm with punishment items, variance expansion coefficient, group stability index and bidirectional stepwise regression algorithm, screening an initial feature set through a preset algorithm, and obtaining a first target feature set, wherein the first target feature set comprises the following steps: screening the initial feature set through an information value algorithm to obtain a first feature set; screening the first feature set through a logistic regression algorithm with penalty terms to obtain a second feature set; performing multiple collinearity screening on the second feature set through the variance expansion coefficient to obtain a third feature set; screening the third feature set through a population stability index to obtain a fourth feature set; and screening the fourth feature set through a bidirectional stepwise regression algorithm to obtain a first target feature set.

Optionally, constructing a model according to the second target feature set, and obtaining the initial risk control model includes: determining the characteristic quantity for constructing an initial risk control model; and obtaining a second target feature set from the first target feature set according to the feature quantity, and constructing a model according to the second target feature set to obtain an initial risk control model.

Optionally, obtaining an energy function of an annealing algorithm according to the first target feature set, the second target feature set and the test set, and performing iterative training on the initial risk control model by using the energy function of the annealing algorithm to obtain the target risk control model includes: calculating a discrimination evaluation index of the training set according to the second target feature set to obtain a first index value; calculating the discrimination evaluation index of the verification set according to the second target feature set to obtain a second index value; calculating the discrimination evaluation index of the test set according to the second target feature set to obtain a third index value; calculating according to the first index value, the second index value and the third index value to obtain a first initial energy function corresponding to the initial risk control model; selecting a plurality of features with preset proportions from the first target feature set to replace the features in the second target feature set to obtain a third target feature set, and constructing an updated initial risk control model based on the third target feature set; according to the third target feature set, calculating to obtain a second initial energy function corresponding to the updated initial risk control model; calculating a difference value between the first initial energy function and the second initial energy function to obtain a target difference value; if the target difference is smaller than a preset value, taking the updated initial risk control model as a latest risk control model; judging whether the current temperature corresponding to the latest risk control model is less than or equal to a preset temperature value or not, and if the current temperature is less than or equal to the preset temperature value, taking the latest risk control model as a target risk control model; if the current temperature is higher than the preset temperature value, the current temperature is cooled, and a plurality of characteristics in a preset proportion are selected from the first target characteristic set to replace the characteristics in the third target characteristic set until the current temperature is lower than or equal to the preset temperature value.

Optionally, if the target difference is smaller than the preset value, the method further includes: calculating a target probability value according to the Monte Carlo criterion on the target difference value and the current temperature; and if the target probability value is greater than the preset probability value, taking the updated initial risk control model as the latest risk control model.

The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: acquiring target data information of a target object; the method comprises the steps of processing target data information through a target risk control model to obtain risk levels corresponding to the target data information, wherein characteristic information corresponding to the target data information corresponds to characteristic information used for building the target risk control model one to one, the target risk control model is obtained by adopting an energy function of an annealing algorithm and performing iterative training on an initial risk control model, and the initial risk control model is built by part of characteristic information obtained by screening through a preset algorithm.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of risk prediction, comprising:

acquiring target data information of a target object;

the target data information is processed through a target risk control model, risk levels corresponding to the target data information are obtained, the characteristic information corresponding to the target data information is in one-to-one correspondence with the characteristic information used for constructing the target risk control model, the target risk control model is obtained by adopting an energy function of an annealing algorithm and performing iterative training on an initial risk control model, and the initial risk control model is constructed by part of characteristic information obtained by screening through a preset algorithm.

2. The method according to claim 1, wherein after the target data information is processed by a target risk control model to obtain a risk level corresponding to the target data information, the method further comprises:

displaying a risk level corresponding to the target data information on a target interface;

if a signal for adjusting the risk level corresponding to the target data information is detected, adjusting the risk level corresponding to the target data information;

and optimizing the target risk control model according to the adjusted risk level corresponding to the target data information.

3. The method of claim 1, wherein the target risk control model is trained by:

acquiring a target data set, and dividing the target data set to obtain a training set, a verification set and a test set;

obtaining a first target feature set through the preset algorithm according to the training set and the verification set;

constructing a model according to a second target feature set to obtain an initial risk control model, wherein the second target feature set is a subset of the first target feature set;

and obtaining an energy function of the annealing algorithm according to the first target feature set, the second target feature set and the test set, and performing iterative training on the initial risk control model by adopting the energy function of the annealing algorithm to obtain the target risk control model.

4. The method according to claim 3, wherein obtaining a first target feature set by the predetermined algorithm according to the training set and the verification set comprises:

performing feature derivation on the training set and the verification set to obtain an initial feature set;

and screening the initial feature set through the preset algorithm to obtain the first target feature set.

5. The method according to claim 4, characterized in that said preset algorithm comprises at least: the information value algorithm, the logistic regression algorithm with punishment items, the variance expansion coefficient, the population stability index and the bidirectional stepwise regression algorithm, and the initial feature set is screened through the preset algorithm to obtain the first target feature set, wherein the first target feature set comprises:

screening the initial feature set through an information value algorithm to obtain a first feature set;

screening the first feature set through a logistic regression algorithm with penalty terms to obtain a second feature set;

performing multiple collinearity screening on the second feature set through the variance expansion coefficient to obtain a third feature set;

screening the third feature set through a population stability index to obtain a fourth feature set;

and screening the fourth feature set through a bidirectional stepwise regression algorithm to obtain the first target feature set.

6. The method of claim 3, wherein constructing the model according to the second set of target features, and obtaining the initial risk control model comprises:

determining a feature quantity for constructing the initial risk control model;

obtaining the second target feature set from the first target feature set according to the feature quantity;

and constructing a model according to the second target feature set to obtain the initial risk control model.

7. The method of claim 3, wherein deriving an energy function of the annealing algorithm from the first target feature set, the second target feature set, and the test set, and iteratively training the initial risk control model using the energy function of the annealing algorithm to derive the target risk control model comprises:

calculating the discrimination evaluation index of the training set according to the second target feature set to obtain a first index value; calculating the discrimination evaluation index of the verification set according to the second target feature set to obtain a second index value; calculating the discrimination evaluation index of the test set according to the second target feature set to obtain a third index value;

calculating according to the first index value, the second index value and the third index value to obtain a first initial energy function corresponding to the initial risk control model;

selecting a plurality of features in a preset proportion from the first target feature set to replace the features in the second target feature set to obtain a third target feature set, and constructing an updated initial risk control model based on the third target feature set;

according to the third target feature set, calculating to obtain a second initial energy function corresponding to the updated initial risk control model;

calculating a difference value between the first initial energy function and the second initial energy function to obtain a target difference value;

if the target difference is smaller than a preset value, taking the updated initial risk control model as a latest risk control model;

judging whether the current temperature corresponding to the latest risk control model is less than or equal to a preset temperature value or not, and if the current temperature is less than or equal to the preset temperature value, setting the latest risk control model as the target risk control model;

if the current temperature is higher than the preset temperature value, the current temperature is cooled, and a plurality of characteristics in a preset proportion are selected from the first target characteristic set to replace the characteristics in the third target characteristic set until the current temperature is lower than or equal to the preset temperature value.

8. The method of claim 7, wherein if the target difference is less than the predetermined value, the method further comprises:

calculating the target difference value and the current temperature according to a Monte Carlo criterion to obtain a target probability value;

and if the target probability value is greater than a preset probability value, taking the updated initial risk control model as the latest risk control model.

9. A risk prediction device, comprising:

a first acquisition unit configured to acquire target data information of a target object;

the processing unit is used for processing the target data information through a target risk control model to obtain a risk grade corresponding to the target data information, wherein the characteristic information corresponding to the target data information is in one-to-one correspondence with the characteristic information used for constructing the target risk control model, the target risk control model is obtained by adopting an energy function of an annealing algorithm and performing iterative training on an initial risk control model, and the initial risk control model is constructed by part of characteristic information obtained by screening through a preset algorithm.

10. A processor configured to run a program, wherein the program when running performs the risk prediction method of any one of claims 1 to 8.

11. An electronic device comprising one or more processors and memory storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the risk prediction method of any one of claims 1-8.