CN114936919A

CN114936919A - Training risk prediction model, and business risk determination method and device

Info

Publication number: CN114936919A
Application number: CN202210568067.XA
Authority: CN
Inventors: 殷旭东; 吴云崇; 章鹏
Original assignee: Ant Blockchain Technology Shanghai Co Ltd
Current assignee: Ant Blockchain Technology Shanghai Co Ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-08-23

Abstract

The embodiment of the specification provides a method and a device for training a risk prediction model and determining business risk, wherein the method for training the risk prediction model comprises the following steps: acquiring a first user group; each user included in the first user group has corresponding historical user data; obtaining a plurality of sub-user groups by carrying out multi-round sampling on a first user group; based on historical user data of each user included in any sub-user group, determining group feature data of a plurality of feature dimensions of the sub-user group and a risk index label value of the sub-user group as a training sample; inputting the group characteristic data of the sub-user group included in the training sample into a risk prediction model, outputting a risk index prediction value of the sub-user group, and adjusting the model parameters of the risk prediction model according to the risk index prediction value of the sub-user group and the risk index label value of the sub-user group. The risk index value of the user group can be estimated during the risk-free performance period of the users in the user group. And, the accuracy is high.

Description

Training risk prediction model, and business risk determination method and device

Technical Field

One or more embodiments of the present specification relate to the field of computers, and in particular, to a method and an apparatus for training a risk prediction model and determining business risk.

Background

In various wind control scenarios, it is necessary to predict a risk index value of a user group after a service is online, where the risk index value is used to indicate a service risk level of the user group. Generally, in a method for determining a business risk of a user group, after a user in the user group has a target business behavior, a long time is required to wait, and a risk index value is estimated according to risk performances of the users in the user group within the time. And, relying on manual experience, the accuracy is not high.

Disclosure of Invention

One or more embodiments of the present specification describe a training risk prediction model, a business risk determination method, and a business risk determination device, which can estimate risk index values of a user group during risk-free performance of users in the user group. And, the accuracy is high.

In a first aspect, a method for training a risk prediction model is provided, the method including:

acquiring a first user group; each user included in the first user group has corresponding historical user data;

obtaining a plurality of sub-user groups by carrying out multi-round sampling on the first user group;

determining group feature data of a plurality of feature dimensions of the sub user group and a risk index label value of the sub user group as a training sample based on historical user data of each user included in any sub user group of the sub user groups;

and inputting the group characteristic data of the sub-user group included in the training sample into a risk prediction model, outputting a risk index predicted value of the sub-user group through the risk prediction model, and adjusting the model parameters of the risk prediction model according to the risk index predicted value of the sub-user group and the risk index label value of the sub-user group.

In a possible implementation manner, the historical user data is user data within a first time period after the user performs the target business behavior, the first time period exceeds a preset time period, and the risk indicator tag value is calculated based on the historical user data within the preset time period after the user performs the target business behavior.

In a possible implementation manner, the risk indicator tag value is used to indicate that the sub-user group has no default risk of occurrence of performance corresponding to the target business behavior within a preset time period after the target business behavior is performed by the user.

Further, the target business behavior comprises a borrowing behavior, and the performing behavior comprises a repayment behavior.

In one possible embodiment, the historical user data includes user feature data of the user in the plurality of feature dimensions; the plurality of feature dimensions comprises a first feature dimension; the determining group feature data of a plurality of feature dimensions of the sub-user group based on historical user data of each user included in any sub-user group of the plurality of sub-user groups includes:

and aggregating the user feature data of the first feature dimension of each user included in any sub-user group to obtain the group feature data of the sub-user group in the first feature dimension.

Further, the first feature dimension is a numerical feature; the aggregating the user feature data of the first feature dimension of each user included in any sub-user group includes:

averaging the user feature data of the first feature dimension of each user included in any sub-user group to obtain the group feature data of the sub-user group in the first feature dimension.

Further, the first feature dimension is a categorical feature; the aggregating the user feature data of the first feature dimension of each user included in any sub-user group includes:

determining the user number ratio of each category according to the user feature data of the first feature dimension of each user included in any sub-user group, and taking the user number ratio as the group feature data of the sub-user group in the first feature dimension.

In one possible embodiment, the method further comprises:

verifying the trained risk prediction model by utilizing each training sample in a verification sample set; the verification sample set is generated based on historical user data corresponding to each user included in the second user group; the first user group and the second user group are obtained by randomly segmenting a third user group;

testing the risk prediction model by utilizing each training sample in the test sample set to determine the performance of the risk prediction model; the test sample set is generated based on historical user data corresponding to all users included in a fourth user group; the fourth user group and the third user group are obtained by randomly splitting a fifth user group.

In a second aspect, a method for determining business risk of a user group is provided, where the method includes:

acquiring a risk prediction model, wherein the risk prediction model is obtained by training according to the method of the first aspect and has an input form of group characteristic data;

acquiring user characteristic data of each user in a target user group in a plurality of characteristic dimensions;

converting the user characteristic data of any user into the form of the group characteristic data, inputting the converted characteristic data into the risk prediction model, and outputting the predicted risk index value of the user through the risk prediction model;

and determining the predicted risk index value of the target user group according to the service amount of each user in the target user group and the predicted risk index value of the user, wherein the predicted risk index value is used for indicating the service risk degree of the target user group.

In one possible implementation, the plurality of feature dimensions includes numerical and categorical features; the converting the user feature data of any user into the form of the group feature data includes:

directly taking the feature data of the numerical feature of any user as the form of the group feature data;

converting the feature data of the category type feature of any user into a user ratio of each category of the category type feature, and taking the ratio as the form of the group feature data.

In a third aspect, an apparatus for training a risk prediction model is provided, the apparatus comprising:

an acquisition unit configured to acquire a first user group; each user included in the first user group has corresponding historical user data;

the sampling unit is used for carrying out multi-round sampling on the first user group acquired by the acquisition unit to obtain a plurality of sub-user groups;

the determining unit is used for determining group feature data of a plurality of feature dimensions of the sub user group and a risk index label value of the sub user group based on historical user data of each user included in any sub user group in the plurality of sub user groups obtained by the sampling unit, and the group feature data and the risk index label value are used as a training sample;

and the training unit is used for inputting the group characteristic data of the sub-user group included in the training sample obtained by the determining unit into a risk prediction model, outputting a risk index prediction value of the sub-user group through the risk prediction model, and adjusting the model parameters of the risk prediction model according to the risk index prediction value of the sub-user group and the risk index label value of the sub-user group obtained by the determining unit.

In a fourth aspect, an apparatus for determining business risk of a user group is provided, the apparatus includes:

a first obtaining unit, configured to obtain a risk prediction model, which is obtained by training according to the apparatus of the third aspect and has an input form of group feature data;

the second acquisition unit is used for acquiring user characteristic data of each user in the target user group in a plurality of characteristic dimensions;

a prediction unit, configured to convert the user feature data of any user acquired by the second acquisition unit into the form of the group feature data, input the converted feature data into the risk prediction model acquired by the first acquisition unit, and output a predicted risk index value of the user through the risk prediction model;

and the determining unit is used for determining the predicted risk index value of the target user group according to the service amount of each user in the target user group and the predicted risk index value of the user obtained by the predicting unit, and the predicted risk index value is used for indicating the service risk degree of the target user group.

In a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.

In a sixth aspect, there is provided a computing device comprising a memory having stored therein executable code, and a processor that when executing the executable code, implements the method of the first or second aspect.

According to the method and the device for training the risk prediction model, a first user group is obtained; each user included in the first user group has corresponding historical user data; then, performing multi-round sampling on the first user group to obtain a plurality of sub-user groups; then, based on historical user data of each user included in any sub-user group in the sub-user groups, determining group feature data of multiple feature dimensions of the sub-user group and a risk index label value of the sub-user group as a training sample; and finally, inputting the group characteristic data of the sub-user group included in the training sample into a risk prediction model, outputting a risk index predicted value of the sub-user group through the risk prediction model, and adjusting the model parameters of the risk prediction model according to the risk index predicted value of the sub-user group and the risk index label value of the sub-user group. Therefore, in the embodiment of the specification, a machine learning modeling strategy is adopted, a historical data random sampling strategy is used for generating a sample for modeling, and a risk prediction model obtained after training can predict the risk index value of a user group in a period of no risk expression of the users in the user group. And, the accuracy is high.

According to the method and the device for determining the business risk of the user group, a risk prediction model is obtained firstly, the risk prediction model is obtained through training according to the method of the first aspect and has an input form of group characteristic data; then, acquiring user characteristic data of each user in a target user group in multiple characteristic dimensions; then, converting the user characteristic data of any user into the form of the group characteristic data, inputting the converted characteristic data into the risk prediction model, and outputting a predicted risk index value of the user through the risk prediction model; and finally, determining the predicted risk index value of the target user group according to the service amount of each user in the target user group and the predicted risk index value of the user, wherein the predicted risk index value is used for indicating the service risk degree of the target user group. As can be seen from the above, in the embodiments of the present specification, the risk prediction model obtained after the training is utilized, and the risk prediction model is applied to a single user to predict a predicted risk index value of the single user, and then the predicted risk index value of the target user group is determined according to the service amount of each user in the target user group and the predicted risk index value of the user, so that the risk index value of the user group can be estimated during risk-free performance of the users in the user group. And, the accuracy is high.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;

FIG. 2 illustrates a flow diagram of a method of training a risk prediction model, according to one embodiment;

FIG. 3 illustrates a flow diagram of a method for determining business risk for a user group according to one embodiment;

FIG. 4 shows a schematic block diagram of an apparatus for training a risk prediction model according to one embodiment;

fig. 5 shows a schematic block diagram of a determination apparatus of business risk of a user group according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. The implementation scenario relates to determining a business risk of a user group, and in particular, to determining a predicted risk index value of a target user group, where the predicted risk index value is used for indicating a business risk degree of the target user group. Referring to fig. 1, in various service scenarios, it is often involved that a user needs to perform a performance action corresponding to a target service action after performing the target service action, and if the user does not perform the performance action within a predetermined time, a service provider is adversely affected, that is, a service risk is considered to exist. For example, in a credit spending service scenario, the target service activity includes a borrowing activity and the performing activity includes a repayment activity; in a shared-bicycle service scenario, the target service behavior comprises a hiring behavior, and the performing behavior comprises a paying behavior or a designated parking behavior; in a service scenario of renting a charger baby, the target service behavior comprises a renting behavior, and the performing behavior comprises a returning behavior. The need for risk control is common in the various business scenarios described above and may therefore also be referred to as wind control scenarios.

In the embodiment of the present specification, the risk index value is used to indicate that the user group does not have a default risk of performing a performance behavior corresponding to the target business behavior within a preset time period after the user performs the target business behavior. The risk index value of the user group may be calculated according to the user data within a preset time length after each user in the user group performs the target business behavior, but the risk index value of the user group is often predicted in advance due to business needs, that is, the risk index value of the user group is predicted in a short time after each user in the user group performs the target business behavior. In the above-mentioned short time, there is a possibility that the user may perform risk-free performance, and it is desirable to provide a solution that can estimate a risk index value of a user group during risk-free performance of the user in the user group. And, the accuracy is high.

In a typical credit consumption wind scenario, the risk indicator values may specifically represent financial disability rates. Financial defect rate: also known as annual loss rate, the denominator is the annual daily average balance within one year after credit borrowing and the numerator is the bad amount one year after borrowing. The general bad estimation scheme in the industry is that under the condition that the credit of a user group has a plurality of stages of risk expressions, if the first term is expired, the bad rate FPD of the first term overdue amount is observed, and the annual loss rate of the assets is estimated according to the FPD. This general solution is not accurate on the one hand and on the other hand does not allow to predict financial deficits during the credit-supporting phase. In the early stage of the credit granting stage, namely in the period of no first-out risk performance, it is more meaningful to estimate the annual loss rate of the corresponding asset in the whole life cycle, and the scheme for predicting the annual loss rate of the asset in the whole life cycle in the early stage of the credit granting stage of the user has higher actual value on business decision, and is beneficial to estimating the risk and reducing the risk earlier.

In the embodiment of the specification, a machine learning modeling idea is adopted, a historical data random sampling strategy is used for generating samples for modeling, and meanwhile, a prediction scheme from a user to a user group is designed.

Fig. 2 shows a flowchart of a method of training a risk prediction model according to an embodiment, which may be based on the implementation scenario shown in fig. 1. As shown in fig. 2, the method for training the risk prediction model in this embodiment includes the following steps: step 21, acquiring a first user group; each user included in the first user group has corresponding historical user data; step 22, performing multiple sampling rounds on the first user group to obtain multiple sub-user groups; step 23, determining group feature data of a plurality of feature dimensions of the sub-user group and a risk index label value of the sub-user group as a training sample based on historical user data of each user included in any sub-user group of the plurality of sub-user groups; and 24, inputting the group characteristic data of the sub-user group included in the training sample into a risk prediction model, outputting a risk index prediction value of the sub-user group through the risk prediction model, and adjusting the model parameters of the risk prediction model according to the risk index prediction value of the sub-user group and the risk index label value of the sub-user group. Specific execution modes of the above steps are described below.

First, in step 21, a first user group is obtained; each user included in the first user group has corresponding historical user data. It will be appreciated that a relatively large number of users, for example ten thousand users, are typically included in the first group of users.

In one example, the historical user data is user data within a first time length after the user performs the target business behavior, the first time length exceeds a preset time length, the historical user data can be used for calculating a risk indicator tag value subsequently, and the risk indicator tag value is calculated based on the historical user data within the preset time length after the user performs the target business behavior.

Taking the risk index value in the credit consumption wind control scene as an example to represent the annual loss rate, the preset time length is one year, and the target business behavior comprises a borrowing behavior.

Then, in step 22, a plurality of sub-user groups are obtained by performing multiple sampling rounds on the first user group. It will be appreciated that each round of sampling results in a sub-group of users, the number of users in the sub-group being less than the number of users in the first group of users, e.g., the first group of users includes ten thousand users and the sub-group of users includes one thousand users.

In the above-mentioned multiple sampling rounds, users may be selected from the first user group in a random sampling manner to form a sub-user group.

In the embodiment of the present specification, the risk index value is generally an index having statistical significance only for a user group having a certain number of users, and it is generally difficult for a single user to calculate its individual risk index value, so that generating a training sample needs to correspond to a user group, and a plurality of sub-user groups obtained by the aforementioned sampling facilitates subsequent generation of a plurality of training samples, and each sub-user group corresponds to a training sample.

Next, in step 23, based on the historical user data of each user included in any sub-user group of the multiple sub-user groups, group feature data of multiple feature dimensions of the sub-user group and a risk indicator label value of the sub-user group are determined as a training sample. It will be appreciated that historical user data typically reflects user characteristic data representing individual users, which needs to be converted into group characteristic data representing groups of sub-users.

In one example, the historical user data is user data within a first time length after the user performs the target business behavior, the first time length exceeds a preset time length, and the risk indicator tag value is calculated based on the historical user data within the preset time length after the user performs the target business behavior.

In one example, the risk indicator tag value is used to indicate that the sub-user group has no default risk of performing a performance corresponding to the target business behavior within a preset time period after the target business behavior is performed by the user.

In one example, the historical user data includes user characteristic data of the user in the plurality of characteristic dimensions; the plurality of feature dimensions comprises a first feature dimension; the determining group feature data of a plurality of feature dimensions of the sub-user group based on historical user data of each user included in any sub-user group of the plurality of sub-user groups includes:

and aggregating the user characteristic data of the first characteristic dimension of each user included in any sub-user group to obtain the group characteristic data of the sub-user group in the first characteristic dimension.

For example, the sub-user group includes a user a, a user B, and a user C, the user feature data of the first feature dimension of the user a is 20, the user feature data of the first feature dimension of the user B is 30, the user feature data of the first feature dimension of the user C is 40, the 20, 30, and 40 are averaged to obtain an average value of 30, and then the group feature data of the sub-user group in the first feature dimension is 30.

For example, the sub-user group includes a user a, a user B, and a user C, the user feature data of the first feature dimension of the user a is category one, the user feature data of the first feature dimension of the user B is category two, the user feature data of the first feature dimension of the user C is category one, and the ratio of the number of users of the category one to the number of users of the category two is 2 to 1, so that the group feature data of the sub-user group in the first feature dimension is 2 to 1.

Finally, in step 24, the group feature data of the sub-user group included in the training sample is input into a risk prediction model, a risk index prediction value of the sub-user group is output through the risk prediction model, and model parameters of the risk prediction model are adjusted according to the risk index prediction value of the sub-user group and a risk index label value of the sub-user group. It will be appreciated that the model parameters may be adjusted with a training goal of minimizing prediction loss.

In one example, the method further comprises:

verifying the trained risk prediction model by using each training sample in a verification sample set; the verification sample set is generated based on historical user data corresponding to each user included in the second user group; the first user group and the second user group are obtained by randomly segmenting a third user group;

For example, the historical user data of the user corpus is randomly segmented into training user data and testing user data, then a training sample set and a testing sample set are respectively generated for the training user data and the testing user data, the training sample set can be further randomly segmented into a training-purpose sample set and a verification-purpose sample set, and then a machine learning regression model (such as LightGBM) is used for training, verifying and testing on the corresponding sample set, so that a risk prediction model with high accuracy is obtained.

According to the method for training the risk prediction model provided by the embodiment of the specification, a first user group is obtained firstly; each user included in the first user group has corresponding historical user data; then, multiple rounds of sampling are carried out on the first user group to obtain multiple sub-user groups; then, based on historical user data of each user included in any sub-user group of the sub-user groups, determining group feature data of multiple feature dimensions of the sub-user group and a risk index label value of the sub-user group to serve as a training sample; and finally, inputting the group characteristic data of the sub-user group included in the training sample into a risk prediction model, outputting a risk index predicted value of the sub-user group through the risk prediction model, and adjusting the model parameters of the risk prediction model according to the risk index predicted value of the sub-user group and the risk index label value of the sub-user group. Therefore, in the embodiment of the present specification, a machine learning modeling strategy is adopted, a historical data random sampling strategy is used to generate a sample for modeling, and a risk prediction model obtained after training can estimate the risk index value of a user group during the risk-free performance period of the users in the user group. And, the accuracy is high.

Fig. 3 shows a flowchart of a method for determining business risk of a user group according to an embodiment, which may be based on the implementation scenario shown in fig. 1. As shown in fig. 3, the method for determining the business risk of the user group in this embodiment includes the following steps: step 31, obtaining a risk prediction model, wherein the risk prediction model is obtained by training according to the method of fig. 2 and has an input form of group characteristic data; step 32, obtaining user feature data of each user in the target user group in a plurality of feature dimensions; step 33, converting the user characteristic data of any user into the form of the group characteristic data, inputting the converted characteristic data into the risk prediction model, and outputting the predicted risk index value of the user through the risk prediction model; and step 34, determining a predicted risk index value of the target user group according to the service amount of each user in the target user group and the predicted risk index value of the user, wherein the predicted risk index value is used for indicating the service risk degree of the target user group. Specific execution modes of the above steps are described below.

First, in step 31, a risk prediction model is obtained, which is trained according to the method of fig. 2 and has an input form of group feature data. It is to be understood that the training sample used in the training of the risk prediction model corresponds to the sub-user group, and the input of the training sample is the group feature data of the sub-user group, so that the trained risk prediction model also has the input form of the group feature data when used.

Then, in step 32, user feature data of each user in the target user group in a plurality of feature dimensions is obtained. It is understood that the feature dimensions may include attribute features of the user and/or behavior features of the user, and any feature dimension may be a numerical feature or a category feature.

In the embodiments of the present specification, the value of the numerical characteristic is expressed by a numerical value, for example, the consumption amount belongs to the numerical characteristic; the value of the categorical feature is represented by a category identifier, for example, a profession belongs to the categorical feature. The class-type feature can also be converted into a numerical-type feature, for example, a value of a class-type feature includes a class a and a class B, where the class a can be represented by 0 and the class B can be represented by 1, so that the class-type feature is converted into the numerical-type feature.

Next, in step 33, the user feature data of any user is converted into the form of the group feature data, the converted feature data is input to the risk prediction model, and the predicted risk index value of the user is output by the risk prediction model. It will be appreciated that the above-described transformation process is merely a transformation of the presentation of the data, and that in essence the input to the risk prediction model is still user characteristic data for a single user.

In one example, the multi-dimensional features include numerical and categorical features; the converting the user feature data of any user into the form of the group feature data includes:

and converting the characteristic data of the category type characteristic of any user into the user number ratio of each category of the category type characteristic, and taking the ratio as the form of the group characteristic data.

For example, if the consumption amount of a single user is 100, the numerical characteristic is converted into the form of the group characteristic data and then is 100; the category type feature is occupation, and the value includes a category a and a category B, wherein if the occupation of a single user is category a, the ratio of the category type feature to the form of the group feature data is 1 to 0.

Finally, in step 34, according to the service amount of each user in the target user group and the predicted risk index value of the user, determining the predicted risk index value of the target user group, wherein the predicted risk index value is used for indicating the service risk degree of the target user group. It can be understood that, the service amounts of different users may be different, and when determining the predicted risk index value of the target user group, a user with a high service amount may have a higher weight corresponding to the predicted risk index value of the user.

In the embodiment of the present specification, the service quota may have different meanings in different service scenarios. For example, in a credit consumption business scenario, the business limit may be the limit of a credit amount; in the shared bicycle service scene, the service quota can be the quota of riding time or the quota of riding times; in a service scene of renting the charger, the service limit can be the limit of renting time or the limit of renting times.

Taking a credit consumption business scene as an example, the trained risk prediction model is applied to each credit consumption customer individual in the target customer group in the prediction stage to predict the financial reject ratio of each customer individual, and then the financial reject ratio of each customer individual is weighted and averaged by taking the credit line of each customer as a weight, so that the total financial reject ratio prediction value of the target customer group is finally obtained.

According to the method for determining the business risk of the user group provided by the embodiment of the specification, firstly, a risk prediction model is obtained, and the risk prediction model is obtained by training according to the method of FIG. 2 and has an input form of group characteristic data; then, acquiring user characteristic data of each user in a target user group in a plurality of characteristic dimensions; then, converting the user characteristic data of any user into the form of the group characteristic data, inputting the converted characteristic data into the risk prediction model, and outputting a predicted risk index value of the user through the risk prediction model; and finally, determining the predicted risk index value of the target user group according to the service amount of each user in the target user group and the predicted risk index value of the user, wherein the predicted risk index value is used for indicating the service risk degree of the target user group. As can be seen from the above, in the embodiments of the present specification, the risk prediction model obtained after the training is utilized, and the risk prediction model is applied to a single user to predict a predicted risk index value of the single user, and then the predicted risk index value of the target user group is determined according to the service amount of each user in the target user group and the predicted risk index value of the user, so that the risk index value of the user group can be estimated during risk-free performance of the users in the user group. And, the accuracy is high.

According to an embodiment of another aspect, an apparatus for training a risk prediction model is also provided, and the apparatus is used for executing the method for training a risk prediction model provided by the embodiment of the present specification. FIG. 4 shows a schematic block diagram of an apparatus to train a risk prediction model according to one embodiment. As shown in fig. 4, the apparatus 400 includes:

an obtaining unit 41, configured to obtain a first user group; each user included in the first user group has corresponding historical user data;

a sampling unit 42, configured to perform multiple sampling cycles on the first user group acquired by the acquiring unit 41 to obtain multiple sub-user groups;

a determining unit 43, configured to determine, based on the historical user data of each user included in any one of the multiple sub-user groups obtained by the sampling unit 42, group feature data of multiple feature dimensions of the sub-user group and a risk indicator label value of the sub-user group, as a training sample;

a training unit 44, configured to input the group feature data of the sub-user group included in the training sample obtained by the determining unit 43 into a risk prediction model, output a risk index prediction value of the sub-user group through the risk prediction model, and adjust a model parameter of the risk prediction model according to the risk index prediction value of the sub-user group and the risk index label value of the sub-user group obtained by the determining unit.

Optionally, as an embodiment, the historical user data is user data within a first time period after the user performs the target business behavior, the first time period exceeds a preset time period, and the risk indicator tag value is calculated based on the historical user data within the preset time period after the user performs the target business behavior.

Optionally, as an embodiment, the risk indicator tag value is used to indicate that the sub-user group does not have a default risk of performing a performance behavior corresponding to the target business behavior within a preset time period after the target business behavior is performed by the user.

Optionally, as an embodiment, the historical user data includes user feature data of the user in the plurality of feature dimensions; the plurality of feature dimensions comprises a first feature dimension; the determining unit 43 is specifically configured to aggregate the user feature data of the first feature dimension of each user included in any sub-user group, so as to obtain the group feature data of the sub-user group in the first feature dimension.

Further, the first feature dimension is a numerical feature; the determining unit 43 is specifically configured to average the user feature data of the first feature dimension of each user included in any sub-user group, so as to obtain the group feature data of the sub-user group in the first feature dimension.

Further, the first feature dimension is a class-type feature; the determining unit 43 is specifically configured to determine a user number ratio of each category according to the user feature data of the first feature dimension of each user included in any sub-user group, and use the user number ratio as the group feature data of the sub-user group in the first feature dimension.

Optionally, as an embodiment, the apparatus further includes:

the verification unit is used for verifying the trained risk prediction model obtained by the training unit by utilizing each training sample in a verification sample set; the verification sample set is generated based on historical user data corresponding to each user included in the second user group; the first user group and the second user group are obtained by randomly segmenting a third user group;

the testing unit is used for testing the risk prediction model verified by the verifying unit by utilizing each training sample in the testing sample set so as to determine the performance of the risk prediction model; the test sample set is generated based on historical user data corresponding to all users included in a fourth user group; the fourth user group and the third user group are obtained by randomly splitting a fifth user group.

With the apparatus for training a risk prediction model provided in the embodiments of the present specification, first, the obtaining unit 41 obtains a first user group; each user included in the first user group has corresponding historical user data; then, the sampling unit 42 performs multiple sampling cycles on the first user group to obtain multiple sub-user groups; then, the determining unit 43 determines group feature data of a plurality of feature dimensions of the sub-user group and a risk indicator label value of the sub-user group as a training sample based on historical user data of each user included in any sub-user group of the plurality of sub-user groups; finally, the training unit 44 inputs the group feature data of the sub-user group included in the training sample into a risk prediction model, outputs a risk index prediction value of the sub-user group through the risk prediction model, and adjusts the model parameters of the risk prediction model according to the risk index prediction value of the sub-user group and the risk index label value of the sub-user group. Therefore, in the embodiment of the present specification, a machine learning modeling strategy is adopted, a historical data random sampling strategy is used to generate a sample for modeling, and a risk prediction model obtained after training can estimate the risk index value of a user group during the risk-free performance period of the users in the user group. And, the accuracy is high.

According to another aspect of the embodiments, a device for determining business risk of a user group is further provided, where the device is configured to execute the method for determining business risk of a user group provided in the embodiments of the present specification. Fig. 5 shows a schematic block diagram of a determination apparatus of business risk of a user group according to one embodiment. As shown in fig. 5, the apparatus 500 includes:

a first obtaining unit 51, configured to obtain a risk prediction model, which is obtained by training according to the apparatus in fig. 4 and has an input form of group feature data;

a second obtaining unit 52, configured to obtain user feature data of each user in the target user group in multiple feature dimensions;

a prediction unit 53, configured to convert the user feature data of any user acquired by the second acquisition unit 52 into the form of the group feature data, input the converted feature data into the risk prediction model acquired by the first acquisition unit 51, and output a predicted risk index value of the user through the risk prediction model;

a determining unit 54, configured to determine a predicted risk index value of the target user group according to the service amount of each user in the target user group and the predicted risk index value of the user obtained by the predicting unit 53, where the predicted risk index value is used to indicate a service risk degree of the target user group.

Optionally, as an embodiment, the plurality of feature dimensions include numerical features and categorical features; the prediction unit 53 includes:

a first conversion subunit, configured to directly use feature data of a numerical feature of any user as a form of the group feature data;

and the second conversion subunit is used for converting the feature data of the category-type feature of any user into the user number ratio of each category of the category-type feature, and taking the ratio as the form of the group feature data.

With the device for determining a business risk of a user group provided in the embodiment of the present specification, first, the first obtaining unit 51 obtains a risk prediction model, which is obtained by training according to the device of fig. 4 and has an input form of group feature data; then, the second obtaining unit 52 obtains user feature data of each user in the target user group in multiple feature dimensions; then, the prediction unit 53 converts the user feature data of any user into the form of the group feature data, inputs the converted feature data into the risk prediction model, and outputs a predicted risk index value of the user through the risk prediction model; finally, the determining unit 54 determines the predicted risk index value of the target user group according to the service amount of each user in the target user group and the predicted risk index value of the user, where the predicted risk index value is used to indicate the service risk degree of the target user group. As can be seen from the above, in the embodiments of the present specification, the risk prediction model obtained after the training is utilized, and the risk prediction model is applied to a single user to predict a predicted risk index value of the single user, and then the predicted risk index value of the target user group is determined according to the service amount of each user in the target user group and the predicted risk index value of the user, so that the risk index value of the user group can be estimated during risk-free performance of the users in the user group. And, the accuracy is high.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2 or fig. 3.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 2 or fig. 3.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method of training a risk prediction model, the method comprising:

obtaining a plurality of sub-user groups by performing multiple rounds of sampling on the first user group;

and inputting the group characteristic data of the sub-user group included in the training sample into a risk prediction model, outputting a risk index prediction value of the sub-user group through the risk prediction model, and adjusting the model parameters of the risk prediction model according to the risk index prediction value of the sub-user group and the risk index label value of the sub-user group.

2. The method of claim 1, wherein the historical user data is user data within a first time period after the user performs the target business activity, the first time period exceeds a preset time period, and the risk indicator tag value is calculated based on the historical user data within the preset time period after the user performs the target business activity.

3. The method of claim 1, wherein the risk indicator tag value is used to indicate that the sub-user group is not at risk of default of performance corresponding to the target business activity within a preset time period after the target business activity is performed from the user.

4. The method of claim 3, wherein the target business action comprises a borrowing action and the performing action comprises a repayment action.

5. The method of claim 1, wherein the historical user data comprises user characteristic data of a user in the plurality of characteristic dimensions; the plurality of feature dimensions comprises a first feature dimension; the determining group feature data of a plurality of feature dimensions of the sub-user group based on historical user data of each user included in any sub-user group of the plurality of sub-user groups includes:

6. The method of claim 5, wherein the first feature dimension is a numerical feature; the aggregating the user feature data of the first feature dimension of each user included in any sub-user group includes:

7. The method of claim 5, wherein the first feature dimension is a categorical feature; the aggregating the user feature data of the first feature dimension of each user included in any sub-user group includes:

8. The method of claim 1, wherein the method further comprises:

testing the risk prediction model by utilizing each training sample in the test sample set to determine the performance of the risk prediction model; the test sample set is generated based on historical user data corresponding to each user included in a fourth user group; the fourth user group and the third user group are obtained by randomly splitting a fifth user group.

9. A method for determining business risk of a user group, the method comprising:

obtaining a risk prediction model trained according to the method of claim 1 and having an input form of group feature data;

and determining a predicted risk index value of the target user group according to the service amount of each user in the target user group and the predicted risk index value of the user, wherein the predicted risk index value is used for indicating the service risk degree of the target user group.

10. The method of claim 9, wherein the plurality of feature dimensions include numerical features and categorical features; the converting the user feature data of any user into the form of the group feature data includes:

11. An apparatus to train a risk prediction model, the apparatus comprising:

the determining unit is used for determining group feature data of a plurality of feature dimensions of the sub-user group and a risk index label value of the sub-user group as a training sample based on historical user data of each user included in any sub-user group in the plurality of sub-user groups obtained by the sampling unit;

and the training unit is used for inputting the group characteristic data of the sub-user group included in the training sample obtained by the determining unit into a risk prediction model, outputting a risk index predicted value of the sub-user group through the risk prediction model, and adjusting the model parameters of the risk prediction model according to the risk index predicted value of the sub-user group and the risk index label value of the sub-user group obtained by the determining unit.

12. The apparatus of claim 11, wherein the historical user data is user data within a first time period after the user performs the target business activity, the first time period exceeds a preset time period, and the risk indicator tag value is calculated based on the historical user data within the preset time period after the user performs the target business activity.

13. The apparatus of claim 11, wherein the risk indicator tag value is used to indicate that the sub-user group has not occurred with a default risk of default of performing corresponding to the target business activity within a preset time period after the target business activity is performed from the user.

14. The apparatus of claim 13, wherein the target business action comprises a borrowing action and the performing action comprises a repayment action.

15. The apparatus of claim 11, wherein the historical user data comprises user characteristic data of a user in the plurality of characteristic dimensions; the plurality of feature dimensions comprises a first feature dimension; the determining unit is specifically configured to aggregate the user feature data of the first feature dimension of each user included in any sub-user group, so as to obtain group feature data of the sub-user group in the first feature dimension.

16. The apparatus of claim 15, wherein the first feature dimension is a numerical feature; the determining unit is specifically configured to average the user feature data of the first feature dimension of each user included in any sub-user group, and obtain the group feature data of the sub-user group in the first feature dimension.

17. The apparatus of claim 15, wherein the first feature dimension is a categorical feature; the determining unit is specifically configured to determine a user number ratio of each category according to user feature data of a first feature dimension of each user included in any sub-user group, and use the user number ratio as group feature data of the sub-user group in the first feature dimension.

18. The apparatus of claim 11, wherein the apparatus further comprises:

the testing unit is used for testing the risk prediction model verified by the verification unit by utilizing each training sample in a testing sample set so as to determine the performance of the risk prediction model; the test sample set is generated based on historical user data corresponding to all users included in a fourth user group; the fourth user group and the third user group are obtained by randomly splitting a fifth user group.

19. An apparatus for determining business risk of a user group, the apparatus comprising:

a first acquisition unit configured to acquire a risk prediction model trained according to the apparatus of claim 11 and having an input form of group feature data;

20. The apparatus of claim 19, wherein the plurality of feature dimensions comprise numerical features and categorical features; the prediction unit includes:

and the second conversion subunit is used for converting the feature data of the category type feature of any user into the user number ratio of each category of the category type feature, and taking the ratio as the form of the group feature data.

21. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-10.

22. A computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of any of claims 1-10.