CN113362168A

CN113362168A - Risk prediction method and device, storage medium and electronic equipment

Info

Publication number: CN113362168A
Application number: CN202110830848.7A
Authority: CN
Inventors: 张耀强; 徐楠
Original assignee: Jilin Yillion Bank Co ltd
Current assignee: Jilin Yillion Bank Co ltd
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2021-09-07

Abstract

The application provides a risk prediction method and device, a storage medium and electronic equipment, wherein target equipment processes local data of the target equipment by using a first prediction submodel which is pre-deployed on the target equipment to obtain a first risk prediction result, user information of a user to be predicted is sent to each participant equipment of the target equipment, each participant equipment processes the local data of the participant by using a second prediction submodel which is pre-deployed on the target equipment to obtain a second risk prediction result and feeds back the second risk prediction result, and the target equipment processes the local data of the participant to obtain the target risk prediction result according to the first risk prediction result and each second risk prediction result. Because the local data of the participants are not directly transmitted to the target equipment, but the second risk prediction result is transmitted to the target equipment, accurate risk prediction is carried out on the loan user on the basis of the local data of the target equipment and the local data of each participant on the premise of ensuring the privacy and the safety of the data of the participants.

Description

Risk prediction method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of internet finance, and in particular, to a risk prediction method and apparatus, a storage medium, and an electronic device.

Background

In the internet loan scenario, a business needs to perform risk prediction on a loan user based on various data of the loan user, however, part of the data for the risk prediction is respectively controlled by a plurality of participants of the internet loan, wherein the participants can be loan-assisting cooperative organizations, third-party data companies and cooperative financial colleagues.

In order to realize accurate risk prediction of the loan users, the risk prediction is carried out based on the user data of the enterprise and the related user data of the loan users in each participant, and the related user data of the loan users are directly obtained from each participant, so that the risks of revealing the privacy of the users and non-compliance of data use exist.

Therefore, how to provide a technical solution that can realize accurate risk prediction for loan users on the premise of ensuring privacy and security of data of participants becomes a problem that those skilled in the art need to solve.

Disclosure of Invention

The application provides a risk prediction method, which is used for realizing accurate risk prediction of loan users on the premise of ensuring the privacy and the safety of data of participants.

The application also provides a risk prediction device for ensuring the realization and the application of the method in practice.

A risk prediction method is applied to a target device, wherein the target device is pre-deployed with a first prediction submodel, and the method comprises the following steps:

sending user information of a user to be predicted to each participant device of the target device;

acquiring a second risk prediction result fed back by each participant device, wherein the second risk prediction result is obtained by processing participant local data by a second prediction sub-model, the participant local data corresponds to the user information, and the second prediction model is pre-deployed in the participant device;

obtaining a first risk prediction result; the first risk prediction result is obtained by processing target equipment local data by the first prediction submodel, wherein the target equipment local data correspond to the user information;

and processing to obtain a target risk prediction result of the user to be predicted according to the first risk prediction result and each second risk prediction result.

The method, optionally, a deployment process of each predictor model, includes:

acquiring a pre-constructed prediction model; the prediction model is as follows: training a neural network model based on the user data of each participant device and the user data of the target device, which are pre-stored in the target device;

determining characteristic variables corresponding to the target equipment and each participant equipment;

splitting the prediction model according to the characteristic variables corresponding to the target equipment and each participant equipment to obtain a first prediction submodel corresponding to the target equipment and a second preset submodel corresponding to each participant equipment;

and deploying the first predictor model to the target equipment, and sending a second predictor model corresponding to each participant to participant equipment corresponding to the second predictor model, so that each participant equipment deploys the received second predictor model.

The method, optionally, a deployment process of each predictor model, includes:

receiving encrypted sample data sent by each participant device, and acquiring the encrypted sample data of the target device; the encrypted sample data sent by each participant device and the encrypted sample data of the target device comprise encrypted user information;

carrying out sample alignment on each encrypted sample data according to the encrypted user information included in each encrypted sample data;

performing characteristic binning on each encrypted sample data after sample alignment to obtain a plurality of characteristic variable groups;

training a pre-constructed federal model according to each characteristic variable group to obtain a prediction model;

splitting the prediction model to obtain a first prediction submodel corresponding to the target equipment and a second prediction submodel corresponding to each participant equipment;

Optionally, the method, before sending the user information of the user to be predicted to each participant device of the target device, further includes:

encrypting the user information using an encryption key of each participant device.

Optionally, in the method, the processing to obtain the target risk prediction result of the user to be predicted according to the first risk prediction result and each second risk prediction result includes:

obtaining respective corresponding prediction weights of the target equipment and each participant equipment;

and calculating a first risk prediction result and each second risk prediction result according to the prediction weight corresponding to the target equipment and each participant equipment to obtain a target risk prediction result of the user to be predicted.

A risk prediction device is applied to target equipment, and the target equipment is pre-deployed with a first prediction submodel, and the device comprises:

a sending unit, configured to send user information of a user to be predicted to each participant device of the target device;

the first obtaining unit is used for obtaining a second risk prediction result fed back by each participant device, the second risk prediction result is obtained by processing participant local data by a second prediction sub-model, the participant local data corresponds to the user information, and the second prediction model is deployed in the participant device in advance;

a second obtaining unit configured to obtain a first risk prediction result; the first risk prediction result is obtained by processing target equipment local data by the first prediction submodel, wherein the target equipment local data correspond to the user information;

and the processing unit is used for processing to obtain a target risk prediction result of the user to be predicted according to the first risk prediction result and each second risk prediction result.

The above apparatus, optionally, further comprises:

a second obtaining unit configured to obtain a pre-constructed prediction model; the prediction model is as follows: training a neural network model based on the user data of each participant device and the user data of the target device, which are pre-stored in the target device;

the determining unit is used for determining the characteristic variables corresponding to the target equipment and each participant equipment;

the first splitting unit is used for splitting the prediction model according to the characteristic variables corresponding to the target device and each participant device to obtain a first prediction sub-model corresponding to the target device and a second preset sub-model corresponding to each participant device;

the first deployment unit is configured to deploy the first predictor model to the target device, and send the second predictor model corresponding to each participant to the participant device corresponding to the second predictor model, so that each participant device deploys the received second predictor model.

The above apparatus, optionally, further comprises:

the receiving unit is used for receiving the encrypted sample data sent by each participant device and acquiring the encrypted sample data of the target device; the encrypted sample data sent by each participant device and the encrypted sample data of the target device comprise encrypted user information;

the sample alignment unit is used for carrying out sample alignment on each encrypted sample data according to the encrypted user information included in each encrypted sample data;

the characteristic binning unit is used for performing characteristic binning on each encrypted sample data after the samples are aligned to obtain a plurality of characteristic variable groups;

the training unit is used for training a pre-constructed federal model according to each characteristic variable group to obtain a prediction model;

the second splitting unit is used for splitting the prediction model to obtain a first prediction submodel corresponding to the target equipment and a second prediction submodel corresponding to each participant equipment;

the second deployment unit is configured to deploy the first predictor model to the target device, and send the second predictor model corresponding to each participant to the participant device corresponding to the second predictor model, so that each participant device deploys the received second predictor model.

A storage medium storing a set of instructions, wherein the set of instructions, when executed by a processor, implement a risk prediction method as described above.

An electronic device, comprising:

a memory for storing at least one set of instructions;

a processor for executing a set of instructions stored in the memory, the method for risk prediction as described above being implemented by executing the set of instructions.

Compared with the prior art, the method has the following advantages:

the application provides a risk prediction method and device, a storage medium and an electronic device, wherein the method comprises the following steps: the target equipment processes local data of the target equipment by using a first prediction submodel which is deployed in advance on the target equipment to obtain a first risk prediction result, and sends user information of a user to be predicted to each participant equipment of the target equipment, each participant equipment processes local data of the participant by using a second prediction submodel which is deployed in advance on the participant equipment to obtain a second risk prediction result, and the second risk prediction result is fed back; and processing to obtain a target risk prediction result of the user to be predicted according to the first risk prediction result and each second risk prediction result. Therefore, according to the scheme, the participator local data are not directly transmitted to the target equipment, the participator local data are processed by the second prediction sub-model, and then the obtained second risk prediction result is transmitted to the target equipment, so that accurate risk prediction is performed on the loan user based on the target equipment local data and the participator local data on the premise of ensuring the privacy and the safety of the participator data.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is an exemplary diagram of a risk prediction method provided herein;

FIG. 2 is a flow chart of a method of risk prediction provided herein;

FIG. 3 is a flow chart of another method of a risk prediction method provided herein;

FIG. 4 is a flow chart of yet another method of a risk prediction method provided herein;

FIG. 5 is a flow chart of another method of a risk prediction method provided herein;

FIG. 6 is a diagram of another example of a risk prediction method provided herein;

FIG. 7 is a schematic structural diagram of a risk prediction device provided herein;

fig. 8 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the disclosure of the present application are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in the disclosure herein are exemplary rather than limiting, and those skilled in the art will understand that "one or more" will be understood unless the context clearly dictates otherwise.

The application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, distributed computing environments with server computers, and the like.

The embodiment of the application provides a risk prediction method, which relates to a plurality of roles and comprises the following steps:

(1) the data owner: shared data has ownership and control right, the supervision requirements of a data supervisor are required to be met during data sharing, and extremely high requirements are provided for data privacy safety. For example, two collaborators of internet joint loan respectively own the characteristic data of the borrowing client, and the two collaborators are both data owners.

(2) The data acquisition party: and acquiring data from a data owner according to actual business needs, and using the data and storing the data according to the data use range by a party for modeling wind control. For example, the two collaborators of the internet joint loan acquire the client characteristic data shared by the data owner through the data sharing protocol, and both the two collaborators are data acquirers.

(3) The data monitoring party: and the financial supervision department monitors whether the shared data completely meets the regulations of the supervision department on data safety and privacy according to financial supervision requirements.

(4) A data operator: and providing technical operation services comprising basic software and hardware services.

Referring to fig. 1, the specific service scenario of the risk prediction method provided in this embodiment is as follows:

service scene description:

A. and both parties B are parties participating in the Internet joint loan, and both A, B are data owners and data acquirers.

Because both parties have partial customer characteristic data A, B. Due to the data loss, an algorithm model is trained based on local data of the user, the recognition degree of good and bad clients is not high, and the effect is poor.

Therefore, both parties A, B have data sharing requirements, and in the process of AI algorithm inference, the data of A and the data of B are respectively called to calculate the joint prediction model. Alternatively, the calculation of the joint prediction model may be performed using horizontal federal learning or vertical federal learning, wherein the horizontal federal learning: the data sets of the participants have highly overlapped characteristic dimensions, the sample overlap is small, the data sets can be transversely segmented, and the parts with consistent characteristics and incompletely identical samples among the data sets are extracted as training data. For example, cross-country financial cooperation: the method comprises the steps of federal learning + house loan business, wherein financial institutions in different regions of two countries are used as participants to jointly model, data characteristic dimensions of house loan clients are highly overlapped, but client samples in different regions are rarely overlapped, and the method belongs to horizontal federal learning. And data sharing and model calculation under the scene of overlapping user characteristics of both partners are achieved through a transverse federated learning technology. Longitudinal federal learning: the data sets of the participants have highly overlapped sample latitudes, the feature dimensionality is less overlapped, the data sets are longitudinally segmented, and the parts, with the same samples and incompletely same features, of the data sets are extracted to serve as training data. For example, there is a high degree of sample overlap between consumer credit users and house credit users in the same region, but the feature dimension overlap of the feature data is small, and joint modeling between consumer credit companies and house credit financial institutions belongs to longitudinal federal learning. The objective of longitudinal federal learning is joint modeling by two or more participants, and assuming that only one party's data is labeled with data, and at the same time, none of the participants want to expose data information, the challenge is that the participants without data sample labels cannot model alone. And data sharing and model calculation under the scene of overlapping user samples of both cooperative parties are solved through a longitudinal federated learning technology.

The data among the participants can not be sent out locally, and the federal study can establish a virtual common model through a parameter exchange mode under an encryption mechanism, namely under the condition of not violating the data privacy regulation. This virtual model is just like an optimal model built by aggregating data together. But when the virtual model is established, the data does not move, and privacy is not disclosed, and data compliance is not influenced. Thus, the built models serve only local targets in their respective regions. Under the federal mechanism, the identity and the status of each participant are the same, and the strategy mechanism for establishing data sharing is helped by people. Meanwhile, a homomorphic encryption technology of cryptography is adopted in data sharing, various conditions of user overlapping and user characteristic overlapping are fully dealt with aiming at different data dimensions, all participants are enabled to jointly model on the basis that data cannot be found out locally, and data protection of a data owner and model protection of a model owner are fully considered.

An embodiment of the present application provides a risk prediction method, which may be applied to a target device, where a method flowchart of the risk prediction method is shown in fig. 2, and specifically includes:

s201, user information of a user to be predicted is sent to each participant device of the target device.

In this embodiment, when there is a risk prediction demand, user information of a user to be predicted is obtained, where the user to be predicted is a user to be subjected to risk prediction, and the user information at least includes a user identifier, and in addition, may also include other information such as but not limited to a name of the user, and optionally, the user identifier may be an identification number.

In this embodiment, user information of a user to be predicted is sent to each participant device of a target device, where each participant device stores local data for risk prediction corresponding to the user information.

It should be noted that a first predictor model is pre-deployed in the target device, a second predictor model is pre-deployed in each participant device, the second predictor models deployed by different participant devices are different, and the first predictor model and each second predictor model are obtained by splitting based on the prediction models.

Referring to fig. 3, fig. 3 shows an implementation of a deployment process of each predictor model, which specifically includes:

s301, obtaining a pre-constructed prediction model.

In this embodiment, the target device stores in advance user data purchased from each participant device.

In this embodiment, a pre-constructed prediction model is obtained, where the prediction model is: and training the neural network model to obtain the model based on the user data of each participant device and the user data of the target device which are pre-stored in the target device.

S302, determining characteristic variables corresponding to the target equipment and each participant equipment.

In this embodiment, the characteristic variables for the target device and the characteristic variables for each participant device in the predictive model are determined.

S303, splitting the prediction model according to the characteristic variables corresponding to the target device and each participant device to obtain a first prediction submodel corresponding to the target device and a second preset submodel corresponding to each participant device.

In this embodiment, the prediction model is split according to the characteristic variables corresponding to the target device and each participant device, so as to obtain a first prediction sub-model corresponding to the target device and a second preset sub-model corresponding to each participant device, specifically, the prediction model is split according to the characteristic variables corresponding to the devices, so as to obtain a first prediction sub-model, and the prediction model is split according to the characteristic variables corresponding to each participant device, so as to obtain a second prediction sub-model corresponding to each participant device; wherein the feature variables include feature dimensions and data variables.

It should be noted that, in the prediction model splitting process, the prediction model needs to be split into a first prediction sub-model corresponding to the target device based on the feature dimension and the data variable owned by the target device. This part of the model splitting process completely depends only on the data of the target device. For each participant device, the prediction model splitting process needs to be split into a second prediction sub-model corresponding to the participant device based on the characteristic dimension and the data variable owned by the participant device. This part of the model splitting process relies solely on the data of the participant device.

S304, deploying the first prediction submodel into the target device, and sending the second prediction submodel corresponding to each participant to the participant device corresponding to the second prediction submodel, so that each participant device deploys the received second prediction submodel.

In this embodiment, the first prediction sub-model is deployed to the target device, and for each participant device, the target device sends the second prediction sub-model corresponding to the participant device, so that the participant device deploys the received second prediction sub-model.

Optionally, in the process that the target device sends the second predictor sub-model corresponding to the participant device, the target device encrypts the second predictor sub-model corresponding to the participant device according to the encryption key of the participant device, and sends the encrypted second predictor sub-model to the participant device.

Correspondingly, after receiving the encrypted second predictor model, the participant device decrypts the encrypted second predictor model by using the decryption key of the participant device to obtain a second predictor model, and deploys the second predictor model in the participant device.

The encryption key and the decryption key of the participant device may be a public-private key pair, the encryption key is a public key, the decryption key is a private key, and the public-private key pair is calculated based on an encryption algorithm.

In this embodiment, each participant device sends its own encryption key to the target device, so that the target device encrypts and sends data sent to the participant device, thereby avoiding leakage of the data.

It should be noted that, in the federal learning theory, a target device and each participant device jointly train a model on a federal learning platform, and then are respectively deployed to the target device and each participant device after being split; the method is one of characteristic scenes, and is based on actual requirements met by enterprises such as banks and the like, and different characteristic variables of an existing prediction model of target equipment, the target equipment and each participant equipment are split.

Referring to fig. 4, fig. 4 shows another implementation manner of the deployment process of each predictor model, which specifically includes:

s401, receiving the encrypted sample data sent by each participant device, and acquiring the encrypted sample data of the target device.

In this embodiment, the federal learning platform is respectively deployed in the target device and each participant device in advance, and federal learning supporting software is respectively deployed in the target device and each participant device. Alternatively, the federated platform may be FATE (FederatedAITechnologyEnabler, Federal Learn open Source framework).

In this embodiment, a new prediction model may be obtained through retraining, and specifically, encrypted sample data sent by each participant device is received, where the encrypted sample data is obtained by encrypting intermediate data, which is required by the participant device to update the federation model, by using an encryption key, the encrypted sample data sent by each participant device and the encrypted sample data of the target device include encrypted user information, and the encrypted user information is encrypted data obtained by encrypting user information by using a preset encryption algorithm, optionally, the preset encryption algorithm may be MD5(message-digestalgorithm 5), and the user information at least includes a user identifier, in addition, the user identifier may include other information such as a user name, and the user identifier may be a user identification number.

In this embodiment, the target device obtains its own encryption sample data, where the encryption sample data of the target device is obtained by encrypting intermediate data required for updating the federal model by using an encryption key of the target device.

S402, according to the encrypted user information included in each encrypted sample data, carrying out sample alignment on each encrypted sample data.

In this embodiment, each encrypted sample data is sample-aligned by using the encrypted user information included in each encrypted sample data, that is, data of the same user information is combined from each encrypted sample data, so as to obtain a plurality of combinations, specifically, the user information encrypted by MD5 is compared, and a library is bumped, so that the encrypted sample data of the target device and each participant device are user-aligned, and it is ensured that model training is performed based on data of the same batch of users.

And S403, performing characteristic binning on each encrypted sample data after sample alignment to obtain a plurality of characteristic variable groups.

In this embodiment, the feature classification may be performed on the encrypted sample data after the sample alignment according to a data type, specifically, each feature variable in the encrypted sample data after the sample alignment is subjected to the feature classification according to a data type of a feature variable included in the encrypted sample data, where the data type of the feature variable may be a continuous type or a discrete type, for example, the continuous type feature variable may be age and income, and the discrete type feature variable may be marital status, and the like.

In this embodiment, before classifying the encrypted sample data, the method further includes: and filling missing values of all characteristic variables in the encrypted sample data after sample alignment.

In this embodiment, before performing feature binning on each encrypted sample data after sample alignment, a feature item with predictability, such as a client age and a historical repayment change, may be selected according to a feature list in the encrypted sample data. Optionally, analysis of characteristic variables may also be performed, for example, the discrimination of which characteristic variables is relatively high for good and bad clients.

In this embodiment, feature binning is performed on each type of feature variable based on the result of feature classification, so as to obtain a plurality of feature variable groups.

S404, training the pre-constructed federal model according to each characteristic variable group to obtain a prediction model.

In this embodiment, a pre-constructed federal model is trained according to each feature variable group, so as to obtain a prediction model.

In this embodiment, the federated model is trained based on the encrypted data, and in the whole process, under the precondition that the effect of the model is guaranteed to be completely consistent with that of the traditional training method, no underlying data is revealed to other parties, thereby ensuring the data security in the data sharing process.

S405, splitting the prediction model to obtain a first prediction submodel corresponding to the target equipment and a second prediction submodel corresponding to each participant equipment.

In this embodiment, the prediction model is split, specifically, the prediction model is split according to the feature variables corresponding to the target device and each of the participant devices used in the federal model training process, so as to obtain a first prediction submodel corresponding to the target device and a second preset submodel corresponding to each of the participant devices.

S406, deploying the first prediction submodel to the target device, and sending the second prediction submodel corresponding to each participant to the participant device corresponding to the second prediction submodel, so that each participant device deploys the received second prediction submodel.

In this embodiment, the local data and the model parameters are not directly exchanged between the different participant devices and the target device, but intermediate values required for updating the parameters are exchanged. Meanwhile, in order to avoid recovering data information from the intermediate numerical values, the numerical values are encrypted and protected by adopting an encryption key, so that the privacy and the safety of data and models are ensured.

S202, processing the local data of the target equipment corresponding to the user information of the user to be predicted by utilizing the first prediction submodel to obtain a first risk prediction result.

In the implementation, the target device obtains target device local data corresponding to user information of a user to be predicted, inputs the target device local data into a first prediction submodel which is pre-deployed in the target device local data, and processes the target device local data through the first prediction submodel to obtain a first risk prediction result. The target device local data refers to characteristic variables of the user to be predicted owned by the target device.

And S203, acquiring a second risk prediction result fed back by each participant device.

In this embodiment, after receiving the user information sent by the target device, each participant device obtains participant local data corresponding to the user information, inputs the participant local data to a second predictor model pre-deployed in itself, processes the participant local data through the second predictor model to obtain a second risk prediction result, and sends the second risk prediction result to the target device. The participant local data refers to characteristic variables of the user to be predicted owned by the participant, and includes but is not limited to the number of overdue times and the number of credit cards of the user to be predicted.

In this embodiment, the target device obtains a second risk prediction result fed back by each participant device; and the second risk prediction result is obtained by processing local data of the participant by a second prediction submodel, the local data of the participant corresponds to the user information, and the second prediction model is pre-deployed in the participant equipment.

Optionally, before sending the second risk prediction result to the target device, each participant device may further encrypt the second risk prediction result according to an encryption key of the target device, that is, after receiving the user information sent by the target device, each participant device obtains participant local data corresponding to the user information, inputs the participant data to a second prediction sub-model pre-deployed in itself, processes the participant local data by the second prediction sub-model to obtain the second risk prediction result, encrypts the second risk prediction result according to the encryption key of the target device, and sends the encrypted second risk prediction result to the target device. Correspondingly, after receiving the encrypted second risk prediction result fed back by each participant device, the target device decrypts the encrypted second risk prediction result by using its own decryption key, so as to obtain the second risk prediction result of each participant device.

And S204, processing to obtain a target risk prediction result of the user to be predicted according to the first risk prediction result and each second risk prediction result.

In this embodiment, the target risk prediction result of the user to be predicted is obtained by processing according to the first risk prediction result and each second risk prediction result, and specifically, the first risk prediction result and each second risk prediction result are calculated, so that the target risk prediction result of the user to be predicted is obtained.

Referring to fig. 5, a process for calculating a first risk prediction and each second risk prediction includes:

s501, obtaining the prediction weight corresponding to the target device and each participant device.

In this embodiment, the prediction weights corresponding to the target device and each of the participating devices are preset.

In this embodiment, the prediction weights corresponding to the preset target device and each participant device are obtained.

And S502, calculating the first risk prediction result and each second risk prediction result according to the respective corresponding prediction weights of the target equipment and each participant equipment to obtain the target risk prediction result of the user to be predicted.

In this embodiment, the first risk prediction result and each second risk prediction result are calculated based on the prediction weight corresponding to each participant device of the target device, specifically, the first result is obtained by calculating a product of the prediction weight of the target device and the first risk prediction result, the second result corresponding to each participant device is obtained by calculating a product of the prediction weight of each participant device and the second risk prediction result corresponding to the participant device, and the target risk prediction result of the user to be predicted is obtained by accumulating the first result and each second result.

According to the risk prediction method provided by the embodiment, the target device processes local data of the target device by using a first prediction submodel which is deployed in advance in the target device to obtain a first risk prediction result, and sends user information of a user to be predicted to each participant device of the target device, each participant device processes local data of the participant by using a second prediction submodel which is deployed in advance in the target device to obtain a second risk prediction result, and feeds back the second risk prediction result; and calculating the first risk prediction result and each second risk prediction result to obtain a target risk prediction result of the user to be predicted. Because the local data of the participants are not directly transmitted to the target equipment, but the second risk prediction result obtained after the second prediction sub-model processes the local data of the participants is transmitted to the target equipment, accurate risk prediction is carried out on the loan users on the basis of the local data of the target equipment and the local data of each participant on the premise of ensuring the privacy and the safety of the data of the participants.

In this embodiment, optionally, before sending the user information of the user to be predicted to each participant device of the target device, the method further includes: the user information is encrypted using the encryption key of the participant device.

In this embodiment, before sending the user information of the user to be predicted to each participant device of the target device, the encryption key of each participant device is used to encrypt the user information, and the encrypted user information is sent to the corresponding participant device, that is, for each participant device, the encryption key of the participant device is used to encrypt the user information, and the encrypted user information is sent to the participant device, so that the security of the user information in the transmission process is ensured.

Correspondingly, after receiving the encrypted user information, each participant device decrypts the encrypted user information by using its own decryption key, thereby obtaining the user information.

Referring to fig. 6, the above-mentioned risk prediction method is exemplified as follows:

wherein, the A terminal is a participant device, and the B terminal is a target device.

S601, the terminal A generates a key pair and sends the public key of the terminal A to the terminal B, the terminal B generates the key pair and sends the public key of the terminal B to the terminal A, wherein the key pair is obtained based on an encryption algorithm, and the encryption algorithm can be a homomorphic encryption algorithm.

And S602, the A end and the B end respectively deploy a federal learning platform in respective environments, ensure that the deployed platforms of the two ends are consistent and usable, and respectively deploy federal learning supporting software on the A end platform and the B end platform. Alternatively, the federated platform may be the federated learning open source framework FATE.

S603, the end A encrypts intermediate data required by updating the federal model by using an encrypted public key of the end A to obtain sample data and transmits the sample data to the end B, the end B encrypts the intermediate data required by updating the federal model by using the encrypted public key of the end B to obtain the sample data, the end B aligns the sample data on the basis of a user identification ciphertext included in the sample data, performs sample alignment on the sample data, performs federal characteristic binning and characteristic selection, selects model algorithms such as logistic regression and the like to perform federal model training, splits the trained federal model on the basis of characteristic dimensions, and both sides of the split model are half models respectively, which comprises the following steps: the prediction Model-A at the A end and the prediction Model-B at the B end.

In the process of model training, the transmitted data is not the data per se and is not the encryption form of the data, but the model updates the models of the two parties under the condition of encryption by using the encryption form of an intermediate result generated in the operation process of the data. In the whole process, under the premise of ensuring that the effect of the model is completely consistent with that of the traditional training method, no bottom layer data is disclosed to other parties, so that the data security in the data sharing process is ensured.

S604, sending the prediction Model-A to an A end for deployment, specifically, firstly encrypting the prediction Model-A by using a public key of the A end, sending the encrypted prediction Model-A to the A end through a network private line, decrypting the encrypted prediction Model-A by using a private key of the A end to obtain the prediction Model-A, and deploying the prediction Model-A; and a prediction Model-B is deployed at the end B.

And S605, carrying out validity verification on the prediction Model-A and the prediction Model-B through sample sampling of the total data.

And S606, when the real-time risk prediction of a certain service is really carried out in the production environment, the terminal B encrypts the prediction sample ID through the public key of the terminal A and sends the encrypted prediction sample ID to the terminal A. And the prediction sample ID is the user identification of the user to be predicted.

And S607, after receiving the predicted sample ID, the terminal A decrypts the predicted sample ID by using a private key to obtain the predicted sample ID, obtains local data corresponding to the predicted sample ID, processes the local data by using a pre-deployed prediction Model-A to obtain a local predicted value ResultA, encrypts the ResultA by using a public key of the terminal B, and sends the encrypted ResultA to the terminal B. And the B terminal obtains local data corresponding to the predicted sample ID, and the local data is processed through a pre-deployed prediction Model-B to obtain a local predicted value ResultB.

And S608, calculating a final Result by the terminal B according to ResultA and the local predicted value ResultB.

Because the privacy of both ends A, B is not directly exposed to the other party in the whole process, and the privacy of both ends A, B is not attacked by the other party.

S609, if there are multiple risk predictions, the step 606 and 608 are repeated.

The data owner shares data on the federal learning platform through the mode of the internet loan joint prediction; and the data acquisition party acquires data through a joint prediction mode, performs model calculation and finally outputs a desired model result for wind control decision. The data sharing and model prediction process utilizes an encryption technology to protect privacy and guarantee the security of data sharing. Meanwhile, the encryption technology has almost no influence on the calculation precision, so that lossless correctness can be achieved. Through the standard process, data safety sharing among the cooperation parties and the requirement of supervision regulation are realized in the internet combined loan service scene, and meanwhile, data isolated islands and information asymmetry are eliminated.

It should be noted that while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous.

It should be understood that the various steps recited in the method embodiments disclosed herein may be performed in a different order and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the disclosure is not limited in this respect.

Corresponding to the method described in fig. 1, an embodiment of the present application further provides a risk prediction apparatus, which is used for specifically implementing the method in fig. 2, where the risk prediction apparatus is applied to a target device, the target device is pre-deployed with a first prediction sub-model, and a schematic structural diagram of the risk prediction apparatus is shown in fig. 7, and specifically includes:

a sending unit 701, configured to send user information of a user to be predicted to each participant device of the target device;

a first obtaining unit 702, configured to obtain a second risk prediction result fed back by each participant device, where the second risk prediction result is obtained by processing, by a second prediction sub-model, participant local data, where the participant local data corresponds to the user information, and the second prediction model is pre-deployed in the participant device;

a second obtaining unit 703, configured to obtain a first risk prediction result; the first risk prediction result is obtained by processing target equipment local data by the first prediction submodel, wherein the target equipment local data correspond to the user information;

and the processing unit 704 is configured to process to obtain a target risk prediction result of the user to be predicted according to the first risk prediction result and each second risk prediction result.

According to the risk prediction device provided by the embodiment of the application, the local data of the participants are not directly transmitted to the target equipment, but the second risk prediction result obtained after the second prediction sub-model processes the local data of the participants is transmitted to the target equipment, so that accurate risk prediction is performed on loan users on the basis of the local data of the target equipment and the local data of each participant on the premise of ensuring the privacy and the safety of the data of the participants.

In an embodiment of the present application, based on the foregoing scheme, the method may further include:

an encryption unit configured to encrypt the user information using an encryption key of each participant device.

In an embodiment of the present application, based on the foregoing solution, the processing unit 704 is specifically configured to:

The embodiment of the present application further provides a storage medium, where an instruction set is stored, where the risk prediction method disclosed in any of the above embodiments is executed when the instruction set is executed.

An electronic device is further provided in the embodiment of the present application, and a schematic structural diagram of the electronic device is shown in fig. 8, and specifically includes a memory 801 for storing at least one set of instruction sets; a processor 802 for executing the set of instructions stored in the memory, the execution of the set of instructions implementing a risk prediction method as disclosed in any of the embodiments above.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

While several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

The foregoing description is only exemplary of the preferred embodiments disclosed herein and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the disclosure. For example, the above features and (but not limited to) technical features having similar functions disclosed in the present disclosure are mutually replaced to form the technical solution.

Claims

1. A risk prediction method is applied to a target device, wherein the target device is pre-deployed with a first prediction submodel, and the method comprises the following steps:

2. The method of claim 1, wherein the deployment of each predictor model comprises:

3. The method of claim 1, wherein the deployment of each predictor model comprises:

4. The method of claim 1, wherein before sending the user information of the user to be predicted to each participant device of the target device, further comprising:

5. The method according to claim 1 or 4, wherein the processing to obtain the target risk prediction result of the user to be predicted according to the first risk prediction result and each second risk prediction result comprises:

6. A risk prediction apparatus applied to a target device, the target device being pre-deployed with a first predictor model, the apparatus comprising:

7. The apparatus of claim 6, further comprising:

8. The apparatus of claim 6, further comprising:

9. A storage medium storing a set of instructions, wherein the set of instructions, when executed by a processor, implement the risk prediction method of any one of claims 1 to 5.

10. An electronic device, comprising:

a memory for storing at least one set of instructions;

a processor for executing a set of instructions stored in said memory, said set of instructions being executable to implement a risk prediction method as claimed in any one of claims 1 to 5.