CN113554476B

CN113554476B - Training method and system of credit prediction model, electronic equipment and storage medium

Info

Publication number: CN113554476B
Application number: CN202010325940.3A
Authority: CN
Inventors: 顾松庠; 孙孟哲; 王佩琪; 薄列峰
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2024-04-19
Anticipated expiration: 2040-04-23
Also published as: CN113554476A

Abstract

The present disclosure provides a training method, system, client, server, electronic device and storage medium for a confidence prediction model, including: the client determines first sample data and second sample data in each user account historical data corresponding to the sample user ID, determines first prediction information corresponding to the first sample data, determines second prediction information corresponding to the second sample data, sends the first prediction information and the second prediction information to the server, receives a prediction result sent by the server, and generates a credit prediction model according to the prediction result, the first sample data and the second sample data. In the embodiment of the application, the model training is carried out by adopting all the sample user IDs participating in the model training, and the model training is carried out by adopting all the account history data of each sample user ID, so that the completeness of the data participating in the model training can be ensured, and the technical effect of improving the accuracy of the credit prediction model can be realized.

Description

Training method and system of credit prediction model, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of internet, in particular to the technical field of model training, and particularly relates to a training method, a training system, a client, a server, electronic equipment and a storage medium of a credit prediction model.

Background

With the development of internet technology and the increasingly compact relevance among enterprises, sharable resources among enterprises are gradually increased, and in order to share the resources of each enterprise on the premise of ensuring the data security of each enterprise, a common model is constructed by each enterprise to realize.

In the prior art, the method can be realized based on federal learning, such as transverse federal learning, longitudinal federal learning and the like. Taking transverse federal learning as an example, in the model training process, the acquired data set is segmented according to the transverse direction (namely the user dimension), and partial data with the same characteristics of each enterprise are taken out for training; taking longitudinal federal learning as an example, in the model training process, the data set is segmented according to the longitudinal direction (namely the characteristic dimension), and the same partial data of the users of each enterprise are taken out for training.

However, in implementing the present disclosure, the inventors found that at least the following problems exist: because the model is partly data, the problem of low accuracy of the model obtained by training is easily caused.

Disclosure of Invention

The disclosure provides a training method, a training system, a training client, a training server, a training electronic device and a training storage medium for a credit prediction model, which are used for solving the problems in the prior art.

In one aspect, an embodiment of the present disclosure provides a method for training a confidence prediction model, the method including:

The method comprises the steps that a client receives training sample information sent by a server, wherein the training sample information comprises the following steps: sample user Identification (ID) of each client participating in model training;

The client determines first sample data and second sample data in each user account historical data corresponding to the sample user ID, wherein the second sample data has sample characteristics to be predicted, and the first sample data has other characteristics except the sample characteristics to be predicted;

The client determines first prediction information corresponding to the first sample data, determines second prediction information corresponding to the second sample data, and sends the first prediction information and the second prediction information to the server;

The client receives the prediction result sent by the server;

and the client generates a credit prediction model according to the prediction result, the first sample data and the second sample data.

In the embodiment of the application, the client receives the sample user IDs of all the clients participating in the model training sent by the server, namely all the sample user IDs received by the client participating in the model training. That is, the data for model training is complete from the dimension of "user". And the client generates first prediction information corresponding to the first sample data and generates second prediction information corresponding to the second sample data, namely the data for model training, which reserves the dimension of the 'feature'. Therefore, in the embodiment of the application, the integrity of the data used for model training is stronger, so that the technical effect of improving the accuracy of the credit prediction model can be realized.

In some embodiments, the sample training information further includes: homomorphic encryption public keys; the method further comprises the steps of:

the client encrypts the first prediction message and the second prediction message according to the public key;

And the client sending the first prediction information and the second prediction information to the server comprises: the client sends the encrypted first predicted message and the encrypted second predicted message to the server.

In the embodiment of the application, the security and the reliability in the data transmission process can be improved by an encryption mode.

In some embodiments, the determining, by the client, first prediction information corresponding to the first sample data includes:

The client acquires a first characteristic value corresponding to a first user which does not comprise the sample characteristic to be predicted, wherein the first user and other users share the first characteristic value, and acquires a first characteristic value weight corresponding to the first characteristic value, and a first common characteristic weight corresponding to the first common characteristic value;

And the client generates the first prediction information according to the public key, the first characteristic value, the first common characteristic value, the first characteristic value weight and the first common characteristic weight.

In some embodiments, the first prediction information includes a predicted value and a loss value of the first user after homomorphic encryption.

In some embodiments, the determining, by the client, second prediction information corresponding to the second sample data includes:

the client receives a reference predicted value sent by the server;

The client acquires a second characteristic value corresponding to a second user comprising the sample characteristics to be predicted, a second common characteristic value of the second user and other users, and acquires a second characteristic value weight corresponding to the second characteristic value, a second common characteristic weight corresponding to the second common characteristic value, and acquires characteristic data corresponding to the sample characteristics to be predicted;

The client generates the second prediction information according to the reference prediction value, the public key, the second characteristic value, the second common characteristic value, the second characteristic value weight, the second common characteristic weight and the characteristic data.

In some embodiments, the second prediction information includes a predicted value and a loss value of the homomorphic encrypted second user, and a loss value common to the homomorphic encrypted first user and the second user.

In some embodiments, the client receives an iterative training indication sent by the server;

and when the client determines iteration according to the first prediction information and the second prediction information, the first characteristic weight, the first common characteristic weight, the second characteristic weight and the updated value of the second common characteristic weight.

On the other hand, the embodiment of the disclosure also provides a training method of the credit prediction model, which comprises the following steps:

the server sends training sample information to each client side participating in model training, wherein the training sample information comprises the following components: sample user Identification (ID) of each client participating in model training;

The server receives first prediction information and second prediction information sent by each client side participating in model training;

The server generates a prediction result based on the first prediction information and the second prediction information, and sends the prediction result to each client side participating in model training, wherein the prediction result is used for representing associated information of the first prediction information and the second prediction information.

In some embodiments, the sample training information further includes: and the first prediction information and the second prediction information are both information encrypted by the public key.

In some embodiments, the method further comprises:

The server receives the reference predicted value sent by each client side participating in model training and sends the reference predicted value to each client side participating in model training, wherein the reference predicted value is used for representing prediction information with other characteristics except for the characteristics of samples to be predicted.

In another aspect, an embodiment of the present disclosure further provides a client, including:

The first communication module is used for receiving training sample information sent by the server, and the training sample information comprises: sample user Identification (ID) of each client participating in model training;

The first processing module is used for determining first sample data and second sample data in each user account history data corresponding to the sample user ID, wherein the second sample data has sample characteristics to be predicted, and the first sample data has other characteristics except the sample characteristics to be predicted;

The first processing module is further configured to determine first prediction information corresponding to the first sample data, and determine second prediction information corresponding to the second sample data;

The first communication module is further configured to send the first prediction information and the second prediction information to the server;

the first communication module is also used for receiving the prediction result sent by the server;

and the generating module is used for generating a credit prediction model according to the prediction result, the first sample data and the second sample data.

In some embodiments, the sample training information further includes: homomorphic encryption public keys; the client further comprises:

the encryption module is used for encrypting the first prediction message and the second prediction message respectively according to the public key;

the first communication module is specifically configured to send the encrypted first predicted message and the encrypted second predicted message to the server by the client.

In some embodiments, the first processing module is specifically configured to obtain a first feature value corresponding to a first user that does not include the sample feature to be predicted, where the first user and other users share the first feature value, and obtain a first feature value weight corresponding to the first feature value, where the first feature value corresponds to a first common feature weight; and generating the first prediction information according to the public key, the first eigenvalue, the first common eigenvalue, the first eigenvalue weight and the first common eigenvalue weight.

In some embodiments, the first processing module is further specifically configured to receive a reference prediction value sent by the server; acquiring a second characteristic value corresponding to a second user comprising the sample characteristics to be predicted, wherein the second user and other users have second common characteristic values, acquiring second characteristic value weights corresponding to the second characteristic values, acquiring second common characteristic weights corresponding to the second common characteristic values, and acquiring characteristic data corresponding to the sample characteristics to be predicted; and generating the second prediction information according to the reference predicted value, the public key, the second characteristic value, the second common characteristic value, the second characteristic value weight, the second common characteristic weight and the characteristic data.

In some embodiments, the client further comprises:

The first communication module is further used for receiving an iterative training instruction sent by the server;

And the updating module is used for determining updating values of the first characteristic weight, the first common characteristic weight, the second characteristic weight and the second common characteristic weight when iterating according to the first prediction information and the second prediction information.

In another aspect, embodiments of the present disclosure further provide a server, including:

The second communication module is configured to send training sample information to each client that participates in model training, where the training sample information includes: sample user Identification (ID) of each client participating in model training;

The second communication module is further used for receiving first prediction information and second prediction information sent by each client side participating in model training;

The second processing module is used for generating a prediction result based on the first prediction information and the second prediction information;

The second communication module is further configured to send the prediction result to each client that participates in model training, where the prediction result is used to characterize association information of the first prediction information and the second prediction information.

In some embodiments, the second communication module is further configured to receive a reference prediction value sent by each client that participates in model training, and send the reference prediction value to each client that participates in model training, where the reference prediction value is used to characterize prediction information that has other characteristics besides the sample characteristic to be predicted.

In another aspect, an embodiment of the present disclosure further provides a training system of a confidence prediction model, the system including: the client as described in the above embodiment, and the server as described in the above embodiment.

In another aspect, an embodiment of the present disclosure further provides an electronic device, including: a memory, a processor;

The memory is used for storing the processor executable instructions;

Wherein the processor, when executing the instructions in the memory, is configured to implement the method as described in any of the embodiments above.

In another aspect, the disclosed embodiments also provide a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement the method of any of the above embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is an application scenario schematic diagram of a training method of a confidence prediction model according to an embodiment of the present application;

FIG. 2 is a flowchart of a training method of a confidence prediction model according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a training method of a confidence prediction model according to another embodiment of the present application;

FIG. 4 is an interactive schematic diagram of a training method of the confidence prediction model of the present application;

FIG. 5 is a flowchart of a training method of a confidence prediction model according to another embodiment of the present application;

FIG. 6 is a schematic diagram of a client according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a client according to another embodiment of the present application;

FIG. 8 is a schematic diagram of a server according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

Specific embodiments of the present disclosure have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Referring to fig. 1, fig. 1 is a schematic diagram of an application scenario of a training method of a confidence prediction model according to an embodiment of the present application.

In the application scenario shown in fig. 1, a client of a plurality of enterprises involved in model training is included, and each client is connected to a server. A worker in an enterprise (hereinafter referred to as an operator) may operate clients of the enterprise to create or participate in model training.

For example, if multiple enterprises, particularly multiple banks, clients of each bank may be connected to a server, and model training is performed by the multiple clients and the server together.

Wherein, any client may be a creator of model training (hereinafter referred to as a first client), the related operations of the first client will be briefly described below:

An operator of the first client (hereinafter referred to as a first operator) may input model training information such as a name of a model and a password for joining the model training through a display interface of the first client. The name of the model is used for distinguishing the model created at the time from other models; the password added to the model training is used for representing the authorization and permission of the model training to ensure the safety and reliability of the model training.

The first operator may input network information, such as a protocol IP (Internet Protocol) address and a database name of an interconnection between networks, and input an account number and a password connected to the database through a display interface of the first client.

The first operator can input information of each user of the enterprise through a display interface of the first client.

The information of the user comprises a user name (comprising a real name and a nickname) of the user, an identification ID of the user, characteristics, characteristic data and characteristic data types. Wherein the features are used to characterize the business of the enterprise to which the user relates; the characteristic data is used for representing data generated by the business of the enterprise related to the user; the characteristic data types include floating point type and double precision double type.

Based on the above example, if the enterprise is a bank, the feature may be deposit and withdrawal, etc., and the feature data may be time, amount, etc. of deposit and withdrawal.

The first operator may select the training feature through a display interface of the first client. Wherein the training features are used for the purpose of characterizing the training model. For example, the training feature may be confidence.

The first operator can select or set parameters of model training, such as the number of circulation rounds of model training, the error convergence value of model training, the precision parameters of model training and the like, through the display interface of the first client.

After completing the simple introduction to the creator, the other clients (hereinafter referred to as second clients) participating in the model training are now briefly introduced as follows:

an operator of the second client (hereinafter referred to as a second operator) inputs the name of the model and the password of the joining model training through a display interface of the second client.

Similarly, the second operator inputs the network information and the user information through the display interface of the second client, which can be specifically referred to the above description and will not be repeated here.

In the related art, the training method of the confidence prediction model may be implemented based on federal learning, such as horizontal federal learning, specifically, the horizontal federal learning is implemented by cutting a data set according to a horizontal direction (i.e., a user dimension) and taking out partial data with the same characteristics of each enterprise for training when the feature of the data set used for model training overlaps more and the user overlaps less; as another example, longitudinal federation learning, where the users of the data set used for model training overlap more and feature overlap less, the data set is segmented according to the longitudinal direction (i.e., feature dimension), and the same partial data of the users of each enterprise is taken out for training.

However, in the practical application scenario, the users and features of the enterprises cannot be completely equivalent, and the adoption of the scheme in the related art can cause that part of the data for model training is abandoned, so that the accuracy of the model obtained by training is reduced.

The inventors of the present application, after having performed creative work, have obtained the inventive concept of the embodiments of the present application: and the data set comprising users and features among enterprises is fully utilized, and the integrity of data in the process of training a model is ensured.

The following describes the technical scheme of the present disclosure and how the technical scheme of the present disclosure solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

According to one aspect of the embodiment of the application, the embodiment of the application provides a training method of a credit prediction model.

Referring to fig. 2, fig. 2 is a flowchart illustrating a training method of a confidence prediction model according to an embodiment of the application.

As shown in fig. 2, the method includes:

S101: the client receives training sample information sent by the server, wherein the training sample information comprises: sample user identification, ID, of each client participating in model training.

The client may be a wireless terminal or a wired terminal. A wireless terminal may be a device that provides voice and/or other traffic data connectivity to a user, a handheld device with wireless connectivity, or other processing device connected to a wireless modem. The wireless terminal may communicate with one or more core network devices via a radio access network (Radio Access Network, RAN for short), which may be mobile terminals such as mobile phones (or "cellular" phones) and computers with mobile terminals, for example, portable, pocket, hand-held, computer-built-in or vehicle-mounted mobile devices that exchange voice and/or data with the radio access network. For another example, the wireless terminal may be a personal communication service (Personal Communication Service, PCS) phone, a cordless phone, a session initiation protocol (Session Initiation Protocol, SIP) phone, a wireless local loop (Wireless Local Loop, WLL) station, a personal digital assistant (personal DIGITAL ASSISTANT, PDA) or the like. A wireless terminal may also be referred to as a system, subscriber unit (subscriber unit), subscriber station (Subscriber Station), mobile station (mobile station), remote terminal (remote terminal), access terminal (ACCESS TERMINAL), user terminal (user terminal), user agent (user agent), user equipment (User Device or User Equipment), without limitation. Optionally, the terminal device may also be a device such as a smart watch or a tablet computer.

In this step, the client receives the sample user IDs of each client that participates in model training sent by the server, i.e., the client receives all the sample user IDs that participated in model training. That is, the data for model training is complete from the dimension of "user", and there is no case where the data is discarded as in the related art.

S102: the client determines first sample data and second sample data in each user account historical data corresponding to the sample user ID, the second sample data has sample characteristics to be predicted, and the first sample data has other characteristics except the sample characteristics to be predicted.

Based on the above examples, for any user, there may or may not be sample data of the sample feature to be predicted in the account history data of the user. In the step, for each user, whether a sample feature to be predicted exists or not is determined, and if so, data with the sample feature to be predicted exists is determined as second sample data; if not, determining that the data of the sample feature to be predicted does not exist as first sample data.

S103: the client determines first prediction information corresponding to the first sample data, determines second prediction information corresponding to the second sample data, and sends the first prediction information and the second prediction information to the server.

As can be seen from the above description, in the embodiment of the present application, the data for model training in the "client" dimension is reserved, and in this step, the client generates first prediction information corresponding to the first sample data, and generates second prediction information corresponding to the second sample data, that is, the data for model training in the "feature" dimension is reserved. Therefore, in the embodiment of the application, the integrity of the data used for model training is stronger, so that the accuracy of the credit prediction model obtained later can be improved.

S104: and the client receives the prediction result sent by the server.

S105: and the client generates a credit prediction model according to the prediction result, the first sample data and the second sample data.

Referring to fig. 3, fig. 3 is a flowchart illustrating a training method of a confidence prediction model according to another embodiment of the application.

As shown in fig. 3, the method includes:

s201: the client receives training sample information sent by the server, wherein the training sample information comprises: the sample user ID of each client participating in model training also includes a homomorphic encrypted public key.

The description of the training sample information including the sample user IDs of the clients participating in the model training may be the above examples, and will not be repeated herein.

In order to ensure the safety and reliability in the model training process, the training sample information also comprises a homomorphic encryption public key.

S202: the client determines first sample data and second sample data in each user account historical data corresponding to the sample user ID, the second sample data has sample characteristics to be predicted, and the first sample data has other characteristics except the sample characteristics to be predicted.

The description of S202 may refer to S102, and will not be repeated here.

S203: the client determines encrypted first prediction information corresponding to the first sample data and determines encrypted second prediction information corresponding to the second sample data.

In some embodiments, the client determining the encrypted first prediction information corresponding to the first sample data includes:

s2031: the client acquires a first characteristic value corresponding to a first user which does not comprise the sample characteristic to be predicted, a first common characteristic value of the first user and other users, and acquires a first characteristic value weight corresponding to the first characteristic value, and a first common characteristic weight corresponding to the first common characteristic value.

The model training is illustratively an iterative process, so in the first iteration, the first eigenvalue weights and the first common eigenvalue weights are set based on requirements, experience, experiment, etc., while in the next iteration, the first eigenvalue weights and the first common eigenvalue weights may be set based on the results of the previous iteration.

The first feature value and the first common feature value are pre-stored, and a specific method for generating the first feature value and the first common feature value may refer to the prior art, which is not described herein.

S2032: and the client generates encrypted first prediction information according to the public key, the first characteristic value, the first common characteristic value, the first characteristic value weight and the first common characteristic weight.

The first prediction information comprises a prediction value and a loss value of the first user after homomorphic encryption, and the homomorphic encryption is realized based on a public key.

In some embodiments, the client determining the encrypted second prediction information corresponding to the second sample data includes:

s2033: the client receives the reference predicted value sent by the server.

Based on the above example, it may be that the account history data of each user includes a sample to be tested or includes no sample feature to be predicted, if the account history data of a certain user includes no sample feature to be predicted, the server may determine the first prediction information based on S2031 and S2032 and send the first prediction information to the server, and collect the prediction information (i.e., the reference prediction value and the first prediction information is included in the reference prediction value) sent by all the clients participating in the model training and send the reference prediction value to all the clients participating in the model training; if account history data of a certain user of the client comprises sample characteristics to be predicted, the client calculates second prediction information on the basis of reference prediction values.

S2034: the client acquires a second characteristic value corresponding to a second user comprising the sample characteristics to be predicted, a second common characteristic value of the second user and other users, a second characteristic value weight corresponding to the second characteristic value, a second common characteristic weight corresponding to the second common characteristic value, and characteristic data corresponding to the sample characteristics to be predicted.

The second feature value, the second common feature value, the second feature value weight, and the second common feature weight may refer to related parameters of the first user, which are not described herein.

And acquiring feature data corresponding to the sample feature to be predicted from the account history data of the second user, wherein the account history data of the second user comprises the sample feature to be predicted.

S2035: the client generates second prediction information according to the reference prediction value, the public key, the second characteristic value, the second common characteristic value, the second characteristic value weight, the second common characteristic weight and the characteristic data.

In this step, the second prediction information includes a predicted value and a loss value of the homomorphic encrypted second user, and a loss value common to the homomorphic encrypted first user and the second user, where homomorphic encryption is implemented based on a public key.

S204: the client sends the encrypted first predicted message and the encrypted second predicted message to the server.

S205: and the client receives the prediction result sent by the server.

S206: and the client generates a credit prediction model according to the prediction result, the first sample data and the second sample data.

In order to make the reader more thoroughly understand the training method of the confidence prediction model according to the embodiment of the present application, the method according to the embodiment of the present application will be described in detail with reference to fig. 4. Fig. 4 is an interaction schematic diagram of a training method of the confidence prediction model of the present application.

As shown in fig. 4, the method includes:

s1: the server obtains all sample user IDs fed back by clients participating in model training.

Based on the above examples, both the creator of the model training and the joiner of the model training can send the respective corresponding user IDs to the server. Thus, the server may collect all the sample user IDs of the clients participating in the model training, and send all the collected sample user IDs to the clients participating in the model training, respectively.

The number of clients is 2, and the model training includes a first client and a second client, based on the above example, where the first client is a client for creating model training, the second client is a client for participating in model training, and the first client includes m sample users, and the second client includes n sample users. Then in this step the server obtains m + n sample user IDs.

S2: the server sends m+n sample user IDs to the first client and the second client respectively, and sends homomorphic encrypted public keys to the first client and the second client respectively.

S3: for each sample user ID, the client (because the operation principles of the first client and the second client are the same, and therefore the client is used for replacing the sample user ID herein) determines whether the sample to be predicted feature is included in the user account history data of each sample user ID, and if not, S4 is executed; if so, S9 is performed.

In order to distinguish sample data including a sample feature to be predicted from sample data not including a sample feature to be predicted, sample data not including a sample feature to be predicted is referred to as first sample data, and sample data including a sample feature to be predicted is referred to as second sample data.

And in order to distinguish the user corresponding to the sample data including the sample feature to be predicted from the user corresponding to the sample data not including the sample feature to be predicted, the user corresponding to the sample data not including the sample feature to be predicted is referred to as a first user u _A, and the user corresponding to the sample data including the sample feature to be predicted is referred to as a second user u _E.

For the convenience of readers to understand, in the embodiment of the present application, it is assumed that, for the sample feature to be predicted, the first user is the user corresponding to the first client, and the second user is the user corresponding to the second client.

S4: the first client obtains a first characteristic value X _A corresponding to a first user u _A, a first common characteristic value X _AB of the first user u _A and other users, and a first characteristic value weight W _A corresponding to the first characteristic value X _A and a first common characteristic weight W _AB corresponding to the first common characteristic value.

Wherein, X _A and X _AB are prestored, and specific generation methods can be referred to the prior art, and are not described herein.

Illustratively, training the model is an iterative process, so that during an iteration cycle, W _A and W _AB can be set based on experience, demand, experimentation, and the like. While in other iteration cycles, W _A and W _AB may be set based on the previous iteration cycle. As in the second iteration period, W _A and W _AB may be set based on the iteration result of the first iteration period.

S5: the first client generates first prediction information according to equations 1 and 2. The first prediction information includes a predicted value u _A and a loss value L _A of the first user.

Wherein, formula 1 may be: u _A＝W_AX_A+W_ABX_AB/2.

Wherein, equation 2 may be ：L_A＝sum_{User' s}(u_A ²)+lambda*W_A ²/2+lambda*W_AB ²/4,lambda as a preset precision parameter, and may be specifically set based on experience, demand, experiment, and the like.

It should be noted that, in the embodiment of the present application, only one user common characteristic value of the first user is used for exemplary description, if the first user characteristic value of the first user and the other users is the common characteristic value of the first user and the other users, the corresponding average parameter 2 is replaced by 3, for example, equation 1u _A＝W_AX_A+W_ABCX_ABC/3, and so on, equation 2 is also adaptively adjusted, which is not described herein.

S6: the first client homomorphic encrypts u _A and L _A according to the public key to generate [ [ u _A ] ] and [ [ L _A ] ].

S7: the first client transmits [ [ u _A ] ] and [ [ L _A ] ] to the server.

S8: the server sends [ [ u _A ] ] to the second client.

S9: the second client obtains a second characteristic value X _E corresponding to a second user u _E, a second common characteristic value X _EF of the second user u _E and other users, and obtains a second characteristic value weight W _E corresponding to the second characteristic value X _E, and a second common characteristic weight W _EF corresponding to the second common characteristic value X _EF, and obtains characteristic data label corresponding to a sample characteristic to be predicted.

The descriptions of X _E、X_EF、W_E and W _EF are referred to above, and are not repeated here.

S10: the second client generates second prediction information according to equations 3, 4 and 5. The second prediction information includes a predicted value u _E and a loss value L _E of the second user, and a loss value [ L _AB ] common to the first user and the second user after homomorphism encryption.

Wherein, formula 3 may be: u _E＝W_EX_E+W_EFX_EF/2.

Wherein formula 4 may be ：L_E＝sum_{User' s}(u_E ²)+lambda*W_E ²/2+lambda*W_EF ²/4.

Wherein, formula 5 may be: [ [ L _EF]]＝2*sum_{User' s}([[u_A]](u_E -label ]).

S11: the second client encrypts u _E -label and L _E homomorphically according to the public key to generate [ [ u _E -label ] ] and [ [ L _E ] ].

S12: the second client sends [ [ L _AB]]、[[u_E -label ] ] and [ [ L _E ] ] to the server.

S13: the server adds [ [ u _E -label ] ] and [ [ u _A ] ] to obtain [ [ d ] ].

S14: the server transmits [ [ d ] ] to the first client and the second client.

S15: the first client performs derivative calculation on [ [ d ] ] according to 6 to generate [ [ dL ] ].

Wherein, formula 6 may be: [ [ dL ] ] =sum _{User' s}([[d]]*(X_A+X_AB))+[[lambda*W_A]]+[[lambda*W_AB ] ]/2.

S16: the first client adds homomorphic encrypted random term (R) on the basis of (d) to generate (dL) and (R).

S17: the first client transmits [ [ dL ] ] to the server.

S18: the server decrypts [ [ dL ] ] [ [ R ] ] to generate dL+R.

S19: the server sends dl+r to the first client and the second client.

S20: the first client calculates dL according to the preset R.

S21: the first client calculates the current weight W according to equation 7.

Wherein, formula 7 may be: w=sum _{User' s}(dL*(X_A+X_AB))/lambda.

When the iteration is finished, the current weight of each sample feature is the model obtained through training.

When the server executes S13, S22 may also be executed: the server determines whether to perform the next iteration according to [ [ L _A]]、[[L_E ] and [ [ L _AB ] ], if yes, executing S23; if not, S25 is performed.

Wherein, this step may specifically include: the server calculates the sum of [ [ L _A]]、[[L_E ] ] and [ [ L _AB ] ] to obtain [ [ L ] ]. Comparing [ [ L ] ] with the preset iteration parameters, and if [ [ L ] ] is smaller than the iteration parameters, executing S23; if [ [ L ] ] is greater than or equal to the iteration parameter, S25 is performed.

S23: the server sends an iterative training instruction to the first client and the second client.

S24: the first client updates W _A according to W, and executes S5.

S25: and the server sends an end training instruction to the first client and the second client, and the process is ended.

The principle of the derivative calculation of the second client and the principle of the calculation of the current weight are the same as the principle of the first client, and the principle of the execution operation after receiving the iterative training instruction is the same as the principle of the first client.

In some embodiments, any one of the clients participating in the model training may initiate a prediction request to the server, where the prediction request is used to predict the confidence level, for example, the first client initiates a prediction request to the server, where the prediction request carries a user ID to be predicted, the server sends the user ID to be predicted to each client participating in the model training, each client participating in the model training predicts the confidence level of the user ID to be predicted according to the confidence level prediction model obtained by the training in the method, so as to obtain a corresponding prediction value, each client participating in the model training sends the corresponding prediction value to the server, and the server generates a final prediction value according to each prediction value and sends the final prediction value to the first client.

According to another aspect of the disclosed embodiments, the disclosed embodiments further provide a training method of the confidence prediction model.

Referring to fig. 5, fig. 5 is a flowchart illustrating a training method of a confidence prediction model according to another embodiment of the application.

As shown in fig. 5, the method includes:

S301: the server sends training sample information to each client participating in model training of the model training, wherein the training sample information comprises: sample user identification, ID, of each client participating in model training.

The server receives the corresponding sample user IDs sent by the clients participating in the model training, and sends all the sample user IDs to the clients participating in the model training.

S302: the server receives the first prediction information and the second prediction information sent by each client participating in model training.

S303: the server generates a prediction result based on the first prediction information and the second prediction information, wherein the prediction result is used for representing associated information of the first prediction information and the second prediction information.

S304: and sending the prediction result to each client side participating in model training.

For example, if the first prediction information includes a first prediction value and the second prediction information includes a second prediction value, the prediction result is the sum of the first prediction value and the second prediction value.

In some embodiments, the training sample information includes: the homomorphic encrypted public key is used for the first prediction information and the second prediction information which are both information encrypted by the public key.

In some embodiments, a server receives reference predictors transmitted by clients participating in model training and transmits the reference predictors to the clients participating in model training, wherein the reference predictors are used to characterize prediction information, such as first prediction information, having characteristics other than the characteristics of the sample to be predicted.

In some embodiments, the server determines whether to continue model training according to the first prediction information and the second prediction information, if so, generates an iterative training instruction, and sends the iterative training instruction to each client participating in model training.

According to another aspect of the embodiment of the present application, there is further provided a client, which may be used to perform the method shown in fig. 2 and 3 and may perform the interaction with a server as shown in fig. 4.

Referring to fig. 6, fig. 6 is a schematic diagram of a client according to an embodiment of the application.

As shown in fig. 6, the client includes:

The first communication module 11 is configured to receive training sample information sent by the server, where the training sample information includes: sample user Identification (ID) of each client participating in model training;

A first processing module 12, configured to determine, in each user account history data corresponding to the sample user ID, first sample data and second sample data, where the second sample data has a sample feature to be predicted, and the first sample data has other features except the sample feature to be predicted;

The first processing module 12 is further configured to determine first prediction information corresponding to the first sample data, and determine second prediction information corresponding to the second sample data;

the first communication module 11 is further configured to send the first prediction information and the second prediction information to the server;

the first communication module 11 is further configured to receive a prediction result sent by the server;

A generating module 13, configured to generate a confidence prediction model according to the prediction result, the first sample data and the second sample data.

As can be seen in conjunction with fig. 7, the sample training information further includes: homomorphic encryption public keys; the client further comprises:

an encryption module 14, configured to encrypt the first predicted message and the second predicted message according to the public key respectively;

and the first communication module 11 is specifically configured to send the encrypted first predicted message and the encrypted second predicted message to the server by the client.

In some embodiments, the first processing module 12 is further specifically configured to receive a reference prediction value sent by the server; acquiring a second characteristic value corresponding to a second user comprising the sample characteristics to be predicted, wherein the second user and other users have second common characteristic values, acquiring second characteristic value weights corresponding to the second characteristic values, acquiring second common characteristic weights corresponding to the second common characteristic values, and acquiring characteristic data corresponding to the sample characteristics to be predicted; and generating the second prediction information according to the reference predicted value, the public key, the second characteristic value, the second common characteristic value, the second characteristic value weight, the second common characteristic weight and the characteristic data.

As can be seen in conjunction with fig. 7, in some embodiments, the client further comprises:

the first communication module 11 is further configured to receive an iterative training instruction sent by the server;

And an updating module 15, configured to determine updated values of the first feature weight, the first common feature weight, the second feature weight, and the second common feature weight when iterating according to the first prediction information and the second prediction information.

According to another aspect of the embodiment of the present application, there is further provided a server, which may be used to perform the method shown in fig. 5 and may perform the interaction with the client as shown in fig. 4.

Referring to fig. 8, fig. 8 is a schematic diagram of a server according to an embodiment of the application.

As shown in fig. 8, the server includes:

The second communication module 21 is configured to send training sample information to each client that participates in model training, where the training sample information includes: sample user Identification (ID) of each client participating in model training;

the second communication module 21 is further configured to receive first prediction information and second prediction information sent by each client that participates in model training;

a second processing module 22, configured to generate a prediction result based on the first prediction information and the second prediction information;

The second communication module 21 is further configured to send the prediction result to each client that participates in model training, where the prediction result is used to characterize association information of the first prediction information and the second prediction information.

In some embodiments, the second communication module 21 is further configured to receive a reference prediction value sent by each client that participates in model training, and send the reference prediction value to each client that participates in model training, where the reference prediction value is used to characterize prediction information that has features other than the sample feature to be predicted.

According to another aspect of the embodiment of the present application, the embodiment of the present application further provides a training system for a confidence prediction model, where the system includes the client according to the foregoing embodiment, and further includes the server according to the foregoing embodiment. That is, the system includes a client as shown in fig. 6 or 7, and also includes a server as shown in fig. 8.

According to another aspect of the embodiments of the present disclosure, there is also provided an electronic device including: a memory, a processor;

A memory for storing processor-executable instructions;

For example, the processor is configured to implement the methods shown in fig. 2 and 3, or the processor is configured to implement the method shown in fig. 5.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

As shown in fig. 9, the electronic device includes a memory and a processor, and may further include a communication interface and a bus, wherein the processor, the communication interface, and the memory are connected by the bus; the processor is configured to execute executable modules, such as computer programs, stored in the memory.

The memory may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. Communication connection between the system network element and at least one other network element is achieved through at least one communication interface, which may be wired or wireless, and the internet, wide area network, local network, metropolitan area network, etc. may be used.

The bus may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be divided into address buses, data buses, control buses, etc.

The memory is used for storing a program, and the processor executes the program after receiving an execution instruction, so that the method disclosed in any embodiment of the foregoing disclosure may be applied to the processor or implemented by the processor.

The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal processor (Digital SignalProcessing), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components. The steps of a method disclosed in connection with the embodiments of the present disclosure may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

According to another aspect of the disclosed embodiments, the disclosed embodiments also provide a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement a method as described in any of the above embodiments.

For example, the computer-executable instructions, when executed by a processor, are for implementing the methods shown in fig. 2 and 3, or the computer-executable instructions, when executed by a processor, are for implementing the methods shown in fig. 5.

The reader will appreciate that in the description of this specification, a description of terms "one embodiment," "some embodiments," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and units described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present disclosure.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present disclosure. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should also be understood that, in the embodiments of the present disclosure, the sequence number of each process described above does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.

The foregoing is merely a specific embodiment of the present disclosure, but the protection scope of the present disclosure is not limited thereto, and any equivalent modifications or substitutions will be apparent to those skilled in the art within the scope of the present disclosure, and these modifications or substitutions should be covered in the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of training a confidence prediction model, the method comprising:

each client receives training sample information sent by a server, wherein the training sample information comprises: sample user Identification (ID) of each client participating in model training;

each client determines first sample data and second sample data in the historical data of each user account corresponding to the sample user ID, wherein the second sample data has sample characteristics to be predicted, and the first sample does not have characteristic data of the sample to be predicted;

Each client determines first prediction information corresponding to the first sample data, determines second prediction information corresponding to the second sample data, and sends the first prediction information and the second prediction information to the server; the first prediction information comprises a prediction value and a loss value of the homomorphic encrypted first user; the second prediction information comprises a prediction value and a loss value of a homomorphic encrypted second user, and a loss value common to the homomorphic encrypted first user and the second user; the first user is a user which does not comprise sample characteristics to be predicted; the second user is a user comprising sample characteristics to be predicted;

Each client receives the prediction result sent by the server; the prediction result is used for representing the association information of the first prediction information and the second prediction information;

And each client generates a credit prediction model according to the prediction result, the first sample data and the second sample data.

2. The method according to claim 1, wherein the training sample information further comprises: homomorphic encryption public keys; the method further comprises the steps of:

The client encrypts the first prediction information and the second prediction information according to the public key;

And the client sending the first prediction information and the second prediction information to the server comprises: and the client sends the encrypted first prediction information and the encrypted second prediction information to the server.

3. The method of claim 2, wherein the client determining the first prediction information corresponding to the first sample data comprises:

4. The method of claim 3, wherein the client determining second prediction information corresponding to the second sample data comprises:

the client receives a reference predicted value sent by the server;

5. The method according to claim 4, wherein the method further comprises:

the client receives an iterative training instruction sent by the server;

6. A method of training a confidence prediction model, the method comprising:

the server sends training sample information to each client side participating in model training, wherein the training sample information comprises the following components: the sample user identification ID of each client participating in model training is used for determining first sample data and second sample data in each user account history data corresponding to the sample user ID, determining first prediction information corresponding to the first sample data, determining second prediction information corresponding to the second sample data and sending the first prediction information and the second prediction information to the server; the second sample data has sample characteristics to be predicted, and the first sample does not have the characteristic data of the sample to be predicted; the first prediction information comprises a prediction value and a loss value of the homomorphic encrypted first user; the second prediction information comprises a prediction value and a loss value of a homomorphic encrypted second user, and a loss value common to the homomorphic encrypted first user and the second user; the first user is a user which does not comprise sample characteristics to be predicted; the second user is a user comprising sample characteristics to be predicted;

The server generates a prediction result based on the first prediction information and the second prediction information, and sends the prediction result to each client participating in model training, so that each client generates a reliability prediction model according to the prediction result, the first sample data and the second sample data; the prediction result is used for representing the association information of the first prediction information and the second prediction information.

7. The method according to claim 6, wherein the training sample information further comprises: and the first prediction information and the second prediction information are both information encrypted by the public key.

8. The method according to claim 6 or 7, characterized in that the method further comprises:

9. A client, the client comprising:

The first processing module is used for determining first sample data and second sample data in each user account history data corresponding to the sample user ID, wherein the second sample data has sample characteristics to be predicted, and the first sample does not have characteristic data of the sample to be predicted; the first processing module is further configured to determine first prediction information corresponding to the first sample data, and determine second prediction information corresponding to the second sample data; the first prediction information comprises a prediction value and a loss value of the homomorphic encrypted first user; the second prediction information comprises a prediction value and a loss value of a homomorphic encrypted second user, and a loss value common to the homomorphic encrypted first user and the second user; the first user is a user which does not comprise sample characteristics to be predicted; the second user is a user comprising sample characteristics to be predicted;

The first communication module is also used for receiving the prediction result sent by the server; the prediction result is used for representing the association information of the first prediction information and the second prediction information;

10. The client of claim 9, wherein the training sample information further comprises: homomorphic encryption public keys; the client further comprises:

the encryption module is used for encrypting the first prediction information and the second prediction information respectively according to the public key;

the first communication module is specifically configured to send the encrypted first prediction information and the encrypted second prediction information to the server by the client.

11. The client of claim 10, wherein the first processing module is specifically configured to obtain a first feature value corresponding to a first user that does not include the sample feature to be predicted, where the first user and other users share a first feature value, and obtain a first feature value weight corresponding to the first feature value, where the first feature value corresponds to a first common feature weight; and generating the first prediction information according to the public key, the first eigenvalue, the first common eigenvalue, the first eigenvalue weight and the first common eigenvalue weight.

12. The client of claim 11, wherein the first processing module is further specifically configured to receive a reference prediction value sent by the server; acquiring a second characteristic value corresponding to a second user comprising the sample characteristics to be predicted, wherein the second user and other users have second common characteristic values, acquiring second characteristic value weights corresponding to the second characteristic values, acquiring second common characteristic weights corresponding to the second common characteristic values, and acquiring characteristic data corresponding to the sample characteristics to be predicted; and generating the second prediction information according to the reference predicted value, the public key, the second characteristic value, the second common characteristic value, the second characteristic value weight, the second common characteristic weight and the characteristic data.

13. The client of claim 12, wherein the client further comprises:

14. A server, the server comprising:

the second communication module is configured to send training sample information to each client that participates in model training, where the training sample information includes: the sample user identification ID of each client participating in model training is used for determining first sample data and second sample data in each user account history data corresponding to the sample user ID, determining first prediction information corresponding to the first sample data, determining second prediction information corresponding to the second sample data and sending the first prediction information and the second prediction information to the server; the second sample data has sample characteristics to be predicted, and the first sample does not have the characteristic data of the sample to be predicted; the first prediction information comprises a prediction value and a loss value of the homomorphic encrypted first user; the second prediction information comprises a prediction value and a loss value of a homomorphic encrypted second user, and a loss value common to the homomorphic encrypted first user and the second user; the first user is a user which does not comprise sample characteristics to be predicted; the second user is a user comprising sample characteristics to be predicted;

The second communication module is further configured to send the prediction result to each client that participates in model training, so that each client generates a reliability prediction model according to the prediction result, the first sample data and the second sample data; the prediction result is used for representing the association information of the first prediction information and the second prediction information.

15. The server of claim 14, wherein the training sample information further comprises: and the first prediction information and the second prediction information are both information encrypted by the public key.

16. The server according to claim 14 or 15, wherein the second communication module is further configured to receive a reference prediction value sent by each client participating in the model training, and send the reference prediction value to each client participating in the model training, where the reference prediction value is used to characterize prediction information having other characteristics than the sample characteristic to be predicted.

17. A training system for a confidence prediction model, the system comprising: a client according to any of claims 9 to 13, and a server according to any of claims 14 to 16.

18. An electronic device, the electronic device comprising: a memory, a processor;

The memory is used for storing the processor executable instructions;

wherein the processor, when executing the instructions in the memory, is configured to implement the method of any one of claims 1 to 5; or alternatively

The processor is configured to implement the method of any one of claims 6 to 8.

19. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 5; or alternatively

The computer-executable instructions, when executed by a processor, are for implementing the method of any one of claims 6 to 8.