CN109902742B

CN109902742B - Sample completion method, terminal, system and medium based on encryption migration learning

Info

Publication number: CN109902742B
Application number: CN201910153223.4A
Authority: CN
Inventors: 刘洋; 康焱; 陈天健; 杨强
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-02-28
Filing date: 2019-02-28
Publication date: 2021-07-16
Anticipated expiration: 2039-02-28
Also published as: CN109902742A

Abstract

The invention discloses a sample completion method, a terminal, a system and a medium based on encryption transfer learning, wherein the method comprises the following steps: determining the feature intersection of the samples of the two parties, training a first sample based on the intersection to obtain a first feature model, and encrypting and sending the model to a second terminal; receiving a second encryption characteristic model sent by a second terminal, and pre-measuring a first encryption completion characteristic for the missing characteristic of the first sample according to the model; calculating the first encryption completion characteristic according to a first preset calculation rule to obtain a first completion characteristic; receiving a second encryption marking model sent by a second terminal, and pre-measuring a first encryption completion marking on the marking missing from the first sample based on the model, the initial characteristic of the first sample and the first completion characteristic; and operating the first encryption completion annotation according to a third preset operation rule to obtain the first completion annotation. The invention realizes the feature and mark completion of the sample data of each party on the premise of ensuring that the data privacy of each party is not revealed.

Description

Sample completion method, terminal, system and medium based on encryption migration learning

Technical Field

The invention relates to the technical field of data processing, in particular to a sample completion method, a terminal, a system and a medium based on encryption migration learning.

Background

In the field of artificial intelligence, a traditional data processing mode is that one party collects data, then transfers the data to the other party for processing, cleaning and modeling, and finally sells the model to a third party. However, as regulations become more sophisticated and monitoring becomes more stringent, operators may violate the laws if the data leaves the collector or if the user is not aware of the particular use of the model. Data exists in an island form, and a direct scheme for solving the island is to integrate the data into one party for processing. However, this is now likely to be illegal, as the laws do not allow the operator to aggregate data roughly.

To solve the dilemma, distributed machine learning algorithms are proposed, but the distributed machine learning algorithms often cannot be used due to the fact that part of data lacks features or labels. For example, a horizontal federal learning algorithm usually requires that feature dimensions of each participant in the algorithm are the same, and for features which are not owned by all participants, we can only generally abandon the part of feature data not used; in the distributed machine learning algorithm based on supervised learning, samples of all participants in the algorithm need to be labeled, and similarly, for data without labels, the data can be usually abandoned. These situations cause a lot of data waste, and also cause the sample data used for training the machine learning algorithm to be unevenly distributed, thereby reducing the generalization ability of the training model.

Therefore, when each participant in distributed machine learning comes from different organizations, how to complement the characteristics and labels of each party under the condition of ensuring the data security and privacy of each party is a problem to be solved urgently.

Disclosure of Invention

The invention mainly aims to provide a sample completion method, a terminal, a system and a medium based on encryption migration learning, and aims to solve the technical problem of data waste caused by sample characteristic loss or label loss of the existing distributed machine learning algorithm.

In order to achieve the above object, the present invention provides a sample completion method based on encryption migration learning, where the sample completion method based on encryption migration learning is applied to a first terminal, and the sample completion method based on encryption migration learning includes the following steps:

determining a feature intersection of a first sample of the first terminal and a second sample of a second terminal, training an initial feature of the first sample based on the feature intersection to obtain a first feature model, encrypting the first feature model and sending the first feature model to the second terminal so that the second terminal can predict a feature missing from the second sample to obtain a second encryption completion feature, and operating the second encryption completion feature according to a second preset operation rule to obtain a second completion feature;

receiving a second encryption feature model sent by the second terminal, predicting the missing features of the first sample according to the second encryption feature model to obtain first encryption completion features, wherein the second encryption feature model is obtained by training initial features of the second sample based on the feature intersection by the second terminal;

calculating the first encryption completion characteristic according to a first preset calculation rule to obtain a first completion characteristic;

receiving a second encryption labeling model sent by the second terminal, predicting the label missing from the first sample based on the second encryption labeling model, the initial characteristic of the first sample and the first completion characteristic to obtain a first encryption completion labeling, wherein the second encryption labeling model is obtained by the second terminal according to the initial characteristic of the second sample, the second completion characteristic and the initial labeling training of the second sample;

and calculating the first encryption completion label according to a third preset operation rule to obtain a first completion label.

Optionally, the step of performing an operation on the first encryption completion feature according to a first preset operation rule to obtain a first completion feature includes:

adding a first random mask to the first encryption completion feature to obtain a first encryption mask feature, and sending the first encryption mask feature to the second terminal so that the second terminal decrypts the first encryption mask feature to obtain a first mask feature;

and receiving a first mask feature sent by the second terminal, and subtracting the first random mask from the first mask feature to obtain a first completion feature.

Optionally, the step of performing an operation on the first encryption completion tag according to a third preset operation rule to obtain a first completion tag includes:

adding a third random mask to the first encryption completion label to obtain a third encryption mask label, and sending the third encryption mask label to the second terminal, so that the second terminal decrypts the third encryption mask label to obtain a third mask label;

and receiving a third mask label sent by the second terminal, and subtracting the third random mask from the third mask label to obtain a first completion label.

Optionally, the determining a feature intersection of a first sample of the first terminal and a second sample of the second terminal, training an initial feature of the first sample based on the feature intersection to obtain a first feature model, encrypting the first feature model, and sending the first feature model to the second terminal, so that the second terminal predicts a missing feature of the second sample to obtain a second encrypted completion feature, and operating the second encrypted completion feature according to a second preset operation rule to obtain a second completion feature includes:

determining a feature intersection of a first sample of the first terminal and a second sample of a second terminal, training an initial feature of the first sample based on the feature intersection to obtain a first feature model, encrypting and sending the first feature model to the second terminal so that the second terminal predicts a missing feature of the second sample to obtain a second encrypted completion feature, adding a second random mask to the second encrypted completion feature to obtain a second encrypted mask feature, sending the second encrypted mask feature to the first terminal so that the first terminal decrypts the second encrypted mask feature to obtain a second mask feature, and subtracting the second random mask from the second mask feature to obtain a second completion feature when the second mask feature sent by the first terminal is received.

The invention provides a sample completion method based on encryption transfer learning, which is applied to a second terminal and comprises the following steps:

the second terminal receives an encrypted first feature model sent by the first terminal, predicts the missing features of a second sample according to the encrypted first feature model to obtain second encrypted completion features, and operates the second encrypted completion features according to a second preset operation rule to obtain second completion features, the first feature model determines the feature intersection of the first sample of the first terminal and the second sample of the second terminal through the first terminal, and the initial features of the first sample are trained on the basis of the feature intersection to obtain the first feature model;

training initial features of the second sample based on the feature intersection to obtain a second encrypted feature model, sending the second encrypted feature model to the first terminal, so that the first terminal can predict the missing features of the first sample according to the second encrypted feature model to obtain a first encrypted completion feature, and calculating the first encrypted completion feature according to a first preset operation rule to obtain a first completion feature;

and training according to the initial characteristic of the second sample, the second completion characteristic and the initial label of the second sample to obtain a second encrypted label model, sending the second encrypted label model to the first terminal so that the first terminal can predict the label missing from the first sample based on the second encrypted label model, the initial characteristic of the first sample and the first completion characteristic to obtain a first encrypted completion label, and calculating the first encrypted completion label according to a third preset operation rule to obtain the first completion label.

Optionally, the step of calculating the second encryption completion characteristic according to a second preset calculation rule to obtain a second completion characteristic includes:

adding a second random mask to the second encryption completion characteristic to obtain a second encryption mask characteristic, and sending the second encryption mask characteristic to the first terminal so that the first terminal decrypts the second encryption mask characteristic to obtain a second mask characteristic;

and when a second mask feature sent by the first terminal is received, subtracting the second random mask from the second mask feature to obtain a second completion feature.

Optionally, the training of the initial feature of the second sample based on the feature intersection to obtain a second encrypted feature model, sending the second encrypted feature model to the first terminal, so that the first terminal predicts the missing feature of the first sample according to the second encrypted feature model to obtain a first encrypted completion feature, and performs an operation on the first encrypted completion feature according to a first preset operation rule to obtain a first completion feature, where the step of obtaining the first completion feature includes:

training initial features of the second sample based on the feature intersection to obtain a second encrypted feature model, sending the second encrypted feature model to the first terminal, so that the first terminal predicts the missing features of the first sample according to the second encrypted feature model to obtain first encrypted completion features, adding a first random mask to the first encrypted completion features to obtain first encrypted mask features, sending the first encrypted mask features to the second terminal, so that the second terminal decrypts the first encrypted mask features to obtain first mask features, and subtracting the first random mask from the first mask features to obtain first completion features when receiving the first mask features sent by the second terminal.

Optionally, the training according to the initial feature of the second sample, the second completion feature, and the initial label of the second sample is performed to obtain a second encrypted label model, and the second encrypted label model is sent to the first terminal, so that the first terminal predicts the label missing from the first sample based on the second encrypted label model, the initial feature of the first sample, and the first completion feature to obtain a first encrypted completion label, and operates the first encrypted completion label according to a third preset operation rule to obtain a first completion label, where the training includes:

obtaining a second encryption labeling model according to the initial characteristic of the second sample, the second completion characteristic and the initial labeling training of the second sample, and sending the second encryption labeling model to the first terminal, so that the first terminal can predict the label missing from the first sample based on the second encryption labeling model, the initial characteristic of the first sample and the first completion characteristic to obtain a first encryption completion label, and adding a third random mask to the first encryption completion label to obtain a third encryption mask label, sending the third encryption mask label to the second terminal, for the second terminal to decrypt the third encrypted mask label to obtain a third mask label, and when a third mask mark sent by the second terminal is received, subtracting the third random mask from the third mask mark to obtain a first completion mark.

In addition, to achieve the above object, the present invention further provides a terminal, where the terminal is a first terminal, and the first terminal includes: the system comprises a memory, a processor and a sample completion program based on the encryption migration learning, wherein the sample completion program based on the encryption migration learning is stored on the memory and can run on the processor, and when being executed by the processor, the sample completion program based on the encryption migration learning realizes the steps of the sample completion method based on the encryption migration learning.

The present invention further provides a terminal, where the terminal is a second terminal, and the second terminal includes: the system comprises a memory, a processor and a sample completion program based on the encryption migration learning, wherein the sample completion program based on the encryption migration learning is stored on the memory and can run on the processor, and when being executed by the processor, the sample completion program based on the encryption migration learning realizes the steps of the sample completion method based on the encryption migration learning.

The invention also provides a sample completion system based on encryption transfer learning, which comprises at least one first terminal and at least one second terminal.

In addition, in order to achieve the above object, the present invention further provides a storage medium applied to a computer, wherein the storage medium stores a sample completion program based on encryption migration learning, and the sample completion program based on encryption migration learning is executed by a processor to implement the steps of the sample completion method based on encryption migration learning as described above.

The method comprises the steps of determining a feature intersection of a first sample of a first terminal and a second sample of a second terminal, training an initial feature of the first sample based on the feature intersection to obtain a first feature model, encrypting the first feature model and sending the first feature model to the second terminal so that the second terminal can predict missing features of the second sample to obtain second encryption completion features, and operating the second encryption completion features according to a second preset operation rule to obtain second completion features; receiving a second encryption characteristic model sent by a second terminal, predicting the missing characteristic of a first sample of the first terminal according to the second encryption characteristic model to obtain a first encryption completion characteristic, wherein the second encryption characteristic model is obtained by training the initial characteristic of a second sample based on the characteristic intersection by the second terminal; calculating the first encryption completion characteristic according to a first preset calculation rule to obtain a first completion characteristic; receiving a second encryption labeling model sent by a second terminal, predicting the label missing from the first sample based on the second encryption labeling model, the initial characteristic of the first sample and the first completion characteristic to obtain a first encryption completion label, wherein the second encryption labeling model is obtained by training the second terminal according to the initial characteristic of the second sample, the second completion characteristic and the initial label of the second sample; and operating the first encryption completion label according to a third preset operation rule to obtain a first completion label. The invention realizes the feature and mark completion of the sample data of each party on the premise of ensuring that the data privacy of each party is not revealed.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of a sample completion method based on encryption transfer learning according to the present invention;

fig. 3 is a schematic view of a second embodiment of a sample completion method based on encryption transfer learning according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.

It should be noted that, the terminal in the embodiment of the present invention may be a terminal device such as a smart phone, a personal computer, and a server, and is not limited herein.

As shown in fig. 1, the model parameter training apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the model parameter training device illustrated in FIG. 1 does not constitute a limitation of the model parameter training device, and may include more or fewer components than illustrated, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a sample complementation program based on the cryptographic migration learning. The operating system is a program for managing and controlling hardware and software resources of the model parameter training device and supports the running of a sample completion program based on encryption migration learning and other software or programs.

In the model parameter training apparatus shown in fig. 1, the user interface 1003 is mainly used for data communication with each terminal; the network interface 1004 is mainly used for connecting a background server and performing data communication with the background server; and the processor 1001 may be configured to call the sample completion program based on the cryptographic transfer learning stored in the memory 1005, and perform the following operations:

Further, the processor 1001 may be further configured to call a sample completion program based on the cryptographic transfer learning stored in the memory 1005, and perform the following steps:

In the technical solution provided by the present invention, the terminal calls a sample completion program based on encryption migration learning stored in the memory 1005 through the processor 1001 to implement the steps of: determining a feature intersection of a first sample of the first terminal and a second sample of the second terminal, training an initial feature of the first sample based on the feature intersection to obtain a first feature model, encrypting the first feature model and sending the first feature model to the second terminal so that the second terminal can predict a missing feature of the second sample to obtain a second encryption completion feature, and calculating the second encryption completion feature according to a second preset operation rule to obtain a second completion feature; receiving a second encryption characteristic model sent by a second terminal, predicting the missing characteristic of a first sample of the first terminal according to the second encryption characteristic model to obtain a first encryption completion characteristic, wherein the second encryption characteristic model is obtained by training the initial characteristic of a second sample based on the characteristic intersection by the second terminal; calculating the first encryption completion characteristic according to a first preset calculation rule to obtain a first completion characteristic; receiving a second encryption labeling model sent by a second terminal, predicting the label missing from the first sample based on the second encryption labeling model, the initial characteristic of the first sample and the first completion characteristic to obtain a first encryption completion label, wherein the second encryption labeling model is obtained by training the second terminal according to the initial characteristic of the second sample, the second completion characteristic and the initial label of the second sample; and operating the first encryption completion label according to a third preset operation rule to obtain a first completion label. The invention realizes the feature and mark completion of the sample data of each party on the premise of ensuring that the data privacy of each party is not revealed.

In addition, an embodiment of the present invention further provides a terminal, where the terminal is a first terminal, and the first terminal includes: the system comprises a memory, a processor and a sample completion program based on the encryption migration learning, wherein the sample completion program based on the encryption migration learning is stored on the memory and can run on the processor, and when being executed by the processor, the sample completion program based on the encryption migration learning realizes the steps of the sample completion method based on the encryption migration learning.

The method implemented when the sample completion program based on the encryption migration learning running on the processor is executed may refer to each embodiment of the sample completion method based on the encryption migration learning of the present invention, and details thereof are not described herein.

In addition, an embodiment of the present invention further provides a terminal, where the terminal is a second terminal, and the second terminal includes: the system comprises a memory, a processor and a sample completion program based on the encryption migration learning, wherein the sample completion program based on the encryption migration learning is stored on the memory and can run on the processor, and when being executed by the processor, the sample completion program based on the encryption migration learning realizes the steps of the sample completion method based on the encryption migration learning.

In addition, the embodiment of the present invention further provides a sample completion system based on encryption migration learning, where the sample completion system based on encryption migration learning includes at least one first terminal and at least one second terminal.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a sample completion program based on migratory learning is stored on the storage medium, and when executed by a processor, the sample completion program based on migratory learning implements the steps of the sample completion method based on migratory learning as described above.

Based on the above structure, various embodiments of the sample completion method based on the encryption transfer learning are proposed.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a sample completion method based on encryption transfer learning according to the present invention.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in a different order than that shown.

The first embodiment of the present invention is a sample completion method based on encryption transfer learning, which is applied to a first terminal, and the first terminal and a second terminal in the embodiment of the present invention may be terminal devices such as a smart phone, a personal computer, and a server, and are not limited specifically herein.

The sample completion method based on the encryption transfer learning in the embodiment comprises the following steps:

step S1, determining a feature intersection of a first sample of the first terminal and a second sample of a second terminal, training an initial feature of the first sample based on the feature intersection to obtain a first feature model, encrypting and sending the first feature model to the second terminal so that the second terminal can predict the missing feature of the second sample to obtain a second encryption completion feature, and calculating the second encryption completion feature according to a second preset calculation rule to obtain a second completion feature;

To solve the dilemma, distributed machine learning algorithms are proposed, but the distributed machine learning algorithms often cannot be used due to the missing features or labels of the data. For example, a horizontal federal learning algorithm usually requires that feature dimensions of each participant in the algorithm are the same, and for features which are not owned by all participants, we can only generally abandon the part of feature data not used; in the distributed machine learning algorithm based on supervised learning, samples of all participants in the algorithm need to be labeled, and similarly, for data without labels, the data can be usually abandoned. These situations cause a lot of data waste, and also cause the sample data used for training the machine learning algorithm to be unevenly distributed, thereby reducing the generalization ability of the training model.

Therefore, when each participant in distributed machine learning comes from different organizations, how to complement the characteristics and labels of each party under the condition of ensuring the data security and privacy of each party is a problem to be solved urgently. In order to solve this problem, various embodiments of the sample completion method based on the encryption migration learning of the present invention are proposed.

The invention is based on transfer learning, wherein the transfer learning refers to a learning process of applying a model learned in an old field to a new field by utilizing the similarity among data, tasks or models. The core problem of the transfer learning is that the similarity between a new problem and an original problem is found, and the learned knowledge can be smoothly transferred and applied to the new problem, so that the knowledge transfer is realized.

In this embodiment, the sample dimensions of the first sample at the first terminal and the second sample at the second terminal are different, the characteristic dimensions are partially overlapped, and the label of the first sample is missing.

The method comprises the steps that firstly, a first terminal determines an overlapped part of feature dimensions of a first sample and a second sample, based on the intersection of the overlapped part, a non-overlapped part of the first sample and the overlapped part are trained to obtain a function mapping model from the overlapped part to the non-overlapped part, namely a first feature model, the first feature model is encrypted through a preset encryption algorithm and then sent to a second terminal, the second terminal predicts the missing features of the second sample through the encrypted first feature model after receiving the encrypted first feature model to obtain a second encryption completion feature, and the second terminal calculates the second encryption completion feature according to a second preset calculation rule to obtain the second completion feature.

Wherein, the preset Encryption algorithm is Homomorphic Encryption algorithm (Homomorphic Encryption).

Step S2, receiving a second encrypted feature model sent by a second terminal, predicting the missing features of the first sample according to the second encrypted feature model to obtain a first encrypted completion feature, wherein the second encrypted feature model is obtained by training the initial features of the second sample based on the feature intersection by the second terminal;

meanwhile, the second terminal determines the overlapping part of the feature dimensions of the first sample and the second sample, trains the non-overlapping part of the second sample and the overlapping part to obtain a function mapping model from the overlapping part to the non-overlapping part, namely a second feature model, encrypts the second feature model through a preset encryption algorithm to obtain a second encryption feature model, and then sends the second encryption feature model to the first terminal, and the first terminal predicts the missing features of the first sample by using the second encryption feature model after receiving the second encryption feature model to obtain a first encryption completion feature.

Step S3, calculating the first encryption completion characteristic according to a first preset calculation rule to obtain a first completion characteristic;

after the first terminal carries out prediction completion on the missing features to obtain first encryption completion features, the first terminal carries out operation on the first encryption completion features according to a first preset operation rule to obtain unencrypted first completion features, and at this time, feature completion on the first terminal is completed.

Step S4, receiving a second encrypted annotation model sent by the second terminal, predicting the label missing from the first sample based on the second encrypted annotation model, the initial feature of the first sample and the first completion feature to obtain a first encrypted completion label, wherein the second encrypted annotation model is obtained by the second terminal according to the initial feature of the second sample, the second completion feature and the initial label training of the second sample;

after the second terminal calculates the second encryption completion characteristic according to a second preset calculation rule to obtain a second completion characteristic so as to complete the characteristic completion of the second terminal, the first terminal calculates the first encryption completion characteristic according to a first preset calculation rule to obtain an unencrypted first completion characteristic and complete the characteristic completion of the first terminal, the second terminal obtains a function mapping model from the characteristics of the second sample to the labels according to the initial characteristics of the second sample, the second completion characteristics and the initial label training of the second sample, namely a second annotation model, and the second annotation model is encrypted by a preset encryption algorithm to obtain a second encrypted annotation model which is then sent to the first terminal, and predicting the label missing from the first sample based on the second encryption labeling model, the initial characteristic of the first sample and the first completion characteristic to obtain a first encryption completion label.

Step S5, calculating the first encryption completion flag according to a third preset calculation rule, to obtain a first completion flag.

After the first terminal carries out prediction completion on the missing marks to obtain first encryption completion marks, the first terminal carries out operation on the first encryption completion marks according to a third preset operation rule to obtain unencrypted first completion marks, and thus feature completion and mark completion of the first terminal and feature completion of the second terminal are completed.

In the embodiment, the characteristic completion and the marking completion of the sample data of each party are realized on the premise of ensuring that the data privacy of each party is not disclosed.

Further, in the second embodiment of the sample completion method based on the encryption migration learning of the present invention, the step S3 includes:

step S31, adding a first random mask to the first encrypted padding feature to obtain a first encrypted mask feature, and sending the first encrypted mask feature to the second terminal, so that the second terminal decrypts the first encrypted mask feature to obtain a first mask feature;

step S32, receiving the first mask feature sent by the second terminal, and subtracting the first random mask from the first mask feature to obtain a first completion feature.

After a first terminal carries out prediction completion on missing features to obtain a first encryption completion feature, a first terminal adds a first random mask to the first encryption completion feature to obtain a first encryption mask feature and sends the first encryption mask feature to a second terminal, the second terminal decrypts the first encryption mask feature after receiving the first encryption mask feature to obtain a first mask feature and sends the first mask feature to the first terminal, and the first terminal subtracts the first random mask from the first mask feature after receiving the first mask feature sent by the second terminal to obtain an unencrypted first completion feature, so that feature completion of the first terminal is completed.

Further, the step S5 includes:

step S51, adding a third random mask to the first encryption completion tag to obtain a third encryption mask tag, and sending the third encryption mask tag to the second terminal, so that the second terminal decrypts the third encryption mask tag to obtain a third mask tag;

step S52, receiving a third mask label sent by the second terminal, and subtracting the third random mask from the third mask label to obtain a first completion label.

After the first terminal carries out prediction completion on the missing marks to obtain a first encryption completion mark, the first terminal adds a third random mask to the first encryption completion mark to obtain a third encryption mask mark and sends the third encryption mask mark to the second terminal, the second terminal decrypts the third encryption mask mark to obtain a third mask mark and sends the third mask mark to the first terminal after receiving the third encryption mask mark, and the first terminal subtracts the third random mask from the third mask mark to obtain an unencrypted first completion mark after receiving the third mask mark sent by the second terminal, so that the mark completion of the first terminal is completed.

Further, the step S1 includes:

step S11, determining a feature intersection of a first sample of the first terminal and a second sample of a second terminal, training an initial feature of the first sample based on the feature intersection to obtain a first feature model, sending the first feature model to the second terminal in an encrypted manner, so that the second terminal can predict a missing feature of the second sample to obtain a second encrypted completion feature, adding a second random mask to the second encrypted completion feature to obtain a second encrypted mask feature, sending the second encrypted mask feature to the first terminal, so that the first terminal can decrypt the second encrypted mask feature to obtain a second mask feature, and when receiving the second mask feature sent by the first terminal, subtracting the second random mask from the second mask feature to obtain a second completion feature.

After the second terminal receives the encrypted first feature model sent by the first terminal, the second terminal predicts the missing features of the second sample by using the encrypted first feature model to obtain second encryption completion features, then, the second terminal adds a second random mask to the second encryption completion features to obtain second encryption mask features and sends the second encryption mask features to the first terminal, after the first terminal receives the second encryption mask features, the second terminal decrypts the second encryption mask features to obtain second mask features and sends the second mask features to the second terminal, and after the second terminal receives the second mask features sent by the first terminal, the second mask features subtract the second random mask to obtain unencrypted second completion features, so that the feature completion of the second terminal is completed.

To aid understanding, an example is now listed: as shown in FIG. 3, the sample dimensions of both A and B are different, and the characteristicsThere is a partial overlap. B party has known data (X)^B，Y^B) Wherein X is^BBy

And

two parts are formed. Party A only has known data X^AWherein X is^AFrom X₁ ^AAnd

two parts are formed. The system consists of a party A and a party B. Firstly, the A party and the B party determine the feature intersection part of the A party and the B party

At the A side, we define

Is a set of n features. For each feature

Train a slave

To

Function mapping model of

The function mapping model is obtained by training the following objective functions:

similarly, on the B side, we define

Characterised by mAnd (4) collecting. For each feature

Train a slave

To

Function mapping model of

the A party and the B party carry out feature completion, and the A party carries out encryption model

And transmitted to the party B. Utilization of B

For the characteristics of the defect

Make a prediction to obtain

And to

Adding a random mask matrix M^BTo obtain

Then will be

Sent to party A, which is receiving

Then decrypting the data to obtain

Then will be

Is sent to the B party, and the B party will

Minus M^BTo obtain

Similarly, party B will encrypt the model

Transmitted to the party A and utilized by the party A

For the characteristics of the defect

Make a prediction to obtain

And to

Adding a random mask matrix M^ATo obtain

Then will be

Sent to party B, which is receiving

Then decrypting the data to obtain

Then will be

Sent to the A side, which will

Minus M^ATo obtain

Subsequently, A, B, both feature data are complemented, and the B-side data (X) is used^B，Y^B) Train a Slave X^BTo Y^BIs mapped to a model g^B:X^B→Y^B. The function mapping model is obtained by training the following objective functions:

party B encrypts the model [ [ g ]^B]]To the A side, the A side utilizes [ [ g ]^B]]For missing tag Y^APredicting to obtain [ [ Y ]^A]]And to [ [ Y ]^A]]Adding a random mask matrix M^A' obtaining [ [ Y ]^A+M^A']]Then [ [ Y ] will^A+M^A']]Sent to party B, party B receives [ [ Y ]^A+M^A']]Then decrypting the data to obtain Y^A+M^A', then Y is^A+M^A' sending to party A, party A sends Y^A+M^A' subtract M^A' obtaining Y^AAnd completing the completion of the label of the party A.

Wherein, L refers to a loss function, theta refers to a model parameter, lambda refers to a regular formula parameter, and F refers to a sum of squares.

In this embodiment, a mask is added in the process of performing feature completion and label completion by the first terminal and the second terminal. The method and the device have the advantages that the characteristic completion is carried out on sample data of all parties on the premise that the data privacy of the two parties is not disclosed, the label completion is carried out on the first terminal with the missing label after the characteristic completion, and the privacy of data interaction is further improved.

Further, a third embodiment of the method for obtaining model parameters based on federated learning according to the present invention is provided, where in this embodiment, the method for complementing samples based on encryption migration learning is applied to a second terminal, and the method for complementing samples based on encryption migration learning includes the following steps:

step C1, the second terminal receives an encrypted first feature model sent by the first terminal, predicts missing features of a second sample according to the encrypted first feature model to obtain second encryption completion features, and operates the second encryption completion features according to a second preset operation rule to obtain second completion features, the first feature model determines a feature intersection of a first sample of the first terminal and a second sample of the second terminal, and an initial feature of the first sample is trained based on the feature intersection to obtain;

Step C2, training the initial features of the second sample based on the feature intersection to obtain a second encrypted feature model, sending the second encrypted feature model to the first terminal, so that the first terminal can predict the missing features of the first sample according to the second encrypted feature model to obtain a first encrypted completion feature, and calculating the first encrypted completion feature according to a first preset calculation rule to obtain a first completion feature;

meanwhile, the second terminal determines the overlapping part of the feature dimensions of the first sample and the second sample, trains the non-overlapping part of the second sample and the overlapping part to obtain a function mapping model from the overlapping part to the non-overlapping part, namely a second feature model, encrypts the second feature model through a preset encryption algorithm to obtain a second encryption feature model, and then sends the second encryption feature model to the first terminal, and the first terminal predicts the missing features of the first sample by using the second encryption feature model after receiving the second encryption feature model to obtain a first encryption completion feature. After the first terminal carries out prediction completion on the missing features to obtain first encryption completion features, the first terminal carries out operation on the first encryption completion features according to a first preset operation rule to obtain unencrypted first completion features, and at this time, feature completion on the first terminal is completed.

And step C3, training according to the initial features of the second sample, the second completion features and the initial labels of the second sample to obtain a second encrypted label model, sending the second encrypted label model to the first terminal, predicting the labels missing from the first sample by the first terminal based on the second encrypted label model, the initial features of the first sample and the first completion features to obtain first encrypted completion labels, and calculating the first encrypted completion labels according to a third preset operation rule to obtain first completion labels.

Further, the step of calculating the second encryption completion characteristic according to a second preset calculation rule to obtain a second completion characteristic includes:

step C11, adding a second random mask to the second encryption completion feature to obtain a second encryption mask feature, and sending the second encryption mask feature to the first terminal, so that the first terminal decrypts the second encryption mask feature to obtain a second mask feature;

step C21, when receiving the second mask feature sent by the first terminal, subtracting the second random mask from the second mask feature to obtain a second completion feature.

Specifically, after the second terminal receives the encrypted first feature model sent by the first terminal, the second terminal predicts the missing features of the second sample by using the encrypted first feature model to obtain a second encryption completion feature, then, the second terminal adds a second random mask to the second encryption completion feature to obtain a second encryption mask feature and sends the second encryption mask feature to the first terminal, after the first terminal receives the second encryption mask feature, the first terminal decrypts the second encryption mask feature to obtain a second mask feature and sends the second mask feature to the second terminal, and after the second terminal receives the second mask feature sent by the first terminal, the second mask feature subtracts the second random mask to obtain an unencrypted second completion feature, and thus, the feature completion of the second terminal is completed.

Further, the step C2 includes:

step C21, training the initial features of the second sample based on the feature intersection to obtain a second encrypted feature model, sending the second encrypted feature model to the first terminal, so that the first terminal predicts the missing features of the first sample according to the second encrypted feature model to obtain a first encrypted complete feature, adding a first random mask to the first encrypted complete feature to obtain a first encrypted mask feature, sending the first encrypted mask feature to the second terminal, so that the second terminal decrypts the first encrypted mask feature to obtain a first mask feature, and when receiving the first mask feature sent by the second terminal, subtracting the first random mask from the first mask feature to obtain a first complete feature.

Specifically, after the first terminal performs predictive completion on the missing features to obtain a first encryption completion feature, the first terminal adds a first random mask to the first encryption completion feature to obtain a first encryption mask feature, and sends the first encryption mask feature to the second terminal, after receiving the first encryption mask feature, the second terminal decrypts the first encryption mask feature to obtain a first mask feature and sends the first mask feature to the first terminal, after receiving the first mask feature sent by the second terminal, the first terminal subtracts the first random mask from the first mask feature to obtain an unencrypted first completion feature, and thus, the feature completion of the first terminal is completed.

Further, the step C3 includes:

step C31, obtaining a second encrypted annotation model according to the initial feature of the second sample, the second complementing feature and the initial annotation training of the second sample, and sending the second encrypted annotation model to the first terminal, so that the first terminal can predict the label missing from the first sample based on the second encryption labeling model, the initial characteristic of the first sample and the first completion characteristic to obtain a first encryption completion label, and adding a third random mask to the first encryption completion label to obtain a third encryption mask label, sending the third encryption mask label to the second terminal, for the second terminal to decrypt the third encrypted mask label to obtain a third mask label, and when a third mask mark sent by the second terminal is received, subtracting the third random mask from the third mask mark to obtain a first completion mark.

Specifically, after the first terminal performs predictive completion on the missing label to obtain a first encryption completion label, the first terminal adds a third random mask to the first encryption completion label to obtain a third encryption mask label, and sends the third encryption mask label to the second terminal, the second terminal decrypts the third encryption mask label to obtain a third mask label and sends the third mask label to the first terminal after receiving the third encryption mask label, and the first terminal subtracts the third random mask from the third mask label to obtain an unencrypted first completion label after receiving the third mask label sent by the second terminal, so that completion of the label of the first terminal is completed.

According to the embodiment, on the premise that the data privacy of each party is not disclosed, the characteristic completion and the marking completion are performed on the sample data of each party.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A sample completion method based on encryption migration learning is characterized in that the sample completion method based on encryption migration learning is applied to a first terminal, and comprises the following steps:

2. The method according to claim 1, wherein the step of computing the first cryptographic completion feature according to a first predetermined computation rule to obtain a first completion feature comprises:

3. The method according to claim 1, wherein the step of computing the first completion label according to a third preset computation rule to obtain the first completion label comprises:

4. The method for complementing samples based on cryptographic transfer learning according to claim 1, wherein the steps of determining a feature intersection of a first sample of the first terminal and a second sample of the second terminal, training an initial feature of the first sample based on the feature intersection to obtain a first feature model, and sending the first feature model to the second terminal in a cryptographic manner, so that the second terminal predicts a missing feature of the second sample to obtain a second cryptographic complementing feature, and calculates the second cryptographic complementing feature according to a second preset calculation rule to obtain a second complementing feature comprise:

5. A sample completion method based on encryption migration learning is characterized in that the sample completion method based on encryption migration learning is applied to a second terminal, and comprises the following steps:

6. The method according to claim 5, wherein the step of computing the second cryptographic completion characteristic according to a second predetermined computation rule to obtain a second completion characteristic comprises:

7. The method as claimed in claim 5, wherein the step of training the initial features of the second sample based on the feature intersection to obtain a second encrypted feature model, sending the second encrypted feature model to the first terminal, so that the first terminal predicts the missing features of the first sample according to the second encrypted feature model to obtain first encrypted completion features, and performs an operation on the first encrypted completion features according to a first preset operation rule to obtain first completion features includes:

8. The method for sample completion based on encryption migration learning according to claim 5, wherein the step of training according to the initial feature of the second sample, the second completion feature and the initial label of the second sample to obtain a second encryption labeling model, sending the second encryption labeling model to the first terminal, so that the first terminal predicts the label missing from the first sample based on the second encryption labeling model, the initial feature of the first sample and the first completion feature to obtain a first encryption completion label, and calculates the first encryption completion label according to a third preset calculation rule to obtain the first completion label comprises:

9. A terminal, characterized in that the terminal comprises: memory, a processor and a exemplar complementation program based on Cryptographic transfer learning stored on the memory and executable on the processor, which when executed by the processor implements the steps of the exemplar complementation method based on Cryptographic transfer learning according to any one of claims 1 to 4.

10. A terminal, characterized in that the terminal comprises: memory, a processor and a exemplar complementation program based on Cryptographic transfer learning stored on the memory and executable on the processor, which when executed by the processor implements the steps of the exemplar complementation method based on Cryptographic transfer learning according to any one of claims 5 to 8.

11. A system for performing exemplar completion based on migratory learning, the system comprising: at least one first terminal and at least one second terminal, the first terminal being the terminal of claim 9 and the second terminal being the terminal of claim 10.

12. A storage medium applied to a computer, wherein a sample completion program based on encryption migration learning is stored on the storage medium, and when being executed by a processor, the sample completion program based on encryption migration learning realizes the steps of the sample completion method based on encryption migration learning according to any one of claims 1 to 8.