CN112822005B

CN112822005B - Secure transfer learning system based on homomorphic encryption

Info

Publication number: CN112822005B
Application number: CN202110134461.8A
Authority: CN
Inventors: 杨旸; 黄欣迪; 池升恒
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2022-08-12
Anticipated expiration: 2041-02-01
Also published as: CN112822005A

Abstract

The invention relates to a security transfer learning system based on homomorphic encryption. The system designs an encryption TrAdaboost training and prediction algorithm based on a double-cloud server model (a storage cloud server and a computing cloud server) around the privacy disclosure problem of migration machine learning in a cloud outsourcing scene. On one hand, a source domain data owner and a target domain data owner of the system respectively upload encrypted training data to a cloud end, and a cloud server trains out a TrAdaboost model in a privacy protection mode; on the other hand, a requesting user of the system sends an encrypted data sample to the cloud server to request a secure predictive service, and then the cloud server returns an encrypted predictive classification result. The system does not leak training and prediction request data, training models, prediction results and intermediate calculation results of users (including data owners and prediction requesters) to a cloud or unauthorized user.

Description

Secure transfer learning system based on homomorphic encryption

Technical Field

The invention relates to a security transfer learning system based on homomorphic encryption.

Background

Cloud computing delivers computing services (including data storage, computing, software, data analysis, etc.) to individuals or companies, providing great convenience to users and reducing software construction or operating costs. Due to the characteristics of strong computing power and storage capacity, high reliability, on-demand service and the like of cloud computing, the cloud computing is widely applied to the practical fields of big data analysis, data backup, software development test, management and the like. Meanwhile, the rapid development of machine learning (including migration learning) benefits from the support of cloud computing. Technical researchers or users outsource complex intelligent computing tasks to the cloud server, the cloud server performs efficient computing and returns computing results to the users, and therefore the users can efficiently and stably complete machine learning computing on resource-limited personal computers or mobile equipment.

In recent years, machine learning has been applied to many fields in real life, such as face or voice recognition, medical diagnosis, financial analysis, and smart home. Transfer learning incorporates the idea of "analogy learning" into machine learning and gains more and more attention from researchers. It aims to transfer learned knowledge (i.e., source tasks) to new problems (i.e., target tasks) to assist in the learning of the target tasks. In general, migration learning is effective when the source domain (or source task) and the target domain (or target task) are different but related. For example, the migratory learning algorithm can multiplex shallow network parameters in a high-performance training model (e.g., AlexNet or ResNet model) and adapt the deep network to the user-customized classification task through training. Consider another real-world scenario: a certain medical research institution collects a large amount of labeled medical data, while another clinic institution has only a small amount of labeled medical data and unlabeled data from the patient, while the medical data of both institutions is relevant. In this case, if the corresponding classification model is trained only with the medical data of the clinic, it is obviously likely that the accuracy is low. When the transfer learning technology is used for training by combining the medical data of the two institutions (namely, the data of the medical research institution is used for assisting the model training of the clinic institution), the quality of the classifier can be improved to a greater extent. Tradoboost is an example-based classical migration learning algorithm (proposed in 2007) that can be used to solve such problems. The TrAdaboost algorithm refers to the core design idea of Adaboost (an ensemble learning algorithm), combines large-scale marking data of a source domain and a small amount of marking data of a target domain, and trains a classifier with good performance for a target task. Specifically, the algorithm optimizes the sub-classifiers round-by-round by adjusting the weight values of the training samples. And the final prediction result is determined by a weighted combination of the sub-classifiers.

However, while delegating the migration learning task to the cloud server for computing, a serious challenge is introduced — the risk of privacy disclosure. Since the data or results involved in the calculations may involve sensitive information of the user, such as personal images, financial information, health conditions, etc. In addition, models of transfer learning are often viewed as important property for researchers, and may be subject to significant damage if exposed. While the cloud server is generally not trusted, and a system attacker may eavesdrop on the transmission channel of the data or attack the cloud server. Therefore, it is necessary to apply effective privacy protection measures in the outsourcing computation task. In order to solve such problems, several mainstream security technologies have been used to ensure security and privacy of cloud computing, including Secret Sharing (SS), Garbled Circuits (GCs), Differential Privacy (DP), hardware security, Homomorphic Encryption (HE), and the like. The homomorphic encryption technology is characterized by the computability of a ciphertext (namely, a calculation result of the ciphertext is equivalent to a calculation result of a corresponding plaintext after being decrypted), and an excellent solution is provided for privacy-protecting cloud computing. Compared with other security technologies, homomorphic encryption can achieve a higher security level. However, the homomorphic encryption technique has the characteristics of high overhead, capability of only supporting positive integer arithmetic, incapability of naturally supporting nonlinear arithmetic (such as logarithm arithmetic), and the like. How to design an effective homomorphic encryption protocol to simultaneously meet the privacy, correctness and efficiency of migration learning in cloud computing is still a research problem which needs to be continuously perfected.

With the help of different security technologies, privacy protection schemes for transfer learning currently exist. Some researchers have implemented Federal Transfer Learning (FTL) computation while protecting privacy of sensitive information. Liu et al achieved the safety training, prediction and cross validation process of FTL by Additive Homomorphic Encryption (AHE). In this scheme, data samples are kept separately by both parties (i.e., the source domain data owner and the target domain data owner) and parameters are encrypted before exchanging data. Sharma et al implement a similar secure FTL training process via SS technology. Both of these schemes can resist two threat levels, semi-honest (semi-host) and malicious (malicious) models. In addition, Gao et al propose a privacy protection scheme for joint neural network migration learning (including training and prediction processes) based on AHE and SS technologies, respectively. In addition, using the same technique, Gao et al designed a heterogeneous FTL system based on a safe logistic regression model. However, the above schemes involve multiple rounds of interaction between users, while consuming a large amount of local computing resources. The first FTL framework for wearable medical scenarios (using AHE technology) was proposed by Chen et al, but no detailed security design solution is given in the scheme. Since the FTL supports only unidirectional transmission, the source domain participants cannot benefit from the computation. To address this problem, Ma et al devised a secure collaborative migration learning scheme based on the SPDZ framework. The correctness of the system result can be verified by a Message Authentication Code (MAC).

Liu et al propose a privacy-preserving multitask learning scheme that utilizes AHE techniques to encrypt model parameters of users and perform cryptographic computations through cloud services. However, in this scheme, all users encrypt their respective data using the same public key. Thus, as long as one user is bought (confidential), the privacy of all users will be compromised. Different from the encryption technology adopted by Liu et al, Xie et al and Zhang et al design a multi-task learning system for privacy protection by using a differential privacy technology. In addition, differential privacy has also been used in other secure migration learning schemes, such as the privacy-preserving domain adaptation (domain adaptation) computation scheme proposed by Wang et al or the privacy-preserving hypothetical migration learning (hypothesization) scheme proposed by Yao et al. Wang et al have designed a security domain adaptive scheme based on counterlearning, which adds certain gaussian noise to the gradient to protect the confidentiality of private data. While differential privacy based schemes have significantly lower overhead than encryption schemes, the accuracy of the results is impaired to a different extent. To implement multi-party knowledge migration, Ma et al introduced a privacy-preserving cloud outsourcing framework that implements decision tree learning and prediction. The core idea of the algorithm is to implement a similarity measure between decision trees over the encrypted domain.

Disclosure of Invention

The invention aims to provide a homomorphic encryption-based secure migration learning system aiming at the problems of certain security risk, low efficiency of a ciphertext computing protocol and the like in the existing secure migration learning scheme, so that the TrAdaboost training and prediction for protecting privacy are realized, and the local overhead of a user is reduced as much as possible.

In order to achieve the purpose, the technical scheme of the invention is as follows: a secure migration learning system based on homomorphic encryption, comprising: the system comprises a key generation center KGC, a cloud platform CP, a cloud service provider CSP, a source data owner SDO, a target data owner TDO and a request user RUs;

a key generation center KGC responsible for initializing cryptographic system parameters and distributing public/private key pairs of system entities;

a cloud platform CP, which is responsible for receiving and storing training data from SDO and TDO and prediction request data from RUs, and performing partial computation of the system;

the CSP interacts with the CP and provides computing service for a transfer learning algorithm for protecting privacy; in addition, the CP and CSP jointly perform decryption and re-encryption operations;

a source data owner SDO, the SDO owning the tagged sample instance from the source domain, sending the encrypted data set to the CP as a source training data set of the system;

target data owner TDO, TDO having tagged sample instances and untagged sample instances from target domains, the source and target domains involved in the system are distributed differently but have correlation; the TDO sends the encrypted data set to a CP to serve as a target training data set of the system, and the CP combines the encrypted source training data set and the target training data set to serve as a joint training data set;

and the request user RUs sends the encrypted unmarked sample from the target domain to the CP after the CP and the CSP finish the construction of the TrAdaboost classifier for the target space, requests related prediction calculation, and the encrypted prediction result returned from the CP can only be decrypted by the corresponding request user.

In an embodiment of the present invention, the key generation center KGC initializes the cryptosystem parameters and distributes the public key/private key pair of the system entity as follows:

(1) KGC generates system parameters N and g of a Paillier algorithm-based homomorphic re-encryption system HRES, and KGC generates respective public key/private key pairs for all entities in the system, specifically: KGC distributes key pairs (pk) for SDO, TDO, CP and CSP, respectively _sdo ,sk _sdo )、(pk _tdo ,sk _tdo )、(pk _cp ,sk _cp ) And (pk) _csp ,sk _csp ) (ii) a In addition, the KGC sets up a key repository to store the requesting user's public/private key pairs

Wherein n is _user Representing the total number of system users, and distributing an unused key pair to each registered user; the key bank is updated by KGC in idle time or when needed;

(2) CP and CSP exchange their respective public keys and negotiate out a Diffie-Hellman key

Subsequently, the PK is used as a global public key of the system to be disclosed to the SDO, the TDO and the subsequent RUs of the requesting user; as can be seen from the characteristics of HRES, PK encrypted based messages can only be decrypted by CP and CSP jointly to recover the plaintext.

In an embodiment of the present invention, the Paillier algorithm based homomorphic re-encryption system HRES includes the following algorithms:

key generation KeyGen algorithm: a safety parameter k and two large prime numbers p and q are given to satisfy

(symbol)

A bit length of expression; then, N ═ pq is calculated and one group generator g is selected, and the order of g is ord (g) ═ p-1 (q-1)/2; user i's public/private key pair of

Wherein s is _i ∈ _R [1,λ(N ² )]λ (·) denotes the euler function; furthermore, assume that two entities a and B possess a public/private key pair (pk) respectively _A ＝g ^a modN ² ,sk _A A) and (pk) _B ＝g ^b modN ² ,sk _B B), the public key obtained after the two carry out Diffie-Hellman negotiation is

The corresponding joint decryption private keys are a and b respectively; taking PK as a global public key of the system in the system; the parameters g and N are published;

the encryption Enc algorithm: joining message m to Z _N And the public key pk _i As input, randomly select

And calculating to obtain ciphertext

Wherein T and T' are respectively the first and second elements of the ciphertext;

decryption Dec algorithm: using the private key sk _i For ciphertext

And (3) decryption:

wherein l (u) ═ 1/N;

the EncTK algorithm with double keys is as follows: the system global public key PK is selected to encrypt the message, similar to the Enc algorithm, given a plaintext message m ∈ Z _N Obtaining a ciphertext [ [ m ]]] _PK ＝(T,T′)＝{PK ^r (1+m·N),g ^r }(modN ² ) (ii) a To simplify the expression, [ [ m ]]] _PK Is uniformly and briefly expressed as [ [ m ]]]；

Using sk _A Partial decryption PDec1 algorithm: input [ [ m ]]]And sk _A A, the first stage of partial decryption:

using sk _B Partial decryption PDec2 algorithm: inputting partially decrypted ciphertext

And sk _B The second stage of partial decryption is performed, so as to obtain the plaintext information m:

m＝L(T ⁽¹⁾ /T′ ⁽²⁾ modN ² )

re-encryption first stage FPRE algorithm: given ciphertext [ [ m ]]]Private key sk _A And a user public key pk _j And executing the first-stage re-encryption calculation:

re-encryption second stage SPRE algorithm: given partial re-encryption ciphertext [ m [ [ m ]]] ⁺ Private key sk _B And a user public key pk _j Performing a second stage of re-encryption calculation to obtain a plaintext m based on the public key pk _j Corresponding cipher text of

Cipher text

The private key sk can only be used by the user j _j And executes the Dec algorithm for decryption.

In an embodiment of the present invention, HRES characteristics of the stateful re-encryption system based on the Paillier algorithm are as follows:

(1) additive homomorphism: given m ₁ ,m ₂ ∈Z _n Is provided with

(2) Given a

And

is provided with

(3)

In an embodiment of the present invention, a training process of the TrAdaboost classifier is as follows:

the training of the tradoost is an R-round iterative process, firstly preparation and preprocessing are carried out, and then each round of the tradoost iterative training consists of four sub-blocks, including sample weight vector normalization, prediction error calculation, weight adjustment rate calculation and sample weight update, which are specifically as follows:

s1, algorithm preparation and preprocessing

First, SDO and TDO submit respective encrypted data sets to CP, assuming that SDO owns source training data set D _S ＝{(x ₁ ,y ₁ ),...,(x _n ,y _n )}，TDO has a target training data set D _T ＝{(x _n+1 ,y _n+1 ),...,(x _n+m ,y _n+m ) }; in SDO and TDO, each value and corresponding tag value included in the feature vector in the dataset are first multiplied by a scaling factor L, that is: execute

And

wherein 1 ≦ i ≦ n + m and d represents the magnitude of the feature vector; after encryption with the system global public key PK, SDO and TDO send respective encrypted data sets to CP, i.e., [ D ] _S ]] _PK ＝{([[x ₁ ]] _PK ,[[y ₁ ]] _PK ),...,([[x _n ]] _PK ,[[y _n ]] _PK ) And [ [ D ] _T ]] _PK ＝{([[x _n+1 ]] _PK ,[[y _n+1 ]] _PK ),...,([[x _n+m ]] _PK ,[[y _n+m ]] _PK )}，[[x _i ]] _PK ＝([[x _i1 ]] _PK ,[[x _i2 ]] _PK ,...,[[x _id ]] _PK ) Wherein i is more than or equal to 1 and less than or equal to n + m; to simplify the representation, [. C]] _PK Is expressed as [ ·]](ii) a In addition, the sizes of the source training data set and the target training data set, namely n and m, are also respectively sent to the CP for storage; upon receiving [ [ D ] _S ]]And [ [ D ] _T ]]The CP then merges them as a joint training data set: [ [ D ]]]＝{([[x ₁ ,y ₁ ]]),...,([[x _n+m ,y _n+m ]])}；

Subsequently, the CP initializes sample weights for the joint training data set, assuming initial weight values

Respectively determining the sizes of a source training data set and a target training data set, setting the initial weight of a source sample to be (1/n) and the initial weight of a target sample to be (1/m) by the CP; then, an encrypted sample weight vector is calculated

S2 sample weight vector normalization

In the t-th iteration training, an encrypted normalized sample weight vector p is obtained ^t (ii) a First, the CP calculates the sum of all weight values over the ciphertext domain:

then, get [ p ] by calling the secure division protocol SDiv n + m times ^t ]]：

S3, calculating prediction error

Suppose that the weak classifier of the t-th round is trained [ [ h ] _t ]]It is applied to the encrypted sample [ [ x ]]]Is expressed as [ [ h ] _t (x)]]And satisfy h _t (x) E {0,1 }; the algorithm aims at calculating [ [ h ] _t ]]At D _T Weighted prediction error of (1); knowing | h _t (x _i )-y _i I and { h | _t (x _i ),y _i The XOR between them results in equality; therefore, first calculate | h using the secure XOR protocol Sxor _t (x _i )-y _i L, |; since in the sample weight update phase, it needs to use [ [ h ] _t ]]Error values on the source and target samples, so in this calculation, the encrypted prediction error is calculated for the source and target samples and the result is stored at the CP:

next, the operations of SMul and addition are performed through the secure multiplication protocolCalculating the prediction error rate [ ∈ of encryption _t ]]：

S4, calculating weight adjustment rate

The weight adjustment rate controls the degree of updating of the sample weights; the adjustment rate beta of the weights of the source training samples is constant in each iteration; therefore, the value of β only needs to be calculated once during the whole training process of traadaboost and since the operands involved in the calculation are all public, it can be calculated in the plaintext domain:

wherein R is a preset training iteration number;

in calculating D _T Before the sample weight adjustment rate; i.e. beta _t ＝∈ _t /(1-∈ _t )＝-1+1/(1-∈ _t ) The algorithm is based on e _t The different values of (A) are calculated under three special conditions respectively; first, a condition e is determined by the following calculation _t ≥1/2、∈ _t 1 or e _t Whether or not 0 is satisfied; wherein, if ex ₁ 1, then indicates the condition e _t 1/2 is equal to or greater than; otherwise, indicating ∈ _t < 1/2; if ex ₂ 1, then indicates the condition e _t 0 is satisfied; if ex ₃ 1, then indicates the condition e _t 1 is satisfied;

[[ex ₁ ]]＝SGE([[∈ _t ，[[1/2]])；

[[ex ₂ ]]＝SETest([[∈ _t ]])；

[[ex ₃ ]]＝SETest([[1]]·[[∈ _t ] ^N-1 )＝SETest([[1-∈ _t ]])

SGE is greater than or equal to the safety comparison protocol, SETest is the safety ciphertext-plaintext equal test protocol;

next, the algorithm calculates [ [ beta ] _t ]]Taking the value of (A);

SETest is a safety reciprocal protocol;

discussion of the related Art

Different value results of (a): if e is _t When 1, there is ex ₂ 1 and

thus, it is possible to provide

If e is _t Not equal to 1, then there is ex ₂ 0 and

thus, it is possible to provide

Suppose when e is _t ≧ 1/2 or ∈ _t ＝1、∈ _t When equal to 0, the beta is _t Set directly to constant c ₁ Or c ₂ 、c ₃ (ii) a Finally, [ [ beta ] _t ]]This can be calculated as follows:

[[S′ ₃ ]]＝SMul([[1]]·[[ex ₁ ]] ^N-1 ,[[1]]·[[ex ₂ ]] ^N-1 )；

[[S″ ₃ ]]＝SMul([[S′ ₃ ]],[[1]]·[[ex ₃ ]] ^N-1 )；

s5 sample weight update

It is known that the strategy of updating the weight values of the source data sample and the target data sample is different; in addition, only when

When misclassified, its corresponding sample weight

Need to be updated; ciphertext due to prediction error [ [ h ] _t (x _i )-y _i |]]And i is more than or equal to 1 and less than or equal to n + m is calculated, so that the algorithm only needs to test [ | h ] on the encrypted domain by calling SETest protocol _t (x _i )-y _i |]]Whether or not it is equal to 0:

[[s]]＝SETest([[|h _t (x _i )-y _i |]])

the sample weight vector is then updated by the following strategy:

in an embodiment of the present invention, a process of implementing encryption prediction by the tradoboost classifier is as follows:

RU when requesting user _i When a request is made to obtain a classification label of an unlabeled sample x from a target sample space, registration is first completed on the system and a unique public/private key pair is obtained

And a global public key PK of the system; requesting user RU to prevent leakage of its privacy sample data _i Encrypting a request sample with PK to [ x [ [ x ]]]And transmitting the data packet

Feeding the CP; received from RU _i After requesting the data, the CP and CSP combine the encrypted weak classifier { [ [ h ] _t ]]And its influence factor pair [ [ x ]]]Performing privacy-preserving TrAdaboost prediction, wherein

For deployment, CP and CSP are first in [ [ x ]]]Performing encrypted weak classifier computation alternately and obtaining

Performing a weighted prediction result calculation:

[[l _t ]]＝SMul([[h _t (x)]]，[[SNLog([β _i ])]]) ^N-1

SMul is a safe multiplication protocol, and SNLog is a safe natural logarithm protocol;

subsequently, the cloud server calculates two decision parameters, namely [ [ left ] ] and [ [ right ] ], wherein:

next, the cloud server runs a greater than or equal security comparison protocol SGE to compare [ [ left ]]]And [ [ right ]]]If right ≧ left, the final classification result [ [ h ] _f (x)]]Is set as [ [ 1]]](ii) a Otherwise, will [ [ h ] _f (x)]]Is set to [ [0 ]]](ii) a In RU returning prediction result to requesting user _i Previously, CP and CSP were RU-based _i Of (2) a public key

Re-encryption of classification result [ [ h ] _f (x)]](ii) a Receiving the result of the re-encryption

Thereafter, requesting user RU _i Using its own private key

The plaintext of the result is recovered.

In an embodiment of the present invention, the secure xor protocol SXor is implemented as follows: for two encrypted numbers [ m ] ₁ ]]And [ [ m ] ₂ ]]，m ₁ ,m ₂ E {0,1}, and implementing ciphertext XOR operation to obtain an encrypted bit XOR result [ u ]]]If m is ₁ ＝m ₂ And u is 0; otherwise, u is 1.

In an embodiment of the present invention, the secure ciphertext-plaintext equality test protocol SETest is implemented in the following manner: testing equality relationship between ciphertext [ m ] and plaintext k, wherein m is a real number between [0,1] and k is 0; if u is 1, it means that m is 0; if u is 0, it indicates that m is not equal to 0.

In an embodiment of the present invention, the implementation manner of the secure natural logarithm synlog is: inputting a ciphertext [ x ] and outputting an encrypted natural logarithm operation result [ ln (x) ]; since the range of the input value of the logarithmic operation related to the present system is between (0, 1), the input value [ [ x ] ] of the present protocol only needs to satisfy x ∈ (0, 1).

Compared with the prior art, the invention has the following beneficial effects:

1. privacy preserving tradabost training. The system adopts a Paillier-based homomorphic re-encryption scheme (HRES) as a basic encryption system. The system allows two non-hooked cloud servers to be used for carrying out privacy-protecting TrAdaboost model training on the data sets (both on the encryption domain) of the source domain and the target domain. While during model training, the two servers will not get any information about the private data (i.e. the training data set, the final model results, and the intermediate calculation results). The encrypted training model parameters are stored in the cloud server by the system and are used for subsequently processing a sample prediction request of a requesting user.

2. Privacy preserving TrAdaboost prediction. In the system, a user is requested to upload an encrypted data sample to a cloud server, and the cloud server calculates a prediction result by using a pre-trained model and finally returns the prediction result to the user. Due to the encryption computing characteristic of the AHE, the two cloud servers perform outsourcing prediction computation on the encryption domain through interaction, and obtain an encrypted prediction result. Only the corresponding requesting user can decrypt the real prediction result by using the own secret key.

3. An efficient ciphertext computing protocol. In order to further reduce the operation overhead of the system, the system designs and realizes three cipher text calculation protocols, including a safe exclusive-or protocol, a safe cipher text-plaintext equivalence test protocol and a safe natural logarithm protocol. These security protocols perform security operations on encrypted input values and output the encrypted results. At the same time, these protocols are more efficient than existing related protocols.

4. The local overhead of the user is reduced as much as possible. On one hand, a system data user only needs to encrypt or decrypt the data uploaded to the cloud end or the encrypted result returned by the cloud end, and the cloud server with strong computing power executes complex TrAdaboost training and prediction calculation. On the other hand, the system minimizes the interaction cost between the data user and the cloud server: the owner of the source domain data and the target domain data only needs to send the encrypted training data to the cloud end; and the prediction request user only needs to transmit the encrypted data sample to the cloud end and wait for the cloud server to return the prediction result.

The application is as follows: the invention provides a safe transfer learning system based on an addition homomorphic re-encryption scheme. The system designs an encryption TrAdaboost training and prediction algorithm based on a double-cloud server model (a storage cloud server and a computing cloud server) around the privacy disclosure problem of migration machine learning in a cloud outsourcing scene. On one hand, a source domain data owner and a target domain data owner of the system respectively upload encrypted training data to a cloud end, and a cloud server trains a TrAdaboost model in a privacy protection mode; on the other hand, a requesting user of the system sends an encrypted data sample to the cloud server to request a secure predictive service, and then the cloud server returns an encrypted predictive classification result. The system does not leak training and prediction request data, training models, prediction results and intermediate calculation results of users (including data owners and prediction requesters) to a cloud or unauthorized user.

Drawings

FIG. 1 is a system model of the present invention.

FIG. 2 is a flow chart of the system of the present invention.

Fig. 3 illustrates the secure TrAdaboost training phase of the present invention.

Fig. 4 shows the secure tradoost prediction stage of the present invention.

Detailed Description

The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.

The invention provides a security transfer learning system based on homomorphic encryption, which comprises: the system comprises a key generation center KGC, a cloud platform CP, a cloud service provider CSP, a source data owner SDO, a target data owner TDO and a request user RUs;

the CSP interacts with the CP and provides computing service for a migration learning algorithm for protecting privacy; in addition, the CP and CSP jointly perform decryption and re-encryption operations;

a source data owner SDO, the SDO owning a tagged sample instance from the source domain, sending the encrypted data set to the CP as a source training data set of the system;

The following is a specific implementation of the present invention.

A secure migration learning system based on homomorphic encryption, comprising: the system comprises a key generation center KGC, a cloud platform CP, a cloud service provider CSP, a source data owner SDO, a target data owner TDO and a request user RUs;

a cloud platform CP responsible for receiving and storing training data from SDO and TDO and prediction request data from RUs, and performing partial computation of the system;

the CSP interacts with the CP and provides computing service for a migration learning algorithm for protecting privacy; in addition, the CP and the CSP jointly execute decryption and re-encryption operations;

The following is a specific implementation of the present invention.

Fig. 1 is a system architecture of the present invention, which includes five entities, namely, a Key Generation Center (KGC), a Cloud Platform (CP), a Cloud Service Provider (CSP), a Source Data Owner (SDO), a Target Data Owner (TDO), and a Request User (RUs).

1. Key Generation Center (KGC): the KGC is a trusted authority responsible for initializing cryptographic system parameters and distributing the public/private key pairs of system entities.

2. Cloud Platform (CP): the CP has powerful storage and computation capabilities, and its task is to receive and store training data from SDO and TDO and prediction request data from RU, and perform part of the computation of the system.

3. Cloud Service Provider (CSP): the CSP interacts with the CP to provide computing service for the migration learning algorithm for protecting privacy. In addition, the CP and CSP jointly perform decryption and re-encryption operations.

4. Source Data Owner (SDO): the SDO has enough instances of tagged samples (from the source domain) to send encrypted data samples to the CP as source training data for the system.

5. Target Data Owner (TDO): TDO has a small number of labeled sample instances and unlabeled sample instances (from the target domain), requiring that the source and target domains involved in the system are distributed differently but have correlation. After the TDO sends the encrypted data set to the CP, the CP merges the encrypted source and target data sets as a joint training data set.

6. User (RUs): after the cloud server completes the construction of the TrAdaboost classifier (for the target space), the user sends the encrypted unlabeled sample (from the target domain) to the CP and requests the relevant prediction calculation. The encrypted prediction results returned from the CP can only be decrypted by the corresponding requesting user.

Table 1 lists the important symbols used in the present invention.

Table 1: symbols in the invention

1. First, some of the algorithms employed by the present invention are introduced:

1.1 TrAdaboost algorithm

The main algorithm adopted by the invention is the TrAdaboost algorithm, which is specifically explained as follows:

TrAdaboost is an example-based classical migration learning algorithm. Assuming a large number of labeled source data sets D _S ＝{(x ₁ ,y ₁ ),...,(x _n ,y _n ) And a small number of labeled target data sets D _T ＝{(x _n+1 ,y _n+1 ),...,(x _n+m ,y _n+m ) } (Label y _i E {0,1}), the two datasets have different distributions but some similarity. The TrAdaboost algorithm is extended from the idea of the Adaboost algorithm, and the purpose of the TrAdaboost algorithm is to utilize a joint data set D ═ D _S ∪D _T And constructing a good classifier for the target domain space. The algorithm sets weights for each source and target training sample, respectively, and adjusts the values of the weight vectors in turn during the iterative training process. Specifically, the method comprises the following steps: in each round of TrAdaboost training, the weights of the instances that are favorable to the target classification task are increased, and the weights of the opposite instances are decreased. After the training is finished, the training device is used,the algorithm obtains R (i.e., the number of iterations of the algorithm) weak classifiers. The final classification hypothesis is determined by the weighted classification results of the weak classifiers (from the second half of the training iteration). The overall procedure of the traadaboost algorithm is shown below (where vector data is represented by a coarse body sign to distinguish scalar data, e.g., w ═ w (w) ₁ ,...,w _n ))：

(1) Weight vector w of the initialization training example: the initial value of the weight vector w of the joint data sample is specified. For example, the initial weights of the source sample and the target sample may be set according to the size of the data set, respectively, that is:

the following is a repetition of the iterative process of the training algorithm (i.e., stage 2 to stage 6) R times (with t representing the current iteration round):

(2) normalized weight vector:

(3) training a weak classifier: using joint training examples (taking into account their weight distribution p) ^t ) Training output hypothesis h _t X → Y. Where X is the instance space of D and Y ∈ {0,1} represents the set of classification labels.

(4) Calculate h _t At D _T Prediction error of (2): given D _T Instance x on the Domain, classifier predictor h _t (x) And a true tag c (x). Classifier h _t The total error of (c) is calculated by combining the weighted misclassification rates:

(5) calculating the weight adjustment rate of the source and target training samples: d _S The weight adjustment rate (i.e. beta) of (c) depends only on n and R and thus remains constant throughout the algorithm. D _T The weight adjustment rate (i.e. beta) _t ) With e _t In each round of the movementAnd (6) updating the rows.

It is worth noting that: when is e _t Taking a special value (1) epsilon _t ≥1/2、(2)∈ _t 1 or (3) e _t When 0, the algorithm will calculate β alone _t The value of (c). For example, β may be _t Set to a suitable constant.

(6) The weight vector is updated. If example x _i Satisfy h _t (x _i )≠c(x _i ) (i ∈ {1,..., n }), the algorithm reduces x _i And (4) corresponding weight values. On the contrary, if example x _i Satisfy h _t (x _i )≠c(x _i ) (i ∈ { n + 1.,. n + m }), x should be added _i The corresponding weight. Otherwise, the weight value of the sample is unchanged.

(7) The final hypothesis is output. When the iterative training process is finished, the iterative weak classifier is trained in combination with the second half (i.e. { h } _t }，

The final classifier (for the target domain) is obtained. Wherein, the weak classifier h _t Is dependent on beta _t Value of (so beta) _t Also known as h _t The influence factor of (c).

1.2 homomorphic re-encryption system based on Paillier algorithm

The invention utilizes a homomorphic re-encryption system (HRES) based on the Paillier algorithm as a basic cryptographic algorithm. The cryptographic system comprises the following algorithms: key Generation (KeyGen), encryption (Enc), decryption (Dec), Dual Key encryption (EncTK), Using sk _A Partial decryption (PDec1) is performed using sk _B Partial decryption (PDec2), re-encryption first stage (FPRE) and re-encryption second Stage (SPRE) are performed.

(1) Key generation (KeyGen): a security parameter k and two large prime numbers p, q are given to satisfy

Then, N ═ pq is calculated and one maximum order generator g is selected. The public/private key pair of user i is

Wherein s is _i ∈ _R [1,λ(N ² )](λ (·) denotes the Euler function). Furthermore, assume that two entities a and B possess a public/private key pair (pk), respectively _A ＝g ^a modN ² ,sk _A A) and (pk) _B ＝g ^b modN ² ,sk _B B), the public key obtained after the two carry out Diffie-Hellman negotiation is

The corresponding joint decryption private keys are a and b, respectively. In the system, the PK is used as a global public key of the system. The parameters g and N are published.

(2) Encryption (Enc): the algorithm makes the message m belong to Z _N And the public key pk _i As input, randomly select

And calculating to obtain ciphertext

(3) Decryption (Dec): using the private key sk _i For ciphertext

And (3) decryption:

wherein l (u) ═ 1/N.

(4) Double key encryption (EncTK): to avoid processing operations between ciphertexts (based on different public keys), the system global public key PK is chosen instead of the user own public key PK _i To encrypt the message. Similar to the Enc algorithm, a given plaintext message m ∈ Z _N Obtaining a ciphertext [ [ m ]]] _PK ＝(T,T′)＝{PK ^r (1+m·N),g ^r }(modN ² ). For simplicity of expression, [ [ m ] will be used herein]] _PK Is uniformly and briefly expressed as [ [ m ]]]。

(5) Using sk _A Partial decryption (PDec 1): input [ [ m ]]]And sk _A A, the first stage of partial decryption:

(6) using sk _B Partial decryption (PDec 2): inputting partially decrypted ciphertext

m＝L(T ⁽¹⁾ /T′ ⁽²⁾ modN ² )

(7) re-encryption first stage (FPRE): given ciphertext [ [ m ]]]Private key sk _A And a user public key pk _j And executing the first-stage re-encryption calculation:

(8) re-encryption second Stage (SPRE): given partial re-encryption ciphertext [ m [ [ m ]]] ⁺ Private key sk _B And a user public key pk _j Performing a second stage of re-encryption calculation to obtain a plaintext m based on the public key pk _j Corresponding cipher text of

Cipher text

Further, HRES satisfies the following characteristics:

(1) additive homomorphism: given m ₁ ,m ₂ ∈Z _n Is provided with

(2) Given [ [ m ]]]And

is provided with

(3)

1.3 privacy protection protocol

The present invention utilizes the following protocol as a basic privacy protection algorithm. Using []] _PK Representing the ciphertext encrypted by the global public key PK, for ease of representation, the invention will be described in the following description]] _PK UnifyIs expressed as [ ·]]. It is worth mentioning that, since HRES only supports computations in the non-negative integer domain, in order to make the system compatible with fractional or negative operations, the system performs the following pre-processing: one is that the system uniformly multiplies the plaintext data by a certain expansion factor and takes the whole as an operation value. For example, when setting the scale factor L to 10 ⁵ The original fraction 0.003343 will be converted to 334. Secondly, the plaintext input field of the known HRES is Z _N Therefore, the scheme utilizes (0, N/2)]Data in the range represents positive numbers and data in the (N/2, N) range represents negative numbers. Given [ [ X ]]]Secure Scaling-down Protocol (SSDown) output [ [ X/L ]]](ii) a Secure Reciprocal Protocol (SRec) output [ [1/X ]]]. Given [ [ X ]]]And [ [ Y ]]]Secure Multiplication Protocol (SMul) output [ [ X · Y ]]](ii) a Secure Division Protocol (SDiv) output [ [ X/Y ]]]](ii) a Greater than or Equal to the Secure compare Protocol (SGE) output [ [ u ] u]]←SGE([[X]],[[Y]]) When X is more than or equal to Y, u is 1; when X < Y, u is 0.

1.4 improved ciphertext computing protocol

The invention provides three Ciphertext computing protocols, including a Secure exclusive OR Protocol (SXor), a Secure Ciphertext-Plaintext equivalence Test Protocol (see Ciphertext-Plaintext equivalence Test Protocol, see), and a Secure Natural Log Protocol (SNLog).

1.4.1 secure XOR protocol

Secure xor protocol SXor pairs two encrypted numbers [ m [ [ m ] ₁ ]]And [ [ m ] ₂ ]](m ₁ ,m ₂ E {0,1}) to obtain an encrypted bit XOR result [ u ]]]. If m ₁ ＝m ₂ And u is 0; otherwise, u is 1. The protocol description is shown in algorithm 1.

1.4.2 secure ciphertext-plaintext equality test protocol

The secure ciphertext-plaintext equality test protocol tests the equality relationship between ciphertext [ m ] and plaintext k (where m is a real number between [0,1] and k is 0). If u is 1, it means that m is 0; if u is 0, it indicates that m is not equal to 0. The protocol description is shown in algorithm 2.

1.4.3 secure Natural logarithm protocol

The secure logarithm protocol inputs the ciphertext [ x ] and outputs the result of the encrypted natural logarithm operation [ ln (x) ]. In particular, since the system involves logarithmic operation input values in the range of (0, 1), the input values of the protocol [ [ x ] ] only satisfy x ∈ (0,1 ].

2. System flow

2.1 System overview

The system of the invention consists of the following three stages (fig. 2). Privacy preserving weak classifier training, privacy preserving traadaboost training, and privacy preserving traadaboost prediction, respectively. The private data involved in the algorithm (e.g., training data set, classifier model, request data, prediction result, or intermediate calculation result) cannot be obtained by other entities, whether in the training or prediction phase.

Privacy preserving weak classifier training. In each round of iterative training of TrAdaboost, the algorithm trains the weak classifiers according to the current source and target data sets (and their sample weight distributions). To protect privacy, the training process should be performed on the encrypted domain. Many security base classifier training schemes based on homomorphic encryption algorithms (especially additive homomorphism) exist, such as Support Vector Machine (SVM) training or Logistic Regression (LR) training. Therefore, the invention will not give extra security algorithm for weak classifier training. After the weak classifier training is completed, the algorithm returns the encrypted model parameters for subsequent calculation of secure TrAdaboost training.

Privacy preserving tradabost training. In each round of secure TrAdaboost training, the algorithm first trains the base classifier using the joint training samples (including the source and target datasets) and their weight distributions and calculates their weighted prediction errors. The algorithm then updates the misclassified sample weights for the next iteration (in a privacy-preserving manner). After each round of algorithm training, the trained encrypted weak classifier model and its influence factors are saved, wherein the base classifier from the second half of iterative training will be used for the final TrAdaboost prediction.

Privacy preserving tradabost prediction. After receiving the encrypted samples from the requesting user, the CP and CSP perform privacy-preserving traadaboost predictive computation in a cooperative interactive manner. The final data classification result depends on a weighted combination of weak classifier predictors. And finally, the cloud server returns the re-encrypted predicted value to the requesting user. The plaintext value of the prediction result can only be decrypted by the requesting user using the corresponding private key.

2.2, System initialization

The initialization task of the system is performed by the KGC, including generating parameters for the cryptographic system and distributing the public/private key pair to all entities in the system. In addition, the system global public key is generated by the negotiation between the CP and the CSP. The detailed description is as follows:

(1) KGC executes the KeyGen algorithm to generate the encrypted system parameters, e.g., N and g, for HRES. At the same time, KGC generates respective public/private key pairs for all entities in the system. Specifically, the method comprises the following steps: KGC distributes key pairs (pk) for SDO, TDO, CP and CSP, respectively _sdo ,sk _sdo )、(pk _tdo ,sk _tdo )、(pk _cp ,sk _cp ) And (pk) _csp ,sk _csp ). In addition, the KGC provides a key store for storing public/private key pairs of the requesting user (i.e., the KGC stores a public key and a private key pair of the requesting user

Wherein n is _user May be optionally specified) and assign a failure to each registered userA used key pair. The keystore will be updated by the KGC at idle time or when needed.

The PK is then published as the system's global public key to the SDO, TDO, and subsequent requesting users RUs. As can be seen from the characteristics of HRES, PK encrypted based messages can only be decrypted by CP and CSP jointly to recover the plaintext.

2.3 secure TrAdaboost training phase

As shown in fig. 3, the training phase of the secure TrAdaboost is an R-round iterative process. The algorithm first performs preparation and preprocessing work. Subsequently, each round of TrAdaboost iterative training consists of four sub-blocks, including sample weight vector normalization, prediction error calculation, weight adjustment rate calculation and sample weight update. Algorithm 4 gives the algorithm flow of the secure TrAdaboost training scheme.

S1, algorithm preparation and preprocessing

First, the SDO and TDO submit respective sets of encrypted training samples to the CP side. Suppose that SDO owns the source data set D _S ＝{(x ₁ ,y ₁ ),...,(x _n ,y _n ) }, TDO having a target data set D _T ＝{(x _n+1 ,y _n+1 ),...,(x _n+m ,y _n+m )}. At the SDO and TDO ends, they first multiply each value contained in the feature vector in their dataset and the corresponding tag value by a scaling factor L (L is specified by the system). In particular, execution

And

where 1 ≦ i ≦ n + m and d represents the size of the feature vector. After encryption with the system global public key PK, SDO and TDO send respective encrypted training sets to CP, i.e., [ D ] _S ]] _PK ＝{([[x ₁ ]] _PK ,[[y ₁ ]] _PK ),...,([[x _n ]] _PK ,[[y _n ]] _PK ) And [ [ D ] _T ]] _PK ＝{([[x _n+1 ]] _PK ,[[y _n+1 ]] _PK ),...,([[x _n+m ]] _PK ,[[y _n+m ]] _PK )}. In particular, [ [ x ] _i ]] _PK ＝([[x _i1 ]] _PK ,[[x _i2 ]] _PK ,...,[[x _id ]] _PK ) Wherein i is more than or equal to 1 and less than or equal to n + m. For simplicity of illustration, PK will be omitted in the following description, i.e., [. cndot. ]]] _PK Is expressed as [ ·]]. In addition, the sizes of the source and target datasets (i.e., n and m) are also sent to the CP store, respectively. Upon receiving [ [ D ] _S ]]And [ [ D ] _T ]]The CP then merges them as a joint training data set: [ [ D ]]]＝{([[x ₁ ,y ₁ ]]),...,([[x _n+m ,y _n+m ]])}。

Subsequently, the CP initializes the sample weights of the joint training data set (the initialization strategy may be specified by TDO). Assuming initial weight values

Determined by the size of the source and target data sets, respectively. The CP sets the initial weight of the source sample to (1/n) and the initial weight of the target sample to (1/m). Then, an encrypted sample weight vector is calculated

S2 sample weight vector normalization

In the t-th iteration training, the algorithm obtains an encrypted normalized sample weight vector p ^t . First, the CP computes the sum of all weight values over the ciphertext domain:

then, obtain [ [ p ] by calling the SDiv algorithm n + m times ^t ]]：

S3, calculating prediction error

Suppose that the weak classifier of the t-th round is trained [ [ h ] _t ]]It is applied to the encrypted sample [ [ x ]]]Is expressed as [ [ h ] _t (x)]]And satisfy h _t (x) E {0,1 }. The algorithm aims at calculating [ [ h ] _t ]]At D _T Weighted prediction error of (1). Knowing | h _t (x _i )-y _i I and { h | _t (x _i ),y _i The xor between them results in equality. Therefore, first calculate | h using the Sxor algorithm _t (x _i )-y _i L (since in the sample weight update phase, it needs to use [ [ h ] _t ]]Error values on source and target samples, so in this calculation, the encrypted prediction error is calculated for the source and target samples and the result is stored at the CP end):

next, the prediction error rate [ ∈ of encryption is calculated by ciphertext multiplication and addition operations _t ]]：

S4, calculating weight adjustment rate

The weight adjustment rate controls the degree of sample weight update. The adjustment rate β of the source training sample weights is constant in each iteration. Therefore, the value of β only needs to be calculated once during the entire traadaboost training process (and since the operands involved in the calculation are public, it is calculated in the clear text domain):

(where R is a preset number of training iterations).

In calculating D _T Before the sample weight adjustment rate (i.e. beta) _t ＝∈ _t /(1-∈ _t )＝-1+1/(1-∈ _t ) The algorithm respectively considers the calculation under three special conditions (based on the epsilon) _t Different values of (a). First, a condition e is determined by the following calculation _t ≥1/2、∈ _t 1 or e _t Whether or not 0 is satisfied (where e ∈ _t Encrypted). Wherein, if ex ₁ 1, then indicates the condition e _t 1/2 is equal to or greater than; otherwise, it indicates ∈ _t < 1/2. If ex ₂ 1, then indicates the condition e _t 0 is satisfied; if ex ₃ 1, then indicates the condition e _t 1 is satisfied.

[[ex ₁ ]]＝SGE([[∈ _t ]]，[[1/2]])；

[[ex ₂ ]]＝SETest([[∈ _t ]])；

[[ex ₃ ]]＝SETest([[1]]·[[∈ _t ]] ^N-1 )＝SETest([[1-∈ _t )

Next, the algorithm calculates [ [ beta ] _t ]]Is taken (has been avoided)

The calculation time denominator may be 0).

Discussion of the related Art

Different value results of (a): if e is _t When 1, there is ex ₂ 1 and

thus, it is possible to provide

If e is _t Not equal to 1, then there is ex ₂ 0 and

thus, it is possible to provide

Suppose when e is _t ≧ 1/2 (or ∈ _t ＝1、∈ _t When is equal to 0), will be beta _t Set directly to constant c ₁ (or c) ₂ 、c ₃ ) E.g. c ₁ 0.5 (or c) ₂ ＝0.4、c ₃ 0.99). Finally, [ [ beta ] _t ]]This can be calculated as follows:

[[S′ ₃ ]]＝SMul([[1]]·[[ex ₁ ]] ^N-1 ,[[1]]·[[ex ₂ ]] ^N-1 )；

[[S″ ₃ ]]＝SMul([[S′ ₃ ]],[[1]]·[[ex ₃ ]] ^N-1 )；

s5 sample weight update

It is known that the strategy of updating the weight values of the source data sample and the target data sample is different. In addition, only when

When misclassified, its corresponding sample weight

Need to be updated. Ciphertext [ | h) due to prediction error _t (x _i )-y _i |]]And i is more than or equal to 1 and less than or equal to n + m is calculated, so that the algorithm only needs to test [ | h ] on the encrypted domain by calling SETest protocol _t (x _i )-y _i |]]Whether or not it is equal to 0:

[[s]]＝SETest([[|h _t (x _i )-y _i |]])

the sample weight vector is then updated by the following strategy:

2.4 secure TrAdaboost prediction phase

Fig. 4 depicts an algorithm flow diagram of the privacy preserving TrAdaboost prediction phase. When the user (called RU) _i ) When requesting a class label for its unlabeled sample x (x from the target sample space), he/she first completes the registration on the system and obtains a unique publicKey/private key pair

And a global public key PK of the system. To prevent leakage of its privacy sample data, the user RU _i Encrypting a request sample with PK to [ x [ [ x ]]]And transmits the data packet

To the CP end. Received from RU _i After requesting data, CP combines with CSP encrypted weak classifiers { [ [ h ] _t ]]And its influence factor pair [ [ x ]]]Performing privacy preserving TrAdaboost prediction, wherein

Subsequently, the cloud server calculates two decision parameters (i.e., [ left ]]]And [ [ right ]]]) Wherein:

next, the cloud server runs the SGE protocol to compare [ [ left ]]]And [ [ right ]]]. If right ≧ left, the final classification result of the model [ [ h ] _f (x)]]Is set as [ [ 1]]](ii) a Otherwise, will [ [ h ] _f (x)]]Is set to [ [0 ]]]. In the RU for returning the prediction result to the user _i Previously, CP and CSP were RU-based _i Of (2)

Re-encryption of classification result [ [ h ] _f (x)]]. Receiving the result of the re-encryption

Thereafter, requesting user RU _i Using its own private key

And recovering the plaintext of the result. Due to addition of RU _i Other entities cannot acquire the private key

The prediction result is thus kept secret. The pseudo code for the secure tradoboost prediction algorithm is given in algorithm 5.

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. A secure migration learning system based on homomorphic encryption, comprising: the system comprises a key generation center KGC, a cloud platform CP, a cloud service provider CSP, a source data owner SDO, a target data owner TDO and a request user RUs;

2. The secure migration learning system based on homomorphic encryption as claimed in claim 1, wherein the key generation center KGC, the initialization cryptosystem parameters and the public/private key pair of the distribution system entity are implemented as follows:

Wherein n is _user A total number of system users is assigned, and an unused key pair is distributed to each registered user; the key bank is updated by KGC in idle time or when needed;

Subsequently, the PK is used as a global public key of the system to be disclosed to the SDO, the TDO and the subsequent RUs of the requesting user; as can be seen from the characteristics of the HRES,PK encrypted based messages can only be decrypted by CP and CSP jointly to recover the plaintext.

3. The secure migration learning system based on homomorphic encryption according to claim 2, wherein the HRES comprises the following algorithms:

(symbol)

A bit length of expression; then, N ═ pq is calculated and a group generator g is selected and the order of g is ord (g) ═ p-1 (q-1)/2; user i's public/private key pair of

Wherein s is _i ∈ _R [1,λ(N ² )]λ (·) denotes the euler function; furthermore, assume that two entities a and B possess a public/private key pair (pk) respectively _A ＝g ^a mod N ² ,sk _A A) and (pk) _B ＝g ^b mod N ² ,sk _B B), the public key obtained after the two carry out Diffie-Hellman negotiation is

Z _N Is an integer set {0,1, …, N-1},

is an integer set {1, …, N-1}, and a ciphertext is obtained by calculation

Wherein T and T' are respectively a first element and a second element of the ciphertext;

decryption Dec algorithm: using the private key sk _i For ciphertext

And (3) decryption:

wherein l (u) ═ 1/N;

the EncTK algorithm with double keys is as follows: the system global public key PK is selected to encrypt the message, similar to the Enc algorithm, given a plaintext message m ∈ Z _N Obtaining a ciphertext

For the sake of simplifying the expression, will

Is uniformly and schematically represented as

Using sk _A Partial decryption PDec1 algorithm: input the method

And sk _A A, the first stage of partial decryption:

And sk _B B, the second stage of partial decryption is performed, so as to obtain the plaintext information m:

m＝L(T ⁽¹⁾ /T′ ⁽²⁾ mod N ² )

re-encryption first stage FPRE algorithm: given ciphertext

Private key sk _A And a user public key pk _j And executing the first-stage re-encryption calculation:

re-encryption second stage SPRE algorithm: given partial re-encrypted ciphertext

Private key sk _B And a user public key pk _j Performing a second stage of re-encryption calculation to obtain a plaintext m based on the public key pk _j Corresponding cipher text of

Cipher text

4. The secure migration learning system based on homomorphic encryption according to claim 3, wherein the HRES characteristics of the homomorphic re-encryption system based on Paillier algorithm are as follows:

(1) additive homomorphism: given m ₁ ,m ₂ ∈Z _n Is provided with

(2) Given a

And

is provided with

(3)

5. The homomorphic encryption-based secure migration learning system according to claim 2, wherein the training process of the TrAdaboost classifier is as follows:

s1, algorithm preparation and preprocessing

First, SDO and TDO submit respective encrypted data sets to CP, assuming that SDO owns source training data set D _S ＝{(x ₁ ,y ₁ ),...,(x _n ,y _n ) }, TDO has a target training data set D _T ＝{(x _n+1 ,y _n+1 ),...,(x _n+m ,y _n+m ) }; in SDO and TDO, each value and corresponding tag value included in the feature vector in the dataset is first multiplied by a scaling factor L, specifically: execute

And

wherein 1 ≦ i ≦ n + m and d represents the magnitude of the feature vector; after encryption with the system global public key PK, SDO and TDO send respective encrypted data sets to CP, i.e.

And

wherein i is more than or equal to 1 and less than or equal to n + m; for simplicity of presentation, will

Is shown as

In addition, the sizes of the source training data set and the target training data set, namely n and m, are also respectively sent to the CP for storage; upon receiving

And

the CP then merges them as a joint training data set:

S2 sample weight vector normalization

In the t-th round of iterative training, an encrypted normalized sample weight vector p is obtained ^t (ii) a First, the CP computes the sum of all weight values over the ciphertext domain:

then, it is obtained by calling the secure division protocol SDiv n + m times

S3, calculating prediction error

Suppose that the weak classifier of the t-th round has been trained

Apply it to the encrypted sample

Is expressed as

And satisfy h _t (x) E {0,1 }; the algorithm aims at calculating

At D _T Weighted prediction error of (1); knowing | h _t (x _i )-y _i I and { h | _t (x _i ),y _i The XOR between them results in equality; therefore, first calculate | h using the secure XOR protocol Sxor _t (x _i )-y _i L, |; due to the need to use in the sample weight update phase

Error values on the source and target samples, so in this calculation, the encrypted prediction error is calculated for the source and target samples and the result is stored at the CP: the implementation mode of the secure exclusive-or protocol Sxor is as follows: for two encrypted numbers

And

m ₁ ,m ₂ e.g. 0,1, to realize the ciphertext XOR operation to obtain the encrypted bit XOR result

If m ₁ ＝m ₂ And u is 0; otherwise, u is 1;

next, the prediction error rate of the encryption is calculated by the secure multiplication protocol SMul and the addition operation

S4, calculating weight adjustment rate

wherein R is a preset training iteration number;

in calculating D _T Before the sample weight adjustment rate; i.e. beta _t ＝∈ _t /(1-∈ _t )＝-1+1/(1-∈ _t ) The algorithm is based on e _t The different values of (A) are calculated under three special conditions respectively; first, a condition e is determined by the following calculation _t ≥1/2、∈ _t 1 or e _t Whether or not 0 is satisfied; wherein, if ex ₁ 1, then indicates the condition e _t 1/2 is equal to or greater than; otherwise, it indicates ∈ _t ＜1/2; if ex ₂ 1, then indicates the condition e _t 0 is satisfied; if ex ₃ 1, then indicates the condition e _t 1 is satisfied;

SGE is greater than or equal to the safety comparison protocol, SETest is the safety ciphertext-plaintext equal test protocol; the secure ciphertext-plaintext equivalence test protocol SETest is implemented in the following modes: for ciphertext

And the equality relationship between k in the plain text, where m is [0,1]]Real number in between and k is 0; if u-1, indicates that m-0; if u is 0, it indicates that m is not equal to 0;

next, an algorithm is calculated

Taking the value of (A);

SRec is a safety reciprocal protocol;

discussion of the related Art

Different value results of (a): if e is _t When 1, there is ex ₂ 1 and

thus, it is possible to provide

If e is equal to _t Not equal to 1, then there is ex ₂ 0 and

thus, it is possible to provide

Suppose when e is _t ≧ 1/2 or ∈ _t ＝1、∈ _t When equal to 0, the beta is _t Set directly to constant c ₁ Or c ₂ 、c ₃ (ii) a Finally, the process is carried out in a batch,

this can be calculated as follows:

s5, sample weight updating

When misclassified, its corresponding sample weights

Need to be updated; ciphertext due to prediction error

Has been calculated, so the algorithm only needs to test on the encrypted domain by calling SETest protocol

Whether or not it is equal to 0:

the sample weight vector is then updated by the following strategy:

for i＝1,...,n,

for i＝n+1,...,n+m,

6. the homomorphic encryption-based secure migration learning system according to claim 5, wherein the TrAdaboost classifier implements encryption prediction as follows:

And a global public key PK of the system; requesting user RU to prevent leakage of its privacy sample data _i Encrypting request samples into using PK

And transmits the data packet

Feeding the CP; received from RU _i After requesting data, CP and CSP are combined to encrypt weak classifier

And its influence factor pair

Performing privacy preserving TrAdaboost prediction, wherein

For deployment, the CP and CSP are first in

Performing encrypted weak classifier computation alternately and obtaining

Performing a weighted prediction result calculation:

SMul is a safe multiplication protocol, and SNLog is a safe natural logarithm protocol; the implementation mode of the safe natural logarithm cooperation SNLog is as follows: input ciphertext