CN112822005B - Secure transfer learning system based on homomorphic encryption - Google Patents
Secure transfer learning system based on homomorphic encryption Download PDFInfo
- Publication number
- CN112822005B CN112822005B CN202110134461.8A CN202110134461A CN112822005B CN 112822005 B CN112822005 B CN 112822005B CN 202110134461 A CN202110134461 A CN 202110134461A CN 112822005 B CN112822005 B CN 112822005B
- Authority
- CN
- China
- Prior art keywords
- encrypted
- sample
- algorithm
- encryption
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0816—Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
- H04L9/0838—Key agreement, i.e. key establishment technique in which a shared key is derived by parties as a function of information contributed by, or associated with, each of these
- H04L9/0841—Key agreement, i.e. key establishment technique in which a shared key is derived by parties as a function of information contributed by, or associated with, each of these involving Diffie-Hellman or related key agreement protocols
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a security transfer learning system based on homomorphic encryption. The system designs an encryption TrAdaboost training and prediction algorithm based on a double-cloud server model (a storage cloud server and a computing cloud server) around the privacy disclosure problem of migration machine learning in a cloud outsourcing scene. On one hand, a source domain data owner and a target domain data owner of the system respectively upload encrypted training data to a cloud end, and a cloud server trains out a TrAdaboost model in a privacy protection mode; on the other hand, a requesting user of the system sends an encrypted data sample to the cloud server to request a secure predictive service, and then the cloud server returns an encrypted predictive classification result. The system does not leak training and prediction request data, training models, prediction results and intermediate calculation results of users (including data owners and prediction requesters) to a cloud or unauthorized user.
Description
Technical Field
The invention relates to a security transfer learning system based on homomorphic encryption.
Background
Cloud computing delivers computing services (including data storage, computing, software, data analysis, etc.) to individuals or companies, providing great convenience to users and reducing software construction or operating costs. Due to the characteristics of strong computing power and storage capacity, high reliability, on-demand service and the like of cloud computing, the cloud computing is widely applied to the practical fields of big data analysis, data backup, software development test, management and the like. Meanwhile, the rapid development of machine learning (including migration learning) benefits from the support of cloud computing. Technical researchers or users outsource complex intelligent computing tasks to the cloud server, the cloud server performs efficient computing and returns computing results to the users, and therefore the users can efficiently and stably complete machine learning computing on resource-limited personal computers or mobile equipment.
In recent years, machine learning has been applied to many fields in real life, such as face or voice recognition, medical diagnosis, financial analysis, and smart home. Transfer learning incorporates the idea of "analogy learning" into machine learning and gains more and more attention from researchers. It aims to transfer learned knowledge (i.e., source tasks) to new problems (i.e., target tasks) to assist in the learning of the target tasks. In general, migration learning is effective when the source domain (or source task) and the target domain (or target task) are different but related. For example, the migratory learning algorithm can multiplex shallow network parameters in a high-performance training model (e.g., AlexNet or ResNet model) and adapt the deep network to the user-customized classification task through training. Consider another real-world scenario: a certain medical research institution collects a large amount of labeled medical data, while another clinic institution has only a small amount of labeled medical data and unlabeled data from the patient, while the medical data of both institutions is relevant. In this case, if the corresponding classification model is trained only with the medical data of the clinic, it is obviously likely that the accuracy is low. When the transfer learning technology is used for training by combining the medical data of the two institutions (namely, the data of the medical research institution is used for assisting the model training of the clinic institution), the quality of the classifier can be improved to a greater extent. Tradoboost is an example-based classical migration learning algorithm (proposed in 2007) that can be used to solve such problems. The TrAdaboost algorithm refers to the core design idea of Adaboost (an ensemble learning algorithm), combines large-scale marking data of a source domain and a small amount of marking data of a target domain, and trains a classifier with good performance for a target task. Specifically, the algorithm optimizes the sub-classifiers round-by-round by adjusting the weight values of the training samples. And the final prediction result is determined by a weighted combination of the sub-classifiers.
However, while delegating the migration learning task to the cloud server for computing, a serious challenge is introduced — the risk of privacy disclosure. Since the data or results involved in the calculations may involve sensitive information of the user, such as personal images, financial information, health conditions, etc. In addition, models of transfer learning are often viewed as important property for researchers, and may be subject to significant damage if exposed. While the cloud server is generally not trusted, and a system attacker may eavesdrop on the transmission channel of the data or attack the cloud server. Therefore, it is necessary to apply effective privacy protection measures in the outsourcing computation task. In order to solve such problems, several mainstream security technologies have been used to ensure security and privacy of cloud computing, including Secret Sharing (SS), Garbled Circuits (GCs), Differential Privacy (DP), hardware security, Homomorphic Encryption (HE), and the like. The homomorphic encryption technology is characterized by the computability of a ciphertext (namely, a calculation result of the ciphertext is equivalent to a calculation result of a corresponding plaintext after being decrypted), and an excellent solution is provided for privacy-protecting cloud computing. Compared with other security technologies, homomorphic encryption can achieve a higher security level. However, the homomorphic encryption technique has the characteristics of high overhead, capability of only supporting positive integer arithmetic, incapability of naturally supporting nonlinear arithmetic (such as logarithm arithmetic), and the like. How to design an effective homomorphic encryption protocol to simultaneously meet the privacy, correctness and efficiency of migration learning in cloud computing is still a research problem which needs to be continuously perfected.
With the help of different security technologies, privacy protection schemes for transfer learning currently exist. Some researchers have implemented Federal Transfer Learning (FTL) computation while protecting privacy of sensitive information. Liu et al achieved the safety training, prediction and cross validation process of FTL by Additive Homomorphic Encryption (AHE). In this scheme, data samples are kept separately by both parties (i.e., the source domain data owner and the target domain data owner) and parameters are encrypted before exchanging data. Sharma et al implement a similar secure FTL training process via SS technology. Both of these schemes can resist two threat levels, semi-honest (semi-host) and malicious (malicious) models. In addition, Gao et al propose a privacy protection scheme for joint neural network migration learning (including training and prediction processes) based on AHE and SS technologies, respectively. In addition, using the same technique, Gao et al designed a heterogeneous FTL system based on a safe logistic regression model. However, the above schemes involve multiple rounds of interaction between users, while consuming a large amount of local computing resources. The first FTL framework for wearable medical scenarios (using AHE technology) was proposed by Chen et al, but no detailed security design solution is given in the scheme. Since the FTL supports only unidirectional transmission, the source domain participants cannot benefit from the computation. To address this problem, Ma et al devised a secure collaborative migration learning scheme based on the SPDZ framework. The correctness of the system result can be verified by a Message Authentication Code (MAC).
Liu et al propose a privacy-preserving multitask learning scheme that utilizes AHE techniques to encrypt model parameters of users and perform cryptographic computations through cloud services. However, in this scheme, all users encrypt their respective data using the same public key. Thus, as long as one user is bought (confidential), the privacy of all users will be compromised. Different from the encryption technology adopted by Liu et al, Xie et al and Zhang et al design a multi-task learning system for privacy protection by using a differential privacy technology. In addition, differential privacy has also been used in other secure migration learning schemes, such as the privacy-preserving domain adaptation (domain adaptation) computation scheme proposed by Wang et al or the privacy-preserving hypothetical migration learning (hypothesization) scheme proposed by Yao et al. Wang et al have designed a security domain adaptive scheme based on counterlearning, which adds certain gaussian noise to the gradient to protect the confidentiality of private data. While differential privacy based schemes have significantly lower overhead than encryption schemes, the accuracy of the results is impaired to a different extent. To implement multi-party knowledge migration, Ma et al introduced a privacy-preserving cloud outsourcing framework that implements decision tree learning and prediction. The core idea of the algorithm is to implement a similarity measure between decision trees over the encrypted domain.
Disclosure of Invention
The invention aims to provide a homomorphic encryption-based secure migration learning system aiming at the problems of certain security risk, low efficiency of a ciphertext computing protocol and the like in the existing secure migration learning scheme, so that the TrAdaboost training and prediction for protecting privacy are realized, and the local overhead of a user is reduced as much as possible.
In order to achieve the purpose, the technical scheme of the invention is as follows: a secure migration learning system based on homomorphic encryption, comprising: the system comprises a key generation center KGC, a cloud platform CP, a cloud service provider CSP, a source data owner SDO, a target data owner TDO and a request user RUs;
a key generation center KGC responsible for initializing cryptographic system parameters and distributing public/private key pairs of system entities;
a cloud platform CP, which is responsible for receiving and storing training data from SDO and TDO and prediction request data from RUs, and performing partial computation of the system;
the CSP interacts with the CP and provides computing service for a transfer learning algorithm for protecting privacy; in addition, the CP and CSP jointly perform decryption and re-encryption operations;
a source data owner SDO, the SDO owning the tagged sample instance from the source domain, sending the encrypted data set to the CP as a source training data set of the system;
target data owner TDO, TDO having tagged sample instances and untagged sample instances from target domains, the source and target domains involved in the system are distributed differently but have correlation; the TDO sends the encrypted data set to a CP to serve as a target training data set of the system, and the CP combines the encrypted source training data set and the target training data set to serve as a joint training data set;
and the request user RUs sends the encrypted unmarked sample from the target domain to the CP after the CP and the CSP finish the construction of the TrAdaboost classifier for the target space, requests related prediction calculation, and the encrypted prediction result returned from the CP can only be decrypted by the corresponding request user.
In an embodiment of the present invention, the key generation center KGC initializes the cryptosystem parameters and distributes the public key/private key pair of the system entity as follows:
(1) KGC generates system parameters N and g of a Paillier algorithm-based homomorphic re-encryption system HRES, and KGC generates respective public key/private key pairs for all entities in the system, specifically: KGC distributes key pairs (pk) for SDO, TDO, CP and CSP, respectively sdo ,sk sdo )、(pk tdo ,sk tdo )、(pk cp ,sk cp ) And (pk) csp ,sk csp ) (ii) a In addition, the KGC sets up a key repository to store the requesting user's public/private key pairsWherein n is user Representing the total number of system users, and distributing an unused key pair to each registered user; the key bank is updated by KGC in idle time or when needed;
(2) CP and CSP exchange their respective public keys and negotiate out a Diffie-Hellman keySubsequently, the PK is used as a global public key of the system to be disclosed to the SDO, the TDO and the subsequent RUs of the requesting user; as can be seen from the characteristics of HRES, PK encrypted based messages can only be decrypted by CP and CSP jointly to recover the plaintext.
In an embodiment of the present invention, the Paillier algorithm based homomorphic re-encryption system HRES includes the following algorithms:
key generation KeyGen algorithm: a safety parameter k and two large prime numbers p and q are given to satisfy(symbol)A bit length of expression; then, N ═ pq is calculated and one group generator g is selected, and the order of g is ord (g) ═ p-1 (q-1)/2; user i's public/private key pair ofWherein s is i ∈ R [1,λ(N 2 )]λ (·) denotes the euler function; furthermore, assume that two entities a and B possess a public/private key pair (pk) respectively A =g a modN 2 ,sk A A) and (pk) B =g b modN 2 ,sk B B), the public key obtained after the two carry out Diffie-Hellman negotiation isThe corresponding joint decryption private keys are a and b respectively; taking PK as a global public key of the system in the system; the parameters g and N are published;
the encryption Enc algorithm: joining message m to Z N And the public key pk i As input, randomly selectAnd calculating to obtain ciphertextWherein T and T' are respectively the first and second elements of the ciphertext;
decryption Dec algorithm: using the private key sk i For ciphertextAnd (3) decryption:wherein l (u) ═ 1/N;
the EncTK algorithm with double keys is as follows: the system global public key PK is selected to encrypt the message, similar to the Enc algorithm, given a plaintext message m ∈ Z N Obtaining a ciphertext [ [ m ]]] PK =(T,T′)={PK r (1+m·N),g r }(modN 2 ) (ii) a To simplify the expression, [ [ m ]]] PK Is uniformly and briefly expressed as [ [ m ]]];
Using sk A Partial decryption PDec1 algorithm: input [ [ m ]]]And sk A A, the first stage of partial decryption:
using sk B Partial decryption PDec2 algorithm: inputting partially decrypted ciphertextAnd sk B The second stage of partial decryption is performed, so as to obtain the plaintext information m:
m=L(T (1) /T′ (2) modN 2 )
re-encryption first stage FPRE algorithm: given ciphertext [ [ m ]]]Private key sk A And a user public key pk j And executing the first-stage re-encryption calculation:
re-encryption second stage SPRE algorithm: given partial re-encryption ciphertext [ m [ [ m ]]] + Private key sk B And a user public key pk j Performing a second stage of re-encryption calculation to obtain a plaintext m based on the public key pk j Corresponding cipher text of
Cipher textThe private key sk can only be used by the user j j And executes the Dec algorithm for decryption.
In an embodiment of the present invention, HRES characteristics of the stateful re-encryption system based on the Paillier algorithm are as follows:
In an embodiment of the present invention, a training process of the TrAdaboost classifier is as follows:
the training of the tradoost is an R-round iterative process, firstly preparation and preprocessing are carried out, and then each round of the tradoost iterative training consists of four sub-blocks, including sample weight vector normalization, prediction error calculation, weight adjustment rate calculation and sample weight update, which are specifically as follows:
s1, algorithm preparation and preprocessing
First, SDO and TDO submit respective encrypted data sets to CP, assuming that SDO owns source training data set D S ={(x 1 ,y 1 ),...,(x n ,y n )},TDO has a target training data set D T ={(x n+1 ,y n+1 ),...,(x n+m ,y n+m ) }; in SDO and TDO, each value and corresponding tag value included in the feature vector in the dataset are first multiplied by a scaling factor L, that is: executeAndwherein 1 ≦ i ≦ n + m and d represents the magnitude of the feature vector; after encryption with the system global public key PK, SDO and TDO send respective encrypted data sets to CP, i.e., [ D ] S ]] PK ={([[x 1 ]] PK ,[[y 1 ]] PK ),...,([[x n ]] PK ,[[y n ]] PK ) And [ [ D ] T ]] PK ={([[x n+1 ]] PK ,[[y n+1 ]] PK ),...,([[x n+m ]] PK ,[[y n+m ]] PK )},[[x i ]] PK =([[x i1 ]] PK ,[[x i2 ]] PK ,...,[[x id ]] PK ) Wherein i is more than or equal to 1 and less than or equal to n + m; to simplify the representation, [. C]] PK Is expressed as [ ·]](ii) a In addition, the sizes of the source training data set and the target training data set, namely n and m, are also respectively sent to the CP for storage; upon receiving [ [ D ] S ]]And [ [ D ] T ]]The CP then merges them as a joint training data set: [ [ D ]]]={([[x 1 ,y 1 ]]),...,([[x n+m ,y n+m ]])};
Subsequently, the CP initializes sample weights for the joint training data set, assuming initial weight valuesRespectively determining the sizes of a source training data set and a target training data set, setting the initial weight of a source sample to be (1/n) and the initial weight of a target sample to be (1/m) by the CP; then, an encrypted sample weight vector is calculated
S2 sample weight vector normalization
In the t-th iteration training, an encrypted normalized sample weight vector p is obtained t (ii) a First, the CP calculates the sum of all weight values over the ciphertext domain:
then, get [ p ] by calling the secure division protocol SDiv n + m times t ]]:
S3, calculating prediction error
Suppose that the weak classifier of the t-th round is trained [ [ h ] t ]]It is applied to the encrypted sample [ [ x ]]]Is expressed as [ [ h ] t (x)]]And satisfy h t (x) E {0,1 }; the algorithm aims at calculating [ [ h ] t ]]At D T Weighted prediction error of (1); knowing | h t (x i )-y i I and { h | t (x i ),y i The XOR between them results in equality; therefore, first calculate | h using the secure XOR protocol Sxor t (x i )-y i L, |; since in the sample weight update phase, it needs to use [ [ h ] t ]]Error values on the source and target samples, so in this calculation, the encrypted prediction error is calculated for the source and target samples and the result is stored at the CP:
next, the operations of SMul and addition are performed through the secure multiplication protocolCalculating the prediction error rate [ ∈ of encryption t ]]:
S4, calculating weight adjustment rate
The weight adjustment rate controls the degree of updating of the sample weights; the adjustment rate beta of the weights of the source training samples is constant in each iteration; therefore, the value of β only needs to be calculated once during the whole training process of traadaboost and since the operands involved in the calculation are all public, it can be calculated in the plaintext domain:
in calculating D T Before the sample weight adjustment rate; i.e. beta t =∈ t /(1-∈ t )=-1+1/(1-∈ t ) The algorithm is based on e t The different values of (A) are calculated under three special conditions respectively; first, a condition e is determined by the following calculation t ≥1/2、∈ t 1 or e t Whether or not 0 is satisfied; wherein, if ex 1 1, then indicates the condition e t 1/2 is equal to or greater than; otherwise, indicating ∈ t < 1/2; if ex 2 1, then indicates the condition e t 0 is satisfied; if ex 3 1, then indicates the condition e t 1 is satisfied;
[[ex 1 ]]=SGE([[∈ t ,[[1/2]]);
[[ex 2 ]]=SETest([[∈ t ]]);
[[ex 3 ]]=SETest([[1]]·[[∈ t ] N-1 )=SETest([[1-∈ t ]])
SGE is greater than or equal to the safety comparison protocol, SETest is the safety ciphertext-plaintext equal test protocol;
next, the algorithm calculates [ [ beta ] t ]]Taking the value of (A);
SETest is a safety reciprocal protocol;
discussion of the related ArtDifferent value results of (a): if e is t When 1, there is ex 2 1 andthus, it is possible to provideIf e is t Not equal to 1, then there is ex 2 0 andthus, it is possible to provideSuppose when e is t ≧ 1/2 or ∈ t =1、∈ t When equal to 0, the beta is t Set directly to constant c 1 Or c 2 、c 3 (ii) a Finally, [ [ beta ] t ]]This can be calculated as follows:
[[S′ 3 ]]=SMul([[1]]·[[ex 1 ]] N-1 ,[[1]]·[[ex 2 ]] N-1 );
[[S″ 3 ]]=SMul([[S′ 3 ]],[[1]]·[[ex 3 ]] N-1 );
s5 sample weight update
It is known that the strategy of updating the weight values of the source data sample and the target data sample is different; in addition, only whenWhen misclassified, its corresponding sample weightNeed to be updated; ciphertext due to prediction error [ [ h ] t (x i )-y i |]]And i is more than or equal to 1 and less than or equal to n + m is calculated, so that the algorithm only needs to test [ | h ] on the encrypted domain by calling SETest protocol t (x i )-y i |]]Whether or not it is equal to 0:
[[s]]=SETest([[|h t (x i )-y i |]])
the sample weight vector is then updated by the following strategy:
in an embodiment of the present invention, a process of implementing encryption prediction by the tradoboost classifier is as follows:
RU when requesting user i When a request is made to obtain a classification label of an unlabeled sample x from a target sample space, registration is first completed on the system and a unique public/private key pair is obtainedAnd a global public key PK of the system; requesting user RU to prevent leakage of its privacy sample data i Encrypting a request sample with PK to [ x [ [ x ]]]And transmitting the data packetFeeding the CP; received from RU i After requesting the data, the CP and CSP combine the encrypted weak classifier { [ [ h ] t ]]And its influence factor pair [ [ x ]]]Performing privacy-preserving TrAdaboost prediction, whereinFor deployment, CP and CSP are first in [ [ x ]]]Performing encrypted weak classifier computation alternately and obtainingPerforming a weighted prediction result calculation:
[[l t ]]=SMul([[h t (x)]],[[SNLog([β i ])]]) N-1
SMul is a safe multiplication protocol, and SNLog is a safe natural logarithm protocol;
subsequently, the cloud server calculates two decision parameters, namely [ [ left ] ] and [ [ right ] ], wherein:
next, the cloud server runs a greater than or equal security comparison protocol SGE to compare [ [ left ]]]And [ [ right ]]]If right ≧ left, the final classification result [ [ h ] f (x)]]Is set as [ [ 1]]](ii) a Otherwise, will [ [ h ] f (x)]]Is set to [ [0 ]]](ii) a In RU returning prediction result to requesting user i Previously, CP and CSP were RU-based i Of (2) a public keyRe-encryption of classification result [ [ h ] f (x)]](ii) a Receiving the result of the re-encryptionThereafter, requesting user RU i Using its own private keyThe plaintext of the result is recovered.
In an embodiment of the present invention, the secure xor protocol SXor is implemented as follows: for two encrypted numbers [ m ] 1 ]]And [ [ m ] 2 ]],m 1 ,m 2 E {0,1}, and implementing ciphertext XOR operation to obtain an encrypted bit XOR result [ u ]]]If m is 1 =m 2 And u is 0; otherwise, u is 1.
In an embodiment of the present invention, the secure ciphertext-plaintext equality test protocol SETest is implemented in the following manner: testing equality relationship between ciphertext [ m ] and plaintext k, wherein m is a real number between [0,1] and k is 0; if u is 1, it means that m is 0; if u is 0, it indicates that m is not equal to 0.
In an embodiment of the present invention, the implementation manner of the secure natural logarithm synlog is: inputting a ciphertext [ x ] and outputting an encrypted natural logarithm operation result [ ln (x) ]; since the range of the input value of the logarithmic operation related to the present system is between (0, 1), the input value [ [ x ] ] of the present protocol only needs to satisfy x ∈ (0, 1).
Compared with the prior art, the invention has the following beneficial effects:
1. privacy preserving tradabost training. The system adopts a Paillier-based homomorphic re-encryption scheme (HRES) as a basic encryption system. The system allows two non-hooked cloud servers to be used for carrying out privacy-protecting TrAdaboost model training on the data sets (both on the encryption domain) of the source domain and the target domain. While during model training, the two servers will not get any information about the private data (i.e. the training data set, the final model results, and the intermediate calculation results). The encrypted training model parameters are stored in the cloud server by the system and are used for subsequently processing a sample prediction request of a requesting user.
2. Privacy preserving TrAdaboost prediction. In the system, a user is requested to upload an encrypted data sample to a cloud server, and the cloud server calculates a prediction result by using a pre-trained model and finally returns the prediction result to the user. Due to the encryption computing characteristic of the AHE, the two cloud servers perform outsourcing prediction computation on the encryption domain through interaction, and obtain an encrypted prediction result. Only the corresponding requesting user can decrypt the real prediction result by using the own secret key.
3. An efficient ciphertext computing protocol. In order to further reduce the operation overhead of the system, the system designs and realizes three cipher text calculation protocols, including a safe exclusive-or protocol, a safe cipher text-plaintext equivalence test protocol and a safe natural logarithm protocol. These security protocols perform security operations on encrypted input values and output the encrypted results. At the same time, these protocols are more efficient than existing related protocols.
4. The local overhead of the user is reduced as much as possible. On one hand, a system data user only needs to encrypt or decrypt the data uploaded to the cloud end or the encrypted result returned by the cloud end, and the cloud server with strong computing power executes complex TrAdaboost training and prediction calculation. On the other hand, the system minimizes the interaction cost between the data user and the cloud server: the owner of the source domain data and the target domain data only needs to send the encrypted training data to the cloud end; and the prediction request user only needs to transmit the encrypted data sample to the cloud end and wait for the cloud server to return the prediction result.
The application is as follows: the invention provides a safe transfer learning system based on an addition homomorphic re-encryption scheme. The system designs an encryption TrAdaboost training and prediction algorithm based on a double-cloud server model (a storage cloud server and a computing cloud server) around the privacy disclosure problem of migration machine learning in a cloud outsourcing scene. On one hand, a source domain data owner and a target domain data owner of the system respectively upload encrypted training data to a cloud end, and a cloud server trains a TrAdaboost model in a privacy protection mode; on the other hand, a requesting user of the system sends an encrypted data sample to the cloud server to request a secure predictive service, and then the cloud server returns an encrypted predictive classification result. The system does not leak training and prediction request data, training models, prediction results and intermediate calculation results of users (including data owners and prediction requesters) to a cloud or unauthorized user.
Drawings
FIG. 1 is a system model of the present invention.
FIG. 2 is a flow chart of the system of the present invention.
Fig. 3 illustrates the secure TrAdaboost training phase of the present invention.
Fig. 4 shows the secure tradoost prediction stage of the present invention.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
The invention provides a security transfer learning system based on homomorphic encryption, which comprises: the system comprises a key generation center KGC, a cloud platform CP, a cloud service provider CSP, a source data owner SDO, a target data owner TDO and a request user RUs;
a key generation center KGC responsible for initializing cryptographic system parameters and distributing public/private key pairs of system entities;
a cloud platform CP, which is responsible for receiving and storing training data from SDO and TDO and prediction request data from RUs, and performing partial computation of the system;
the CSP interacts with the CP and provides computing service for a migration learning algorithm for protecting privacy; in addition, the CP and CSP jointly perform decryption and re-encryption operations;
a source data owner SDO, the SDO owning a tagged sample instance from the source domain, sending the encrypted data set to the CP as a source training data set of the system;
target data owner TDO, TDO having tagged sample instances and untagged sample instances from target domains, the source and target domains involved in the system are distributed differently but have correlation; the TDO sends the encrypted data set to a CP to serve as a target training data set of the system, and the CP combines the encrypted source training data set and the target training data set to serve as a joint training data set;
and the request user RUs sends the encrypted unmarked sample from the target domain to the CP after the CP and the CSP finish the construction of the TrAdaboost classifier for the target space, requests related prediction calculation, and the encrypted prediction result returned from the CP can only be decrypted by the corresponding request user.
The following is a specific implementation of the present invention.
A secure migration learning system based on homomorphic encryption, comprising: the system comprises a key generation center KGC, a cloud platform CP, a cloud service provider CSP, a source data owner SDO, a target data owner TDO and a request user RUs;
a key generation center KGC responsible for initializing cryptographic system parameters and distributing public/private key pairs of system entities;
a cloud platform CP responsible for receiving and storing training data from SDO and TDO and prediction request data from RUs, and performing partial computation of the system;
the CSP interacts with the CP and provides computing service for a migration learning algorithm for protecting privacy; in addition, the CP and the CSP jointly execute decryption and re-encryption operations;
a source data owner SDO, the SDO owning a tagged sample instance from the source domain, sending the encrypted data set to the CP as a source training data set of the system;
target data owner TDO, TDO having tagged sample instances and untagged sample instances from target domains, the source and target domains involved in the system are distributed differently but have correlation; the TDO sends the encrypted data set to a CP to serve as a target training data set of the system, and the CP combines the encrypted source training data set and the target training data set to serve as a joint training data set;
and the request user RUs sends the encrypted unmarked sample from the target domain to the CP after the CP and the CSP finish the construction of the TrAdaboost classifier for the target space, requests related prediction calculation, and the encrypted prediction result returned from the CP can only be decrypted by the corresponding request user.
The following is a specific implementation of the present invention.
Fig. 1 is a system architecture of the present invention, which includes five entities, namely, a Key Generation Center (KGC), a Cloud Platform (CP), a Cloud Service Provider (CSP), a Source Data Owner (SDO), a Target Data Owner (TDO), and a Request User (RUs).
1. Key Generation Center (KGC): the KGC is a trusted authority responsible for initializing cryptographic system parameters and distributing the public/private key pairs of system entities.
2. Cloud Platform (CP): the CP has powerful storage and computation capabilities, and its task is to receive and store training data from SDO and TDO and prediction request data from RU, and perform part of the computation of the system.
3. Cloud Service Provider (CSP): the CSP interacts with the CP to provide computing service for the migration learning algorithm for protecting privacy. In addition, the CP and CSP jointly perform decryption and re-encryption operations.
4. Source Data Owner (SDO): the SDO has enough instances of tagged samples (from the source domain) to send encrypted data samples to the CP as source training data for the system.
5. Target Data Owner (TDO): TDO has a small number of labeled sample instances and unlabeled sample instances (from the target domain), requiring that the source and target domains involved in the system are distributed differently but have correlation. After the TDO sends the encrypted data set to the CP, the CP merges the encrypted source and target data sets as a joint training data set.
6. User (RUs): after the cloud server completes the construction of the TrAdaboost classifier (for the target space), the user sends the encrypted unlabeled sample (from the target domain) to the CP and requests the relevant prediction calculation. The encrypted prediction results returned from the CP can only be decrypted by the corresponding requesting user.
Table 1 lists the important symbols used in the present invention.
Table 1: symbols in the invention
1. First, some of the algorithms employed by the present invention are introduced:
1.1 TrAdaboost algorithm
The main algorithm adopted by the invention is the TrAdaboost algorithm, which is specifically explained as follows:
TrAdaboost is an example-based classical migration learning algorithm. Assuming a large number of labeled source data sets D S ={(x 1 ,y 1 ),...,(x n ,y n ) And a small number of labeled target data sets D T ={(x n+1 ,y n+1 ),...,(x n+m ,y n+m ) } (Label y i E {0,1}), the two datasets have different distributions but some similarity. The TrAdaboost algorithm is extended from the idea of the Adaboost algorithm, and the purpose of the TrAdaboost algorithm is to utilize a joint data set D ═ D S ∪D T And constructing a good classifier for the target domain space. The algorithm sets weights for each source and target training sample, respectively, and adjusts the values of the weight vectors in turn during the iterative training process. Specifically, the method comprises the following steps: in each round of TrAdaboost training, the weights of the instances that are favorable to the target classification task are increased, and the weights of the opposite instances are decreased. After the training is finished, the training device is used,the algorithm obtains R (i.e., the number of iterations of the algorithm) weak classifiers. The final classification hypothesis is determined by the weighted classification results of the weak classifiers (from the second half of the training iteration). The overall procedure of the traadaboost algorithm is shown below (where vector data is represented by a coarse body sign to distinguish scalar data, e.g., w ═ w (w) 1 ,...,w n )):
(1) Weight vector w of the initialization training example: the initial value of the weight vector w of the joint data sample is specified. For example, the initial weights of the source sample and the target sample may be set according to the size of the data set, respectively, that is:
the following is a repetition of the iterative process of the training algorithm (i.e., stage 2 to stage 6) R times (with t representing the current iteration round):
(3) training a weak classifier: using joint training examples (taking into account their weight distribution p) t ) Training output hypothesis h t X → Y. Where X is the instance space of D and Y ∈ {0,1} represents the set of classification labels.
(4) Calculate h t At D T Prediction error of (2): given D T Instance x on the Domain, classifier predictor h t (x) And a true tag c (x). Classifier h t The total error of (c) is calculated by combining the weighted misclassification rates:
(5) calculating the weight adjustment rate of the source and target training samples: d S The weight adjustment rate (i.e. beta) of (c) depends only on n and R and thus remains constant throughout the algorithm. D T The weight adjustment rate (i.e. beta) t ) With e t In each round of the movementAnd (6) updating the rows.
It is worth noting that: when is e t Taking a special value (1) epsilon t ≥1/2、(2)∈ t 1 or (3) e t When 0, the algorithm will calculate β alone t The value of (c). For example, β may be t Set to a suitable constant.
(6) The weight vector is updated. If example x i Satisfy h t (x i )≠c(x i ) (i ∈ {1,..., n }), the algorithm reduces x i And (4) corresponding weight values. On the contrary, if example x i Satisfy h t (x i )≠c(x i ) (i ∈ { n + 1.,. n + m }), x should be added i The corresponding weight. Otherwise, the weight value of the sample is unchanged.
(7) The final hypothesis is output. When the iterative training process is finished, the iterative weak classifier is trained in combination with the second half (i.e. { h } t },The final classifier (for the target domain) is obtained. Wherein, the weak classifier h t Is dependent on beta t Value of (so beta) t Also known as h t The influence factor of (c).
1.2 homomorphic re-encryption system based on Paillier algorithm
The invention utilizes a homomorphic re-encryption system (HRES) based on the Paillier algorithm as a basic cryptographic algorithm. The cryptographic system comprises the following algorithms: key Generation (KeyGen), encryption (Enc), decryption (Dec), Dual Key encryption (EncTK), Using sk A Partial decryption (PDec1) is performed using sk B Partial decryption (PDec2), re-encryption first stage (FPRE) and re-encryption second Stage (SPRE) are performed.
(1) Key generation (KeyGen): a security parameter k and two large prime numbers p, q are given to satisfyThen, N ═ pq is calculated and one maximum order generator g is selected. The public/private key pair of user i isWherein s is i ∈ R [1,λ(N 2 )](λ (·) denotes the Euler function). Furthermore, assume that two entities a and B possess a public/private key pair (pk), respectively A =g a modN 2 ,sk A A) and (pk) B =g b modN 2 ,sk B B), the public key obtained after the two carry out Diffie-Hellman negotiation isThe corresponding joint decryption private keys are a and b, respectively. In the system, the PK is used as a global public key of the system. The parameters g and N are published.
(2) Encryption (Enc): the algorithm makes the message m belong to Z N And the public key pk i As input, randomly selectAnd calculating to obtain ciphertext
(3) Decryption (Dec): using the private key sk i For ciphertextAnd (3) decryption:wherein l (u) ═ 1/N.
(4) Double key encryption (EncTK): to avoid processing operations between ciphertexts (based on different public keys), the system global public key PK is chosen instead of the user own public key PK i To encrypt the message. Similar to the Enc algorithm, a given plaintext message m ∈ Z N Obtaining a ciphertext [ [ m ]]] PK =(T,T′)={PK r (1+m·N),g r }(modN 2 ). For simplicity of expression, [ [ m ] will be used herein]] PK Is uniformly and briefly expressed as [ [ m ]]]。
(5) Using sk A Partial decryption (PDec 1): input [ [ m ]]]And sk A A, the first stage of partial decryption:
(6) using sk B Partial decryption (PDec 2): inputting partially decrypted ciphertextAnd sk B The second stage of partial decryption is performed, so as to obtain the plaintext information m:
m=L(T (1) /T′ (2) modN 2 )
(7) re-encryption first stage (FPRE): given ciphertext [ [ m ]]]Private key sk A And a user public key pk j And executing the first-stage re-encryption calculation:
(8) re-encryption second Stage (SPRE): given partial re-encryption ciphertext [ m [ [ m ]]] + Private key sk B And a user public key pk j Performing a second stage of re-encryption calculation to obtain a plaintext m based on the public key pk j Corresponding cipher text of
Cipher textThe private key sk can only be used by the user j j And executes the Dec algorithm for decryption.
Further, HRES satisfies the following characteristics:
1.3 privacy protection protocol
The present invention utilizes the following protocol as a basic privacy protection algorithm. Using []] PK Representing the ciphertext encrypted by the global public key PK, for ease of representation, the invention will be described in the following description]] PK UnifyIs expressed as [ ·]]. It is worth mentioning that, since HRES only supports computations in the non-negative integer domain, in order to make the system compatible with fractional or negative operations, the system performs the following pre-processing: one is that the system uniformly multiplies the plaintext data by a certain expansion factor and takes the whole as an operation value. For example, when setting the scale factor L to 10 5 The original fraction 0.003343 will be converted to 334. Secondly, the plaintext input field of the known HRES is Z N Therefore, the scheme utilizes (0, N/2)]Data in the range represents positive numbers and data in the (N/2, N) range represents negative numbers. Given [ [ X ]]]Secure Scaling-down Protocol (SSDown) output [ [ X/L ]]](ii) a Secure Reciprocal Protocol (SRec) output [ [1/X ]]]. Given [ [ X ]]]And [ [ Y ]]]Secure Multiplication Protocol (SMul) output [ [ X · Y ]]](ii) a Secure Division Protocol (SDiv) output [ [ X/Y ]]]](ii) a Greater than or Equal to the Secure compare Protocol (SGE) output [ [ u ] u]]←SGE([[X]],[[Y]]) When X is more than or equal to Y, u is 1; when X < Y, u is 0.
1.4 improved ciphertext computing protocol
The invention provides three Ciphertext computing protocols, including a Secure exclusive OR Protocol (SXor), a Secure Ciphertext-Plaintext equivalence Test Protocol (see Ciphertext-Plaintext equivalence Test Protocol, see), and a Secure Natural Log Protocol (SNLog).
1.4.1 secure XOR protocol
Secure xor protocol SXor pairs two encrypted numbers [ m [ [ m ] 1 ]]And [ [ m ] 2 ]](m 1 ,m 2 E {0,1}) to obtain an encrypted bit XOR result [ u ]]]. If m 1 =m 2 And u is 0; otherwise, u is 1. The protocol description is shown in algorithm 1.
1.4.2 secure ciphertext-plaintext equality test protocol
The secure ciphertext-plaintext equality test protocol tests the equality relationship between ciphertext [ m ] and plaintext k (where m is a real number between [0,1] and k is 0). If u is 1, it means that m is 0; if u is 0, it indicates that m is not equal to 0. The protocol description is shown in algorithm 2.
1.4.3 secure Natural logarithm protocol
The secure logarithm protocol inputs the ciphertext [ x ] and outputs the result of the encrypted natural logarithm operation [ ln (x) ]. In particular, since the system involves logarithmic operation input values in the range of (0, 1), the input values of the protocol [ [ x ] ] only satisfy x ∈ (0,1 ].
2. System flow
2.1 System overview
The system of the invention consists of the following three stages (fig. 2). Privacy preserving weak classifier training, privacy preserving traadaboost training, and privacy preserving traadaboost prediction, respectively. The private data involved in the algorithm (e.g., training data set, classifier model, request data, prediction result, or intermediate calculation result) cannot be obtained by other entities, whether in the training or prediction phase.
Privacy preserving weak classifier training. In each round of iterative training of TrAdaboost, the algorithm trains the weak classifiers according to the current source and target data sets (and their sample weight distributions). To protect privacy, the training process should be performed on the encrypted domain. Many security base classifier training schemes based on homomorphic encryption algorithms (especially additive homomorphism) exist, such as Support Vector Machine (SVM) training or Logistic Regression (LR) training. Therefore, the invention will not give extra security algorithm for weak classifier training. After the weak classifier training is completed, the algorithm returns the encrypted model parameters for subsequent calculation of secure TrAdaboost training.
Privacy preserving tradabost training. In each round of secure TrAdaboost training, the algorithm first trains the base classifier using the joint training samples (including the source and target datasets) and their weight distributions and calculates their weighted prediction errors. The algorithm then updates the misclassified sample weights for the next iteration (in a privacy-preserving manner). After each round of algorithm training, the trained encrypted weak classifier model and its influence factors are saved, wherein the base classifier from the second half of iterative training will be used for the final TrAdaboost prediction.
Privacy preserving tradabost prediction. After receiving the encrypted samples from the requesting user, the CP and CSP perform privacy-preserving traadaboost predictive computation in a cooperative interactive manner. The final data classification result depends on a weighted combination of weak classifier predictors. And finally, the cloud server returns the re-encrypted predicted value to the requesting user. The plaintext value of the prediction result can only be decrypted by the requesting user using the corresponding private key.
2.2, System initialization
The initialization task of the system is performed by the KGC, including generating parameters for the cryptographic system and distributing the public/private key pair to all entities in the system. In addition, the system global public key is generated by the negotiation between the CP and the CSP. The detailed description is as follows:
(1) KGC executes the KeyGen algorithm to generate the encrypted system parameters, e.g., N and g, for HRES. At the same time, KGC generates respective public/private key pairs for all entities in the system. Specifically, the method comprises the following steps: KGC distributes key pairs (pk) for SDO, TDO, CP and CSP, respectively sdo ,sk sdo )、(pk tdo ,sk tdo )、(pk cp ,sk cp ) And (pk) csp ,sk csp ). In addition, the KGC provides a key store for storing public/private key pairs of the requesting user (i.e., the KGC stores a public key and a private key pair of the requesting userWherein n is user May be optionally specified) and assign a failure to each registered userA used key pair. The keystore will be updated by the KGC at idle time or when needed.
(2) CP and CSP exchange their respective public keys and negotiate out a Diffie-Hellman keyThe PK is then published as the system's global public key to the SDO, TDO, and subsequent requesting users RUs. As can be seen from the characteristics of HRES, PK encrypted based messages can only be decrypted by CP and CSP jointly to recover the plaintext.
2.3 secure TrAdaboost training phase
As shown in fig. 3, the training phase of the secure TrAdaboost is an R-round iterative process. The algorithm first performs preparation and preprocessing work. Subsequently, each round of TrAdaboost iterative training consists of four sub-blocks, including sample weight vector normalization, prediction error calculation, weight adjustment rate calculation and sample weight update. Algorithm 4 gives the algorithm flow of the secure TrAdaboost training scheme.
S1, algorithm preparation and preprocessing
First, the SDO and TDO submit respective sets of encrypted training samples to the CP side. Suppose that SDO owns the source data set D S ={(x 1 ,y 1 ),...,(x n ,y n ) }, TDO having a target data set D T ={(x n+1 ,y n+1 ),...,(x n+m ,y n+m )}. At the SDO and TDO ends, they first multiply each value contained in the feature vector in their dataset and the corresponding tag value by a scaling factor L (L is specified by the system). In particular, executionAndwhere 1 ≦ i ≦ n + m and d represents the size of the feature vector. After encryption with the system global public key PK, SDO and TDO send respective encrypted training sets to CP, i.e., [ D ] S ]] PK ={([[x 1 ]] PK ,[[y 1 ]] PK ),...,([[x n ]] PK ,[[y n ]] PK ) And [ [ D ] T ]] PK ={([[x n+1 ]] PK ,[[y n+1 ]] PK ),...,([[x n+m ]] PK ,[[y n+m ]] PK )}. In particular, [ [ x ] i ]] PK =([[x i1 ]] PK ,[[x i2 ]] PK ,...,[[x id ]] PK ) Wherein i is more than or equal to 1 and less than or equal to n + m. For simplicity of illustration, PK will be omitted in the following description, i.e., [. cndot. ]]] PK Is expressed as [ ·]]. In addition, the sizes of the source and target datasets (i.e., n and m) are also sent to the CP store, respectively. Upon receiving [ [ D ] S ]]And [ [ D ] T ]]The CP then merges them as a joint training data set: [ [ D ]]]={([[x 1 ,y 1 ]]),...,([[x n+m ,y n+m ]])}。
Subsequently, the CP initializes the sample weights of the joint training data set (the initialization strategy may be specified by TDO). Assuming initial weight valuesDetermined by the size of the source and target data sets, respectively. The CP sets the initial weight of the source sample to (1/n) and the initial weight of the target sample to (1/m). Then, an encrypted sample weight vector is calculated
S2 sample weight vector normalization
In the t-th iteration training, the algorithm obtains an encrypted normalized sample weight vector p t . First, the CP computes the sum of all weight values over the ciphertext domain:
then, obtain [ [ p ] by calling the SDiv algorithm n + m times t ]]:
S3, calculating prediction error
Suppose that the weak classifier of the t-th round is trained [ [ h ] t ]]It is applied to the encrypted sample [ [ x ]]]Is expressed as [ [ h ] t (x)]]And satisfy h t (x) E {0,1 }. The algorithm aims at calculating [ [ h ] t ]]At D T Weighted prediction error of (1). Knowing | h t (x i )-y i I and { h | t (x i ),y i The xor between them results in equality. Therefore, first calculate | h using the Sxor algorithm t (x i )-y i L (since in the sample weight update phase, it needs to use [ [ h ] t ]]Error values on source and target samples, so in this calculation, the encrypted prediction error is calculated for the source and target samples and the result is stored at the CP end):
next, the prediction error rate [ ∈ of encryption is calculated by ciphertext multiplication and addition operations t ]]:
S4, calculating weight adjustment rate
The weight adjustment rate controls the degree of sample weight update. The adjustment rate β of the source training sample weights is constant in each iteration. Therefore, the value of β only needs to be calculated once during the entire traadaboost training process (and since the operands involved in the calculation are public, it is calculated in the clear text domain):
In calculating D T Before the sample weight adjustment rate (i.e. beta) t =∈ t /(1-∈ t )=-1+1/(1-∈ t ) The algorithm respectively considers the calculation under three special conditions (based on the epsilon) t Different values of (a). First, a condition e is determined by the following calculation t ≥1/2、∈ t 1 or e t Whether or not 0 is satisfied (where e ∈ t Encrypted). Wherein, if ex 1 1, then indicates the condition e t 1/2 is equal to or greater than; otherwise, it indicates ∈ t < 1/2. If ex 2 1, then indicates the condition e t 0 is satisfied; if ex 3 1, then indicates the condition e t 1 is satisfied.
[[ex 1 ]]=SGE([[∈ t ]],[[1/2]]);
[[ex 2 ]]=SETest([[∈ t ]]);
[[ex 3 ]]=SETest([[1]]·[[∈ t ]] N-1 )=SETest([[1-∈ t )
Next, the algorithm calculates [ [ beta ] t ]]Is taken (has been avoided)The calculation time denominator may be 0).
Discussion of the related ArtDifferent value results of (a): if e is t When 1, there is ex 2 1 andthus, it is possible to provideIf e is t Not equal to 1, then there is ex 2 0 andthus, it is possible to provideSuppose when e is t ≧ 1/2 (or ∈ t =1、∈ t When is equal to 0), will be beta t Set directly to constant c 1 (or c) 2 、c 3 ) E.g. c 1 0.5 (or c) 2 =0.4、c 3 0.99). Finally, [ [ beta ] t ]]This can be calculated as follows:
[[S′ 3 ]]=SMul([[1]]·[[ex 1 ]] N-1 ,[[1]]·[[ex 2 ]] N-1 );
[[S″ 3 ]]=SMul([[S′ 3 ]],[[1]]·[[ex 3 ]] N-1 );
s5 sample weight update
It is known that the strategy of updating the weight values of the source data sample and the target data sample is different. In addition, only whenWhen misclassified, its corresponding sample weightNeed to be updated. Ciphertext [ | h) due to prediction error t (x i )-y i |]]And i is more than or equal to 1 and less than or equal to n + m is calculated, so that the algorithm only needs to test [ | h ] on the encrypted domain by calling SETest protocol t (x i )-y i |]]Whether or not it is equal to 0:
[[s]]=SETest([[|h t (x i )-y i |]])
the sample weight vector is then updated by the following strategy:
2.4 secure TrAdaboost prediction phase
Fig. 4 depicts an algorithm flow diagram of the privacy preserving TrAdaboost prediction phase. When the user (called RU) i ) When requesting a class label for its unlabeled sample x (x from the target sample space), he/she first completes the registration on the system and obtains a unique publicKey/private key pairAnd a global public key PK of the system. To prevent leakage of its privacy sample data, the user RU i Encrypting a request sample with PK to [ x [ [ x ]]]And transmits the data packetTo the CP end. Received from RU i After requesting data, CP combines with CSP encrypted weak classifiers { [ [ h ] t ]]And its influence factor pair [ [ x ]]]Performing privacy preserving TrAdaboost prediction, whereinFor deployment, CP and CSP are first in [ [ x ]]]Performing encrypted weak classifier computation alternately and obtainingSubsequently, the cloud server calculates two decision parameters (i.e., [ left ]]]And [ [ right ]]]) Wherein:
next, the cloud server runs the SGE protocol to compare [ [ left ]]]And [ [ right ]]]. If right ≧ left, the final classification result of the model [ [ h ] f (x)]]Is set as [ [ 1]]](ii) a Otherwise, will [ [ h ] f (x)]]Is set to [ [0 ]]]. In the RU for returning the prediction result to the user i Previously, CP and CSP were RU-based i Of (2)Re-encryption of classification result [ [ h ] f (x)]]. Receiving the result of the re-encryptionThereafter, requesting user RU i Using its own private keyAnd recovering the plaintext of the result. Due to addition of RU i Other entities cannot acquire the private keyThe prediction result is thus kept secret. The pseudo code for the secure tradoboost prediction algorithm is given in algorithm 5.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.
Claims (6)
1. A secure migration learning system based on homomorphic encryption, comprising: the system comprises a key generation center KGC, a cloud platform CP, a cloud service provider CSP, a source data owner SDO, a target data owner TDO and a request user RUs;
a key generation center KGC responsible for initializing cryptographic system parameters and distributing public/private key pairs of system entities;
a cloud platform CP responsible for receiving and storing training data from SDO and TDO and prediction request data from RUs, and performing partial computation of the system;
the CSP interacts with the CP and provides computing service for a migration learning algorithm for protecting privacy; in addition, the CP and CSP jointly perform decryption and re-encryption operations;
a source data owner SDO, the SDO owning the tagged sample instance from the source domain, sending the encrypted data set to the CP as a source training data set of the system;
target data owner TDO, TDO having tagged sample instances and untagged sample instances from target domains, the source and target domains involved in the system are distributed differently but have correlation; the TDO sends the encrypted data set to a CP to serve as a target training data set of the system, and the CP combines the encrypted source training data set and the target training data set to serve as a joint training data set;
and the request user RUs sends the encrypted unmarked sample from the target domain to the CP after the CP and the CSP finish the construction of the TrAdaboost classifier for the target space, requests related prediction calculation, and the encrypted prediction result returned from the CP can only be decrypted by the corresponding request user.
2. The secure migration learning system based on homomorphic encryption as claimed in claim 1, wherein the key generation center KGC, the initialization cryptosystem parameters and the public/private key pair of the distribution system entity are implemented as follows:
(1) KGC generates system parameters N and g of a Paillier algorithm-based homomorphic re-encryption system HRES, and KGC generates respective public key/private key pairs for all entities in the system, specifically: KGC distributes key pairs (pk) for SDO, TDO, CP and CSP, respectively sdo ,sk sdo )、(pk tdo ,sk tdo )、(pk cp ,sk cp ) And (pk) csp ,sk csp ) (ii) a In addition, the KGC sets up a key repository to store the requesting user's public/private key pairsWherein n is user A total number of system users is assigned, and an unused key pair is distributed to each registered user; the key bank is updated by KGC in idle time or when needed;
(2) CP and CSP exchange their respective public keys and negotiate out a Diffie-Hellman keySubsequently, the PK is used as a global public key of the system to be disclosed to the SDO, the TDO and the subsequent RUs of the requesting user; as can be seen from the characteristics of the HRES,PK encrypted based messages can only be decrypted by CP and CSP jointly to recover the plaintext.
3. The secure migration learning system based on homomorphic encryption according to claim 2, wherein the HRES comprises the following algorithms:
key generation KeyGen algorithm: a safety parameter k and two large prime numbers p and q are given to satisfy(symbol)A bit length of expression; then, N ═ pq is calculated and a group generator g is selected and the order of g is ord (g) ═ p-1 (q-1)/2; user i's public/private key pair ofWherein s is i ∈ R [1,λ(N 2 )]λ (·) denotes the euler function; furthermore, assume that two entities a and B possess a public/private key pair (pk) respectively A =g a mod N 2 ,sk A A) and (pk) B =g b mod N 2 ,sk B B), the public key obtained after the two carry out Diffie-Hellman negotiation isThe corresponding joint decryption private keys are a and b respectively; taking PK as a global public key of the system in the system; the parameters g and N are published;
the encryption Enc algorithm: joining message m to Z N And the public key pk i As input, randomly selectZ N Is an integer set {0,1, …, N-1},is an integer set {1, …, N-1}, and a ciphertext is obtained by calculationWherein T and T' are respectively a first element and a second element of the ciphertext;
decryption Dec algorithm: using the private key sk i For ciphertextAnd (3) decryption:wherein l (u) ═ 1/N;
the EncTK algorithm with double keys is as follows: the system global public key PK is selected to encrypt the message, similar to the Enc algorithm, given a plaintext message m ∈ Z N Obtaining a ciphertextFor the sake of simplifying the expression, willIs uniformly and schematically represented as
Using sk A Partial decryption PDec1 algorithm: input the methodAnd sk A A, the first stage of partial decryption:
using sk B Partial decryption PDec2 algorithm: inputting partially decrypted ciphertextAnd sk B B, the second stage of partial decryption is performed, so as to obtain the plaintext information m:
m=L(T (1) /T′ (2) mod N 2 )
re-encryption first stage FPRE algorithm: given ciphertextPrivate key sk A And a user public key pk j And executing the first-stage re-encryption calculation:
re-encryption second stage SPRE algorithm: given partial re-encrypted ciphertextPrivate key sk B And a user public key pk j Performing a second stage of re-encryption calculation to obtain a plaintext m based on the public key pk j Corresponding cipher text of
4. The secure migration learning system based on homomorphic encryption according to claim 3, wherein the HRES characteristics of the homomorphic re-encryption system based on Paillier algorithm are as follows:
5. The homomorphic encryption-based secure migration learning system according to claim 2, wherein the training process of the TrAdaboost classifier is as follows:
the training of the tradoost is an R-round iterative process, firstly preparation and preprocessing are carried out, and then each round of the tradoost iterative training consists of four sub-blocks, including sample weight vector normalization, prediction error calculation, weight adjustment rate calculation and sample weight update, which are specifically as follows:
s1, algorithm preparation and preprocessing
First, SDO and TDO submit respective encrypted data sets to CP, assuming that SDO owns source training data set D S ={(x 1 ,y 1 ),...,(x n ,y n ) }, TDO has a target training data set D T ={(x n+1 ,y n+1 ),...,(x n+m ,y n+m ) }; in SDO and TDO, each value and corresponding tag value included in the feature vector in the dataset is first multiplied by a scaling factor L, specifically: executeAndwherein 1 ≦ i ≦ n + m and d represents the magnitude of the feature vector; after encryption with the system global public key PK, SDO and TDO send respective encrypted data sets to CP, i.e.And wherein i is more than or equal to 1 and less than or equal to n + m; for simplicity of presentation, willIs shown asIn addition, the sizes of the source training data set and the target training data set, namely n and m, are also respectively sent to the CP for storage; upon receivingAndthe CP then merges them as a joint training data set:
subsequently, the CP initializes sample weights for the joint training data set, assuming initial weight valuesRespectively determining the sizes of a source training data set and a target training data set, setting the initial weight of a source sample to be (1/n) and the initial weight of a target sample to be (1/m) by the CP; then, an encrypted sample weight vector is calculated
S2 sample weight vector normalization
In the t-th round of iterative training, an encrypted normalized sample weight vector p is obtained t (ii) a First, the CP computes the sum of all weight values over the ciphertext domain:
S3, calculating prediction error
Suppose that the weak classifier of the t-th round has been trainedApply it to the encrypted sampleIs expressed asAnd satisfy h t (x) E {0,1 }; the algorithm aims at calculatingAt D T Weighted prediction error of (1); knowing | h t (x i )-y i I and { h | t (x i ),y i The XOR between them results in equality; therefore, first calculate | h using the secure XOR protocol Sxor t (x i )-y i L, |; due to the need to use in the sample weight update phaseError values on the source and target samples, so in this calculation, the encrypted prediction error is calculated for the source and target samples and the result is stored at the CP: the implementation mode of the secure exclusive-or protocol Sxor is as follows: for two encrypted numbersAndm 1 ,m 2 e.g. 0,1, to realize the ciphertext XOR operation to obtain the encrypted bit XOR resultIf m 1 =m 2 And u is 0; otherwise, u is 1;
next, the prediction error rate of the encryption is calculated by the secure multiplication protocol SMul and the addition operation
S4, calculating weight adjustment rate
The weight adjustment rate controls the degree of updating of the sample weights; the adjustment rate beta of the weights of the source training samples is constant in each iteration; therefore, the value of β only needs to be calculated once during the whole training process of traadaboost and since the operands involved in the calculation are all public, it can be calculated in the plaintext domain:
in calculating D T Before the sample weight adjustment rate; i.e. beta t =∈ t /(1-∈ t )=-1+1/(1-∈ t ) The algorithm is based on e t The different values of (A) are calculated under three special conditions respectively; first, a condition e is determined by the following calculation t ≥1/2、∈ t 1 or e t Whether or not 0 is satisfied; wherein, if ex 1 1, then indicates the condition e t 1/2 is equal to or greater than; otherwise, it indicates ∈ t <1/2; if ex 2 1, then indicates the condition e t 0 is satisfied; if ex 3 1, then indicates the condition e t 1 is satisfied;
SGE is greater than or equal to the safety comparison protocol, SETest is the safety ciphertext-plaintext equal test protocol; the secure ciphertext-plaintext equivalence test protocol SETest is implemented in the following modes: for ciphertextAnd the equality relationship between k in the plain text, where m is [0,1]]Real number in between and k is 0; if u-1, indicates that m-0; if u is 0, it indicates that m is not equal to 0;
SRec is a safety reciprocal protocol;
discussion of the related ArtDifferent value results of (a): if e is t When 1, there is ex 2 1 andthus, it is possible to provideIf e is equal to t Not equal to 1, then there is ex 2 0 andthus, it is possible to provideSuppose when e is t ≧ 1/2 or ∈ t =1、∈ t When equal to 0, the beta is t Set directly to constant c 1 Or c 2 、c 3 (ii) a Finally, the process is carried out in a batch,this can be calculated as follows:
s5, sample weight updating
It is known that the strategy of updating the weight values of the source data sample and the target data sample is different; in addition, only whenWhen misclassified, its corresponding sample weightsNeed to be updated; ciphertext due to prediction errorHas been calculated, so the algorithm only needs to test on the encrypted domain by calling SETest protocolWhether or not it is equal to 0:
the sample weight vector is then updated by the following strategy:
for i=1,...,n,
for i=n+1,...,n+m,
6. the homomorphic encryption-based secure migration learning system according to claim 5, wherein the TrAdaboost classifier implements encryption prediction as follows:
RU when requesting user i When a request is made to obtain a classification label of an unlabeled sample x from a target sample space, registration is first completed on the system and a unique public/private key pair is obtainedAnd a global public key PK of the system; requesting user RU to prevent leakage of its privacy sample data i Encrypting request samples into using PKAnd transmits the data packetFeeding the CP; received from RU i After requesting data, CP and CSP are combined to encrypt weak classifierAnd its influence factor pairPerforming privacy preserving TrAdaboost prediction, whereinFor deployment, the CP and CSP are first inPerforming encrypted weak classifier computation alternately and obtainingPerforming a weighted prediction result calculation:
SMul is a safe multiplication protocol, and SNLog is a safe natural logarithm protocol; the implementation mode of the safe natural logarithm cooperation SNLog is as follows: input ciphertextAnd outputs the result of the encrypted natural logarithm operationThe input value range of the logarithmic operation related to the system is (0, 1)]Thus the input value of the protocolSatisfying only x ∈ (0, 1)]Then the method is finished;
next, the cloud server runs a greater than or equal security comparison protocol SGE to compareAndif right ≧ left, the final classification result isIs arranged asOtherwise, it willIs arranged asIn RU returning prediction result to requesting user i Previously, CP and CSP were RU-based i Of (2) a public keyRe-encrypting the classification resultReceiving the result of the re-encryptionThereafter, requesting user RU i Using its own private keyThe plaintext of the result is recovered.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110134461.8A CN112822005B (en) | 2021-02-01 | 2021-02-01 | Secure transfer learning system based on homomorphic encryption |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110134461.8A CN112822005B (en) | 2021-02-01 | 2021-02-01 | Secure transfer learning system based on homomorphic encryption |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112822005A CN112822005A (en) | 2021-05-18 |
CN112822005B true CN112822005B (en) | 2022-08-12 |
Family
ID=75860845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110134461.8A Active CN112822005B (en) | 2021-02-01 | 2021-02-01 | Secure transfer learning system based on homomorphic encryption |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112822005B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113032838B (en) * | 2021-05-24 | 2021-10-29 | 易商征信有限公司 | Label prediction model generation method, prediction method, model generation device, system and medium based on privacy calculation |
CN113421122A (en) * | 2021-06-25 | 2021-09-21 | 创络(上海)数据科技有限公司 | First-purchase user refined loss prediction method under improved transfer learning framework |
CN113472805B (en) * | 2021-07-14 | 2022-11-18 | 中国银行股份有限公司 | Model training method and device, storage medium and electronic equipment |
CN113938266B (en) * | 2021-09-18 | 2024-03-26 | 桂林电子科技大学 | Junk mail filter training method and system based on integer vector homomorphic encryption |
CN113783898B (en) * | 2021-11-12 | 2022-06-10 | 湖南大学 | Renewable hybrid encryption method |
CN114219306B (en) * | 2021-12-16 | 2022-11-15 | 蕴硕物联技术(上海)有限公司 | Method, apparatus, medium for establishing welding quality detection model |
CN114915399A (en) * | 2022-05-11 | 2022-08-16 | 国网福建省电力有限公司 | Energy big data security system based on homomorphic encryption |
CN115051816B (en) * | 2022-08-17 | 2022-11-08 | 北京锘崴信息科技有限公司 | Privacy protection-based cloud computing method and device and financial data cloud computing method and device |
CN116402505B (en) * | 2023-05-11 | 2023-09-01 | 蓝象智联(杭州)科技有限公司 | Homomorphic encryption-based graph diffusion method, homomorphic encryption-based graph diffusion device and storage medium |
CN117290659B (en) * | 2023-11-24 | 2024-04-02 | 华信咨询设计研究院有限公司 | Data tracing method based on regression analysis |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108259158A (en) * | 2018-01-11 | 2018-07-06 | 西安电子科技大学 | Efficient and secret protection individual layer perceptron learning method under a kind of cloud computing environment |
CN109255444A (en) * | 2018-08-10 | 2019-01-22 | 深圳前海微众银行股份有限公司 | Federal modeling method, equipment and readable storage medium storing program for executing based on transfer learning |
CN110008717A (en) * | 2019-02-26 | 2019-07-12 | 东北大学 | Support the decision tree classification service system and method for secret protection |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103036884B (en) * | 2012-12-14 | 2015-09-16 | 中国科学院上海微系统与信息技术研究所 | A kind of data guard method based on homomorphic cryptography and system |
CN105488422B (en) * | 2015-11-19 | 2019-01-11 | 上海交通大学 | Editing distance computing system based on homomorphic cryptography private data guard |
-
2021
- 2021-02-01 CN CN202110134461.8A patent/CN112822005B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108259158A (en) * | 2018-01-11 | 2018-07-06 | 西安电子科技大学 | Efficient and secret protection individual layer perceptron learning method under a kind of cloud computing environment |
CN109255444A (en) * | 2018-08-10 | 2019-01-22 | 深圳前海微众银行股份有限公司 | Federal modeling method, equipment and readable storage medium storing program for executing based on transfer learning |
CN110008717A (en) * | 2019-02-26 | 2019-07-12 | 东北大学 | Support the decision tree classification service system and method for secret protection |
Also Published As
Publication number | Publication date |
---|---|
CN112822005A (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112822005B (en) | Secure transfer learning system based on homomorphic encryption | |
Ding et al. | Encrypted data processing with homomorphic re-encryption | |
Xu et al. | Privacy-preserving federated deep learning with irregular users | |
Mandal et al. | PrivFL: Practical privacy-preserving federated regressions on high-dimensional data over mobile networks | |
CN108712260B (en) | Multi-party deep learning computing agent method for protecting privacy in cloud environment | |
Li et al. | Outsourced privacy-preserving classification service over encrypted data | |
Liu et al. | An efficient privacy-preserving outsourced calculation toolkit with multiple keys | |
US11165558B2 (en) | Secured computing | |
González-Serrano et al. | Training support vector machines with privacy-protected data | |
US20160020898A1 (en) | Privacy-preserving ridge regression | |
Barbosa et al. | Labeled homomorphic encryption: scalable and privacy-preserving processing of outsourced data | |
CN109992979A (en) | A kind of ridge regression training method calculates equipment, medium | |
CN111581648B (en) | Method of federal learning to preserve privacy in irregular users | |
CN113434898A (en) | Non-interactive privacy protection logistic regression federal training method and system | |
Wang et al. | Privacy preserving computations over healthcare data | |
Zhang et al. | Privacy-preserving multikey computing framework for encrypted data in the cloud | |
Zhu et al. | Enhanced federated learning for edge data security in intelligent transportation systems | |
Liu et al. | DHSA: efficient doubly homomorphic secure aggregation for cross-silo federated learning | |
CN111159727B (en) | Multi-party cooperation oriented Bayes classifier safety generation system and method | |
Zhao et al. | A privacy preserving homomorphic computing toolkit for predictive computation | |
Chen et al. | Cryptanalysis and improvement of DeepPAR: Privacy-preserving and asynchronous deep learning for industrial IoT | |
Liu et al. | Efficient and Privacy-Preserving Logistic Regression Scheme based on Leveled Fully Homomorphic Encryption | |
Wang et al. | DPP: Data Privacy-Preserving for Cloud Computing based on Homomorphic Encryption | |
CN113114454B (en) | Efficient privacy outsourcing k-means clustering method | |
Wang et al. | Secure outsourced calculations with homomorphic encryption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |