CN116667996A - Verifiable federal learning method based on mixed homomorphic encryption - Google Patents
Verifiable federal learning method based on mixed homomorphic encryption Download PDFInfo
- Publication number
- CN116667996A CN116667996A CN202310620541.3A CN202310620541A CN116667996A CN 116667996 A CN116667996 A CN 116667996A CN 202310620541 A CN202310620541 A CN 202310620541A CN 116667996 A CN116667996 A CN 116667996A
- Authority
- CN
- China
- Prior art keywords
- model
- aggregation
- encryption
- pasta
- ciphertext
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000002776 aggregation Effects 0.000 claims abstract description 94
- 238000004220 aggregation Methods 0.000 claims abstract description 94
- 238000012795 verification Methods 0.000 claims abstract description 47
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000004364 calculation method Methods 0.000 claims abstract description 28
- 230000008569 process Effects 0.000 claims abstract description 13
- 235000015927 pasta Nutrition 0.000 claims description 50
- 230000006870 function Effects 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 15
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 238000013507 mapping Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000013139 quantization Methods 0.000 claims description 9
- 238000006116 polymerization reaction Methods 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 4
- 238000007667 floating Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 abstract description 6
- 230000008901 benefit Effects 0.000 abstract description 4
- 230000002829 reductive effect Effects 0.000 abstract description 4
- 230000007547 defect Effects 0.000 abstract description 2
- 238000012360 testing method Methods 0.000 abstract description 2
- 238000010801 machine learning Methods 0.000 description 6
- 238000007792 addition Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005242 forging Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/098—Distributed learning, e.g. federated learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0435—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0816—Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
- H04L9/0838—Key agreement, i.e. key establishment technique in which a shared key is derived by parties as a function of information contributed by, or associated with, each of these
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a verifiable federation learning method based on mixed homomorphic encryption, which is characterized in that on a client-server infrastructure of centralized federation learning, data are encrypted by adopting a mixed homomorphic encryption technology supporting SIMD operation, the advantages of simple calculation and no ciphertext expansion of symmetric encryption are mobilized while data privacy and correct aggregation are ensured, and the defects of complex calculation and serious ciphertext expansion of a public key homomorphic encryption scheme are overcome; and constructing the verification code by using a Lagrange interpolation method, and supporting the client to test the aggregation result. The method specifically comprises the following steps: system initialization, model training and data encryption, aggregation result verification, model updating and the like. Compared with the prior art, the method and the device have the advantages that the user gradient confidentiality, the aggregation correctness and the integrity of the aggregation result in the federal learning aggregation process are ensured, meanwhile, the public key encryption times are reduced, the calculation and communication burden is reduced, and the efficiency is improved greatly.
Description
Technical Field
The invention relates to the field of homomorphic encryption of cryptography and the field of privacy protection of machine learning, in particular to a verifiable federal learning method based on mixed homomorphic encryption.
Background
With the advent of the fourth industrial revolution, new generation artificial intelligence, big data and other technologies bring opportunities for intelligent transformation development of the traditional industry. Machine learning is one of the main methods of big data analysis in recent years, and covers various intelligent algorithms such as deep neural networks and regression algorithms, and is successfully applied to various fields such as medical sanitation, automatic driving, finance, industrial manufacturing and the like. High quality machine learning models typically require a huge amount of reliable data as a training support, so the performance of artificial intelligence based services is greatly affected by the quality of the training data, and large-scale data collection is of paramount importance. However, the limited data and poor quality in most industries often lead to problems of overfitting, unreliability and the like of a trained machine learning model, which are insufficient to support the implementation of artificial intelligence technology. An intuitive solution idea is to share data and fuse the data among different industries and different enterprises in the same industry. In traditional centralized training, the server requires all participants to upload their local data to the cloud. The server then initializes the deep neural network on the cloud and trains it using training samples until optimal parameters are obtained. Finally, the cloud server returns the release forecast service interface or the optimal parameters to each participant. This centralized training approach introduces a series of data privacy and security issues: the user's local data may contain sensitive private information, for example, in a healthcare system, the patient may be reluctant to share their medical data with a third party service provider (e.g., a cloud server). Furthermore, there is often a hard-to-break barrier between different data sources, in most industries the data is in the form of islands. It is difficult to integrate data scattered in various industries, various areas directly or at great cost and with privacy risks.
To solve the problem of "data islands", the concept of federal learning is proposed: in the course of machine learning, each participant can perform joint modeling by means of other party data. And all the parties do not need to share data resources, namely, under the condition that the data does not go out of the local area, data joint training is carried out, and a shared machine learning model is established. Federal learning can be categorized into centralized federal learning and decentralized federal learning according to the architecture of the communication system. Centralized federal learning is based on a client-server architecture, and decentralized federal learning is based on an end-to-end network architecture. Compared with the decentralized federal learning, the centralized federal learning is generally simple and effective, and the characteristics of expansibility and stability also lead the application of the centralized federal learning to be wider. In a centralized federal learning scheme, each participant trains a local model locally using private data. In each round of model update, the user uploads the update parameters (gradients) of the local model to the aggregation server. And the aggregation server aggregates the update parameters of all the users and returns the aggregation value to each user for model update. However, studies have shown that adversaries can still indirectly obtain sensitive information based on shared gradients. Thus, the research branch of federal learning of privacy protection is extended.
In order to protect the privacy of participants in federal learning, many strategies have been proposed, with the use of more widely used secure multi-party computing, differential privacy and homomorphic encryption techniques. Secure multiparty computing allows multiple parties to cooperatively compute a contract function with private data in a way that is unknown to each other except for input and output. The principle of differential privacy is to ensure the privacy of each individual sample in a dataset by injecting noise-obscuring sensitive information into the dataset so that a third party cannot distinguish individuals. Homomorphic encryption allows certain computations to be performed directly on ciphertext without decrypting them first. Compared with multiparty security calculation and differential privacy methods, homomorphic encryption technology has stronger privacy guarantee and has little influence on the accuracy of training models. However, since the common homomorphic encryption techniques belong to the public key cryptosystem, they have long keys and require complex mathematical calculations, which can cause expensive calculation overhead and ciphertext expansion problems. Thus, the development of federal learning systems based on homomorphic encryption techniques is limited by the bottleneck of high computational and communication overhead. In the field of cryptography research, recent scholars have proposed the concept of hybrid homomorphic encryption, fusing symmetric encryption with public key homomorphic encryption, and a feasible example supporting SIMD (Single Instruction Multiple Data ) operations: the fusion use of a Pasta (a symmetrical stream encryption algorithm cluster) symmetrical cipher and a BFV (isotactic encryption scheme based on RLWE (Ring-Learning With Errors, on-loop fault tolerance learning) difficult) homomorphic cipher (hereinafter referred to as a "pasta+bfv" scheme). The characteristic of high efficiency of symmetric cipher calculation is complementary with homomorphic encryption, thereby providing a new idea for federal learning scheme of privacy protection.
In addition, data integrity issues exist in federal learning. The aggregation server exists as a third party and is prone to single point failure problems. Without integrity assurance, once the server is compromised, an adversary controlling the server may manipulate the global model. Malicious servers can reverse analyze the participants 'private data by forging the aggregated results or destroy the user's local model, resulting in misclassified results.
In summary, protecting the privacy and data integrity of users is two fundamental issues in the joint learning training process. In addition, in the federal learning scheme based on homomorphic encryption, how to realize stronger privacy guarantee to protect data safety, reduce the number of public key encryption, reduce calculation and communication burden, and balance between safety and efficiency is a difficult problem to be solved. Therefore, it is urgent and meaningful to design a safe and efficient verifiable federal learning method.
Disclosure of Invention
In order to solve the problems of privacy security, data integrity and efficiency in the federal learning field described in the background art, the invention aims to provide a verifiable federal learning method based on mixed homomorphic encryption. The method is suitable for a centralized federation learning scene under a client-server architecture, and enables all parties to perform data joint training by means of the aggregation server under the condition that source data does not go out of the local area, so that the problem of 'data island' is solved. Aiming at the problems of privacy safety and efficiency, the invention uses the mixed homomorphic encryption technology of 'pasta+BFV' supporting SIMD, introduces the idea of 'symmetrical tentative homomorphism', transfers complex calculation from a client to a server to reduce the burden of a user, and relieves the communication pressure caused by ciphertext expansion; correct aggregation is performed under the condition of current encryption based on BFV homomorphism encryption so as to protect user source data privacy; based on SIMD packaging, multiple data are encrypted at one time, and encryption times are reduced. Aiming at the problem of data integrity, a Lagrange interpolation method is used for constructing the verification code, and the client is supported to test the aggregation result.
The specific technical scheme for realizing the aim of the invention is as follows:
a verifiable federal learning method based on hybrid homomorphic encryption, comprising the entities: a key generation mechanism PKG (Public Key Generator), n clients, and an aggregation server; the federation learning method is used for a federation learning scene under a client-server architecture and is characterized by comprising the following specific steps:
step A: initialization of
The n clients negotiate according to the service requirement, and the agreement is achieved by the training model; the PKG generates an initialized global model, a secret key and public parameters and distributes the initialized global model, the secret key and the public parameters to the client and the aggregation server according to requirements;
and (B) step (B): model training and data encryption
In each round of model updating, each client trains a local model by using local data, and calculates model updating parameters of the round of iteration; then, preprocessing the updated parameters, constructing verification codes and forming a plaintext; finally, performing Pasta symmetric encryption on the plaintext, and transmitting the Pasta symmetric ciphertext to the aggregation server;
step C: polymerization
After receiving the Pasta symmetric ciphertext from all clients, the aggregation server firstly carries out ciphertext conversion to obtain BFV homomorphic ciphertext; then, all BFV homomorphic ciphertexts are aggregated, and an aggregation result is sent to all clients;
step D: aggregation result verification and model update
After receiving the aggregation result from the aggregation server, each client performs homomorphic decryption; then, checking the aggregation result by means of a Lagrange interpolation method, if the verification is passed, updating a model by using the aggregation result, otherwise, discarding the aggregation result; the next iteration is then entered until the model converges or the maximum federal number of rounds is reached.
The step A specifically comprises the following steps:
a1: model initialization
The individual clients firstly reach a contract for a training model according to a training target, and PKG generates an initialization global model, a learning rate, gradient quantization precision, a mapping finite field and a maximum federal round number according to the contract and distributes the initialization global model, the learning rate, the gradient quantization precision, the mapping finite field and the maximum federal round number to all the clients as initial local models;
a2: key initialization
The PKG generates a Pasta encryption scheme, a BFV encryption scheme and a security parameter lambda; then generating a Pasta key for all clients and distributing the Pasta key to each client; then, generating a group of public BFV keys, wherein the BFV keys comprise public keys, private keys and calculation keys, the public keys and the calculation keys are disclosed to all the participants, namely n clients and an aggregation server, the private keys are shared to all the clients, and the private keys are kept secret from the aggregation server; then, the Pasta keys of all clients are encrypted by using BFV public keys in sequence to form a user list and then sent to the aggregation server;
a3: public parameter initialization
The PKG generates a sequence of parameters for lagrangian interpolation and sends it to all clients.
The step B specifically comprises the following steps:
b1: model training
Each client trains a local model by using a local private data set, and calculates a loss function of the training and a gradient for updating;
b2: data preprocessing
To encrypt the gradient, the gradient is preprocessed to convert it into a form suitable for "pasta+bfv" mixed homomorphic encryption; firstly, carrying out quantization conversion on floating point number gradients to form integers, and then mapping the quantized gradients onto a finite field to adapt to an encryption algorithm; finally, grouping all data to be encrypted according to threshold parameters of SIMD operation supported by Pasta;
b3: construction verification code
For each preprocessed gradient group, constructing a verification code by adopting a Lagrange interpolation method;
b4: encryption
Constructing a once-encrypted plaintext vector according to a preprocessed gradient packet and a verification code thereof; the plaintext vector is encrypted using a Pasta cipher and the Pasta symmetric ciphertext is sent to the aggregation server.
The step C specifically comprises the following steps:
c1: ciphertext conversion
After receiving the Pasta symmetric ciphertext from the client, the aggregation server firstly retrieves the Pasta key encrypted by the user through BFV from a user list from PKG received in the initialization process; then, according to the full homomorphism of the BFV password, homomorphically decrypting the symmetric ciphertext to convert the symmetric ciphertext into a BFV homomorphism ciphertext;
c2: polymerization
After all the received Pasta symmetric ciphertexts are converted into BFV homomorphic ciphertexts, the aggregation server aggregates the BFV homomorphic ciphertexts, and then the aggregation result is sent to each client.
The step D specifically comprises the following steps:
d1: verification
After receiving the aggregation result from the aggregation server, the client decrypts the aggregation result to divide an aggregated gradient packet and an aggregation verification code; then constructing a check code for the aggregated gradient group by using a Lagrangian interpolation method, and verifying whether the check code is equal to the aggregated check code or not; if the two types of data are equal, the verification is successful; otherwise, the verification fails;
d2: updating a model
If the verification is successful, the user processes the aggregated gradient, namely, the inverse operation in the data preprocessing of the model training and data encryption process; updating the model using the restored aggregation gradient; discarding the aggregate value if the verification fails;
d3: and (3) iteratively entering the next training round, namely repeatedly executing the steps B to D until the model converges or the maximum federal round number is reached.
Compared with the existing privacy protection federal learning method, the method has the beneficial effects that:
(1) The invention applies the concept of mixed homomorphic encryption to the federal learning scene, and solves the defects of complex calculation and serious ciphertext expansion of the public key homomorphic encryption scheme by taking advantage of simple calculation of symmetric encryption and no ciphertext expansion. In the existing federal learning method based on homomorphic encryption, a client generally encrypts a gradient directly by using a public key homomorphic encryption scheme, and then transmits homomorphic ciphertext to an aggregation server to calculate an aggregation result. However, the ciphertext generated by public key encryption schemes is typically much longer than plaintext, with the expansion factor being dependent on the security parameters of the encryption scheme. This security parameter needs to be large enough to guarantee the security of the cryptographic scheme. Studies have demonstrated that the amount of data transmitted by performing this type of federal learning method increases by more than 150 times over that without encryption. Furthermore, implementation of public key homomorphic encryption schemes requires complex cryptographic operations (e.g., modular exponentiations and exponentiations) to be performed, which can place significant computational stress on some computationally limited clients. In the invention, the client encrypts the gradient by using a symmetric encryption algorithm, and then transmits the symmetric ciphertext to the aggregation server to calculate an aggregation result. Note that the symmetric encryption scheme is simple to calculate and the plaintext lengths are equal, i.e., the ciphertext expansion factor is 1. The method transfers the calculation burden from the client with limited calculation power to the aggregation server, reduces the calculation overhead of the client, and simultaneously relieves the communication pressure caused by the ciphertext expansion problem of public key encryption.
(2) The mixed homomorphic encryption scheme used by the invention is a 'pasta+BFV' scheme, and the scheme supports the SIMD idea. BFV homomorphic encryption scheme supports polynomial packing, i.e., encoding a plaintext vector into a polynomial, and transcoding the vectorIs polynomial encryption. Further, homomorphic operations on ciphertext are equivalent to inter-element operations on vectors. In order to extend the packaging advantage of BFV in a hybrid homomorphic encryption scheme, the present invention selects a Pasta symmetric encryption scheme that supports the SIMD concept. A Pasta encryption scheme with SIMD operation threshold can be converted into a BFV homomorphic ciphertext once after encrypting a plaintext. Compared with encrypting only one plaintext at a time and performing ciphertext conversion one by one, the encryption and decryption times of the client and the ciphertext conversion times of the aggregation server are reduced toThe calculation efficiency is improved.
(3) The invention uses Lagrange interpolation method to realize the verification of the aggregation result. Not only is the data integrity ensured, but also the calculation is simpler and more efficient compared with the common method using homomorphic hash and based on linear pairing.
Drawings
FIG. 1 is a schematic diagram of the present invention;
FIG. 2 is a flow chart of the present invention;
FIG. 3 is a schematic diagram of a gradient grouping method in the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and drawings. The procedures, conditions, experimental methods, etc. for carrying out the present invention are common knowledge and common knowledge in the art, except for the following specific matters, and the present invention is not limited in any way. It should be noted that like reference numerals and letters refer to like items in the drawings, and thus once an item is defined in one drawing, no further definition or explanation thereof is necessary in subsequent drawings.
Term interpretation:
(1) Symmetric cipher (Symmetric Key Encryption, SKE): depending on whether the keys used for encryption and decryption are identical, the cryptosystem is divided into a symmetric cryptosystem and an asymmetric cryptosystem, which is also called a public key cryptosystem. The symmetric cipher is a cipher system for encrypting and decrypting with the same key. Symmetric cryptography uses keys and encryption algorithms to change plaintext into ciphertext. The plaintext may be recovered from the ciphertext using the same key and decryption algorithm.
A symmetric encryption scheme SKE consists of three probabilistic polynomial-time algorithms:
SKE=(SKE.keyGen,SKE.Enc.SKE.Dec)。
wherein ske. Keygen is a key generation algorithm, ske. Enc is an encryption algorithm, ske. Dec is a decryption algorithm.
(2) Homomorphic encryption (Homomorphic Encryption, HE): homomorphic encryption allows certain computations to be performed directly on ciphertext without decrypting them first. Common homomorphic encryption schemes all belong to the public key cryptosystem. Record given plaintext m 1 ,m 2 The encrypted ciphertexts are respectivelyThe homomorphism of homomorphic encryption appears as:
addition homomorphism:
multiplication homomorphism:
according to the operation type and number supported by homomorphic encryption, the method can be divided into partial homomorphic encryption, xu Tongtai encryption and homomorphic encryption. Partial homomorphic encryption only supports the homomorphism of addition or multiplication operations. Some Xu Tongtai encryption supports limited number of additions and multiplications; homomorphic encryption supports homomorphism of arbitrary computation on ciphertext, and does not limit the number of computations. The BFV encryption scheme adopted by the invention is an isomorphic encryption scheme.
A public key homomorphic encryption scheme HE consists of four probabilistic polynomial time algorithms:
HE=(HE.keyGen,HE.Enc,HE.Dec,HE.Eval)。
the he.keygen is a key generation algorithm, which outputs a set of keys of a public key homomorphic scheme, including a public key, a private key and a calculation key. Enc and he.dec are encryption and decryption algorithms, respectively. Eval is homomorphic evaluation algorithm, the calculation key, the target calculation function and the ciphertext are input, and the result output by the algorithm is equal to the value obtained by directly using the target calculation function to calculate the original plaintext after decryption.
(3) Hybrid homomorphic encryption (Hybrid Homomorphic Encryption, HHE): considering that common homomorphic encryption technologies all belong to a public key cryptosystem, have long keys and require complex mathematical computation, thereby causing expensive calculation overhead and ciphertext expansion problems. Researchers have proposed the concept of hybrid homomorphic encryption. The main idea is as follows: and encrypting data by using a symmetrical cipher scheme with the ciphertext expansion factor of 1 to replace homomorphic encryption with a larger expansion factor, encrypting a key of the symmetrical encryption scheme by using the homomorphic cipher, and transmitting the key and the symmetrical ciphertext to a cloud service provider. Then, the cloud service provider firstly homomorphically executes the symmetric decryption circuit to convert the symmetric ciphertext into homomorphic ciphertext, and then continues to execute the required computing operation. The definition of the mixed homomorphic password is led out from a public key homomorphic encryption scheme and a symmetric password scheme:
based on a public key homomorphic encryption scheme HE and a symmetric encryption scheme SKE, a hybrid homomorphic encryption scheme HHE can be constructed, which consists of five probabilistic polynomial time algorithms:
HHE=(HHE.keyHen,HHE.Enc,HHE.Decomp,HHE.Eval,HHE.Dec)。
wherein the key generation algorithm hhe. Keygen calls he. Keygen and ske. Keygen; the encryption algorithm HHE.enc calls SKE.enc to encrypt the plaintext, and calls HE.enc to encrypt the symmetric key; the ciphertext conversion algorithm HHE.Decomp calls the HE.rval, and homomorphism executes a decryption circuit of the symmetric cipher to convert the symmetric ciphertext into homomorphic ciphertext; the homomorphism assessment algorithm hhw.eval and the decryption algorithm hhe.dec call he.eval and he.dec directly, respectively.
The finite field, also called the Galois field, is a field containing only a limited number of elements, F q Representing a finite field containing q elements. Homomorphic encryption schemes applied to federal learning scenarios are generally implemented by finite field F q To input plaintext space, many popular homomorphic encryption algorithms (e.g., BFV)Support SIMD operations: a plaintext vector is encoded into a polynomial, and the encryption of the vector is converted into polynomial encryption, so homomorphic operation on ciphertext is equivalent to inter-element operation on vector. To maintain this feature in a hybrid homomorphic encryption scheme, many scholars have studied symmetric encryption schemes that meet the corresponding requirements and have achieved some results, such as Pasta. Therefore, in order to rely on SIMD to obtain further efficiency improvement, the invention chooses a Pasta symmetric cryptographic scheme and a BFV homomorphic encryption scheme to construct a hybrid homomorphic encryption scheme, wherein Pasta isSymmetric stream encryption algorithm cluster on +.>Is finite field F q The dimension vector space above, q, is generally chosen to satisfy 2 (16) <q<2 (60) Is the threshold for SIMD operations, i.e., bulk conversion. From the given q and security level requirements, the appropriate can be calculated. In the "pasta+bfv" encryption scheme thus constructed, the finite field F is encrypted q The plaintext can be converted into a BFV homomorphic ciphertext at one time.
It should be noted that: the above description about the algorithm composition of the hybrid homomorphic encryption scheme is only for illustrating the construction manner and execution process of the hybrid homomorphic encryption scheme, and in the above definition, the hybrid homomorphic encryption algorithm is accompanied by the operations of homomorphic encryption symmetric passwords each time the plaintext is encrypted, and in practical application, these operations can be simplified and split. Therefore, in the following description of the embodiments of the present invention, the formal definition of the hybrid homomorphic encryption scheme is not directly used, but the terms of base homomorphic encryption and symmetric encryption are used instead based on the idea of hybrid homomorphic encryption.
(4) Lagrangian interpolation: the lagrangian interpolation method is a polynomial interpolation method. Given +1 points of different coordinates, lagrangian interpolation may give a polynomial function of order that passes exactly through the +1 points. The calculation thinking is that firstly, a node basis function is calculated on a given node, and then linear combination of the basis functions is carried out, so that an interpolation polynomial with a combination coefficient as a node function value is obtained.
The specific calculation process is described as follows:
given +1 different interpolation points x i (i=0, 1, …, n.), and the corresponding values f (x i ). First, an interpolation basis function is calculated:
obviously, l i (x) Is also an n-th order polynomial and satisfies
Then, the basis functions are linearly combined:
thus, an n-th order polynomial L is obtained n (x) It obviously satisfies L n (n i )=f(x i ) I.e. the lagrangian interpolation polynomial passing through n +1 given interpolation points.
Examples
Referring to fig. 1, the present invention employs a centralized federal learning architecture based on a client-server architecture, comprising three types of entities: a key generation facility PKG, n clients, and an aggregation server. The key generation mechanism is responsible for parameter initialization and key distribution, and does not participate in the subsequent process after the initialization task is completed. Each client P i Both, (+ ∈n, n= {1,2, …, N }) have a private data set D i ={<x j ,y j >I j = 1,2, …, T }, where x j Is input, y j Is a label, t= |d i The i indicates the size of the dataset. Using the data set, the client locally trains a local model f (x, M), where x is the input and M is the model parameter, with the aim of obtaining model parameters that minimize the loss function, i.e., arrivalTo model convergence. Thus, in each round of model update, P i Select D i Is then locally modeled and then the gradient W of the loss function is calculated i . To accelerate model convergence, to remedy the problem of insufficient local data, each client does not directly use local gradient W i Updates are made by taking aggregated values of local gradients and other client gradients, i.e., global gradientsAnd updating. Then, each client encrypts the local gradient and adds a verification code to obtain ciphertext c i And uploading, and requesting an aggregation server to aggregate the local gradients of all the clients. And after receiving the ciphertext from all the clients, the aggregation server aggregates the ciphertext to obtain C and sends the C back to each client. After receiving the aggregation ciphertext, each client unpacks the ciphertext to obtain an aggregation verification code and a global gradient W globPl . According to the verification result, the client decides whether to use W globPl Updating the model, then entering the next iteration until the model converges or reaches the contracted maximum federal round number.
Referring to fig. 2, the invention provides a verifiable federal learning method based on mixed homomorphic encryption, which comprises the following steps:
step A: initialization of
The n clients negotiate according to the service requirement, and the agreement is achieved for the training model; the PKG generates an initialized global model, a secret key and public parameters and distributes the initialized global model, the secret key and the public parameters to the client and the aggregation server according to requirements;
and (B) step (B): model training and data encryption
In each round of model updating, each client trains a local model by using local data, and calculates model updating parameters of the round of iteration; then, preprocessing the updated parameters, constructing verification codes and forming a plaintext; finally, performing Pasta symmetric encryption on the plaintext, and transmitting the Pasta symmetric ciphertext to the aggregation server;
step C: polymerization
After receiving the Pasta symmetric ciphertext from all clients, the aggregation server firstly carries out ciphertext conversion to obtain BFV homomorphic ciphertext; then, all BFV homomorphic ciphertexts are aggregated, and an aggregation result is sent to all clients;
step D: aggregation result verification and model update
After receiving the aggregation result from the aggregation server, each client performs homomorphic decryption; then, checking the aggregation result by means of a Lagrange interpolation method, if the verification is passed, updating a model by using the aggregation result, otherwise, discarding the aggregation result; and then, the next iteration is carried out, namely, the steps B to D are repeatedly executed until the model converges or the maximum federal round number is reached.
The step A specifically comprises the following steps:
step A1: and initializing a model. All clients reach conventions for the training model according to the training targets, and PKG generates an initialized global model d (x, M) according to the conventions, learning rate eta and quantization accuracy l of gradients w Mapping finite field F q Maximum federal number of rounds of model r max And distributed to all clients as an initial model, i.e. the local model used in the first round of model updating.
Step A2: and initializing a key. PKG according to finite Domain F q And security level requirements calculate SIMD operation threshold t, generateThe above Pasta encryption scheme, BFV encryption scheme, and security parameters. Then, for each client P i Generating a peer-to-peer client P j (j ε N, j+.i) and aggregation server secret Pasta key k i . Next, a public BFV key (pk, sk, evk) is generated, wherein the public key pk and the computation key evk are public to all participants (n clients and the aggregation server), the private key sk is shared to all clients and kept secret from the aggregation server. Next, the Pasta key of each client is encrypted using BFV public key pk in turn, i.e., for client P i i.epsilon.N, calculate +.>Constructing a user listAnd sending the message to an aggregation server.
Step A3: and initializing common parameters. PKG generates a parameter sequence { a } for Lagrange interpolation 1 ,a 2 ,…,a t And send it to all clients. Where t is the SIMD operation threshold of the Pasta encryption scheme.
The step B specifically comprises the following steps:
step B1: and (5) model training. In each round of model update, the client P i Randomly selecting a subset among the local private data setsTraining a local model to calculate the loss function of the training round> For updating gradient->Wherein (1)>Is a gradient operator. Recording device Wherein n is gn =|W i I is W i Is a length of (c).
Step B2: and (5) preprocessing data. To encrypt the gradient, the gradient is preprocessed to convert it into a form suitable for "pasta+bfv" mixed homomorphic encryption. Specifically, first, for a floating point number vector gradient W i Quantization into integer vectors, and mapping the quantized gradient to finite field F q Obtained by the methodThe specific calculation process is as follows: for each component of the gradient vector +.>Calculate->Obtaining Wherein ( round And ψ (·) are the quantization function and the finite field mapping function, respectively:
represents a maximum integer less than or equal to x.
Finally, referring to fig. 3, all data to be encrypted are grouped according to threshold parameters for Pasta-supported SIMD operations: the number of consecutive-1 components is a group, and when the number of the last component is less than-1, zero is used for complement, so that the total number of the groups isWherein->Represents a maximum integer greater than or equal to x. The j-th packet is marked as +.> Then->
Step B3: and constructing a verification code. For each preprocessed gradient packet A verification code is constructed by adopting a Lagrange interpolation method, and the specific process is as follows: according to-1 interpolation points->Solving a Lagrangian polynomial of the t-2 order>Then substituting to obtain verification code->
Step B4: encryption. For each packetAnd its corresponding verification code ++>A plaintext vector can be constructed>Encrypting the plaintext vector using a Pasta cipher to obtain a symmetric ciphertext And sends all the Pasta symmetric ciphertext to the aggregation server.
The step C specifically comprises the following steps:
step C1: ciphertext conversion. The aggregation server receives the Pasta symmetric ciphertext from the client Thereafter, for each client P i From the user list from PKG received in step A2 +.>The user retrieves the Pasta key encrypted via BFV +.>Then homomorphically decrypting the symmetric ciphertext according to the isomorphism of the BFV password to convert it into BFV homomorphic ciphertext +.>
Step C2: and (3) polymerization. After all the received Pasta symmetric ciphertext is converted into BFV ciphertext, the aggregation server aggregates the BFV ciphertextWhere h is the aggregation function. Then the polymerization result (C (j) ) HE ,j∈N group And sending the data to each client.
The step D specifically comprises the following steps:
step D1: and (5) verification. After receiving the aggregation result from the aggregation server, the client decrypts the aggregation result m (j) =BFV.Dec((R (j) ) HE ),j∈N group According to m (j) =G (j) ||v (j) Deriving an aggregated gradient packet G (j) And aggregate verification code v (j) . Then grouping the aggregated gradientsThe check code is constructed using lagrangian interpolation: i.e. for each j e N group From t-1 interpolation pointsConstruction of Lagrangian polynomials (L) t-2 (x)) (j) Then substitutes for a t Obtaining verification code (L) t-2 (a t )) (j) . Then, the verification code (L t-2 (a t ) (j) Whether to aggregate verification code v (j) Equal. If the two types of data are equal, the verification is successful; otherwise, the verification fails.
Step D2: updating the model. If the verification is successful, an aggregation gradient can be obtained according to all aggregation groups Depending on the addition homomorphism of BFV, it can be demonstrated that +.> I.e. the correctness of the polymerization process. The client then resumes the aggregated gradient, i.e. for each component +.>Performing the inverse operation in the data preprocessing step B2 +.>Obtaining a restored gradient->Each client then updates the model with the restored aggregate gradientIf the verification fails, the aggregate value is discarded.
Step D3: iterating to enter the next training round, namely repeatedly executing the steps B1 to D3 until the model converges or the maximum federal round number r is reached max 。
In the embodiment, client P * Taking + ∈n, n= {1,2, …, N }) as an example, the related operation of the client in the flow of the present invention is specifically described, and attention should be paid toIs P i Not limited to a particular client, but rather operates equally on behalf of all clients.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (5)
1. A verifiable federal learning method based on hybrid homomorphic encryption, comprising the entities: a key generation mechanism PKG, a personal client and an aggregation server; the federation learning scenario for use in a client-server architecture, the method comprising the specific steps of:
step A: initialization of
The individual clients negotiate according to the service requirements, and the agreement is achieved by the training model; the PKG generates an initialized global model, a secret key and public parameters and distributes the initialized global model, the secret key and the public parameters to the client and the aggregation server according to requirements;
and (B) step (B): model training and data encryption
In each round of model updating, each client trains a local model by using local data, and calculates model updating parameters of the round of iteration; then, preprocessing the updated parameters, constructing verification codes and forming a plaintext; finally, encrypting the plaintext by using a symmetrical encryption scheme Pasta to obtain a Pasta symmetrical ciphertext, and transmitting the ciphertext to the aggregation server;
step C: polymerization
After receiving the Pasta symmetric ciphertext from all clients, the aggregation server firstly uses the full homomorphic encryption scheme BFV to carry out ciphertext conversion to obtain BFV homomorphic ciphertext; then, all BFV homomorphic ciphertexts are aggregated, and an aggregation result is sent to all clients;
step D: aggregation result verification and model update
After receiving the aggregation result from the aggregation server, each client performs homomorphic decryption; then, checking the aggregation result by means of a Lagrange interpolation method, if the verification is passed, updating a model by using the aggregation result, otherwise, discarding the aggregation result; the next iteration is then entered until the model converges or the maximum federal number of rounds is reached.
2. The verifiable federal learning method based on hybrid homomorphic encryption according to claim 1, wherein said step a specifically comprises:
a1: model initialization
The individual clients firstly reach a contract for a training model according to a training target, and PKG generates an initialization global model, a learning rate, gradient quantization precision, a mapping finite field and a maximum federal round number according to the contract and distributes the initialization global model, the learning rate, the gradient quantization precision, the mapping finite field and the maximum federal round number to all the clients as initial local models;
a2: key initialization
PKG generates a Pasta encryption scheme, a BFV encryption scheme and security parameters; then generating a Pasta key for all clients and distributing the Pasta key to each client; then, generating a group of public BFV keys, wherein the BFV keys comprise public keys, private keys and calculation keys, the public keys and the calculation keys are disclosed to all the participants, namely, individual clients and an aggregation server, the private keys are shared to all the clients, and the private keys are kept secret from the aggregation server; then, the Pasta keys of all clients are encrypted by using BFV public keys in sequence to form a user list and then sent to the aggregation server;
a3: public parameter initialization
The PKG generates a sequence of parameters for lagrangian interpolation and sends it to all clients.
3. The verifiable federal learning method based on hybrid homomorphic encryption according to claim 1, wherein said step B specifically comprises:
b1: model training
In each round of model updating, each client trains a local model by using a local private data set, and calculates a loss function of the round of training and a gradient for updating;
b2: data preprocessing
To encrypt the gradient, the gradient is preprocessed to convert it into a form suitable for "pasta+bfv" mixed homomorphic encryption; firstly, carrying out quantization conversion on floating point number gradients to form integers, and then mapping the quantized gradients onto a finite field to adapt to an encryption algorithm; finally, grouping all data to be encrypted according to threshold parameters of batch conversion operation supported by Pasta;
b3: construction verification code
For each preprocessed gradient group, constructing a verification code by adopting a Lagrange interpolation method;
b4: encryption
Constructing a once-encrypted plaintext vector according to a preprocessed gradient packet and a verification code thereof; the plaintext vector is encrypted using a Pasta cipher and the Pasta symmetric ciphertext is sent to the aggregation server.
4. The verifiable federal learning method based on hybrid homomorphic encryption according to claim 1, wherein said step C specifically comprises:
c1: ciphertext conversion
After receiving the Pasta symmetric ciphertext from the client, the aggregation server firstly retrieves the Pasta key encrypted by the user through BFV from a user list from PKG received in the initialization process; then, according to the full homomorphism of the BFV password, homomorphically decrypting the symmetric ciphertext to convert the symmetric ciphertext into a BFV homomorphism ciphertext;
c2: polymerization
After all the received Pasta symmetric ciphertexts are converted into BFV homomorphic ciphertexts, the aggregation server aggregates the BFV homomorphic ciphertexts, and then the aggregation result is sent to each client.
5. The verifiable federal learning method based on hybrid homomorphic encryption according to claim 1, wherein said step D comprises:
d1: verification
After receiving the aggregation result from the aggregation server, the client decrypts the aggregation result to divide an aggregated gradient packet and an aggregation verification code; then constructing a check code for the aggregated gradient group by using a Lagrangian interpolation method, and verifying whether the check code is equal to the aggregated check code or not; if the two types of data are equal, the verification is successful; otherwise, the verification fails;
d2: updating a model
If the verification is successful, the user processes the aggregated gradient, namely, the inverse operation in the data preprocessing of the model training and data encryption process; updating the model using the restored aggregation gradient; discarding the aggregate value if the verification fails;
d3: and (3) iteratively entering the next training round, namely repeatedly executing the steps B to D until the model converges or the maximum federal round number is reached.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310620541.3A CN116667996A (en) | 2023-05-30 | 2023-05-30 | Verifiable federal learning method based on mixed homomorphic encryption |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310620541.3A CN116667996A (en) | 2023-05-30 | 2023-05-30 | Verifiable federal learning method based on mixed homomorphic encryption |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116667996A true CN116667996A (en) | 2023-08-29 |
Family
ID=87709182
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310620541.3A Pending CN116667996A (en) | 2023-05-30 | 2023-05-30 | Verifiable federal learning method based on mixed homomorphic encryption |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116667996A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117196017A (en) * | 2023-09-28 | 2023-12-08 | 数力聚(北京)科技有限公司 | Federal learning method, system, equipment and medium for lightweight privacy protection and integrity verification |
CN117560229A (en) * | 2024-01-11 | 2024-02-13 | 吉林大学 | Federal non-intrusive load monitoring user verification method |
CN117811722A (en) * | 2024-03-01 | 2024-04-02 | 山东云海国创云计算装备产业创新中心有限公司 | Global parameter model construction method, secret key generation method, device and server |
TWI846601B (en) * | 2023-09-19 | 2024-06-21 | 英業達股份有限公司 | Operating system and method for a fully homomorphic encryption neural network model |
-
2023
- 2023-05-30 CN CN202310620541.3A patent/CN116667996A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI846601B (en) * | 2023-09-19 | 2024-06-21 | 英業達股份有限公司 | Operating system and method for a fully homomorphic encryption neural network model |
CN117196017A (en) * | 2023-09-28 | 2023-12-08 | 数力聚(北京)科技有限公司 | Federal learning method, system, equipment and medium for lightweight privacy protection and integrity verification |
CN117560229A (en) * | 2024-01-11 | 2024-02-13 | 吉林大学 | Federal non-intrusive load monitoring user verification method |
CN117560229B (en) * | 2024-01-11 | 2024-04-05 | 吉林大学 | Federal non-intrusive load monitoring user verification method |
CN117811722A (en) * | 2024-03-01 | 2024-04-02 | 山东云海国创云计算装备产业创新中心有限公司 | Global parameter model construction method, secret key generation method, device and server |
CN117811722B (en) * | 2024-03-01 | 2024-05-24 | 山东云海国创云计算装备产业创新中心有限公司 | Global parameter model construction method, secret key generation method, device and server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mouchet et al. | Multiparty homomorphic encryption from ring-learning-with-errors | |
US11374736B2 (en) | System and method for homomorphic encryption | |
Ion et al. | Private intersection-sum protocol with applications to attributing aggregate ad conversions | |
CN116667996A (en) | Verifiable federal learning method based on mixed homomorphic encryption | |
CN112543187B (en) | Industrial Internet of things safety data sharing method based on edge block chain | |
Fu et al. | DIPOR: An IDA-based dynamic proof of retrievability scheme for cloud storage systems | |
CN112380578A (en) | Edge computing framework based on block chain and trusted execution environment | |
CN110414981B (en) | Homomorphic encryption method supporting ZKPs and blockchain transaction amount encryption method | |
CN112906044A (en) | Multi-party security calculation method, device, equipment and storage medium | |
Hassan et al. | An efficient outsourced privacy preserving machine learning scheme with public verifiability | |
Mou et al. | A verifiable federated learning scheme based on secure multi-party computation | |
CN108400862A (en) | A kind of intelligent power trusted end-user data fusion encryption method | |
Shen et al. | ABNN2: secure two-party arbitrary-bitwidth quantized neural network predictions | |
CN115065463A (en) | Neural network prediction system for privacy protection | |
An et al. | QChain: Quantum-resistant and decentralized PKI using blockchain | |
CN112291053B (en) | Lattice and basic access tree based CP-ABE method | |
Liu et al. | ESA-FedGNN: Efficient secure aggregation for federated graph neural networks | |
Zhu et al. | Outsourcing set intersection computation based on bloom filter for privacy preservation in multimedia processing | |
Karakoç et al. | Set-ot: A secure equality testing protocol based on oblivious transfer | |
Ma et al. | Development of video encryption scheme based on quantum controlled dense coding using GHZ state for smart home scenario | |
Asare et al. | A nodal authentication iot data model for heterogeneous connected sensor nodes within a blockchain network | |
CN115906149A (en) | KP-ABE based on directed acyclic graph and user data credible sharing method of block chain | |
CN115549922A (en) | Decentralized federal learning method | |
Gladkov et al. | SNS-Based Secret Sharing Scheme for Security of Smart City Communication Systems | |
Gong et al. | Nearly optimal protocols for computing multi-party private set union |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |