CN115378707A - Adaptive sampling federal learning privacy protection method based on threshold homomorphism - Google Patents
Adaptive sampling federal learning privacy protection method based on threshold homomorphism Download PDFInfo
- Publication number
- CN115378707A CN115378707A CN202211010855.3A CN202211010855A CN115378707A CN 115378707 A CN115378707 A CN 115378707A CN 202211010855 A CN202211010855 A CN 202211010855A CN 115378707 A CN115378707 A CN 115378707A
- Authority
- CN
- China
- Prior art keywords
- client
- gradient vector
- gradient
- federal
- sampling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000005070 sampling Methods 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000003044 adaptive effect Effects 0.000 title claims description 15
- 239000013598 vector Substances 0.000 claims abstract description 113
- 238000004891 communication Methods 0.000 claims abstract description 30
- 230000002776 aggregation Effects 0.000 claims abstract description 10
- 238000004220 aggregation Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 24
- 238000004422 calculation algorithm Methods 0.000 claims description 20
- 238000013527 convolutional neural network Methods 0.000 claims description 17
- 238000011084 recovery Methods 0.000 claims description 14
- 238000006116 polymerization reaction Methods 0.000 claims description 7
- 238000010276 construction Methods 0.000 claims description 3
- 238000009795 derivation Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 2
- 238000007906 compression Methods 0.000 claims description 2
- 230000014759 maintenance of location Effects 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 8
- 230000008901 benefit Effects 0.000 abstract description 3
- 230000004931 aggregating effect Effects 0.000 abstract 1
- 230000004913 activation Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0442—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0816—Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
- H04L9/0819—Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s)
- H04L9/083—Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s) involving central third party, e.g. key distribution center [KDC] or trusted third party [TTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0816—Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
- H04L9/085—Secret sharing or secret splitting, e.g. threshold schemes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a self-adaptive sampling federal learning privacy protection method based on threshold homomorphism, which comprises the following implementation steps of: establishing a federal learning system, initializing parameters of a federal server S, and generating a client key and a client A by a key distribution center KDC n Acquiring encrypted sampling gradient vectors and position bit strings, aggregating the encrypted sampling gradient vectors and sending an aggregation result by the Federal server S, and sending an aggregation result to the client A w Decrypting the encrypted aggregation gradient vector, and recovering a gradient plaintext and acquiring a privacy protection federal learning result by the federal server S; the method can be used for reducing the client communication load in the federal learning process, improving the communication efficiency of federal learning privacy protection and widening the applicable range of federal learning privacy protection by self-adaptive sampling, threshold homomorphic decryption task distribution and gradient recoveryThe method has the advantages of high communication efficiency and wide application range.
Description
Technical Field
The invention belongs to the technical field of computers, and relates to a federated learning privacy protection method, in particular to a threshold homomorphic-based adaptive sampling federated learning privacy protection method which can be used for reducing the communication load of a client in the federated learning process, improving the communication efficiency of federated learning privacy protection and widening the applicable range of federated learning privacy protection.
Background
Federated learning is a distributed machine learning framework that allows multiple clients to collaboratively train a machine learning model without sharing private data. In summary, federal learning can be divided into the following three steps. First, the server initializes and distributes the global model to the clients. Secondly, the client side trains on the local data set by using the global model issued by the server to obtain a gradient, and then uploads the gradient to the server. Finally, the server aggregates all the uploaded gradients to obtain an aggregated gradient, and then updates the global model using the aggregated gradient. And finally training to obtain the trained model by continuously iterating the three steps.
However, research shows that there is a hidden danger of privacy disclosure in the process of uploading the gradient to the server by the federal learning client: the attacker can reverse the data in the client local data set through the plaintext gradient. To address this problem, many scholars have proposed to apply a homomorphic encryption algorithm to encrypt gradient plaintext to protect the privacy of the federated learning client local data set. The federated learning privacy protection based on the homomorphic encryption algorithm is basically consistent with the traditional federated learning training steps, and only needs to send a polymerization gradient ciphertext to a client for decryption after the server completes polymerization, and the client uploads the decrypted plaintext gradient to the server for parameter updating.
The federated learning privacy protection method based on homomorphic encryption has the problems that in practical application, a client is usually deployed in a complex network environment with poor communication conditions and seriously uneven communication resources, so that the communication bandwidth of the client is low and the communication bandwidth difference between the clients is extremely large. In addition, compared with the plaintext gradient, the encryption gradient has the advantage that the requirement of communication bandwidth during uploading is at least doubled, and the decrypted aggregation gradient plaintext still needs to be uploaded to the server again, so that the communication load of the client is further increased, and the communication efficiency of federal learning is seriously reduced.
The university of inner Mongolia discloses a federal learning privacy protection method based on homomorphic encryption in the patent document 'federal learning privacy protection method based on homomorphic encryption' (patent application number: 202110608465.5, application publication number: CN 113434873A). According to the method, after the client calculates the gradient locally in each round of federal study, the noise is added by using the random gradient decrease of differential privacy, the disturbed gradient is encrypted according to a homomorphic encryption mechanism, and the disturbed gradient is sent to a server. And the server generates new ciphertext parameters according to all the received ciphertext gradients and sends the new ciphertext parameters to the client, and finally the client decrypts and updates the model according to the obtained ciphertext parameters. The method has the disadvantages that the client is required to transmit all encryption gradients to the server, so that an overweight communication load is brought to the client, and the communication efficiency is low. For example, using ResNet-9 as a training network in practice, the total traffic load for a single client upload of encryption gradients at a time is above 50 MB. In addition, by applying the method, part of the clients which can not finish encryption gradient transmission in the window time can not participate in federal learning privacy protection, and the applicable range of federal learning is compressed.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a self-adaptive sampling federal learning privacy protection method based on threshold homomorphism, which is used for solving the technical problem that the communication load of a client is too heavy in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) Constructing a federal learning system:
the construction method comprises a federal server S, a key distribution center KDC and N clients A = { A = { (A) 1 ,A 2 ,...,A n ,...,A N The Federal learning System of { each client A }, each client A n Holding an image data set D n ={X n ,Y n In which N is not less than 2,A n Indicating identity information as ID n N < th > client, X n Representing a set of images comprising Z images,Y n represents X n The set of true category labels of the objects contained in (a),
(2) Federal server S initialization parameters:
(2a) The Federal Server S initializes each client A n Convolutional neural network model F n Minimum compressibility is CR min Window time, F n The weight value parameter isAnd CR is min Time and F n Is sent to A n Wherein M represents a model F n The number of the weights, M is more than or equal to 50000,is shown as F n The mth weight of (1);
(2b) The decryption threshold in the Paillier homomorphic encryption algorithm of the (T, N) threshold initialized by the federal server S is T, the number of training rounds is r, and the convolutional neural network model of the training round isThe maximum training round number R is obtained, T and the number N of the clients are sent to a KDC, R =0, wherein T is more than 0 and less than or equal to N, and T is more than CR min ·N,R≥100;
(3) The key distribution center KDC generates a client key:
the KDC of the key distribution center generates each client A according to the T and the N n Private key sk n Then will sk n Is sent to A n And generating a public key pk;
(4) Client A n Obtaining an encrypted sampling gradient vector and a position bit string:
(4a) Client A n Image data set D n As a convolutional neural network model F n The input of the image is transmitted forward to obtain a prediction label set P of all images n Then using a cross entropy loss function, through P n And Y n Calculating F n Loss value L of n Then through L n For the weight parameterCalculating gradient vector elements by partial derivationObtaining a gradient vector
(4b) Client A n Measuring the r round of client A n The communication bandwidth for sending the position bit string and the encrypted sampling gradient vector to the federal server S is B r And calculating the number of sampling gradient elements of the training of the current roundThen according toFor gradient vector G n Performing adaptive sampling to obtain a sample includingSampling gradient vector of sampling gradientReuse of public keyThe set pk is for each sampled gradient vector elementEncrypting to obtain encrypted sampling gradient vectorWhereiny is a gradient vector elementThe size of the occupied computer storage space, E (-) represents the encryption function in the Paillier homomorphic encryption algorithm with the (T, N) threshold,representing the second in the sampled gradient vectorThe number of the elements is one,representing the second in the gradient vector of the encrypted sampleAn element;
(4c) Client A n Initialization and gradient vectorsCorresponding binary bit string I with each bit being 0 n '=<I n1 ,I n2 ,...,I nm ,...,I nM >And make a judgment onWhether or not it is established, namely, determiningWhether or not to be adaptiveIf selected, letCorresponding bit I nm =1, otherwise let I nm =0, obtaining record sample gradient vector elementsIn the gradient vector G n Bit string of binary positions I of intermediate position n =<I n1 ,I n2 ,...,I nm ,...,I nM >;
(4d) Client A n Bit string of position I n And encrypting the sampled gradient vector E (G) n ') to a federal server S;
(5) The federal server S aggregates the encryption sampling gradient vectors and sends an aggregation result:
(5a) Federal Server S Via location bit string I n To client A n Encrypted sampling gradient vector E (G) n ') is developed to obtain Is represented by A n The mth developed sample gradient vector element of (1):
(5b) Federal server S performs gradient vector element on mth developed sample in N clientsPolymerizing to obtain a dense polymerResultant gradient vectorWhereinRepresents the mth encrypted aggregate gradient vector element:
(5c) The federal server S is constructed toDistribution dictionary D for key-value pairs, wherein key-value pairs are pushedThe sizes of the cells are arranged in a descending order; at the same time, a gradient vector E (G) aggregated by T ciphers is constructed a ) Composed distribution list L =<E 1 (G a ),E 2 (G a ),...,E t (G a ),...,E T (G a )>In which E t (G a ) Denotes the t-th E (G) constituting the distribution list L a );
(5d) Federal Server S according to ID in distribution dictionary D n In order of sending the elements in the distribution list L to the client A w The set of clients receiving the distribution list L element is a' = { a = { (a) } 1 ,A 2 ,...,A w ,...,A W And (c) the step of (c) in which,A w representing the W-th client receiving the element of the distribution list L, wherein W is less than or equal to N;
(6) Client A w Decrypting the encrypted aggregate gradient vector:
client A w Using its private key sk w Decrypting the received distribution list L element to obtain a partially decrypted gradient vectorThen D (E (g)) n Sent to a federal server S, where sk w Representing client A w D (-) represents the decryption function of the Paillier homomorphic encryption algorithm with a (T, N) threshold,representing the second in a partially decrypted aggregate gradient vectorAn element;
(7) And recovering the gradient plaintext and acquiring a federal learning privacy protection result by the federal server S:
(7a) Federal Server S connects client A w Transmitted D (E (g)) n According to ID in D n The sequence of (1) is filled into a one-dimensional recovery list L 'with the length of len = M.T, and then the L' is uniformly divided into T sections to obtain T recovery vectors C 1 ,C 2 ,...,C t ,...,C T Wherein T is a decryption threshold T in the Paillier homomorphic encryption algorithm of the (T, N) threshold,which represents the t-th recycle vector and,represents a recycle vector C t The mth element of (1);
(7b) Federal server S uses public key set pk and T recovery vectors C t In a corresponding positionRecovery of aggregated gradient vector elements plaintextObtaining a gradient vector of polymerizationAnd adopts a gradient descent method to carry out the treatment,through g a To F n Updating to obtain the convolutional neural network model of the current round of trainingWherein R (-) represents a combination function of the Paillier homomorphic encryption algorithm of the (T, N) threshold;
(7c) The federal server S judges whether R is larger than or equal to R, if so, the trained convolutional neural network modelOtherwise, it ordersLet r = r +1, and perform step (4).
Compared with the prior art, the invention has the following advantages:
firstly, the gradient vectors of each training turn are subjected to self-adaptive sampling by calculating the number of sampling gradient elements of each training turn, so that a client only needs to upload sampled encrypted gradient data instead of all encrypted gradient data in the process of federal learning privacy protection, the problem of overlarge communication load of the client in the prior art is solved, the communication load of the client in the process of federal learning privacy protection is effectively reduced, and the application range of federal learning privacy protection is widened.
Secondly, the invention uses the threshold homomorphic decryption task distribution and gradient recovery method to ensure that the server distributes the decryption tasks suitable for the client communication bandwidth according to the number of the client sampling gradient elements, thereby overcoming the problem of large client communication load when the client uploads the decryption aggregation gradient to the server in the prior art and further reducing the client communication load in the federal learning process.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of the threshold homomorphic decryption task distribution of the present invention;
Detailed Description
The invention is described in further detail below with reference to the figures and specific embodiments.
Referring to fig. 1, the present invention comprises the steps of:
(1) Constructing a federal learning system:
the construction method comprises a federal server S, a key distribution center KDC and N clients A = { A = { (A) 1 ,A 2 ,...,A n ,...,A N The Federal learning System of, each client A n Holding an image data set D n ={X n ,Y n In which N is not less than 2,A n Indicating identity information as ID n N < th > client, X n Representing a set of images comprising Z images,Y n represents X n The set of true category labels of the objects contained in (a),
in this example, the number of federate learning system clients N =30, z =200, a n Training using the MNIST dataset as an image dataset; the MNIST data set is a gray-scale handwritten digital image data set widely applied to the field of federal learning, and comprises 60000 training image data, wherein each image data is a gray-scale handwritten digital image with the size of 28 multiplied by 28, and a label of the MNIST data set is a real number corresponding to a handwritten number and takes the value of 0-9; in this example, each user A n Respectively hold D n ={X n ,Y n Of MNIST data setI.e. each user a n Respectively hold 200 different real labels Y n Gray scale hand-written digital picture X n ;
(2) Federal server S initialization parameters:
(2a) The federal server S initializes each client a n Convolutional neural network model F n Minimum compressibility is CR min And window time, F n Weight parameter ofIs numbered asAnd CR is min Time and F n Is sent to A n Wherein M represents a model F n The number of the weights, M is more than or equal to 50000,is represented by F n The mth weight of (1);
in this example, the minimum compression ratio CR min =0.2, window time =1s; convolutional neural network model F initialized by federal server n The structure comprises a first convolution layer, an activation function layer, a second convolution layer, an activation function layer, a third convolution layer, an activation function layer and a full-connection layer which are connected in sequence; the number of input channels of the first convolution layer is 3, the number of output channels is 10, and the size of convolution kernel is 5 x 5; the number of input channels of the second convolution layer is 10, the number of output channels is 20, and the convolution kernel size is 5 x 5; the number of input channels of the third convolution layer is 20, the number of output channels is 10, and the convolution kernel size is 5 x 5; the activation function layer adopts a Sigmoid function; the input dimension of the full connection layer is 4000, and the output dimension is 10; the convolution neural network model F n The calculation formula of the weight number M is as follows: m =3 × M cov +3×M sig +M fc Wherein the number M of convolutional layer weights cov The calculation formula is as follows: m cov =size 2 ×C i ×C 0 + C, where size represents the convolution kernel size, C i Indicates the number of input channels, C 0 Representing the number of output channels, and C representing the weight number of the offset items; number M of activation function layer weights sig The calculation formula is as follows: m sig =2×C i Number of full link layer weights M fc The calculation formula is as follows: m fc =E i ×E 0 + E, wherein E i Representing the length of the input vector, E 0 The length of the output vector is represented, E represents the weight number of the bias items, and the convolution neural network global model F in the example is calculated n The number of the weights M =35900; in practical application, a convolution neural network model F with a complex structure n Weight valueThe number can reach millions or even tens of millions.
(2b) The decryption threshold in the Paillier homomorphic encryption algorithm of the (T, N) threshold initialized by the federal server S is T, the number of training rounds is r, and the convolutional neural network model of the r-th round of training isThe maximum training round number R is obtained, T and the number N of the clients are sent to a key distribution center KDC, R =0, wherein T is more than 0 and less than or equal to N, and T is more than CR min ·N,R≥100;
In this example, the federal server S initializes a decryption threshold T =5 and a maximum number of training rounds R =1000 in the Paillier homomorphic encryption algorithm for the (T, N) threshold, where 0 < T ≦ N and T < CR min N; requirement T < CR min The reason for N is that in step (5 d), gradient vectors E (G) are aggregated by T ciphers a ) All elements in the composed distribution list L can be distributed and decrypted, i.e. E (G) a ) Each encryption gradient element in the (T, N) threshold can be decrypted by different clients for T times so as to meet the decryption condition of the Paillier homomorphic encryption algorithm of the (T, N) threshold, and therefore, T.M < CR is required min N.M, i.e. T < CR min ·N;
(3) The key distribution center KDC generates a client key:
the KDC of the key distribution center generates each client A according to the T and the N n Private key sk n Then will sk n Is sent to A n And generating a public key pk;
the secret key distribution center in this example generates the secret key sk n The method for postneutralizing the public key pk comprises the following steps:
firstly, randomly selecting two different strong prime numbers p 'and q' by a key distribution center KDC in a positive integer domain, and calculating p =2p '+1 and q =2q' +1, wherein p and q are required to be different prime numbers, and u = pq and v = p '· q'; then randomly selecting a positive integer s, and enabling the master key to be sk = s · v; then a KDC of a key distribution center randomly selects T random integers a 1 ,a 2 ,...,a t ,...,a T Constructing a polynomial:
f(λ)=a 0 +a 1 λ+a 2 λ 2 +...+a t λ t +...+a T λ T-1 ;
then, a secret key distribution center KDC randomly selects N random integers lambda 1 ,λ 2 ,...,λ n ,...,λ N Sequentially substituting into f (lambda), and calculating to obtain f (lambda) 1 ),f(λ 2 ),...,f(λ n ),...,f(λ N ) (ii) a Subsequent key distribution center KDC calculation sk n =f(λ n ) mod uv to obtain the private key sk of the nth client n (ii) a Finally, two different positive integers a and b are randomly selected, and a public key set pk = { g, u, theta } is calculated, wherein g = (1 + u) a b u modu 2 θ = α sv mod u, ord (b) = α, ord (·) denotes a symbol of an integer order;
(4) Client A n Obtaining an encrypted sampling gradient vector and a position bit string:
(4a) Client A n Image data set D n As a convolutional neural network model F n The input of the image is transmitted forward to obtain a prediction label set P of all images n Then using a cross entropy loss function, through P n And Y n Calculating F n Loss value L of n Then through L n For the weight parameterCalculating gradient vector elements by partial derivationObtaining a gradient vector
(4b) Client A n Measuring the r round of client A n The communication bandwidth for sending the position bit string and the encrypted sampling gradient vector to the federal server S is B r And calculating the number of sampling gradient elements of the training of the current roundThen theAccording toFor gradient vector G n Performing adaptive sampling to obtain a sample includingSampling gradient vector of sampling gradientThen using public key set pk to each sampling gradient vector elementEncrypting to obtain encrypted sampling gradient vectorWhereiny is a gradient vector elementThe size of the occupied computer storage space, E (-) represents the encryption function in the Paillier homomorphic encryption algorithm of the (T, N) threshold,representing the second in the sampled gradient vectorThe number of the elements is one,representing the second in the gradient vector of the encrypted sampleAn element;
the invention uses adaptive sampling, and can reduce the number of encryption gradients to be uploaded by the clientTherefore, the communication load of the client in the federal learning process is effectively reduced, and the communication efficiency of the federal learning privacy protection is improved. At the same time, the number of gradient elements is sampledBy the r round client A n Communication bandwidth B for transmitting position bit string and encrypted sampling gradient vector to federal server S r The client with poor communication condition can be adjusted through self-adaptionThe method ensures that the user can participate in the federal learning privacy protection, and widens the application range of the federal learning privacy protection.
Subsequently, client A n Recording abs (G) n ) Of the largestObtaining a position vector according to the position of the gradient absolute value elementWhereinRepresents abs (G) n ) To middleThe position of the large absolute gradient element;
selecting the largest absolute valueThe reason for the individual gradient elements is: the larger the absolute value of the gradient element is, the larger the contribution to the convergence of the convolutional neural network model is, and on the contrary, the gradient element with the absolute value close to 0 hardly contributes to the convergence of the convolutional neural network model;
then, the client A n Judging whether to useIf not, the method will beSetting to 0 to obtain the gradient vector of the marked samplingWhereinRepresenting client A n Gradient element of mThe position of (a);
finally, client A n Erasure marker sampling gradient vector G ns All of which are 0, to obtain a composition comprisingSampling gradient vector of sampling gradientThen each sampled gradient vector element is subjected to a public key set pkAnd (3) encryption:
deriving an encrypted sampling gradient vectorWhereinr n Random number representing the Nth client selection, E: (N) an encryption function in a Paillier homomorphic encryption algorithm representing a (T, N) threshold,representing the second in the sampled gradient vectorThe number of the elements is one,representing the second in the gradient vector of the encrypted sampleAn element;
(4c) As each client A is required in the process of calculating the aggregation gradient by the server S n Are aligned in bits, and thus, client a n A binary bit string recording the positions of the gradient vector elements is also obtained and sent to the server S. Client A n Initialization and gradient vectorsCorresponding binary bit string I with each bit being 0 n '=<I n1 ,I n2 ,...,I nm ,...,I nM >And make a judgment onWhether it is established or not, i.e. determiningWhether or not it is selected in the adaptive sampling, if so, order itCorresponding bit I nm =1, otherwise let I nm =0, obtaining record sample gradient vector elementsAt gradient vector G n Binary position of the middle positionBit string I n =<I n1 ,I n2 ,...,I nm ,...,I nM >;
(4c) Client A n Bit string of position I n And encrypting the sampled gradient vector E (G) n ') to a federal server S;
(5) The federal server S aggregates the encryption sampling gradient vectors and sends an aggregation result:
(5a) Federal Server S Via location bit string I n To client A n Encrypted sampling gradient vector E (G) n ') is developed to obtain Is represented by A n The mth developed sample gradient vector element of (1):
(5b) Federal server S pairs mth developed sample gradient vector element in N clientsPolymerizing to obtain encrypted polymerization gradient vectorWhereinRepresents the mth encrypted aggregate gradient vector element:
the distribution of the threshold homomorphic decryption task of the federated server S is further described with reference to FIG. 2;
(5c) The federal server S is constructed toDistribution of dictionaries D for key-value pairs, wherein key-value pairs are pressedThe sizes of the two groups are arranged in descending order; at the same time, a gradient vector E (G) aggregated by T ciphers is constructed a ) Composed distribution list L =<E 1 (G a ),E 2 (G a ),...,E t (G a ),...,E T (G a )>In which E t (G a ) Denotes the t-th E (G) constituting the distribution list L a );
In this example, press in dictionary D is distributedDescending order of IDs n The order of distribution of the threshold homomorphic decryption tasks is recorded, and the ID n Corresponding toI.e. ID is the identity information to be sent to n The threshold homomorphic decryption task amount of the client. In one round of federal learning, due to client a n Number of encrypted sampling gradients sent to the Federal Server SIs composed of communication bandwidth B r Calculated and carried the communication bandwidth B r So that the federal server S can refer to the threshold homomorphic decryption task distribution processCompress each client A w Threshold homomorphism to be undertakenDecrypt the task volume and send it to client A w Dispense the appropriate amountUsing the encryption aggregation gradient element as a threshold homomorphic decryption task to enable the client A w Communication load and B when uploading partial decryption aggregate gradient after decryption r And (4) adapting.
(5d) Federal Server S according to ID in distribution dictionary D n In the order of (2), send the elements in the distribution list L to the client A w The set of clients receiving the distribution list L element is a' = { a = { (a) } 1 ,A 2 ,...,A w ,...,A W And (c) the step of (c) in which,A w representing the W-th client receiving the element of the distribution list L, wherein W is less than or equal to N;
(6) Client A w Decrypting the encrypted aggregate gradient vector:
client A w Using its private key sk w Decrypting the received distribution list L element to obtain a partially decrypted gradient vector
Then D (E (g)) n Sent to a federal server S, where N! Denote factorial, sk of N w Representing client A w D (-) represents the decryption function of the Paillier homomorphic encryption algorithm with a (T, N) threshold,representing the second in a partially decrypted aggregate gradient vectorAn element;
(7) And recovering the gradient plaintext and acquiring a federal learning privacy protection result by the federal server S:
(7a) Federal Server S connects client A w Transmitted D (E (g)) n According to ID in D n Is filled into a one-dimensional recovery list L 'with the length of len = M.T, and then the L' is evenly divided into T sections to obtain T recovery vectors C 1 ,C 2 ,...,C t ,...,C T Wherein T is a decryption threshold T in the Paillier homomorphic encryption algorithm of the (T, N) threshold,which represents the t-th recycle vector,represents a recycle vector C t The mth element of (1);
(7b) Federal server S uses public key set pk and T recovery vectors C t Of middle corresponding positionRecovery of aggregated gradient vector elements plaintext
WhereinAnd is provided withx is the set S x ={x<n 2 I x =1mod n }; obtaining a gradient vector of polymerizationAnd using a gradient descent method through g a To F n Updating:
ω n =ω n +ηg a
obtaining the convolutional neural network model of the training of the current roundWhere η is the learning rate, in this example, take the combination function of the Paillier homomorphic encryption algorithm with 0.001, R (-) representing the (T, N) threshold;
Claims (7)
1. A self-adaptive sampling federal learning privacy protection method based on threshold homomorphism is characterized by comprising the following steps:
(1) Constructing a federal learning system:
the construction method comprises a federal server S, a key distribution center KDC and N clients A = { A = { (A) 1 ,A 2 ,...,A n ,...,A N The Federal learning System of, each client A n Holding an image data set D n ={X n ,Y n In which N is not less than 2, A n Indicating identity information as ID n N < th > client, X n Representing a set of images comprising Z images,Y n represents X n The set of true category labels of the objects contained in (a),
(2) Federal server S initialization parameters:
(2a) Federal Server SInitialize each client A n Convolutional neural network model F n Minimum compression ratio of CR min Window time, F n The weight value parameter isAnd CR is min Time and F n Is sent to A n Wherein M represents a model F n The number of the weights, M is more than or equal to 50000,is represented by F n The mth weight of (1);
(2b) The decryption threshold in the Paillier homomorphic encryption algorithm of the (T, N) threshold initialized by the federal server S is T, the number of training rounds is r, and the convolutional neural network model of the r-th round of training isThe maximum training round number R is obtained, T and the number N of the clients are sent to a KDC, R =0, wherein T is more than 0 and less than or equal to N, and T is more than CR min ·N,R≥100;
(3) The key distribution center KDC generates a client key:
the key distribution center KDC generates each client A according to T and N n Private key sk n Then will sk n Is sent to A n And generates a public key pk;
(4) Client A n Obtaining an encrypted sampling gradient vector and a position bit string:
(4a) Client A n Image data set D n As a convolutional neural network model F n The input of the image is transmitted forward to obtain a prediction label set P of all images n Then using a cross entropy loss function, through P n And Y n Calculating F n Loss value L of n Then through L n For the weight parameterCalculating gradient vector elements by partial derivationObtaining a gradient vector
(4b) Client A n Measuring the r round of client A n The communication bandwidth for sending the position bit string and the encrypted sampling gradient vector to the federal server S is B r And calculating the number of sampling gradient elements of the training roundThen according toFor gradient vector G n Performing adaptive sampling to obtain a sample includingSampling gradient vector of sampling gradientThen using public key set pk to each sampling gradient vector elementEncrypting to obtain encrypted sampling gradient vectorWhereiny is a gradient vector elementOccupied computer storage space size, E (-) represents Paillier homomorphic encryption algorithm of (T, N) thresholdThe encryption function of (1) is selected,representing the second in the sampled gradient vectorThe number of the elements is one,representing the second in the encrypted sampled gradient vectorAn element;
(4c) Client A n Initialization and gradient vectorsCorresponding binary bit string I with each bit being 0 n '=<I n1 ,I n2 ,...,I nm ,...,I nM >And make a judgment onWhether it is established or not, i.e. determiningWhether or not it is selected in the adaptive sampling, if so, order itCorresponding bit I nm =1, otherwise let I nm =0, obtaining record sample gradient vector elementsIn the gradient vector G n Bit string of binary positions I of intermediate position n =<I n1 ,I n2 ,...,I nm ,...,I nM >;
(4d) Client A n Bit string of position I n And encrypting the sampled gradient vector E (G) n ') to a federated server S;
(5) The federal server S aggregates the encrypted sampling gradient vectors and sends an aggregation result:
(5a) Federal Server S Via location bit string I n To client A n Encrypted sampling gradient vector E (G) n ') is developed to obtain Is represented by A n The mth developed sample gradient vector element of (1):
(5b) Federal server S performs gradient vector element on mth developed sample in N clientsPolymerizing to obtain encrypted polymerization gradient vectorWhereinRepresents the mth encrypted aggregate gradient vector element:
(5c) The federal server S is constructed toDistribution of dictionaries D for key-value pairs, wherein key-value pairs are pressedThe sizes of the two groups are arranged in descending order; at the same time, a gradient vector E (G) aggregated by T ciphers is constructed a ) Composed distribution list L =<E 1 (G a ),E 2 (G a ),...,E t (G a ),...,E T (G a )>In which E t (G a ) Indicating the t-th E (G) constituting the distribution list L a );
(5d) Federal Server S according to distribution of IDs in dictionary D n In order of sending the elements in the distribution list L to the client A w The set of clients receiving the distribution list L element is a' = { a = } 1 ,A 2 ,...,A w ,...,A W And (c) the step of (c) in which,A w representing the W-th client receiving the element of the distribution list L, wherein W is less than or equal to N;
(6) Client A w Decrypting the encrypted aggregate gradient vector:
client A w Using its private key sk w Decrypting the received distribution list L element to obtain a partially decrypted gradient vectorThen D (E (g)) n Sent to the Federal Server S, where sk w Representing client A w D (-) represents the decryption function of the Paillier homomorphic encryption algorithm with a (T, N) threshold,representing the second in a partially decrypted aggregate gradient vectorAn element;
(7) And recovering a gradient plaintext and acquiring a privacy protection federal learning result by the federal server S:
(7a) Federal Server S connects client A w Transmitted D (E (g)) n According to ID in D n Is filled into a one-dimensional recovery list L 'with the length of len = M.T, and then the L' is evenly divided into T sections to obtain T recovery vectors C 1 ,C 2 ,...,C t ,...,C T Wherein T is a decryption threshold T in the Paillier homomorphic encryption algorithm of the (T, N) threshold,which represents the t-th recycle vector and,represents a recycle vector C t The mth element of (1);
(7b) Federal server S uses public key set pk and T recovery vectors C t Of middle corresponding positionRecovery of aggregated gradient vector elements plaintextObtaining a gradient vector of polymerizationAnd using a gradient descent method through g a To F n Updating to obtain the convolutional neural network model of the current round of trainingWherein R (-) represents a combination function of the Paillier homomorphic encryption algorithm of the (T, N) threshold;
2. The adaptive sampling federated learning privacy protection method based on threshold homomorphism as claimed in claim 1, characterized in that, each client A stated in step (3) is n Private key sk n And a public key pk, the generation formula of which is:
sk n =f(λ n )moduv
pk={g,u,θ}
u=pq
p=2p'+1
q=2q'+1
v=p'·q'
g=(1+u) a b u modu 2
θ=αsvmodu
wherein, f (λ) n ) Represents a function f (λ) = a 0 +a 1 λ+a 2 λ 2 +...+a t λ t +...+a T λ T-1 At λ n A function value, λ, in the representation of an independent variable 1 ,λ 2 ,...,λ n ,...,λ N N random integers randomly selected by KDC representing a key distribution center, mod representing a modulus operator, sk n =f(λ n ) moduv, indicating f (λ) n ) The residue of dividing by uv is sk n P and q represent two different prime numbers in the positive integer domain, p 'and q' represent two different strong prime numbers in the positive integer domain, s, a, and b represent positive integers, α = ord (b), and ord (·) represents a symbol of an integer order.
3. The threshold homomorphic-based adaptive sampling federated learning privacy protection method of claim 1, characterized in that the steps are as followsThe pair of gradient vectors G described in step (4 b) n The self-adaptive sampling method comprises the following steps:
5. The threshold homomorphic-based adaptive sampling federated learning privacy protection method of claim 1, wherein the federated server S in step (5D) follows the IDs in the distribution dictionary D n In order of sending the elements in the distribution list L to the client A w In a Federal Server S according to dictionary DThe size and order of the distribution list L, the elements in the distribution list L are divided into W distribution sets Ω 1 ,Ω 2 ,...,Ω w ,...,Ω W Wherein W is less than or equal to N, omega w Represents the nth distribution set; then the omega is adjusted 1 ,Ω 2 ,...,Ω w ,...,Ω W According to the ID in dictionary D n Is distributed to the client a 1 ,A 2 ,...,A w ,...,A W 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211010855.3A CN115378707B (en) | 2022-08-23 | 2022-08-23 | Self-adaptive sampling federal learning privacy protection method based on threshold homomorphism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211010855.3A CN115378707B (en) | 2022-08-23 | 2022-08-23 | Self-adaptive sampling federal learning privacy protection method based on threshold homomorphism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115378707A true CN115378707A (en) | 2022-11-22 |
CN115378707B CN115378707B (en) | 2024-03-29 |
Family
ID=84066827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211010855.3A Active CN115378707B (en) | 2022-08-23 | 2022-08-23 | Self-adaptive sampling federal learning privacy protection method based on threshold homomorphism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115378707B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116055027A (en) * | 2023-01-10 | 2023-05-02 | 西安交通大学 | Inter-cloud federal learning model aggregation method and system for homomorphic encryption |
CN117892357A (en) * | 2024-03-15 | 2024-04-16 | 大连优冠网络科技有限责任公司 | Energy big data sharing and distribution risk control method based on differential privacy protection |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200204341A1 (en) * | 2017-01-20 | 2020-06-25 | Enveil, Inc. | Secure Machine Learning Analytics Using Homomorphic Encryption |
CN112149160A (en) * | 2020-08-28 | 2020-12-29 | 山东大学 | Homomorphic pseudo-random number-based federated learning privacy protection method and system |
CN113434873A (en) * | 2021-06-01 | 2021-09-24 | 内蒙古大学 | Federal learning privacy protection method based on homomorphic encryption |
US20210312334A1 (en) * | 2019-03-01 | 2021-10-07 | Webank Co., Ltd | Model parameter training method, apparatus, and device based on federation learning, and medium |
CN114745092A (en) * | 2022-04-11 | 2022-07-12 | 浙江工商大学 | Financial data sharing privacy protection method based on federal learning |
-
2022
- 2022-08-23 CN CN202211010855.3A patent/CN115378707B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200204341A1 (en) * | 2017-01-20 | 2020-06-25 | Enveil, Inc. | Secure Machine Learning Analytics Using Homomorphic Encryption |
US20210312334A1 (en) * | 2019-03-01 | 2021-10-07 | Webank Co., Ltd | Model parameter training method, apparatus, and device based on federation learning, and medium |
CN112149160A (en) * | 2020-08-28 | 2020-12-29 | 山东大学 | Homomorphic pseudo-random number-based federated learning privacy protection method and system |
CN113434873A (en) * | 2021-06-01 | 2021-09-24 | 内蒙古大学 | Federal learning privacy protection method based on homomorphic encryption |
CN114745092A (en) * | 2022-04-11 | 2022-07-12 | 浙江工商大学 | Financial data sharing privacy protection method based on federal learning |
Non-Patent Citations (5)
Title |
---|
PAN YANG: "Data Security and Privacy Protection for Cloud Storage: A Survey", 《IEEE ACCESS》, 16 July 2020 (2020-07-16) * |
YILONG YANG: "Reveal Your Images: Gradient Leakage Attack Against Unbiased Sampling-Based Secure Aggregation", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》, 12 December 2023 (2023-12-12) * |
杨易龙等: "基于门限同态加密的自适应联邦学习安全聚合方案", 《通信学报》, 24 July 2023 (2023-07-24) * |
董业;侯炜;陈小军;曾帅;: "基于秘密分享和梯度选择的高效安全联邦学习", 计算机研究与发展, no. 10, 9 October 2020 (2020-10-09) * |
魏立斐;陈聪聪;张蕾;李梦思;陈玉娇;王勤;: "机器学习的安全问题及隐私保护", 计算机研究与发展, no. 10, 9 October 2020 (2020-10-09) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116055027A (en) * | 2023-01-10 | 2023-05-02 | 西安交通大学 | Inter-cloud federal learning model aggregation method and system for homomorphic encryption |
CN117892357A (en) * | 2024-03-15 | 2024-04-16 | 大连优冠网络科技有限责任公司 | Energy big data sharing and distribution risk control method based on differential privacy protection |
CN117892357B (en) * | 2024-03-15 | 2024-05-31 | 国网河南省电力公司经济技术研究院 | Energy big data sharing and distribution risk control method based on differential privacy protection |
Also Published As
Publication number | Publication date |
---|---|
CN115378707B (en) | 2024-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110572253B (en) | Method and system for enhancing privacy of federated learning training data | |
CN115378707B (en) | Self-adaptive sampling federal learning privacy protection method based on threshold homomorphism | |
CN112862001B (en) | Privacy protection method and system for decentralizing data modeling under federal learning | |
US20200410404A1 (en) | Systems, circuits and computer program products providing a framework for secured collaborative training using hyper-dimensional vector based data encoding/decoding and related methods | |
CN110071798B (en) | Equivalent key obtaining method and device and computer readable storage medium | |
CN103873234B (en) | Biological quantum secret key distribution method oriented to wireless body area network | |
CN112949741B (en) | Convolutional neural network image classification method based on homomorphic encryption | |
CN115310121A (en) | Real-time reinforced federal learning data privacy security method based on MePC-F model in Internet of vehicles | |
CN113240129A (en) | Multi-type task image analysis-oriented federal learning system | |
Vijayakumar et al. | An improved level of security for dna steganography using hyperelliptic curve cryptography | |
Blesswin et al. | Enhanced semantic visual secret sharing scheme for the secure image communication | |
CN115630713A (en) | Longitudinal federated learning method, device and medium under condition of different sample identifiers | |
Gabr et al. | A combination of decimal-and bit-level secure multimedia transmission | |
Kumar et al. | A GRU and chaos-based novel image encryption approach for transport images | |
CN116644778A (en) | Quantum homomorphic neural network construction method and encrypted image classification method | |
CN118133985A (en) | Task processing method, device, system and medium | |
CN118400089A (en) | Block chain-based intelligent internet of things privacy protection federation learning method | |
Vijayakumar et al. | Increased level of security using DNA steganography | |
Ma et al. | A secure multi-party hybrid encryption sharing scheme with a new 2D sine-cosine chaotic system and compressed sensing | |
CN116684061A (en) | Secret picture encryption method and device based on key expansion and adopting improved AES algorithm | |
CN117675270A (en) | Multi-mode data encryption transmission method and system for longitudinal federal learning | |
CN116582242A (en) | Safe federal learning method of ciphertext and plaintext hybrid learning mode | |
Liu et al. | Cross-domain lossy compression as entropy constrained optimal transport | |
Fang et al. | Flfe: a communication-efficient and privacy-preserving federated feature engineering framework | |
Liu et al. | Quantum steganography for multi-party covert communication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |