CN115378707A

CN115378707A - Adaptive sampling federal learning privacy protection method based on threshold homomorphism

Info

Publication number: CN115378707A
Application number: CN202211010855.3A
Authority: CN
Inventors: 马卓; 杨易龙; 金嘉玉; 李腾; 张俊伟; 刘洋
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-11-22
Anticipated expiration: 2042-08-23
Also published as: CN115378707B

Abstract

The invention discloses a self-adaptive sampling federal learning privacy protection method based on threshold homomorphism, which comprises the following implementation steps of: establishing a federal learning system, initializing parameters of a federal server S, and generating a client key and a client A by a key distribution center KDC _n Acquiring encrypted sampling gradient vectors and position bit strings, aggregating the encrypted sampling gradient vectors and sending an aggregation result by the Federal server S, and sending an aggregation result to the client A _w Decrypting the encrypted aggregation gradient vector, and recovering a gradient plaintext and acquiring a privacy protection federal learning result by the federal server S; the method can be used for reducing the client communication load in the federal learning process, improving the communication efficiency of federal learning privacy protection and widening the applicable range of federal learning privacy protection by self-adaptive sampling, threshold homomorphic decryption task distribution and gradient recoveryThe method has the advantages of high communication efficiency and wide application range.

Description

Adaptive sampling federal learning privacy protection method based on threshold homomorphism

Technical Field

The invention belongs to the technical field of computers, and relates to a federated learning privacy protection method, in particular to a threshold homomorphic-based adaptive sampling federated learning privacy protection method which can be used for reducing the communication load of a client in the federated learning process, improving the communication efficiency of federated learning privacy protection and widening the applicable range of federated learning privacy protection.

Background

Federated learning is a distributed machine learning framework that allows multiple clients to collaboratively train a machine learning model without sharing private data. In summary, federal learning can be divided into the following three steps. First, the server initializes and distributes the global model to the clients. Secondly, the client side trains on the local data set by using the global model issued by the server to obtain a gradient, and then uploads the gradient to the server. Finally, the server aggregates all the uploaded gradients to obtain an aggregated gradient, and then updates the global model using the aggregated gradient. And finally training to obtain the trained model by continuously iterating the three steps.

However, research shows that there is a hidden danger of privacy disclosure in the process of uploading the gradient to the server by the federal learning client: the attacker can reverse the data in the client local data set through the plaintext gradient. To address this problem, many scholars have proposed to apply a homomorphic encryption algorithm to encrypt gradient plaintext to protect the privacy of the federated learning client local data set. The federated learning privacy protection based on the homomorphic encryption algorithm is basically consistent with the traditional federated learning training steps, and only needs to send a polymerization gradient ciphertext to a client for decryption after the server completes polymerization, and the client uploads the decrypted plaintext gradient to the server for parameter updating.

The federated learning privacy protection method based on homomorphic encryption has the problems that in practical application, a client is usually deployed in a complex network environment with poor communication conditions and seriously uneven communication resources, so that the communication bandwidth of the client is low and the communication bandwidth difference between the clients is extremely large. In addition, compared with the plaintext gradient, the encryption gradient has the advantage that the requirement of communication bandwidth during uploading is at least doubled, and the decrypted aggregation gradient plaintext still needs to be uploaded to the server again, so that the communication load of the client is further increased, and the communication efficiency of federal learning is seriously reduced.

The university of inner Mongolia discloses a federal learning privacy protection method based on homomorphic encryption in the patent document 'federal learning privacy protection method based on homomorphic encryption' (patent application number: 202110608465.5, application publication number: CN 113434873A). According to the method, after the client calculates the gradient locally in each round of federal study, the noise is added by using the random gradient decrease of differential privacy, the disturbed gradient is encrypted according to a homomorphic encryption mechanism, and the disturbed gradient is sent to a server. And the server generates new ciphertext parameters according to all the received ciphertext gradients and sends the new ciphertext parameters to the client, and finally the client decrypts and updates the model according to the obtained ciphertext parameters. The method has the disadvantages that the client is required to transmit all encryption gradients to the server, so that an overweight communication load is brought to the client, and the communication efficiency is low. For example, using ResNet-9 as a training network in practice, the total traffic load for a single client upload of encryption gradients at a time is above 50 MB. In addition, by applying the method, part of the clients which can not finish encryption gradient transmission in the window time can not participate in federal learning privacy protection, and the applicable range of federal learning is compressed.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a self-adaptive sampling federal learning privacy protection method based on threshold homomorphism, which is used for solving the technical problem that the communication load of a client is too heavy in the prior art.

In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:

(1) Constructing a federal learning system:

the construction method comprises a federal server S, a key distribution center KDC and N clients A = { A = { (A) ₁ ,A ₂ ,...,A _n ,...,A _N The Federal learning System of { each client A }, each client A _n Holding an image data set D _n ＝{X _n ,Y _n In which N is not less than 2,A _n Indicating identity information as ID _n N < th > client, X _n Representing a set of images comprising Z images,

Y _n represents X _n The set of true category labels of the objects contained in (a),

(2) Federal server S initialization parameters:

(2a) The Federal Server S initializes each client A _n Convolutional neural network model F _n Minimum compressibility is CR _min Window time, F _n The weight value parameter is

And CR is _min Time and F _n Is sent to A _n Wherein M represents a model F _n The number of the weights, M is more than or equal to 50000,

is shown as F _n The mth weight of (1);

(2b) The decryption threshold in the Paillier homomorphic encryption algorithm of the (T, N) threshold initialized by the federal server S is T, the number of training rounds is r, and the convolutional neural network model of the training round is

The maximum training round number R is obtained, T and the number N of the clients are sent to a KDC, R =0, wherein T is more than 0 and less than or equal to N, and T is more than CR _min ·N，R≥100；

(3) The key distribution center KDC generates a client key:

the KDC of the key distribution center generates each client A according to the T and the N _n Private key sk _n Then will sk _n Is sent to A _n And generating a public key pk;

(4) Client A _n Obtaining an encrypted sampling gradient vector and a position bit string:

(4a) Client A _n Image data set D _n As a convolutional neural network model F _n The input of the image is transmitted forward to obtain a prediction label set P of all images _n Then using a cross entropy loss function, through P _n And Y _n Calculating F _n Loss value L of _n Then through L _n For the weight parameter

Calculating gradient vector elements by partial derivation

Obtaining a gradient vector

(4b) Client A _n Measuring the r round of client A _n The communication bandwidth for sending the position bit string and the encrypted sampling gradient vector to the federal server S is B _r And calculating the number of sampling gradient elements of the training of the current round

Then according to

For gradient vector G _n Performing adaptive sampling to obtain a sample including

Sampling gradient vector of sampling gradient

Reuse of public keyThe set pk is for each sampled gradient vector element

Encrypting to obtain encrypted sampling gradient vector

Wherein

y is a gradient vector element

The size of the occupied computer storage space, E (-) represents the encryption function in the Paillier homomorphic encryption algorithm with the (T, N) threshold,

representing the second in the sampled gradient vector

The number of the elements is one,

representing the second in the gradient vector of the encrypted sample

An element;

(4c) Client A _n Initialization and gradient vectors

Corresponding binary bit string I with each bit being 0 _n '＝<I _n1 ,I _n2 ,...,I _nm ,...,I _nM >And make a judgment on

Whether or not it is established, namely, determining

Whether or not to be adaptiveIf selected, let

Corresponding bit I _nm =1, otherwise let I _nm =0, obtaining record sample gradient vector elements

In the gradient vector G _n Bit string of binary positions I of intermediate position _n ＝<I _n1 ,I _n2 ,...,I _nm ,...,I _nM >；

(4d) Client A _n Bit string of position I _n And encrypting the sampled gradient vector E (G) _n ') to a federal server S;

(5) The federal server S aggregates the encryption sampling gradient vectors and sends an aggregation result:

(5a) Federal Server S Via location bit string I _n To client A _n Encrypted sampling gradient vector E (G) _n ') is developed to obtain

Is represented by A _n The mth developed sample gradient vector element of (1):

wherein

J = m; j represents I _nm In I _n A position index of (1);

(5b) Federal server S performs gradient vector element on mth developed sample in N clients

Polymerizing to obtain a dense polymerResultant gradient vector

Wherein

Represents the mth encrypted aggregate gradient vector element:

(5c) The federal server S is constructed to

Distribution dictionary D for key-value pairs, wherein key-value pairs are pushed

The sizes of the cells are arranged in a descending order; at the same time, a gradient vector E (G) aggregated by T ciphers is constructed _a ) Composed distribution list L =<E ₁ (G _a ),E ₂ (G _a ),...,E _t (G _a ),...,E _T (G _a )>In which E _t (G _a ) Denotes the t-th E (G) constituting the distribution list L _a )；

(5d) Federal Server S according to ID in distribution dictionary D _n In order of sending the elements in the distribution list L to the client A _w The set of clients receiving the distribution list L element is a' = { a = { (a) } ₁ ,A ₂ ,...,A _w ,...,A _W And (c) the step of (c) in which,

A _w representing the W-th client receiving the element of the distribution list L, wherein W is less than or equal to N;

(6) Client A _w Decrypting the encrypted aggregate gradient vector:

client A _w Using its private key sk _w Decrypting the received distribution list L element to obtain a partially decrypted gradient vector

Then D (E (g)) _n Sent to a federal server S, where sk _w Representing client A _w D (-) represents the decryption function of the Paillier homomorphic encryption algorithm with a (T, N) threshold,

representing the second in a partially decrypted aggregate gradient vector

An element;

(7) And recovering the gradient plaintext and acquiring a federal learning privacy protection result by the federal server S:

(7a) Federal Server S connects client A _w Transmitted D (E (g)) _n According to ID in D _n The sequence of (1) is filled into a one-dimensional recovery list L 'with the length of len = M.T, and then the L' is uniformly divided into T sections to obtain T recovery vectors C ₁ ,C ₂ ,...,C _t ,...,C _T Wherein T is a decryption threshold T in the Paillier homomorphic encryption algorithm of the (T, N) threshold,

which represents the t-th recycle vector and,

represents a recycle vector C _t The mth element of (1);

(7b) Federal server S uses public key set pk and T recovery vectors C _t In a corresponding position

Recovery of aggregated gradient vector elements plaintext

Obtaining a gradient vector of polymerization

And adopts a gradient descent method to carry out the treatment,through g _a To F _n Updating to obtain the convolutional neural network model of the current round of training

Wherein R (-) represents a combination function of the Paillier homomorphic encryption algorithm of the (T, N) threshold;

(7c) The federal server S judges whether R is larger than or equal to R, if so, the trained convolutional neural network model

Otherwise, it orders

Let r = r +1, and perform step (4).

Compared with the prior art, the invention has the following advantages:

firstly, the gradient vectors of each training turn are subjected to self-adaptive sampling by calculating the number of sampling gradient elements of each training turn, so that a client only needs to upload sampled encrypted gradient data instead of all encrypted gradient data in the process of federal learning privacy protection, the problem of overlarge communication load of the client in the prior art is solved, the communication load of the client in the process of federal learning privacy protection is effectively reduced, and the application range of federal learning privacy protection is widened.

Secondly, the invention uses the threshold homomorphic decryption task distribution and gradient recovery method to ensure that the server distributes the decryption tasks suitable for the client communication bandwidth according to the number of the client sampling gradient elements, thereby overcoming the problem of large client communication load when the client uploads the decryption aggregation gradient to the server in the prior art and further reducing the client communication load in the federal learning process.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of the threshold homomorphic decryption task distribution of the present invention;

Detailed Description

The invention is described in further detail below with reference to the figures and specific embodiments.

Referring to fig. 1, the present invention comprises the steps of:

(1) Constructing a federal learning system:

the construction method comprises a federal server S, a key distribution center KDC and N clients A = { A = { (A) ₁ ,A ₂ ,...,A _n ,...,A _N The Federal learning System of, each client A _n Holding an image data set D _n ＝{X _n ,Y _n In which N is not less than 2,A _n Indicating identity information as ID _n N < th > client, X _n Representing a set of images comprising Z images,

in this example, the number of federate learning system clients N =30, z =200, a _n Training using the MNIST dataset as an image dataset; the MNIST data set is a gray-scale handwritten digital image data set widely applied to the field of federal learning, and comprises 60000 training image data, wherein each image data is a gray-scale handwritten digital image with the size of 28 multiplied by 28, and a label of the MNIST data set is a real number corresponding to a handwritten number and takes the value of 0-9; in this example, each user A _n Respectively hold D _n ＝{X _n ,Y _n Of MNIST data set

I.e. each user a _n Respectively hold 200 different real labels Y _n Gray scale hand-written digital picture X _n ；

(2) Federal server S initialization parameters:

(2a) The federal server S initializes each client a _n Convolutional neural network model F _n Minimum compressibility is CR _min And window time, F _n Weight parameter ofIs numbered as

is represented by F _n The mth weight of (1);

in this example, the minimum compression ratio CR _min =0.2, window time =1s; convolutional neural network model F initialized by federal server _n The structure comprises a first convolution layer, an activation function layer, a second convolution layer, an activation function layer, a third convolution layer, an activation function layer and a full-connection layer which are connected in sequence; the number of input channels of the first convolution layer is 3, the number of output channels is 10, and the size of convolution kernel is 5 x 5; the number of input channels of the second convolution layer is 10, the number of output channels is 20, and the convolution kernel size is 5 x 5; the number of input channels of the third convolution layer is 20, the number of output channels is 10, and the convolution kernel size is 5 x 5; the activation function layer adopts a Sigmoid function; the input dimension of the full connection layer is 4000, and the output dimension is 10; the convolution neural network model F _n The calculation formula of the weight number M is as follows: m =3 × M _cov +3×M _sig +M _fc Wherein the number M of convolutional layer weights _cov The calculation formula is as follows: m _cov ＝size ² ×C _i ×C ₀ + C, where size represents the convolution kernel size, C _i Indicates the number of input channels, C ₀ Representing the number of output channels, and C representing the weight number of the offset items; number M of activation function layer weights _sig The calculation formula is as follows: m _sig ＝2×C _i Number of full link layer weights M _fc The calculation formula is as follows: m _fc ＝E _i ×E ₀ + E, wherein E _i Representing the length of the input vector, E ₀ The length of the output vector is represented, E represents the weight number of the bias items, and the convolution neural network global model F in the example is calculated _n The number of the weights M =35900; in practical application, a convolution neural network model F with a complex structure _n Weight valueThe number can reach millions or even tens of millions.

(2b) The decryption threshold in the Paillier homomorphic encryption algorithm of the (T, N) threshold initialized by the federal server S is T, the number of training rounds is r, and the convolutional neural network model of the r-th round of training is

The maximum training round number R is obtained, T and the number N of the clients are sent to a key distribution center KDC, R =0, wherein T is more than 0 and less than or equal to N, and T is more than CR _min ·N，R≥100；

In this example, the federal server S initializes a decryption threshold T =5 and a maximum number of training rounds R =1000 in the Paillier homomorphic encryption algorithm for the (T, N) threshold, where 0 < T ≦ N and T < CR _min N; requirement T < CR _min The reason for N is that in step (5 d), gradient vectors E (G) are aggregated by T ciphers _a ) All elements in the composed distribution list L can be distributed and decrypted, i.e. E (G) _a ) Each encryption gradient element in the (T, N) threshold can be decrypted by different clients for T times so as to meet the decryption condition of the Paillier homomorphic encryption algorithm of the (T, N) threshold, and therefore, T.M < CR is required _min N.M, i.e. T < CR _min ·N；

(3) The key distribution center KDC generates a client key:

the secret key distribution center in this example generates the secret key sk _n The method for postneutralizing the public key pk comprises the following steps:

firstly, randomly selecting two different strong prime numbers p 'and q' by a key distribution center KDC in a positive integer domain, and calculating p =2p '+1 and q =2q' +1, wherein p and q are required to be different prime numbers, and u = pq and v = p '· q'; then randomly selecting a positive integer s, and enabling the master key to be sk = s · v; then a KDC of a key distribution center randomly selects T random integers a ₁ ,a ₂ ,...,a _t ,...,a _T Constructing a polynomial:

f(λ)＝a ₀ +a ₁ λ+a ₂ λ ² +...+a _t λ ^t +...+a _T λ ^T-1 ；

then, a secret key distribution center KDC randomly selects N random integers lambda ₁ ,λ ₂ ,...,λ _n ,...,λ _N Sequentially substituting into f (lambda), and calculating to obtain f (lambda) ₁ ),f(λ ₂ ),...,f(λ _n ),...,f(λ _N ) (ii) a Subsequent key distribution center KDC calculation sk _n ＝f(λ _n ) mod uv to obtain the private key sk of the nth client _n (ii) a Finally, two different positive integers a and b are randomly selected, and a public key set pk = { g, u, theta } is calculated, wherein g = (1 + u) ^a b ^u modu ² θ = α sv mod u, ord (b) = α, ord (·) denotes a symbol of an integer order;

Calculating gradient vector elements by partial derivation

Obtaining a gradient vector

Then theAccording to

Sampling gradient vector of sampling gradient

Then using public key set pk to each sampling gradient vector element

Encrypting to obtain encrypted sampling gradient vector

Wherein

y is a gradient vector element

The size of the occupied computer storage space, E (-) represents the encryption function in the Paillier homomorphic encryption algorithm of the (T, N) threshold,

representing the second in the sampled gradient vector

The number of the elements is one,

representing the second in the gradient vector of the encrypted sample

An element;

the invention uses adaptive sampling, and can reduce the number of encryption gradients to be uploaded by the client

Therefore, the communication load of the client in the federal learning process is effectively reduced, and the communication efficiency of the federal learning privacy protection is improved. At the same time, the number of gradient elements is sampled

By the r round client A _n Communication bandwidth B for transmitting position bit string and encrypted sampling gradient vector to federal server S _r The client with poor communication condition can be adjusted through self-adaption

The method ensures that the user can participate in the federal learning privacy protection, and widens the application range of the federal learning privacy protection.

Subsequently, client A _n Recording abs (G) _n ) Of the largest

Obtaining a position vector according to the position of the gradient absolute value element

Wherein

Represents abs (G) _n ) To middle

The position of the large absolute gradient element;

selecting the largest absolute value

The reason for the individual gradient elements is: the larger the absolute value of the gradient element is, the larger the contribution to the convergence of the convolutional neural network model is, and on the contrary, the gradient element with the absolute value close to 0 hardly contributes to the convergence of the convolutional neural network model;

then, the client A _n Judging whether to use

If not, the method will be

Setting to 0 to obtain the gradient vector of the marked sampling

Wherein

Representing client A _n Gradient element of m

The position of (a);

finally, client A _n Erasure marker sampling gradient vector G _ns All of which are 0, to obtain a composition comprising

Sampling gradient vector of sampling gradient

Then each sampled gradient vector element is subjected to a public key set pk

And (3) encryption:

deriving an encrypted sampling gradient vector

Wherein

r _n Random number representing the Nth client selection, E: (N) an encryption function in a Paillier homomorphic encryption algorithm representing a (T, N) threshold,

representing the second in the sampled gradient vector

The number of the elements is one,

representing the second in the gradient vector of the encrypted sample

An element;

(4c) As each client A is required in the process of calculating the aggregation gradient by the server S _n Are aligned in bits, and thus, client a _n A binary bit string recording the positions of the gradient vector elements is also obtained and sent to the server S. Client A _n Initialization and gradient vectors

Whether it is established or not, i.e. determining

Whether or not it is selected in the adaptive sampling, if so, order it

At gradient vector G _n Binary position of the middle positionBit string I _n ＝<I _n1 ,I _n2 ,...,I _nm ,...,I _nM >；

(4c) Client A _n Bit string of position I _n And encrypting the sampled gradient vector E (G) _n ') to a federal server S;

Is represented by A _n The mth developed sample gradient vector element of (1):

wherein

J = m; j represents I _nm In I _n A position index of (2);

(5b) Federal server S pairs mth developed sample gradient vector element in N clients

Polymerizing to obtain encrypted polymerization gradient vector

Wherein

Represents the mth encrypted aggregate gradient vector element:

the distribution of the threshold homomorphic decryption task of the federated server S is further described with reference to FIG. 2;

(5c) The federal server S is constructed to

Distribution of dictionaries D for key-value pairs, wherein key-value pairs are pressed

The sizes of the two groups are arranged in descending order; at the same time, a gradient vector E (G) aggregated by T ciphers is constructed _a ) Composed distribution list L =<E ₁ (G _a ),E ₂ (G _a ),...,E _t (G _a ),...,E _T (G _a )>In which E _t (G _a ) Denotes the t-th E (G) constituting the distribution list L _a )；

In this example, press in dictionary D is distributed

Descending order of IDs _n The order of distribution of the threshold homomorphic decryption tasks is recorded, and the ID _n Corresponding to

I.e. ID is the identity information to be sent to _n The threshold homomorphic decryption task amount of the client. In one round of federal learning, due to client a _n Number of encrypted sampling gradients sent to the Federal Server S

Is composed of communication bandwidth B _r Calculated and carried the communication bandwidth B _r So that the federal server S can refer to the threshold homomorphic decryption task distribution process

Compress each client A _w Threshold homomorphism to be undertakenDecrypt the task volume and send it to client A _w Dispense the appropriate amount

Using the encryption aggregation gradient element as a threshold homomorphic decryption task to enable the client A _w Communication load and B when uploading partial decryption aggregate gradient after decryption _r And (4) adapting.

(5d) Federal Server S according to ID in distribution dictionary D _n In the order of (2), send the elements in the distribution list L to the client A _w The set of clients receiving the distribution list L element is a' = { a = { (a) } ₁ ,A ₂ ,...,A _w ,...,A _W And (c) the step of (c) in which,

(6) Client A _w Decrypting the encrypted aggregate gradient vector:

Then D (E (g)) _n Sent to a federal server S, where N! Denote factorial, sk of N _w Representing client A _w D (-) represents the decryption function of the Paillier homomorphic encryption algorithm with a (T, N) threshold,

representing the second in a partially decrypted aggregate gradient vector

An element;

(7a) Federal Server S connects client A _w Transmitted D (E (g)) _n According to ID in D _n Is filled into a one-dimensional recovery list L 'with the length of len = M.T, and then the L' is evenly divided into T sections to obtain T recovery vectors C ₁ ,C ₂ ,...,C _t ,...,C _T Wherein T is a decryption threshold T in the Paillier homomorphic encryption algorithm of the (T, N) threshold,

which represents the t-th recycle vector,

represents a recycle vector C _t The mth element of (1);

(7b) Federal server S uses public key set pk and T recovery vectors C _t Of middle corresponding position

Recovery of aggregated gradient vector elements plaintext

Wherein

And is provided with

x is the set S _x ＝{x＜n ² I x =1mod n }; obtaining a gradient vector of polymerization

And using a gradient descent method through g _a To F _n Updating:

ω _n ＝ω _n +ηg _a

obtaining the convolutional neural network model of the training of the current round

Where η is the learning rate, in this example, take the combination function of the Paillier homomorphic encryption algorithm with 0.001, R (-) representing the (T, N) threshold;

(7c) The federal server S judges whether R is larger than or equal to R, and if so, outputs the trained convolutional neural network model

Otherwise, make it

Let r = r +1, and perform step (4).

Claims

1. A self-adaptive sampling federal learning privacy protection method based on threshold homomorphism is characterized by comprising the following steps:

(1) Constructing a federal learning system:

the construction method comprises a federal server S, a key distribution center KDC and N clients A = { A = { (A) ₁ ,A ₂ ,...,A _n ,...,A _N The Federal learning System of, each client A _n Holding an image data set D _n ＝{X _n ,Y _n In which N is not less than 2, A _n Indicating identity information as ID _n N < th > client, X _n Representing a set of images comprising Z images,

(2) Federal server S initialization parameters:

(2a) Federal Server SInitialize each client A _n Convolutional neural network model F _n Minimum compression ratio of CR _min Window time, F _n The weight value parameter is

is represented by F _n The mth weight of (1);

(3) The key distribution center KDC generates a client key:

the key distribution center KDC generates each client A according to T and N _n Private key sk _n Then will sk _n Is sent to A _n And generates a public key pk;

Calculating gradient vector elements by partial derivation

Obtaining a gradient vector

(4b) Client A _n Measuring the r round of client A _n The communication bandwidth for sending the position bit string and the encrypted sampling gradient vector to the federal server S is B _r And calculating the number of sampling gradient elements of the training round

Then according to

Sampling gradient vector of sampling gradient

Then using public key set pk to each sampling gradient vector element

Encrypting to obtain encrypted sampling gradient vector

Wherein

y is a gradient vector element

Occupied computer storage space size, E (-) represents Paillier homomorphic encryption algorithm of (T, N) thresholdThe encryption function of (1) is selected,

representing the second in the sampled gradient vector

The number of the elements is one,

representing the second in the encrypted sampled gradient vector

An element;

(4c) Client A _n Initialization and gradient vectors

Whether it is established or not, i.e. determining

Whether or not it is selected in the adaptive sampling, if so, order it

(4d) Client A _n Bit string of position I _n And encrypting the sampled gradient vector E (G) _n ') to a federated server S;

(5) The federal server S aggregates the encrypted sampling gradient vectors and sends an aggregation result:

Is represented by A _n The mth developed sample gradient vector element of (1):

wherein

J = m; j represents I _nm In I _n A position index of (2);

Polymerizing to obtain encrypted polymerization gradient vector

Wherein

Represents the mth encrypted aggregate gradient vector element:

(5c) The federal server S is constructed to

The sizes of the two groups are arranged in descending order; at the same time, a gradient vector E (G) aggregated by T ciphers is constructed _a ) Composed distribution list L =<E ₁ (G _a ),E ₂ (G _a ),...,E _t (G _a ),...,E _T (G _a )>In which E _t (G _a ) Indicating the t-th E (G) constituting the distribution list L _a )；

(5d) Federal Server S according to distribution of IDs in dictionary D _n In order of sending the elements in the distribution list L to the client A _w The set of clients receiving the distribution list L element is a' = { a = } ₁ ,A ₂ ,...,A _w ,...,A _W And (c) the step of (c) in which,

(6) Client A _w Decrypting the encrypted aggregate gradient vector:

Then D (E (g)) _n Sent to the Federal Server S, where sk _w Representing client A _w D (-) represents the decryption function of the Paillier homomorphic encryption algorithm with a (T, N) threshold,

representing the second in a partially decrypted aggregate gradient vector

An element;

(7) And recovering a gradient plaintext and acquiring a privacy protection federal learning result by the federal server S:

which represents the t-th recycle vector and,

represents a recycle vector C _t The mth element of (1);

Recovery of aggregated gradient vector elements plaintext

Obtaining a gradient vector of polymerization

And using a gradient descent method through g _a To F _n Updating to obtain the convolutional neural network model of the current round of training

(7c) The federal server S judges whether or notR is more than or equal to R, if yes, the well trained convolution neural network model

Otherwise, make it

Let r = r +1, and perform step (4).

2. The adaptive sampling federated learning privacy protection method based on threshold homomorphism as claimed in claim 1, characterized in that, each client A stated in step (3) is _n Private key sk _n And a public key pk, the generation formula of which is:

sk _n ＝f(λ _n )moduv

pk＝{g,u,θ}

u＝pq

p＝2p'+1

q＝2q'+1

v＝p'·q'

g＝(1+u) ^a b ^u modu ²

θ＝αsvmodu

wherein, f (λ) _n ) Represents a function f (λ) = a ₀ +a ₁ λ+a ₂ λ ² +...+a _t λ ^t +...+a _T λ ^T-1 At λ _n A function value, λ, in the representation of an independent variable ₁ ,λ ₂ ,...,λ _n ,...,λ _N N random integers randomly selected by KDC representing a key distribution center, mod representing a modulus operator, sk _n ＝f(λ _n ) moduv, indicating f (λ) _n ) The residue of dividing by uv is sk _n P and q represent two different prime numbers in the positive integer domain, p 'and q' represent two different strong prime numbers in the positive integer domain, s, a, and b represent positive integers, α = ord (b), and ord (·) represents a symbol of an integer order.

3. The threshold homomorphic-based adaptive sampling federated learning privacy protection method of claim 1, characterized in that the steps are as followsThe pair of gradient vectors G described in step (4 b) _n The self-adaptive sampling method comprises the following steps:

client A _n According to G _n Middle order retention G _n In

Having the greatest absolute value

A gradient vector element, obtaining including

Sampling gradient vector of sampling gradient

4. The threshold homomorphic-based adaptive sampling federal learning privacy protection method as claimed in claim 1, wherein the expression of the encryption function E (-) in step (4 b) is:

wherein r is _n Representing the random number selected by each client pair.

5. The threshold homomorphic-based adaptive sampling federated learning privacy protection method of claim 1, wherein the federated server S in step (5D) follows the IDs in the distribution dictionary D _n In order of sending the elements in the distribution list L to the client A _w In a Federal Server S according to dictionary D

The size and order of the distribution list L, the elements in the distribution list L are divided into W distribution sets Ω ₁ ,Ω ₂ ,...,Ω _w ,...,Ω _W Wherein W is less than or equal to N, omega _w Represents the nth distribution set; then the omega is adjusted ₁ ,Ω ₂ ,...,Ω _w ,...,Ω _W According to the ID in dictionary D _n Is distributed to the client a ₁ ,A ₂ ,...,A _w ,...,A _W 。

6. The threshold homomorphic-based adaptive sampling federated learning privacy protection method of claim 1, wherein the decryption function D (-) in step (6) has the expression:

wherein N! Representing a factorial of N.

7. The threshold homomorphic-based adaptive sampling federated learning privacy protection method of claim 1, wherein the recovery function R (-) in step (7 a) is expressed as:

wherein

t'∈[0,t]And is

The representation of the integer field is shown,

x is the set S _x ＝{x＜n ² | x =1modn }.