CN116933899A

CN116933899A - Data security aggregation method and system based on multiple homomorphism attributes

Info

Publication number: CN116933899A
Application number: CN202310842760.6A
Authority: CN
Inventors: 孙奕; 陈性元; 高琦; 曹利峰; 杨帆; 张东巍
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-10-24

Abstract

The invention relates to a data security aggregation method and system based on multiple homomorphism attributes, wherein a user uses seeds as privacy parameters to generate a random mask, the privacy input of each user is covered by the random mask, the privacy parameters are encrypted by using a temporary key and a global public key, and ciphertext and covered privacy input are sent to an aggregation server; the aggregation server aggregates the received ciphertext of each user and feeds back an aggregation result to the user; the user uses the temporary key, the private key of the client and the public key of the server to perform conversion calculation on the aggregation result and send the conversion calculation result to the aggregation server, the aggregation server uses the private key of the server to decrypt the conversion result, obtain a seed sum, obtain a random mask sum by using the seed sum, obtain a privacy input sum by aggregating the privacy input covered by the aggregation, and obtain global parameters in federal learning by using the privacy input sum to subtract the random mask sum. The method and the system can improve the aggregation efficiency and are suitable for federal study of large-scale mobile terminals and Internet of things equipment.

Description

Data security aggregation method and system based on multiple homomorphism attributes

Technical Field

The invention relates to the technical field of federal learning, in particular to a data security aggregation method and system based on multiple homomorphic attributes.

Background

Research federal learning allows multiple decentralized devices to train a local model with their local data and then aggregate the local model by a central server to generate a global model by exchanging the local model. Because the data is not local, compared with the traditional centralized data for model training, the user is not exposed to the risk of privacy data disclosure. Federal learning has the advantage that high quality models can be trained using rich data from different sources. With the development of the mobile internet of things, more and more devices are connected to the network, such as mobile terminals, internet of things devices and the like, the scale is huge, a large amount of valuable data is contained, enterprises train high-quality models to provide better use experience for users by utilizing the data to perform federal learning, and the competitive power of the industries is increased. Compared with the traditional centralized data training model, federal learning greatly reduces the risk of privacy disclosure through a sharing model mode, but with the deep research, the prior research shows that the privacy attack can still steal the user privacy data through the update of the model, such as member reasoning attack, attribute reasoning attack and the like. Therefore, protection of the local model is necessary in federal learning. For this reason, studies have proposed privacy protection mechanisms based on secure multi-party computing, homomorphic encryption, and differential privacy, to implement protection of local models in federal learning.

The differential privacy is that a user adds carefully constructed noise to model parameters to perform disturbance before uploading the local model to the aggregation server, and the server can aggregate the disturbed local model according to the property of the differential privacy to obtain an approximate result while protecting the local model. However, this approach trades off privacy protection against model utility, the better the amount of noise the model protects, but the worse the quality of the model. The aggregation scheme based on homomorphic encryption is that a user encrypts a local model through a homomorphic encryption algorithm, then sends ciphertext to a server, the server aggregates the ciphertext, then returns an aggregation result to a participant for decryption, and then carries out new training. Homomorphic encryption can protect the model from leakage and can not influence the quality of the model. However, the computing and communication overhead of homomorphic encryption of high-dimensional model parameters is expensive, and in cross-device federal learning, participants are typically mobile terminals, internet of things devices, which are huge in number and limited in computing and communication resources, and device performance may not meet training requirements. Employing homomorphic encryption may also require the introduction of additional security assumptions, such as the need for all participants to share a key, intolerance of collusion between the server and the user, etc. For example, a practical security aggregation scheme SecAgg realizes privacy protection model aggregation through lightweight one-time key encryption. The users u and v share a seed s_ (u and v), then a paired mask mechanism is used for protecting the model, and after aggregation, masks are mutually counteracted to obtain a global model. However, secAgg requires a large number of key negotiations and secret sharing by the user in order to generate the pairwise mask, making the secure aggregate computing overhead a secondary complexity of the number of users. In addition, since the pair masks are used, in order to eliminate the influence of the line users, the aggregation time may be significantly increased with the increase of the line users, and when d line users exist, the server needs to increase (n-d) d times of key recovery operations to ensure correct aggregation, which may not be suitable for the federal learning scenario in which large-scale users participate.

Disclosure of Invention

Therefore, the invention provides a data security aggregation method and system based on multiple homomorphism attributes, which solve the problems of privacy disclosure, resource limitation, frequent user withdrawal and the like in federal learning on the existing large-scale mobile terminal and internet of things equipment.

According to the design scheme provided by the invention, a data security aggregation method based on multiple homomorphic attributes is provided, and is used for federal learning privacy protection, and comprises the following steps:

each user generates a client public-private key pair locally, an aggregation server generates a server public-private key pair, each client transmits a respective client public key to the aggregation server, the aggregation server generates a global public key by fusing the client public keys, and the global public key and the server public key are transmitted to each user;

each user uses seeds as privacy parameters, random masks are generated by using seed pseudo-random numbers, privacy input is masked by the random masks, the privacy parameters are encrypted by using a local randomly generated temporary key and a global public key, and encrypted ciphertext and masked privacy input are sent to an aggregation server; the aggregation server aggregates the received ciphertext of each user according to the addition homomorphism of the encryption algorithm, and feeds the aggregated ciphertext back to the user;

The user uses the temporary key, the private key of the client and the public key of the server to perform conversion calculation on the aggregated ciphertext, and sends a conversion calculation result to the aggregation server, the aggregation server uses the private key of the server to decrypt the conversion result to obtain a user seed sum, uses the seed sum to generate a random mask sum, and uses the aggregation server to obtain a privacy input sum by aggregating the masked privacy input to obtain global parameters of federal learning by subtracting the random mask sum from the privacy input sum.

As the data security aggregation method based on the multi-homomorphism attribute, the method further comprises the following steps before each user encrypts the respective privacy input by using the local random generated temporary key and the global public key: a random mask is generated using a seed pseudorandom number generator that employs a pseudorandom number function having homomorphic properties, and the privacy input is encrypted using the random mask.

As the data security aggregation method based on the multi-homomorphism attribute, the invention further utilizes a seed pseudo-random number generator to generate a random mask, comprising the following steps:

firstly, constructing a shared input sequence by splicing the training wheel number and the model parameter index;

Then, a random mask is generated using a seed pseudorandom number generator based on the shared input sequence.

As the data security aggregation method based on multiple homomorphism attributes of the invention, further, a seed pseudo-random number generator is utilized to generate a random mask, and the generation process is expressed as follows: r is (r) _i ＝SHPRG(s _i ) Wherein SHPRG(s) _i )＝[F(s _i ，b||τ)]B is more than 0 and less than L, b is the index of each model parameter, L is the user local model m _i Is the current training round number, s _i For user u _i Locally randomly selected seeds, F (s _i ，b||τ)＝s _i H (b||τ), H () is a hash function.

As the data security aggregation method based on the multi-homomorphism attribute, the invention further utilizes a seed pseudo-random number generator to generate a random mask, and the generation process further comprises the following steps: the seed selected randomly by the user locally is represented by a plurality of congruence numbers, a random mask is generated based on a seed pseudo-random number generator, and then CRT encoding is carried out on the seed so as to convert the random mask aggregation into seed aggregation in data aggregation.

The data security aggregation method based on the multi-homomorphism attribute further comprises the following steps: before federal learning training, each user divides the private key of the client into t shares by a preset threshold t, and each share is shared with other users in a secret manner, so that the user is tolerant to quit training when data are gathered safely.

As the data security aggregation method based on the multi-homomorphism attribute, the invention further tolerates the user to quit training during the data security aggregation, and comprises the following steps: if the situation that the user quits the training exists in the federal learning training, other users send the received share to the server, and after the share is received to be greater than t, the server recovers the client private key and the client private key of the user quits the training through reconstruction, so that the server obtains global parameters through aggregation and decryption of the received second ciphertext.

Further, the invention also provides a data security aggregation system based on multiple homomorphism attributes, which is used for federal learning privacy protection and comprises the following steps: the system comprises a local user composed of a plurality of clients and an aggregation server connected with each local user, wherein each user generates a client public-private key pair locally, the aggregation server generates a server public-private key pair, each client transmits a respective client public key to the aggregation server, and the aggregation server generates a global public key by fusing the client public keys and transmits the global public key and the server public key to each user;

in the data security aggregation, firstly, each user uses a seed as a privacy parameter, a seed pseudo-random number is used for generating a random mask, privacy input is masked through the random mask, the privacy parameter is encrypted by using a local random generated temporary key and a global public key, and encrypted ciphertext and masked privacy input are sent to an aggregation server; the aggregation server aggregates the received ciphertext of each user according to the addition homomorphism of the encryption algorithm, and feeds the aggregated ciphertext back to the user; then, the user performs conversion calculation on the aggregated ciphertext by using the temporary key, the private key of the user's own client and the public key of the server, and sends a conversion calculation result to the aggregation server, the aggregation server decrypts the conversion result by using the private key of the server to obtain a user seed sum, generates a random mask sum by using the seed sum, and obtains a privacy input sum by aggregating the masked privacy input to obtain global parameters of federal learning by subtracting the random mask sum from the privacy input sum.

The invention has the beneficial effects that:

the user encrypts the seed privacy input through the public key, so that the server decrypts the sum of the privacy inputs by using the private key of the server under the condition that the single privacy input is not known, and worst-case user collusion can be tolerated, namely n-2 user collusion exists; the homomorphic pseudo-random number technology is combined to convert random vector aggregation into seed aggregation, so that calculation cost generated by encrypting the high-dimensional vector by a user and communication cost caused by ciphertext transmission are avoided; further, by introducing secret sharing once, the user can tolerate the user to quit training at any time when participating in aggregation, and the calculation time of the client in the training process is not substantially increased. The scheme has the advantages that the scheme can remarkably improve the aggregation efficiency under the condition that privacy guarantee and user exit tolerance are not reduced, the aggregation time is reduced by about 24 times when the number of users is 500 and the disconnection rate is 30%, and the scheme has good application prospect in the field of federal study of large-scale mobile terminals and Internet of things equipment.

Description of the drawings:

FIG. 1 is a federal learning aggregation illustration in an embodiment;

FIG. 2 is a schematic diagram of a secure aggregate interaction flow in an embodiment.

The specific embodiment is as follows:

the present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.

Aiming at the problems of privacy leakage, resource limitation, frequent user withdrawal and the like in federal learning on a large-scale mobile terminal and Internet of things equipment in the background technology, the embodiment of the invention provides a data security aggregation method based on multi-homomorphic attribute, which is used for federal learning privacy protection and comprises the following steps: firstly, each user generates a client public-private key pair locally, an aggregation server generates a server public-private key pair, each client transmits a respective client public key to the aggregation server, and the aggregation server generates a global public key by fusing the client public keys and transmits the global public key and the server public key to each user; then, each user uses the seed as a privacy parameter, a random mask is generated by using the seed pseudo-random number, privacy input is masked by the random mask, the privacy parameter is encrypted by using a local randomly generated temporary key and a global public key, and the encrypted ciphertext and the masked privacy input are sent to an aggregation server; the aggregation server aggregates the received ciphertext of each user according to the addition homomorphism of the encryption algorithm, and feeds the aggregated ciphertext back to the user; then, the user performs conversion calculation on the aggregated ciphertext by using the temporary key, the private key of the user's own client and the public key of the server, and sends a conversion calculation result to the aggregation server, the aggregation server decrypts the conversion result by using the private key of the server to obtain a user seed sum, generates a random mask sum by using the seed sum, and obtains a privacy input sum by aggregating the masked privacy input to obtain global parameters of federal learning by subtracting the random mask sum from the privacy input sum.

Similar to the SecAgg protocol, in the present example, the participants are divided into two categories: a user and an aggregation server. The basic structure and aggregation flow are shown in FIG. 1, an aggregation server S, which serves as a center to aggregate the user setsIs input by n users.

Table 1 symbol description

User' sHolding a local private data set representing a client, obtaining a local model m by training in the local data set _i Other symbol descriptions are shown in table 1. The scheme aims at that the server cannot acquire a single user u _i Model m of (2) _i Is a global model of aggregation->Meanwhile, robustness on user quit at any time is guaranteed to meet the requirements of cross-equipment federal learning application.

The user encrypts the seed serving as the privacy parameter through the public key, so that the server can decrypt the seed sum by using the private key of the server without knowing the single seed, and can tolerate worst-case user collusion, namely n-2 user collusion exists under the condition that user exit is not considered.

Threat models consider a semi-honest federal learning environment, i.e., the aggregation server and user may be semi-honest, trained to follow rules, but curious about the honest user's local model, with the intention of inferring the client's local privacy data by observing the local model. The scheme aims at maintaining confidentiality of single model update of each user, and all participants only allow learning of the sum of privacy input of all users.

In addition to the semi-honest setting, consideration needs to be given to the situation of collusion between the aggregation server and the users, namely, collusion between the server and part of the users is possible, the server and other collusion users share internal states and all received/transmitted messages, and the success rate of stealing privacy data of other honest users is intended to be increased. For the case of collusion, the scheme aims at guaranteeing confidentiality of honest client models, namely that in the case of collusion of less than T users with a server, the users cannot learn local model information of any honest users. Irrespective ofBecause in this case the colluded users and the server can calculate model information of the honest users from the aggregated results by doing the difference. In addition, for setting the T value, under the cross-host federal learning environment, the problem of user disconnection is not considered, and the scheme can tolerate the maximum collusion quantity as followsUnder a cross-device federal learning environment, namely considering the situation that users possibly exit training at any time, the scheme can tolerate that the maximum collusion user quantity depends on a secret sharing threshold value, and the maximum is +.>Reducing tolerance in a cross-device federal learning environment is reasonable because The method has the advantages of numerous users, wide geographical distribution and greatly reduced collusion probability of large-scale users.

In order to protect the safety of the local model of the user, various researches propose to use modes such as differential privacy, homomorphic encryption and the like, but excessive noise in the differential privacy can reduce the utility of the model, the noise is too little to ensure the safety of the model, the homomorphic encryption can well protect the model, but expensive calculation cost and communication cost are required when facing to a high-dimensional model. The most intuitive and simplest mode for protecting the model is to protect the model by using a one-time pad (OTP), the model is covered by using a generated random mask as a one-time key, the covered model parameters are distributed in disorder, confidentiality can be completely guaranteed under the condition of not revealing the key based on the security of the one-time key, thus privacy attack on the local model is effectively prevented, and only one addition operation is involved in protecting the model, compared with other protection modes, the efficiency is very high. The SecAgg uses a pairwise mask protection model to ensure that the masks between all users can cancel each other after aggregation, but this requires a large number of communications negotiations to negotiate the pairwise mask, and the aggregation time increases significantly as dropped users increase.

In the embodiment of the present application, an independent mask mode may be used to protect the model, so as to avoid communication overhead caused by mask negotiation with other users, and the basic idea is that the user generates a random mask model locally to protect local model parameters.

Suppose user u _i Model m _i By randomly selected seeds s _i Independent random mask generated by pseudo-random generator is r _i And masking the local model parameters with a random mask:

thereafter, all users willAnd sending the message to a server to participate in aggregation.

The aggregation server receives the user modelThereafter, assume that the server already knows the sum of the user's local random masks +.>The syndication server can aggregate the global model m _G ：

In order to obtain the sum R of the random masks, the most straightforward method is user u _i Masking the local random r _i Is sent to the server over a secure channel, but the server is semi-honest in the threat model considered herein, it can pass throughThe original local model m of the user can be calculated _i This would present a privacy exposure risk. To solve this problem, it is necessary to ensure that the server is unaware of the individual users r _i R is calculated in the case of (2).

EC-ElGamal is an implementation of the ElGamal encryption algorithm on Elliptic Curves (EC), which have additive homomorphism, and EC-ElGamal security is based on the discrete logarithm problem on EC (ECDLP). Under the assumption of Decisional Diffie-Hellman (DDH), it is IND-CPA safe. The encryption scheme with additive homomorphism is described as follows:

Initialization of

1. The prime number p is selected, the finite field GF (p) is determined, and p is disclosed.

2. The elements a, b epsilon GF (p) are selected, an elliptic curve over GF (p) is determined, and the additive switching group E is determined, and a and b are disclosed.

3. A large prime number n is selected, a base point G (x, y) with the order of n is determined, and n and G (x, y) are disclosed.

5. Randomly selecting an integer d (0 < d < n) as a secret key sk, and then the public key is:

pk＝skG

encryption: (C) ₁ ，C ₂ )←ECE.EncP _m ，k _i ，pk：

Assume that message P is to be sent _m The user randomly selects the parameter k e Z _n Calculated according to the following calculation (C ₁ ，C ₂ )：

C ₁ ＝k _i G，C ₂ ＝P _n G+k _i pk

Decryption: p (P) _m ←ECE.Dec(C ₂ ，C ₁ ，sk)：

For decryption (C ₁ ，C ₂ ) User calculation m=c ₂ -skC ₁ ：

P _m ＝P _m G+k _i pk-skk _i G

＝P _m G+k _i (skG)-k _i skG

Addition homomorphism: EC-ElGamal encryption has additive homomorphism, assuming plaintextCiphertext is respectivelyAdding the ciphertext:

decryption by sk can be obtainedTo obtain plaintext->This requires solving the discrete logarithm problem on elliptic curves, which can be achieved by usingThe national remainder theorem (Chinese remainder theorem, CRT) converts an ECDLP problem that is difficult to solve (i.e., a large integer) to solve several smaller ECDLP problems, improving computational efficiency. In the embodiment, multiparty random mask privacy protection aggregation using random masks is realized based on EC-ElGamal addition homomorphic and homomorphic pseudorandom numbers.

To be able to calculate R correctly without exposing R to the server _i Multiple key privacy aggregation may be implemented based on homomorphism of homomorphism EC-ElGamal and ciphertext correlation, an algorithm that allows a user to pass through a public key and a random temporary key k _i The privacy input is encrypted, the final server can obtain the sum of the privacy input through decryption of the private key of the final server through ciphertext calculation, and the algorithm does not need to share a secret key among users, so that the security risk possibly brought by secret key sharing is avoided. The basic idea is that through one-time interaction between the user and the server, the encryption key of the privacy input and the ciphertext is converted into the public key of the server S in the ciphertext state, so that the server S can decrypt R by using the private key, and the algorithm can ensure thatAnd confidentiality of privacy input of honest and non-colluded users under the condition of participation of the individual honest and honest colluded users.

Let v _i Is user u _i As shown in algorithm 1, first, each user locally generates a pair of public and private keysThe server generates a pair of public and private keys sk _s ，pk _s The user then sends the public key to the server, which computes the global public key PK:

the server will PK and PK _s To each user.

User u _i Randomly generating a large integer k _i As temporary key, global public key PK and temporary key k are then used _i For privacy input v _i Encryption is carried out:

thereafter, the user will ciphertextAnd sending the message to an aggregation server.

The server S calculates ciphertext according to the addition homomorphism of EC-ElGamal:

the server S calculates the result C _1，s ，C _2，s Returns to user, user u _i And (3) locally calculating:

then, the calculation resultIs sent to the server, which receives +.>And then calculating:

the calculation result isWherein->The aggregation server S may use the private key sk _s Decryption get->Want to get +.>To solve the discrete logarithm problem on elliptic curves, at v _i The element value of the medium element is smaller and v _i In the case of lower dimensionality, the server is computationally feasible, but when v _i The element value of the medium element is larger and v _i High dimensions can lead to inefficient polymerization.

Algorithm 1. Multiple key privacy aggregation algorithm.

Input: user u _i Public-private key pair sk _i ，pk _i Privacy input v _i Aggregation server S public-private key pair sk _s ，pk _s

And (3) outputting: sum of privacy inputs V

Initializing:

1. user u _i E U will public key pk _i Sending to an aggregation server S;

2. the server receives pk _i After which calculationPK and PK _s Sending to each user;

the first stage:

3. user u _i In generating a random temporary key k _i Encrypting privacy input v with PK _i ：And will result->Sending the data to a server;

4. the server S receives user u _i Submitted toAnd (3) calculating: (C) _1，s ，C _2，s )；

5. The calculation result (C _1，s ，C _2，s ) Sending to a user;

and a second stage:

6. user u _i Receiving the server calculation result (C _1，s ，C _2，s ) Then, calculating:

7. sending the calculation result to an aggregation server;

8. the aggregation server S receives the user u _i Submitted toAnd then calculating: (C' _1，s ，C′ _2，s )；

9. The aggregation server S uses the private key sk _s Decryption (C' _1，s ，C′ _2，s ) The method comprises the following steps: v=ece.dec (C' _1，s ，C′ _2，s ，sk _s )

Through the algorithm 1, the user ensures that the server calculates the sum of the privacy inputs under the condition that the privacy inputs of the user are not exposed, and the server can obtain the sum of the privacy inputs under the condition that less than N-1 users collude with the server, and the privacy inputs of any party are not revealed.

Based on algorithm 1, the user may mask the random r _i As privacy input v _i Realizing the r without exposing any user _i In the case of (2) causing the server to calculate R, thereby correctly calculating m _G . However, in practical applications this faces two problems: 1) The user needs to mask r in the training process _i Encryption is performed, the dimension of the encryption is the same as that of the model, expensive calculation overhead and communication overhead are needed when facing to a high-dimensional complex deep learning model, most clients are mobile devices in a cross-device federal learning scene, and hardware may not meet training requirements. 2) The server obtains the aggregate hidden Private input V, the discrete logarithm problem on elliptic curve is solved, when V _i Too high dimensions and too large element values reduce the polymerization efficiency.

Further, before each user encrypts the respective privacy input using the local randomly generated temporary key and the global public key, the method further comprises: a random mask is generated using a seed pseudorandom number generator that employs a pseudorandom number function having homomorphic properties, and the privacy input is encrypted using the random mask.

The pseudo-random function (Pseudorandom Function, PRF) is a keyed function F:wherein +.>The function F (k,) is indistinguishable from a uniform random function given a black box access. Homomorphic pseudo-random function (Homomorphic Pseudorandom Function, HPRF) is a pseudo-random number function with homomorphic properties, for any key +.>And any input +.>The method comprises the following steps:the pseudo-random generator (Pseudorandom Generator, PRG) is a computationally efficient function G _： />For->S and ∈of (a) are uniform>The distribution of r, G(s) is computationally indistinguishable from the distribution of r. Also, if for any G (s ₁ X) and G(s) ₂ X) hasThen the PRG function G is called: />Is a seed homomorphic pseudo-random generator (Seed Homomorphic Pseudorandom Generator, SHPRG).

A simple secure key homomorphic PRF may be constructed in the random predictor model. Let G be a finite cyclic group of prime order q, H:is a hash function modeled as a random predictor, then the function F can be constructed:the method comprises the following steps:

F(k，x)＝k·H(x)

f (k, x) has additive homomorphism:

F(k ₁ ，x)+F(k ₂ ，x)＝F(k ₁ +k ₂ ，x)

in the stochastic predictive model, F is a safe PRF, assuming DDH holds in G.

A seed homomorphic pseudo-random generator SHPRG can be constructed based on the HPRF described above and can be expressed as:

SHPRG(s，x)＝s·H(x)

the method meets the following conditions:

SHPRG(s ₁ +s ₂ ，x)

＝SHPRG(s ₁ ，x)+SHPRG(s ₂ ，x)

to achieve efficient federal learning security aggregation, consider r _i The property of the pseudo random number of (C) can be used to increase the aggregation efficiency by changing the generation mode of the random maskSeed homomorphic pseudorandom number generator to generate random mask, thereby masking r _i Aggregation to seed s _i And (3) the aggregation of the high-dimensional random vector is avoided, so that the calculation and communication overhead caused by encrypting the high-dimensional random vector by a user is avoided, and the number of times that the server needs to solve the ECDLP problem is reduced to 1. On the premise of not affecting the safety, CRT can be introduced to convert the problem of ECDLP which is difficult to solve into the problem of ECDLP which is several simple to solve, so as to improve the polymerization efficiency.

Specifically, the generation of the random mask using the seed pseudorandom number generator may be designed to include the following:

When all users share one x, for anySatisfying the additive homomorphism. Considering that when the generated random mask is unique, the same random mask is used for masking all model parameters, under the condition that the range of the model parameters is determined, an adversary can easily reconstruct the original model parameters through enumeration attack. Therefore, in the case of seed determination, the shared input sequence x can be transformed to generate different masks to mask different model parameters, in order to avoid additional communication overhead caused by negotiating x between users, the shared x sequence can be constructed by first employing training round number and model parameter index stitching, and then generating random mask r by using SHPRG based on the x sequence _i 。

User' sRandomly selecting a seed at home>Generating a mask r using SHPRG _i ：

r _i ＝SHPRG(s _i )

Wherein SHPRG(s) _i )＝[F(s _i ，b||τ)]B is more than 0 and less than L, b is the index of each model parameter, L is the user local model m _i τ is the current training number of rounds and F is:

F(s _i ，b||τ)＝s _i ·H(b||τ)

h is a hash function, based on the properties of SHPRG:

algorithm 1 is used for seed s _i Is calculated by the final serverAnd then, calculating the sum R of the random masks, and avoiding the calculation and communication overhead caused by encrypting the high-dimensional random masks.

Specifically, the random mask is generated by using the seed pseudo-random number generator, and the generation process further comprises: the seed selected randomly by the user locally is represented by a plurality of congruence numbers, a random mask is generated based on a seed pseudo-random number generator, and then CRT encoding is carried out on the seed so as to convert the random mask aggregation into seed aggregation in data aggregation.

To reduce server computation time, a CRT may be introduced to take a large integer s _i Using a plurality of congruence numbers a thereof _i The representation is:

s _i ≡a ₁ mod p ₁

s _i ≡a ₂ mod p ₂

…

s _i ≡a _n mod p _n

wherein p is ₂ ，...，p _n Two-two mutual element and disclosing, a _i ＜＜s _i . At a is known as ₁ ，a ₂ ，...，a _n The unique solution s of the equation can be calculated _i ：

Wherein, the liquid crystal display device comprises a liquid crystal display device,furthermore, based on the additive homomorphism of CRT, two seeds s are known _i Sum s _j The congruent number list of a is respectively ₁ ，a ₂ And b ₁ ，b ₂ When the method is used, the following steps are included:

therefore, user u _i By s _i Generating r _i Thereafter, seeds s _i CRT encoding is performed:then, the +.1 is respectively polymerized>Finally, the server decodes according to the aggregation result to obtain +. >

On the basis of algorithm 1, the SHPRG can be utilized to mask the high dimension r _i Is transformed into seeds s _i 1= |s _i |＜＜|r _i And the computing cost and communication cost caused by the encryption computation of the user on the high-dimensional data are avoided, then the CRT is introduced to further reduce the computing time of the server, the overall aggregation efficiency is improved, and the efficient random mask security aggregation is realized. In practical application, in order to further reduce the calculation cost in the training process, the userThe mask r can be calculated offline _i 、And +.>

The scheme can realize efficient random mask security aggregation so as to realize efficient privacy protection federal learning, but algorithm 1 cannot tolerate user disconnection. To realize seeds s _i The correct aggregation of (2) needs the user to carry out after receiving the ciphertext aggregation result, and if the user is disconnected and cannot calculate, the aggregation server S cannot correctly calculate C' _1，s ，C′ _2，s Eventually leading to incorrect polymerization.

However, in a cross-device federal learning application scenario, a client participating in training is usually some devices with limited resources, such as a mobile terminal, an internet of things device, and the like, may face the problems of unstable network state, energy constraint, and the like, may quit training at any time in the training process, and is common in an actual application scenario, in order to meet the application scenario, a security aggregation scheme needs to be able to tolerate user disconnection, that is, the correctness of an aggregation result can be still ensured under the condition that a certain number of users are disconnected, and for this reason, in the embodiment of the present invention, the elasticity of quitting training for the user can be realized by adding an additional mechanism.

Specifically, before federal learning training, each user divides its client private key into t shares by a preset threshold t, and each share is shared with other users in a secret manner, so as to realize that the user is tolerant to quit training when data are gathered safely. Wherein, tolerating the user to withdraw from training when data security gathers, include: if the situation that the user quits the training exists in the federal learning training, other users send the received share to the server, and after the share is received to be greater than t, the server recovers the client private key and the client private key of the user quits the training through reconstruction, so that the server obtains global parameters through aggregation and decryption of the received second ciphertext.

In (t, n) -Secret Sharing (SS), it is assumed that a Secret s is divided into n parts, each of which is a sub-Secret and is held by one user, the sub-Secret held by t or more users can reconstruct the Secret s, and the sub-Secret held by less than t users cannot reconstruct the Secret and cannot obtain any information of the Secret s. It performs (t, n) -secret sharing on a secret s, and the specific operations are as follows:

initializing: secret holder u _s From Z _P T-1 positive integers a are selected randomly ₁ ，...，a _t-1 Simultaneously let a ₀ =s. Selecting a polynomial with the highest degree of t-1 degree based on the values: f (x) _s ＝a ₀ +a ₁ x+a ₂ x ² +a ₃ x ³ +…+a _t-1 x ^t-1 modP。

Secret sharing: the secret holder randomly selects n points x on the polynomial f (x) ₁ ，x ₂ ，...，x _n Then, the function value f (x) corresponding to each point is calculated ₁ )，f(x ₂ )，...，f(x _n ) Finally, the (x ₁ ，f(x ₁ ))，(x ₂ ，f(x ₂ ))，...，(x _n ，f(x _n ) Sharing to other users as child secrets, denoted (x) _i ，f(x _i ) _s ) _i∈n ←ASS.share(s，t，n)。

Secret reconstruction: given any not less than t sub-secrets (x _i ，f(x _i ) _s ) The polynomial f (x) can be determined by lagrangian interpolation _s Wherein when x=0, a constant term a can be calculated ₀ That is, secret s is obtained, formally expressed as s+.ASS.recon ((x) _i ，f(x _i ) _s ) _i∈n ，t)。

Identity: shamir secret sharing has additive homomorphism, i.e. for a given s ₁ ，s ₂ The (t, n) secret sharing is performed separately, and secret shares are distributed to other users, so that the holder of any sub-secret can calculate locallythe result of the addition of the t-1 th order polynomials is still t-1 th order given any not less than t sub-secrets and +.>Calculate->When x=0, calculate

In the embodiment of the scheme, the Shamir threshold secret sharing is introduced in the initialization stage before training, so that the tolerance of exiting training at any time for a user is realized, the robustness of safety aggregation is enhanced, the cross-equipment federal learning application scene is met, and meanwhile, the calculation time of a server is shortened by utilizing the addition homomorphism of the cross-equipment federal learning application scene.

Before the federal learning training starts, user u _i Private key sk is thresholded at a threshold t _i Secret sharing is carried out: (j，sk _i，j ) Representing user u _i Sharing to user u _j Is (1) sk _i Is used for the secret share of (a).

The list of users surviving the first phase is given by the assumption that users may exit at any time at each stage of the aggregationThey all succeeded in sending +.>And->The list of users surviving the second stage isThey all send the server a success +.>

Then the server calculates (C _1，s ，C _2，s ) And (C' _1，s ，C′ _2，s ) Respectively isAnd->The server cannot obtain the +.>The key conversion cannot be performed so that the key conversion can be correctly calculatedThe calculation formula can be revised, and the server calculates C' _1，s ，C′ _2，s ：

Finally, the server may calculate the ciphertextThe server uses the private key sk _s Decryption (C' _1，s ，C′ _2，s ) Obtain->

Based on the above procedure, the server needs to know that the correct aggregation is desiredTherefore, if the user is +_ at any stage in the aggregation process>Exit training, other users u _i Will->Sent to the server S, which can reconstruct after receiving more than t sub-secret shares andbased on Shamir secret sharing addition homomorphism, the server only needs to reconstruct once to recover the sum of private keys of the offline users, so that the calculation time cannot be increased obviously along with the increase of the number of the offline users.

Furthermore, the individual user sk is known at the server _i In (2) the security is not affected because user u _i Seed s of (2) _i Encrypted by PK unless the server knows all other users sk _i . After increasing the tolerance to user exit training, the number of collusion users that can be tolerated isIn addition, some additional practical assumptions need to be made, namely that there is one honest user that does not drop in multiple rounds of aggregation.

In summary, in the most preferred embodiment of the present embodiment, the privacy protection model aggregation is implemented by using the HMASA-based efficient security aggregation scheme, and the interaction process thereof may be specifically described as follows: as shown in fig. 2, the scheme operates between an aggregation server S and n users, and the users are integratedRepresenting user sets at different stagesDistinguishing by numerical subscripts, e.g +.>For ease of expression, each client may be provided with a unique identifier, e.g. 1,2, n, user parameters are divided by subscript s, server parameters are represented by subscript s, use [ use ]]Symbology multiple EC-E1Gamal ciphertext sets, e.g Representing the homonymy->An encrypted ciphertext set. Each user has a privacy vector m of length L _i As input, each element ranges from Z _q Is a kind of medium. The aggregation server is responsible for message forwarding between clients and necessary computing services, as with SecAgg, assuming that communication between clients and servers takes place over a secure channel. In the process of user participation aggregation, training is possibly exited at any time, the scheme ensures that as long as t users survive, namely, a server receives messages sent by more than t users within waiting time, the result can be correctly aggregated, if enough messages are not received, aggregation is immediately terminated, and based on the (t, n) threshold secret sharing property, collusion between t-1 users and the server can be tolerated, in addition, if at least one of honest and non-collusion users in the users is kept online in one complete calculation task, if each honest user is dropped once, other honest users can contribute to sharing own sub secret share of the dropped users, honest user private keys ski can be recovered in sequence, and the final server and the collusion users can calculate out the honest>Obtaining seeds si of honest users through decryption to infer privacy input of users, resulting inIn addition, privacy disclosure may specify that a user who is offline may submit his own data in the subsequent process, but that his data is not participating in the present round of aggregation, i.e.) >But may participate in the next round of aggregation assuming that the server remains online at all times.

Further, based on the above method, the embodiment of the present invention further provides a data security aggregation system based on multiple homomorphic attributes, which is used for federal learning privacy protection, and includes: the system comprises a local user composed of a plurality of clients and an aggregation server connected with each local user, wherein each user generates a client public-private key pair locally, the aggregation server generates a server public-private key pair, each client transmits a respective client public key to the aggregation server, and the aggregation server generates a global public key by fusing the client public keys and transmits the global public key and the server public key to each user;

The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The elements and method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or a combination thereof, and the elements and steps of the examples have been generally described in terms of functionality in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different methods for each particular application, but such implementation is not considered to be beyond the scope of the present invention.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the above methods may be performed by a program that instructs associated hardware, and that the program may be stored on a computer readable storage medium, such as: read-only memory, magnetic or optical disk, etc. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits, and accordingly, each module/unit in the above embodiments may be implemented in hardware or may be implemented in a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A data security aggregation method based on multiple homomorphism attributes is used for federal learning privacy protection, and is characterized by comprising the following steps:

each user takes the seeds as privacy parameters, generates a random mask by using the seed pseudo-random numbers, masks privacy input by using the random mask, encrypts the privacy parameters by using a local randomly generated temporary key and a global public key, and sends encrypted ciphertext and masked privacy input to an aggregation server; the aggregation server aggregates the received ciphertext of each user according to the addition homomorphism of the encryption algorithm, and feeds the aggregated ciphertext back to the user;

2. The method of claim 1, wherein the seed pseudorandom number generator uses a pseudorandom number function having homomorphic properties.

3. The method of claim 1 or 2, wherein generating a random mask using a seed pseudorandom number generator comprises:

4. A method of secure aggregation of data based on multiple homomorphism attributes according to claim 3, wherein the generation of the random mask using a seed pseudo-random number generator is represented by: _i ＝SHPRG(s _i ) Wherein HPRG(s) _i )＝[F _i ,b||τ)]B is more than 0 and less than L, b is the index of each model parameter, L is the user local model m _i Is the current training round number, s _i For user u _i Randomly selected seeds at home, F _i ,b||τ)＝s _i H (b||τ), H () is a hash function.

5. The method of claim 2, wherein the generating a random mask using a seed pseudorandom number generator further comprises: the seed selected randomly by the user locally is represented by a plurality of congruence numbers, a random mask is generated based on a seed pseudo-random number generator, and then CRT coding is carried out on the seed so as to convert the random mask aggregation into seed aggregation in aggregation.

6. The method for secure aggregation of data based on multiple homomorphism attributes of claim 1, further comprising: before federal learning training, each user divides the private key of the client into t shares by a preset threshold t, and each share is shared with other users in a secret manner, so that the user is tolerant to quit training when data are gathered safely.

7. The method for data security aggregation based on multiple homomorphism attributes of claim 6, wherein tolerating user exit training when data security aggregation comprises: if the situation that the user quits the training exists in the federal learning training, other users send the received share to the server, and after the share is received to be greater than t, the server recovers the client private key and the client private key of the user quits the training through reconstruction, so that the server obtains global parameters through aggregation and decryption of the received second ciphertext.

8. A data security aggregation system based on multiple homomorphism attributes for federal learning privacy protection, comprising: the system comprises a local user composed of a plurality of clients and an aggregation server connected with each local user, wherein each user generates a client public-private key pair locally, the aggregation server generates a server public-private key pair, each client transmits a respective client public key to the aggregation server, and the aggregation server generates a global public key by fusing the client public keys and transmits the global public key and the server public key to each user;

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1-7.

10. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the method of any one of claims 1-7 when executing the computer program.