CN115277015A

CN115277015A - Asynchronous federal learning privacy protection method, system, medium, equipment and terminal

Info

Publication number: CN115277015A
Application number: CN202210835092.XA
Authority: CN
Inventors: 张应辉; 曹大禹; 刘伟; 韩刚; 郑东
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2022-07-16
Filing date: 2022-07-16
Publication date: 2022-11-01

Abstract

The invention belongs to the technical field of machine learning safety, and discloses an asynchronous federal learning privacy protection method, system, medium, equipment and terminal, which initializes the system and selects random parameters, and discloses series parameters and signature public keys; a user generates a signature and a public and private key pair and sends the signature and the public and private key pair to a server, and the server packs and broadcasts user identity information and a public key to other users; a user generates a sub-secret corresponding to a random parameter and a one-time session key after receiving data; generating a shared key among users and encrypting user information; the information to be encrypted of the user is weighted local information of the user; the user sends the data after the mask is added to the server, and the server aggregates the data to obtain an aggregation result; and the server divides the aggregation result by the total number of the samples held by the users who finally participate in training to obtain the global model. The asynchronous federated learning privacy protection method protects the privacy of the user in federated learning; the time of learning and training is reduced, and resources are saved.

Description

Asynchronous federal learning privacy protection method, system, medium, equipment and terminal

Technical Field

The invention belongs to the technical field of machine learning safety, and particularly relates to an asynchronous federal learning privacy protection method, system, medium, equipment and terminal.

Background

Federal machine learning, also known as federal learning, is a machine learning framework, and can effectively help a plurality of organizations to perform data use and machine learning modeling on the premise of guaranteeing user privacy, data safety and government regulation requirements. Under the environment of explosive increase of data volume, federal learning is used as a distributed machine learning paradigm, the problem of data island can be effectively solved, and participators can jointly model on the basis of not sharing data, so that the data island is technically broken.

At present, data security in federal learning is mainly realized through a security aggregation protocol, and although the data security is ensured, the conventional federal learning has low concurrency and can cause the problem of low learning efficiency. Furthermore, since the local model update speed of users participating in federal learning is different, and in the traditional federal learning training, each round of aggregation must wait for all the participating users to complete, such model training is difficult and inefficient for syncing federals, which can cause the latter problem, and also can affect the learning efficiency. Therefore, asynchronous federal learning is proposed, however, the characteristics of asynchronous federal learning make it incompatible with security aggregation protocols, so that the problem that privacy and confidentiality of data cannot be guaranteed in asynchronous federal learning still exists at present.

Through the above analysis, the problems and defects of the prior art are as follows: synchronous federal learning has the problems of low concurrency and late concurrency, and the concurrency and training efficiency of federal learning need to be improved through asynchronous learning. The existing scheme tries to improve the privacy protection technology in asynchronous federal learning, but the privacy protection and training efficiency in asynchronous federal learning cannot be well balanced. How to realize privacy protection under the efficient training of asynchronous federal learning and not to influence the effectiveness of a training result is a problem in the federal learning at the present stage.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an asynchronous federal learning privacy protection method, system, medium, equipment and terminal, and particularly relates to an asynchronous federal learning privacy protection method, system, medium, equipment and terminal based on safe multi-party computation.

In the training of asynchronous federal learning, the invention introduces a weight system for each user participating in the training to improve the training efficiency, and simultaneously provides a verification scheme based on a homomorphic hash function to ensure the safety of the weight system under the active attack adversary.

The invention is realized in such a way, and provides an asynchronous federal learning privacy protection method, which comprises the following steps: the invention introduces a weight system in an asynchronous federated learning framework to improve the learning efficiency, and simultaneously provides a verification scheme based on a homomorphic hash function to ensure that the security of the weight system initializes the system and selects random parameters, and discloses a series of parameters and a signature public key; the users participating in training receive the signature private key to generate a signature and a public and private key pair; each user packs the public key and the signature and sends the public key and the signature to the server, and the server packs and broadcasts the user identity information and the public key to other users after checking; after receiving the data, the user generates a sub-secret corresponding to the random parameter and a one-time session key and sends the sub-secret and the one-time session key to the server for broadcasting;

generating a shared key among users and encrypting user information; the information to be encrypted of each user is weighted local information of the user, and the weight is the number of samples held by the user and the staleness attenuation coefficient of the user during training; each user sends the data after the mask is added to a server, and the server aggregates the data to obtain an aggregation result; and the server divides the aggregation result by the total number of the samples held by the users who finally participate in training to obtain the global model.

Further, the asynchronous federated learning privacy protection method further comprises the following steps:

a trusted third party initializes the system, selects random parameters in a limited domain and discloses a series of parameters; users participating in training respectively generate two sets of public and private key pairs and signatures by using the parameters; the communication between the users and the server passes through a safety certification channel, each user packs the public key and the signature and sends the public key and the signature to the server, and the server packs and broadcasts the user identity information, the public key and the signature to other users after verification; after each user receives the data, the signature is verified, and random parameters are selected from a finite field; generating corresponding sub-secrets according to a secret sharing algorithm, then generating sub-secrets corresponding to a private key and a one-time session key corresponding to users, and encrypting and sending the sub-secrets and identity information to a server by the session key; the server broadcasts the ciphertext to the corresponding user;

generating a shared key between users, and encrypting the information of the users by masking the random number selected previously and the shared key through a pseudo-random number generator; the information which each user needs to encrypt is the local information of the user multiplied by the number of held samples and the decay coefficient of the staleness degree when the user participates in training; each user generates a verification vector and sends the data after being added with the mask to a server, and the server broadcasts a user list for participating in training; after receiving the list of all users, the user calculates the signature and sends the signature to the server;

the server broadcasts the signature to a corresponding user; each user receives the signature and then verifies the signature, a ciphertext sent by the server is decrypted to obtain a sub-secret, the sub-secret of the shared key of the disconnected user and the sub-secret of the random number of the disconnected user are sent to the server, the server recovers the original secret corresponding to the sub-secret after receiving the sub-secret, all mask data are added and subtracted by the secret passing through the pseudo-random number generator, and finally an aggregation result is obtained; and the server verifies that the sample number of the user is not false according to the aggregation result.

Further, the asynchronous federated learning privacy protection method comprises the following steps:

step one, a trusted third party initializes the system and is used for generating all system parameters in the scheme. A trusted third party selects random parameters in a limited domain and discloses a series of parameters and a signature public key of a user; each user generates a secret key, the users participating in training receive a signature private key from a trusted third party to generate a signature, public and private key pairs are generated by public parameters, and the public and private key pairs are packaged and sent to a server;

and step two, the user generates a cipher text which is used for generating keys required for constructing the mask among the users in the scheme, and the keys are sent to all the users in a secret mode. Each user selects random parameters from a finite field, generates a sub-secret corresponding to the random parameters and a sub-secret corresponding to a private key according to a secret sharing algorithm, encrypts the sub-secrets and identity information by using a one-time session key, and sends the encrypted sub-secrets and identity information to a server as a ciphertext;

and step three, generating local data with coefficients and masks by the users, and encrypting the local weighted data of each user in the scheme so as to ensure the safety of the private data of the users. Each user masks the random number selected previously and a shared key generated between the user and other users, and the mask is added to the local data with coefficients of the user for encryption; generating a verification vector, and sending the data added with the mask to a server;

and step four, the server decodes and aggregates the data, and the data is used for safely aggregating data at the server side in the scheme, namely the final output result of the scheme. Each user decrypts the ciphertext sent by the server to obtain the sub-secret, and the server recovers the original secret corresponding to the sub-secret and then recovers the mask code according to whether other users are disconnected to send the corresponding sub-secret, so as to obtain an aggregation result; the server generates a verification vector and verifies that the number of samples of the user is not false according to the aggregation result.

(1) The trusted third party initializes and gives a security parameter k, resulting in (q, G, H), where q is a prime number, G is a group of order q, G is a generator of G, H: {0,1}^*→{0，1}^kIs a hash function, in which Z_qRandomly choosing x as the signing key d^SKCalculating d^PK＝(g^xmod q, g, q) as a signature verification key, where Z_qRepresenting a finite field, simultaneously giving a threshold value t for secret recovery and the number n of users participating in training, and issuing a public parameter GPK = (G, q, G, H, t, n, d)^PK(δ, ρ)), where (δ, ρ) is ∈ Z_qThe key is a secret key in a homomorphic hash function; for 1. Ltoreq. A. Ltoreq.n identity information is_aThe user of (2) receiving a signing key issued by a trusted third party

Signature verification key published with other users b

User generated key, user a in middle Z_qRandomly selecting two different x₁And x₂Two pairs of public and private keys are generated, and one pair of public and private keys is

Wherein

Similarly, another pair of public and private keys is

Wherein

The former of the two pairs of keys is used for authentication encryption, and the latter is used for generating a mask; user a obtains signing key through trusted third party

Signature generation for messages

Wherein k ∈ Zq is the random number selected by user a(ii) a Will be provided with

Packaging and sending to a server; the server only judges whether the number of users sending data is larger than a threshold value t or not, and records the current user as U₁If and only if | U₁Executing the next step when | ≧ t is established; server packaging

Broadcasting to other users b;

(2) User generates cipher text, user a receives message of other user b broadcast by server, uses verification key of b

Authentication

If it is true, and selecting a random number beta in a finite field_aGenerating a secret beta for the other user b based on a threshold value t_aAnd

secret share beta of_a，bAnd

wherein b ∈ U₁(ii) a User a uses its own private key

Public key published by other users b

Generating one-time session keys

And key_a，b＝key_b，a(ii) a For user aDisposable session key_a，bEncrypting the identity information and the two secret shares to produce a ciphertext message to the other user b

The server only judges whether the number of users sending data is larger than a threshold value t or not, and records the current user as U₂If and only if | U₂Executing the next step when | ≧ t is established; the server broadcasts the ciphertext to the corresponding user; the user generates local data with coefficient and mask, user a receives corresponding cipher text from the server and stores it locally, calculates shared secret key s with other users b_a，bFor generating a mask; the random number beta selected previously_aAnd shared secret s with other users b_a，bObtaining a private mask and a public mask by a pseudo-random number generator; user a sends local data x_aMultiplied by the number n of samples held by the user_aAnd server specified staleness attenuation coefficient

Adding the mask to obtain the output y_aAnd generating a verification vector V according to the parameters of the trusted third party_aWill { y_a，V_aSending the data to a server; the server only judges whether the number of users sending data is larger than a threshold value t or not, and records the current user as U₃(ii) a If and only if | U₃Executing the next step when | ≧ t is true, and converting U₃Broadcasting the user list to all users;

(3) The server decodes and aggregates, the user a receives the user list participating in training, and the list generates a signature by using the signature key

And sending to a server; the server receives the signature and records the current user set as U₄Packing the identity information and the signature and sending the identity information and the signature to all users; user a receives the message and then verifies signatures sigma of other users'_b，b∈U₄Judging that the user list is not tampered; using one-time session key_a，bDecryption clothesCiphertext c previously sent by the Server_b，aObtaining identity information and two groups of secret shares, verifying the identity and confirming that the secret shares are not false, and packaging and sending the secret shares to a server; the server records the current user set as U₅Recovering the private mask and the public mask of the disconnected user through the secret share sent by the user; the server aggregates the output of the users and then subtracts the private mask and the public mask of the disconnected users to obtain a weighted aggregation result z, and generates a verification vector K according to the aggregation result_aZ verifies that the user is not false.

Further, in the step (2), the user a finally generates an output y_aAnd a verification vector V_aThe method comprises the following steps:

1) The user a will select the random number beta previously_aAnd shared secret s with other users b_a，bObtaining a private mask PRG (beta) by a pseudo-random number generator PRG_a) And a common mask ∑ PRG(s)_a，b)；

2) User a sends local data x_aMultiplied by the number n of samples held by the user_aAnd a server specified staleness attenuation factor

Wherein t-tau represents the time difference of the user a participating in the polymerization, and alpha belongs to (0, 1);

3) User a output

Wherein

4) The user a generates a verification vector according to the public parameters (delta, rho) of the trusted third party

Wherein HF is_δ，ρIs a homomorphic hash function, eta is the learning rate of federal learning, ∑ w_aFor user a local data in the current global model w_GSum of the lower gradients.

Further, the server aggregation and authentication in the step (3) includes:

1) After the user confirms that the user list is not fake, the disposable session key is used_a，bDecrypting ciphertext c previously sent by server_b，aTo obtain

Verifying whether a = a '^ b = b' is established, and if so, representing that the ciphertext is sent by other users b;

2) Will be provided with

And { beta ]_b，a|b∈U₃And (5) packaging and sending to a server, wherein U₂\U₃Representing a disconnected user set, U₃Representing a user set which is not disconnected;

3) The server has previously confirmed | U₄If | ≧ t, the remaining users can recover the random number beta of the user without disconnection_aAnd further, the private mask PRG (beta) is restored_a) Where a ∈ U₃(ii) a Meanwhile, the remaining users can recover the private key of the disconnected user

And thus with the public key of the corresponding user b

Recovery of shared secret s by key agreement_a，bAnd further obtains a public mask Σ PRG(s)_a，b) Where a ∈ U₂\U₃，b∈U₃；

4) The server obtains the final output

5) The server generates a verification vector from public parameters (delta, rho) of the trusted third party

And

wherein a ∈ U₃；

6) Server authentication

And if yes, outputting z.

For the offline user, the server cannot recover the random number selected by the user and cannot acquire local data; for the user who is not disconnected, the server cancels the public mask through summation, and cannot acquire local data.

Another object of the present invention is to provide an asynchronous federal learning privacy protection system using the asynchronous federal learning privacy protection method, where the asynchronous federal learning privacy protection system includes:

the key generation module is used for initializing the system by a trusted third party, selecting random parameters in a limited domain and disclosing a series of parameters and a signature public key of a user; the user generates a key, the user participating in training receives a signature private key from a trusted third party to generate a signature, generates a public and private key pair by using public parameters, packs and sends the public and private key pair to the server;

the cipher text generation module is used for generating cipher texts through users, each user selects random parameters from a finite field, sub-secrets corresponding to the random parameters and sub-secrets corresponding to the private keys are generated according to a secret sharing algorithm, the sub-secrets and identity information are encrypted by using a one-time session key and are sent to a server as the cipher text;

the local data encryption module is used for generating local data with coefficients and masks through users, each user masks the random number selected previously and a shared key generated between the users, and the local data with the coefficients of the users are encrypted by adding codes; generating a verification vector, and sending the data added with the mask to a server;

the aggregation verification module is used for decoding and aggregating through the server, decrypting the ciphertext sent by the server by the user to obtain the sub-secret, sending the corresponding sub-secret according to whether other users are disconnected, recovering the original secret corresponding to the sub-secret by the server, and then recovering the mask code to further obtain an aggregation result; the server generates a verification vector and verifies that the number of samples of the user is not false according to the aggregation result.

It is a further object of the present invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the asynchronous federal learned privacy protected method.

It is another object of the present invention to provide a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the asynchronous federated learning privacy protection method.

Another object of the present invention is to provide an information data processing terminal, which is configured to implement the asynchronous federal learning privacy protection system.

In combination with the above technical solutions and the technical problems to be solved, please analyze the advantages and positive effects of the technical solutions to be protected in the present invention from the following aspects:

first, aiming at the technical problems existing in the prior art and the difficulty in solving the problems, the technical problems to be solved by the technical scheme of the present invention are closely combined with results, data and the like in the research and development process, and some creative technical effects are brought after the problems are solved. The specific description is as follows:

the invention provides an asynchronous federal learning privacy protection method based on safe multi-party computation.A trusted third party initializes a system, selects random parameters in a finite field and discloses a series of parameters and a signature public key; users participating in training receive the private signature key to generate a signature, and use the parameters to respectively generate two groups of public and private key pairs and open the number of held samples; the communication between the user and the server passes through a safety certification channel, the user packs the public key and the signature and sends the public key and the signature to the server, and the server packs the user identity information and the public key and broadcasts the user identity information and the public key to other users after verification; after receiving data, a user selects random parameters from a finite field, generates corresponding sub-secrets according to a secret sharing algorithm, then generates sub-secrets corresponding to a private key and a one-time session key corresponding to the user, and then encrypts and sends the sub-secrets and identity information to a server through the session key; the server broadcasts the ciphertext to the corresponding user; generating a shared key among users, and then masking the random number selected previously and the shared key by a pseudo-random number generator to encrypt the information of the users; the information which needs to be encrypted by the user is weighted local information of the user, wherein the weight is the number of samples held by the user and the staleness attenuation coefficient of the user during training; the user sends the data after the mask is added to the server, and the server broadcasts a user list participating in training at the moment; after receiving the list of all users, the user decrypts the ciphertext sent by the server to obtain the sub-secrets, then sends the sub-secrets of the shared secret key of the off-line user and the sub-secrets of the random number of the off-line user to the server, and after receiving the sub-secrets, the server recovers the original secrets corresponding to the sub-secrets, and then adds all the mask data and subtracts the secrets passing through the pseudo-random number generator to finally obtain an aggregation result; and the server divides the aggregation result by the total number of the samples held by the final training users to obtain a global model.

The invention provides an asynchronous federal learning privacy protection method based on safe multi-party computation, which is used for realizing efficient and safe asynchronous federal learning. The user and the server perform operations asynchronously, so that the model for training is more efficiently transmitted in the system, and the training time is reduced. Meanwhile, the introduction of the weight system greatly improves the training accuracy and efficiency of the federal learning. The server and the users execute corresponding operations through the asynchronous coordinator, when a user is slow in local updating due to insufficient computing power, other users do not need to wait for the user to complete the local updating, and other users who complete updating directly take over to complete the safety aggregation, so that the flexibility and the concurrency of federal learning are greatly improved. The security aggregation is realized by a secure multi-party computing technology, the technology uses a secret shared key between users as a mask, the mask is attached to the weighted local privacy data to mask plaintext information, and a server cannot identify the masked information. For a dropped user, the server can recover its public mask but cannot obtain its private mask and cannot obtain its private information. Similarly, for a user who is not disconnected, the server can obtain the private mask of the user, but cannot obtain the public mask, and the public mask can be offset only through aggregation operation, namely, the server can only obtain the result after aggregation, so that an implementable asynchronous federal learning privacy protection scheme with high safety performance is obtained, and the privacy and confidentiality of user data are protected; on the other hand, the invention also improves the federal learning training efficiency, solves the problem of falling of the latter in the traditional federal learning, reduces the training time and saves the resources. The scheme is simple, the practicability is high, and the popularization effect is achieved.

Secondly, considering the technical scheme as a whole or from the perspective of products, the technical effect and advantages of the technical scheme to be protected by the invention are specifically described as follows:

the asynchronous federal learning privacy protection method based on safe multi-party calculation provided by the invention protects the privacy of users in federal learning; the time of learning and training is reduced, and resources are saved.

Third, as an inventive supplementary proof of the claims of the present invention, there are also presented several important aspects:

(1) The technical scheme of the invention fills the technical blank in the industry at home and abroad:

the safe asynchronous federated learning with the weighting system is provided for the first time, is suitable for asynchronous federated learning under a multi-user scene, is particularly suitable for asynchronous federated learning under a multi-edge computing device user scene, can effectively perform federated learning training, and ensures privacy of local data of users in the training. Meanwhile, a verification scheme of the weighting system under the active attack adversary is provided by combining a homomorphic hash function, and the safety of the weighting system is ensured.

(2) The technical scheme of the invention solves the technical problems which are always desired to be solved but are not successfully achieved:

at present, the communication cost is seriously considered to be reduced aiming at federal learning in China, and the thinking for improving the concurrency and the learning efficiency is lacked. In an actual application scenario, user equipment participating in federal learning has differences, and it cannot be guaranteed that all users can complete local model updating at a relatively average speed, and the latter problem is finally caused. Although asynchronous federated learning is proposed to improve the training efficiency, an efficient asynchronous federated learning privacy protection method is not proposed in the existing scheme, and the method can realize efficient safe asynchronous federated learning and is more in line with the actual application scene.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of an asynchronous federated learning privacy protection method provided by an embodiment of the present invention;

FIG. 2 is a communication diagram of an asynchronous security aggregation scheme provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a secure asynchronous federated learning framework provided by an embodiment of the present invention;

FIG. 4 is a comparison diagram of the presence or absence of weights in asynchronous federated learning provided by an embodiment of the present invention;

fig. 5 is a schematic diagram of training times of a random gradient descent method (sgd), a Synchronous federal Learning (syn), and an Asynchronous federal Learning (asyn) under different numbers of training rounds, which are used in conventional federal Learning according to an embodiment of the present invention;

fig. 6 is a schematic diagram after the privacy protection scheme (asyn + secagg, asynchronous fed Learning + Secure Aggregation) of the present invention is added.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to solve the problems in the prior art, the invention provides an asynchronous federal learning privacy protection method, system, medium, device and terminal, and the invention is described in detail with reference to the accompanying drawings.

1. The embodiments are explained. This section is an explanatory embodiment expanding on the claims so as to fully understand how the present invention is embodied by those skilled in the art.

As shown in fig. 1, the asynchronous federated learning privacy protection method provided in the embodiment of the present invention includes the following steps:

s101, a trusted third party initializes a system, selects random parameters in a limited domain, and discloses a series of parameters and a signature public key of a user; the user generates a key, the user participating in training receives a private signature key from a trusted third party to generate a signature, a public and private key pair is generated by using public parameters, and the public and private key pair is packaged and sent to a server;

s102, a user generates a ciphertext, selects random parameters from a finite field, generates a sub-secret corresponding to the random parameters and a sub-secret corresponding to a private key according to a secret sharing algorithm, encrypts the sub-secret and identity information by using a one-time session key, and sends the encrypted sub-secret and identity information to a server as the ciphertext;

s103, a user generates local data with a coefficient and a mask, the user masks the random number selected previously and a shared key generated between the users, and the local data with the coefficient of the user is encrypted by adding codes; generating a verification vector, and sending the data added with the mask to a server;

s104, the server decodes and aggregates, the user decrypts the ciphertext sent by the server to obtain the sub-secret, and the server recovers the original secret corresponding to the sub-secret and then recovers the mask code according to whether other users drop the line to send the corresponding sub-secret, so as to obtain an aggregation result; the server generates a verification vector, and verifies that the number of samples of the user is not false according to the aggregation result.

As a preferred embodiment, as shown in fig. 2 to 3, the asynchronous federated learning privacy protection method provided in the embodiment of the present invention specifically includes the following steps:

(1) Initializing a system:

the trusted third party initializes and gives a security parameter k, resulting in (q, G, H), where q is a prime number, G is a group of order q, G is a generator of G, H: {0,1}^*→{0，1}^kIs a hash function; in Z_qRandomly choosing x as the signing key d^SKCalculating d^PK＝(g^xmod q, g, q) as signature verification key, where Z_qRepresenting a finite field, simultaneously giving a threshold value t for secret recovery and the number n of users participating in training, and issuing a public parameter GPK = (G, q, G, H, t, n, d)^PK(δ, ρ)), where (δ, ρ) is ∈ Z_qThe key is a key in a homomorphic hash function; for the user with the identity information of a being more than or equal to 1 and less than or equal to n and a, receiving a signature key issued by a trusted third party

Signature verification key disclosed by other user b

(2) The user generates a key:

user a is in middle Z_qRandomly selecting two different x₁And x₂Two pairs of public and private keys are generated, and one pair of public and private keys is

Wherein

Similarly, another pair of public and private keys is

Wherein

The former is used for authentication encryption, and the latter is used for generating a mask; user a obtains signing key through trusted third party

Signature generation for messages

Wherein k ∈ Z_qRandom number selected for user a, which will then be

Broadcast to other users;

(3) The user generates a ciphertext:

user a receives the message broadcast by the server and uses the authentication key of other users b

Authentication

If this is true, then a random number β is selected in the finite field_aGenerating secrets for other users b, based on a threshold value t

Secret share beta of_a，bAnd

wherein b ∈ U₁(ii) a User a uses its own private key

Public key published by other users b

Generating one-time session keys

And key_a，b＝key_b，a(ii) a User a uses disposable session key_a，bEncrypting the identity information and the two secret shares to produce a ciphertext message to the other user b

Namely symmetric encryption; the server only judges whether the number of users sending data is larger than a threshold value t or not, and records the current user as U₂If and only if | U₂Executing the next step when | ≧ t is established; the server broadcasts the ciphertext to the corresponding user;

(4) The user generates local data with coefficients plus masks:

user a receives the corresponding cipher text from the server, stores the cipher text in the local, and then calculates the shared secret key s of other users b_a，bTo generate a mask; the random number beta selected previously_aAnd shared secret s with other users b_a，bObtaining a private mask and a public mask by a pseudo-random number generator; user a sends local data x_aMultiplied by the number n of samples held by the user_aAnd server specified staleness attenuation coefficient

Adding the mask to obtain the output y_aThen generates a verification vector V_aWill { y_a，V_aSending the data to a server; the server only judges whether the number of users sending data is larger than a threshold value t or not, and records the current user as U₃If and only if | U₃If | ≧ t is true, then executing the next step, and then taking U₃Broadcasting the user list to all users;

(5) And (3) checking consistency:

user a receives the list of users participating in training, and uses the list to generate signature by using signature key

And sending to the server; the server receives the signature and records the current user set as U₄Packing the identity information and the signature and sending the identity information and the signature to all users;

(6) The server decodes the aggregation:

user a verifies signature sigma 'after receiving message'_b，b∈U₄The verification mode is the same as the step (2), the user list is judged not to be tampered, and then the disposable session key is used_a，bDecrypting ciphertext c previously sent by server_b，aObtaining identity information and two groups of secret shares, and confirming that no fake is made after the identity is verified; packaging and sending the secret share to a server; the server records the current user set as U₅Recovering the private mask and the public mask of the dropped user by the secret share sent by the user; the server aggregates the user's output and subtracts the private mask sumObtaining weighted aggregation result z by public mask of off-line user, and then generating verification vector K according to the aggregation result_aZ, to verify that the user is not false.

Wherein, the output y finally generated by the user a in the step (4) provided by the embodiment of the invention_aAnd a verification vector generated as follows:

(a) The user a will select the random number beta previously_aAnd shared secret s with other users b_a，bObtaining a private mask PRG (beta) by a pseudo-random number generator PRG_a) And a common mask ∑ PRG(s)_a，b)；

(b) User a sends local data x_aMultiplied by the number n of samples held by the user_aAnd server specified staleness attenuation coefficient

(c) User a output

Wherein

(d) User a generates an authentication vector

Wherein HF is_δ，ρFor homomorphic hash functions, η is the learning rate of federal learning, ∑ w_aFor user a local data in the current global model w_GSum of the lower gradients.

The polymerization and verification in step (6) provided by the embodiment of the invention are performed as follows:

(a) After the user confirms that the user list is not fake, the disposable session key is used_a，bDecrypting ciphertext c previously sent by server_b，aTo obtain

At this time, whether a = a '^ b = b' is verified, if yes, the ciphertext is sent by other users b is represented;

(b) Will be provided with

(c) The server has previously acknowledged | U₄If | ≧ t, the remaining users can recover the random number β of the user without disconnection_aAnd further, the private mask PRG (beta) is restored_a) Where a ∈ U₃(ii) a Meanwhile, the remaining users can recover the private key of the disconnected user

And further with the public key corresponding to the other user b

Recovery of shared secret s by key agreement_a，bFurther, a common mask Σ PRG(s) can be obtained_a，b) Where a ∈ U₂\U₃，b∈U₃；

(d) The server obtains the final output

(e) Server-generated authentication vectors

And

wherein a ∈ U₃；

(f) Server authentication

And if yes, outputting z.

For the user who is disconnected, the server cannot recover the random number selected by the user, so that the local data of the user cannot be acquired; for users who are not dropped, the server can only cancel the public mask by summing, so that local data cannot be acquired.

The asynchronous federated learning privacy protection system provided by the embodiment of the invention comprises:

the key generation module is used for initializing the system by a trusted third party, selecting random parameters in a finite field and disclosing a series of parameters and a signature public key of a user; the user generates a key, the user participating in training receives a signature private key from a trusted third party to generate a signature, generates a public and private key pair by using public parameters, packs and sends the public and private key pair to the server;

the cipher text generation module is used for generating a cipher text through a user, selecting random parameters from a limited domain by the user, generating a sub-secret corresponding to the random parameters and a sub-secret corresponding to a private key according to a secret sharing algorithm, encrypting the sub-secrets and identity information by using a one-time session key, and sending the encrypted sub-secrets and identity information to a server as the cipher text;

the local data encryption module is used for generating local data with coefficients and masks through a user, the user masks the random number selected in the past and a shared key generated between the users, and the local data with the coefficients of the user is encrypted in an encryption mode; generating a verification vector, and sending the data added with the mask to a server;

the aggregation verification module is used for decoding and aggregating through the server, decrypting a ciphertext sent by the server by a user to obtain a sub-secret, sending a corresponding sub-secret according to whether other users are disconnected, recovering an original secret corresponding to the sub-secret by the server, and then recovering a mask code to further obtain an aggregation result; the server generates a verification vector and verifies that the number of samples of the user is not false according to the aggregation result.

2. Application examples. In order to prove the creativity and the technical value of the technical scheme of the invention, the part is the application example of the technical scheme of the claims on specific products or related technologies.

According to an actual application scene, the safe aggregation of the asynchronous federated learning data with the weight is realized by utilizing a safe multi-party computing technology, and meanwhile, a weight verification scheme is provided according to a homomorphic hash function, so that the correctness of the weight in the asynchronous federated learning is ensured.

In the intelligent health detection system, a user can select to upload data collected on own wearable equipment or a mobile phone to a server for federal learning, the server can obtain a better group user health model after the federal learning, and in the process, the user equipment participating in the federal learning has heterogeneous conditions. For example, the smart watch and the mobile phone have great difference in computational power or standby time, which may cause the problem that the local model update of the smart watch and the local model update of the mobile phone are not synchronous when the server collects data for federal learning, thereby causing the problem of falling of the latter. After the method is combined, the users who participate in training can be ensured not to wait for the user with slow local model updating during asynchronous federal learning, so that the training efficiency and concurrency of federal learning are improved.

In addition, in this scenario, since private information may exist in data collected locally by the user, such as blood pressure, heart rate, and the like of the user, the user does not want to disclose the data to other users or servers, which puts a requirement on privacy protection on asynchronous federal learning. When the method is combined, the privacy of the user data in the training process of federal learning can be ensured. The server can only get the aggregated result and cannot get the local information of any user.

In addition, in the practical application scenario, the user uploads the sample number in a public way, so that the actively attacking user can upload the sample number maliciously to interfere with normal training of asynchronous federal learning. The invention provides a verification scheme for the number of the user samples by utilizing the unforgeability and the confidentiality of the homomorphic hash function, and ensures that a user does not maliciously upload false weight values in the training process. The scheme is that the server carries out verification after obtaining the aggregated result, if a user who actively attacks uploads a false weight value, a model generated by the training is unavailable, and the server can directly abandon the result for the next training after verification.

In conclusion, the invention introduces the weight system in asynchronous federated learning for the first time, and provides an efficient safe asynchronous federated learning scheme by combining a safe multi-party computing technology and a homomorphic hash function, so that certain social requirements can be met, and the method has practical value.

3. Evidence of the relevant effects of the examples. The embodiment of the invention achieves some positive effects in the process of research and development or use, and has great advantages compared with the prior art, and the following contents are described by combining data, diagrams and the like in the test process.

As shown in fig. 3, a secure asynchronous federated learning framework is shown, and asynchronous federated learning security aggregation adopted in the present solution is consistent with the prior art, that is, operations of the user side and the server side are determined by the asynchronous coordinator, and security aggregation is performed after sufficient local updates of the user are collected, and at this time, other users who do not upload data can still perform local updates, thereby achieving an asynchronous effect.

As shown in fig. 4, the comparison of the presence or absence of the weight in the Asynchronous federal Learning is shown, and it is obvious that the Learning effect after the weight (asyn + w, asynchronous fed Learning + Weighted), that is, the test accuracy, is more excellent; fig. 5 shows training times of a random gradient descent method (sgd) used in conventional federal Learning, synchronous federal Learning (syn) and asynchronous federal Learning (asyn) under different training rounds, which clearly shows that the training time of the asynchronous federal Learning under a high training round is lower than that of the other two conventional schemes, which also proves the importance and necessity of the asynchronous federal Learning.

As shown in fig. 6, it is demonstrated that after the privacy protection scheme (asyn + secagg, asynchronous Federal Learning + Secure Aggregation) of the present invention is added, the effect of Asynchronous Federal Learning is not much affected compared to when the scheme is not added. The method improves the safety of asynchronous federal learning on the premise of not influencing the training effect of asynchronous federal learning.

It should be noted that embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. An asynchronous federated learning privacy protection method is characterized by comprising the following steps:

initializing a system and selecting random parameters, and disclosing a series of parameters and a signature public key; the users participating in training receive the signature private key to generate a signature and a public and private key pair; the user packs the public key and the signature and sends the public key and the signature to the server, and the server packs the user identity information and the public key and broadcasts the user identity information and the public key to other users after verification; after receiving the data, the user generates a sub-secret corresponding to the random parameter and a one-time session key and sends the sub-secret and the one-time session key to the server for broadcasting;

generating a shared key between users and encrypting user information; the information to be encrypted of the user is weighted local information of the user, and the weight is the number of samples held by the user and the staleness attenuation coefficient of the user during training; the user sends the data after the mask is added to the server, and the server aggregates the data to obtain an aggregation result; and the server divides the aggregation result by the total number of the samples held by the users who finally participate in training to obtain the global model.

2. The asynchronous federated learning privacy protection method of claim 1, wherein the asynchronous federated learning privacy protection method further comprises:

a trusted third party initializing system, selecting random parameters in a finite field and disclosing a series of parameters; users participating in training respectively generate two sets of public and private key pairs and signatures by using the parameters; the communication between the user and the server passes through a safety authentication channel, the user packs the public key and the signature and sends the public key and the signature to the server, and the server packages and broadcasts the user identity information, the public key and the signature to other users after verification; the user verifies the signature after receiving the data and selects random parameters from the finite field; generating corresponding sub-secrets according to a secret sharing algorithm, then generating sub-secrets corresponding to a private key and a one-time session key corresponding to users, and encrypting and sending the sub-secrets and identity information to a server through the session key; the server broadcasts the ciphertext to the corresponding user;

generating a shared key among users, and encrypting the information of the users by masking the selected random number and the shared key through a pseudo-random number generator; the information which needs to be encrypted by the user is the local information of the user multiplied by the number of held samples and the staleness attenuation coefficient when the user participates in training; the user generates a verification vector and sends the data after being added with the mask to a server, and the server broadcasts a user list for participating in training; after receiving the list of all users, the user calculates the signature and sends the signature to the server;

the server broadcasts the signature to a corresponding user; the user receives the signature and then verifies the signature, a ciphertext sent by the server is decrypted to obtain a sub-secret, the sub-secret of the shared key of the disconnected user and the sub-secret of the random number of the disconnected user are sent to the server, the server recovers the original secret corresponding to the sub-secret after receiving the sub-secret, all mask data are added and subtracted by the secret passing through the pseudo-random number generator, and finally an aggregation result is obtained; and the server verifies that the sample number of the user is not false according to the aggregation result.

3. The asynchronous federated learning privacy protection method of claim 1, wherein the asynchronous federated learning privacy protection method comprises the steps of:

step one, a trusted third party initializes a system, selects random parameters in a limited domain, and discloses a series of parameters and a signature public key of a user; the user generates a key, the user participating in training receives a private signature key from a trusted third party to generate a signature, a public and private key pair is generated by using public parameters, and the public and private key pair is packaged and sent to a server;

step two, a user generates a ciphertext, selects random parameters from a finite field, generates a sub-secret corresponding to the random parameters and a sub-secret corresponding to a private key according to a secret sharing algorithm, encrypts the sub-secret and identity information by using a one-time session key, and sends the encrypted sub-secret and identity information to a server as the ciphertext;

step three, the user generates local data with the coefficient and the mask, the user masks the random number selected previously and the shared key generated between the users, and the local data with the coefficient of the user is encrypted by adding codes; generating a verification vector, and sending the data added with the mask to a server;

step four, the server decodes and aggregates, the user decrypts the ciphertext sent by the server to obtain the sub-secret, and the server recovers the original secret corresponding to the sub-secret and then recovers the mask code according to whether other users drop the line to send the corresponding sub-secret, so as to obtain an aggregation result; the server generates a verification vector and verifies that the number of samples of the user is not false according to the aggregation result.

4. The asynchronous federated learning privacy protection method of claim 1, wherein the asynchronous federated learning privacy protection method further comprises:

(1) The trusted third party initializes and gives a security parameter k, resulting in (q, G, H), where q is a prime number, G is a group of order q, G is a generator of G, H: {0,1}^*→{0，1}^kIs a hash function; in the middle Z_qRandomly choosing x as the signing key d^SKCalculating d^PK＝(g^xmod q, g, q) as a signature verification key, where Z_qRepresenting a finite field, simultaneously giving a threshold value t for secret recovery and the number n of users participating in training, and issuing a public parameter GPK = (G, q, G, H, t, n, d)^PK(δ, ρ)), where (δ, ρ) is ∈ Z_qThe key is a key in a homomorphic hash function; for the user with the identity information of a being more than or equal to 1 and less than or equal to n and a, the user receives a signature key issued by a trusted third party

Signature verification key published with other users b

User generates a key, user a is in middle Z_qRandomly selecting two different x₁And x₂Two pairs of public and private keys are generated, and one pair of public and private keys is

Wherein

Similarly, another pair of public and private keys is

Wherein

The former of the two pairs of keys is used for authentication encryption, and the latter is used for generating a mask; user a obtains a signing key through a trusted third party

Signature generation for messages

Wherein k ∈ Z_qA random number selected for user a; will be provided with

Packaging and sending to a server; the server only judges whether the number of users sending data is larger than a threshold value t or not, and records the current user as U₁And if and only if | U₁Executing the next step when | ≧ t is established; server packaging

Broadcasting to other users b;

(2) User generates cipher text, user a receives the information of other user b broadcast by server, uses the verification key of b

Authentication

Whether it is true, and selecting a random number beta in a finite field_aGenerating a secret beta for the other user b based on a threshold value t_aAnd

secret share of (b)_a，bAnd

wherein b ∈ U₁(ii) a User a uses its own private key

Public key published by other users b

Generating one-time session keys

Adding the mask to obtain the output y_a(ii) a Generating a verification vector V from parameters of a trusted third party_aWill { y_a，V_aSending the result to a server; the server only judges whether the number of users sending data is larger than a threshold value t or not, and records the current user as U₃(ii) a If and only if | U₃Executing the next step when | ≧ t is true, and converting U₃Broadcasting the user list to all users;

And sending to a server; the server receives the signature and records the current user set as U₄Packing the identity information and the signature and sending the identity information and the signature to all users; user a verifies signature sigma 'after receiving message'_b，b∈U₄Judging that the user list is not tampered; using one-time session key_a，bDecrypting ciphertext c previously sent by server_b，aObtaining identity information and two groups of secret shares, and confirming that no fake is made after the identity is verified; packaging and sending the secret share to a server; the server records the current user set as U₅Recovering the private mask and the public mask of the disconnected user through the secret share sent by the user; the server aggregates the output of the users and then subtracts the private mask and the public mask of the disconnected user to obtain a weighted aggregate result z, and a verification vector K is generated according to the aggregate result_aZ verifies that the user is not false.

5. The asynchronous federated learning privacy protection method of claim 4, wherein user a ultimately generates output y in step (2)_aAnd a verification vector V_aThe method comprises the following steps:

3) User a output

Wherein

6. The asynchronous federated learning privacy protection method of claim 4, wherein the server aggregation and validation in step (3) includes:

Verifying whether a = a '^ b = b' is true, and if true, representing that the ciphertext is sent by other users b;

2) Will be provided with

And { beta ]_b，a|b∈U₃And (5) packaging and sending to a server, wherein U₂\U₃Represents a disconnected user set, U₃Representing a user set which is not disconnected;

3) The server has previously acknowledged | U₄| > t, the remaining users can recover the random number beta of the users who are not disconnected_aAnd further, the private mask PRG (beta) is restored_a) Where a ∈ U₃(ii) a Meanwhile, the remaining users can recover the private key of the disconnected user

And further with the public key corresponding to the other user b

Recovery of shared secret s by key agreement_a，bAnd further obtains a common mask Σ PRG(s)_a，b) Where a ∈ U₂\U₃，b∈U₃；

4) The server obtains the final output

5) The server generates a verification vector from the public parameters (delta, rho) of the trusted third party

And

wherein a ∈ U₃；

6) Server authentication

If yes, outputting z;

7. An asynchronous federated learning privacy protection system that applies the asynchronous federated learning privacy protection method of any one of claims 1-6, wherein the asynchronous federated learning privacy protection system comprises:

the local data encryption module is used for generating local data with coefficients and masks through a user, the user masks the random number selected in the past and a shared key generated between the users, and the local data with the coefficients of the user is encrypted in a code adding mode; generating a verification vector, and sending the data added with the mask to a server;

8. A computer arrangement comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the asynchronous federal learned privacy protected method as claimed in any of claims 1 to 6.

9. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the asynchronous federally learned privacy protection method as claimed in any of claims 1 to 6.

10. An information data processing terminal, wherein the information data processing terminal is configured to implement the asynchronous federal learning privacy protection system as claimed in claim 7.