CN116796832A - Federal learning method, system and equipment with high availability under personalized differential privacy scene - Google Patents

Federal learning method, system and equipment with high availability under personalized differential privacy scene Download PDF

Info

Publication number
CN116796832A
CN116796832A CN202310767207.0A CN202310767207A CN116796832A CN 116796832 A CN116796832 A CN 116796832A CN 202310767207 A CN202310767207 A CN 202310767207A CN 116796832 A CN116796832 A CN 116796832A
Authority
CN
China
Prior art keywords
model
local
availability
aggregation
participant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310767207.0A
Other languages
Chinese (zh)
Inventor
郭晶晶
刘昌�
刘志全
马勇
徐贵泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310767207.0A priority Critical patent/CN116796832A/en
Publication of CN116796832A publication Critical patent/CN116796832A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/46Secure multiparty computation, e.g. millionaire problem

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The method realizes privacy protection of a local model of a participant through a local differential privacy technology, and simultaneously extracts more useful information from the local model with higher availability through a federal aggregation algorithm based on a model availability theorem, thereby improving the availability of a global model; the system and the equipment can realize privacy protection of the local model of the participant based on the method, improve the usability of the aggregation model under the condition of guaranteeing the usability parameter privacy of the model, and have the advantages of good individuation privacy protection and high precision.

Description

Federal learning method, system and equipment with high availability under personalized differential privacy scene
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a federal learning method, a federal learning system and federal learning equipment which are high in availability under a personalized differential privacy scene.
Background
Federal learning (Federated Learning) is an emerging distributed machine learning framework that enables data to be model trained through interactive model intermediate parameters without going out locally, thereby protecting the privacy of the data owners. At present, federal learning is widely applied to various fields such as intelligent finance, intelligent medical treatment, automatic driving, wireless communication, target detection and the like.
Although federal learning protects the data privacy of the participants by way of the participants and servers exchanging model parameters, learner studies find that exchanged model parameters may also reveal the original training data. One common technique for preventing information disclosure is differential privacy technology, and the existing work based on differential privacy federal learning includes local differential privacy, differential privacy based on random gradient descent, and the like. In the local differential privacy, each participant adds disturbance of different sizes to the local model parameters according to the size of the local privacy budget of the participant, and sends the disturbed model parameters to a server, so that the privacy information of the participant is protected. Existing work on differential privacy federal learning assumes that all participants have a uniform privacy budget. In practice, however, the privacy budget of the participants exhibits personalized features. Different data principals may have different privacy requirements due to different privacy policies or different privacy preferences of the data principals, which results in differences across principal privacy budgets. However, existing federal average aggregation algorithms weight average local models according to client data set size, which is not suitable for personalized differential privacy scenarios, because clients with smaller privacy budgets add more perturbations, resulting in lower model availability, in which case the adoption of federal average aggregation algorithm would result in lower aggregation model availability. And how to perform privacy protection on the model availability parameters and perform global model aggregation based on the model availability parameters, so that the improvement of the model availability is a very challenging problem, and at present, no better solution exists.
A personalized Federal learning solution is proposed in documents [ R.Hu, Y.Guo, H.Li, Q.Pei and Y.Gong, "Personalized Federated Learning With Differential Privacy," in IEEE Internet of Things Journal, vol.7, no.10, pp.9530-9539, oct.2020, doi:10.1109/JIOT.2020.2991416 ], where the multi-task learning is accomplished by learning the characteristics of the user, but where the safety of the Federal learning process is not considered.
A personalized differential privacy federal learning scheme is proposed in literature [ g.yang, s.wang and h.wang, "Federated Learning with Personalized Local Differential Privacy,"2021IEEE 6th International Conference on Computer and Communication Systems (ICCCS), chengdu, china,2021, pp.484-489, doi:10.1109/ICCCS52626.2021.9449232 ], in which a user perturbs data by using his own privacy preferences, but which does not take into account the effect of user personalized additive noise on global data, nor does it quantify the degree of privacy protection.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a federal learning method, a federal learning system and federal learning equipment which are high in availability under a personalized differential privacy scene, privacy protection of a local model of a participant is realized through a local differential privacy technology, meanwhile, based on a model availability theory, more useful information can be extracted from the local model with higher availability through a federal aggregation algorithm, so that the availability of a global model is improved, and the availability parameters of the local model are subjected to privacy protection by adopting homomorphic encryption and a safe multiparty computing technology in the algorithm.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the highly available federal learning method under the personalized differential privacy scene comprises the steps that each participant trains a local model by using a local data set, privacy protection of the local model of the participant is realized through a local differential privacy technology, meanwhile, each participant calculates the availability of the local model according to a model availability theory, encrypts model availability parameters under a plaintext by adopting a homomorphic encryption technology, and encrypts model availability parameters under the plaintext by adopting a safe multiparty calculation technology; then, the participants upload the local model protected by differential privacy to an aggregation server, and simultaneously upload all model availability parameters encrypted by adopting homomorphic encryption and a secure multiparty computing method to a utility server; the utility server calculates by utilizing the received encryption model availability parameters to obtain local model aggregation weights under the ciphertext, and then sends the local model aggregation weights under the ciphertext to the aggregation server; and decrypting by the aggregation server to obtain ciphertext aggregation weights, performing model aggregation according to the aggregation weights, and finally transmitting the aggregation models to all the participants.
The federal learning method which is high in availability under the personalized differential privacy scene is established on the following system assumption:
1) In the system, n participants are shared, and the set of the participants is denoted as u= { u 1 ,u 2 ……,u n Local data sets of each participant are { D }, respectively 1 ,D 2 ……,D n The dataset of all participants is denoted D;
2) The local neural network models of all the participants are the same in type and structure, privacy budgets of all the participants can be different, and the local data sets can be distributed independently and simultaneously or not independently and simultaneously;
3) Aggregation servers and utility servers are honest and non-collusion.
A highly available federal learning method under personalized differential privacy scene specifically comprises the following steps:
step 1: initializing a system: generating public and private key pairs for homomorphic encryption, and negotiating the secret keys of two parties to generate a mask; the server initializes the global model and then issues the global model to the local participants;
step 2: each participant performs local model training on a global model issued by a server, differential privacy protection of local model information is realized based on local privacy budget, and local model parameters realizing differential privacy protection are uploaded to an aggregation server;
step 3: each participant calculates the local model availability of the differential privacy protection according to the model availability theory in the step 2, and privacy protection of local model availability parameters is realized through homomorphic encryption and masking technology, and the result is uploaded to a utility server;
step 4: the utility server obtains the aggregation weight under the ciphertext according to the information obtained in the step 3, and sends the aggregation weight under the ciphertext to the aggregation server;
step 5: the aggregation server decrypts the received aggregation weight through the private key, carries out model aggregation according to the decrypted aggregation weight and the local model parameters, and finally transmits an aggregation model to each participant;
step 6: and (5) repeating the steps 2 to 5, and performing the next round of model training until the training round number reaches the predefined training round number or the model converges.
The specific method for initializing the system in the step 1 is as follows:
using KeyGen (1) λ ) A (pk, sk) key generating function, wherein lambda is a security parameter, the function outputs a public key pk and a private key sk shared by all the participants, a public-private key pair (pk, sk) is generated, the private key sk is held by an aggregation server, and the public key pk is held by all the participants and a utility server; participant u i and uj I.noteq.j, by KeyGen (1 λ ) The agree and mask generation algorithm generates a series of masksThe server initializes a global model and transmits model information to each participant; each participant selects a respective local privacy budget.
Based on the local privacy budget, the specific method for realizing differential privacy protection of the local model information in the step 2 is as follows:
from the DPSGD algorithm can be obtained: given a data sampling rate q and training round number T, and a constant c exists 1 and c2 Pre-emption for arbitrary privacyCalculating epsilon < c 1 q 2 T, if the noise standard deviation is selectedThen algorithm 1 satisfies (epsilon, delta) -differential privacy.
The specific method of the step 3 is as follows:
defining local model availabilityWherein the local loss is->t is the number of training wheels of the current model, D t Representing randomly sampled data of each participant from a local data set D during the t-th training, wherein the sampled data set has the size of j t =|D t |,w * Is the optimal model parameter that minimizes local losses;
from the model availability theorem, it is possible to obtain: model loss L (w) for mu smooth and lambda strong convex t ,D t ) Let the learning rated is the dimension of the model parameters, C is the regularized boundary of the gradient, σ 2 For the variance of the added gaussian noise, the availability of the local model after training t rounds using the DPSGD algorithm satisfies:
from the above theorem, it can be seen that the availability of the local model of the participant is proportional to
Determining local model parameters for participants based on local model availability for each participantWeights in the aggregation process, i.e. participant u i The aggregate weights at round t are:
wherein ,as party u i Model availability at round t, < >>Model availability for all participants at round t;
computing participant u i Availability of local modelsParty u thereafter i Model availability parameter->Adding to the local mask to obtain the model availability parameter under mask protection +.> At the same time by homomorphic encryptionObtaining a model availability parameter in ciphertext form>The model availability parameter under mask protection is then +.>And homomorphic encryption protected model availability parameter ++>Uploading to a utility server.
The specific method of the step 4 is as follows:
first, the utility server protects all model availability parameters under the protection of the mask it receivesAdding to eliminate the effect of the mask, namely:
the utility server then passes through the encryption functionPlaintext using public key pkEncryption to obtain->Thereafter, by means of the ciphertext multiplication function->ObtainingAnd aggregate weight ∈> And sending the message to an aggregation server.
The specific method in the step 5 is as follows:
first, the aggregation server will obtain from the utility serverDecryption is performed to obtain the aggregate weight for each participant's local model: i.e. decryption function->Ciphertext ++using key sk>After decryption, the aggregate weight is obtained>The aggregation server then relies on the local model parameters from the participants>And aggregate weight->Performing model polymerization, namely:
and finally, issuing the global model parameters to each participant.
The system based on the federal learning method which is highly available in the personalized differential privacy scene comprises:
the system initialization module is used for initializing the system in the step 1, and comprises the steps of generating public and private key pairs for homomorphic encryption, and generating masks by negotiation of secret keys of two parties; the server initializes the global model and then issues the global model to the local participants;
the local model training module is used for carrying out local model training on the initialized global model by each participant in the step 2, realizing differential privacy protection of local model information based on local privacy budget, and uploading local model parameters for realizing differential privacy protection to the aggregation server;
the availability calculation and protection module is used for calculating the availability of the local model in the step 3 and realizing privacy protection on the availability parameters by using masks and homomorphic encryption;
the aggregation weight calculation module is used for calculating the aggregation weight according to the method in the step 4 under the condition that the availability privacy is not revealed;
and the model aggregation module is used for performing model aggregation in the step 5.
The system based on the federal learning method which is highly available in the personalized differential privacy scene comprises:
a memory for storing a computer program;
and the processor is used for executing the computer program, and the computer program can realize the federal learning method which is highly available in the personalized differential privacy scene in the steps 1 to 6 when being executed by the processor.
A computer readable storage medium for storing a computer program which, when executed by a processor, is capable of implementing privacy protection for a local model of a participant according to the federal learning method highly available in the personalized differential privacy scenario described in steps 1 to 6, and improving the usability of an aggregation model while guaranteeing the privacy of usability parameters of the model.
Compared with the prior art, the invention has the following advantages:
1. aiming at the problem of insufficient availability of an aggregation model in a personalized differential privacy scene, a novel federal aggregation method is designed based on the model availability theorem; meanwhile, privacy protection of the availability parameters of the local model is realized through homomorphic encryption and a secure multiparty computing technology in the process.
2. According to the method, the local model weights of all the participants are determined by introducing the model availability parameters in the model aggregation process by using the model availability theorem, so that a server can extract more useful information from a model with higher availability, and an aggregation model with higher availability is obtained.
3. The invention adopts homomorphic encryption and secure multiparty computing technology to carry out privacy protection on the availability parameters of the local model, and can improve the availability of the aggregation model under the condition of guaranteeing the privacy of the availability parameters of the model.
In conclusion, the method has the advantages of being good in personalized privacy protection and high in precision.
Drawings
Fig. 1 is a federal learning system framework oriented to personalized differential privacy scenarios.
FIG. 2 is an aggregation model accuracy using a Federal average aggregation algorithm.
Fig. 3 is an aggregate model precision of privacy budget compliance hybrid 1, where fig. 3 (a) is an aggregate model precision when α is 0.1, fig. 3 (b) is an aggregate model precision when α is 0.3, fig. 3 (c) is an aggregate model precision when α is 0.5, fig. 3 (d) is an aggregate model precision when α is 0.7, and fig. 3 (e) is an aggregate model precision when α is 0.9.
Fig. 4 is an aggregate model precision of privacy budget compliance hybrid distribution 2, where fig. 4 (a) is an aggregate model precision when α is 0.1, fig. 4 (b) is an aggregate model precision when α is 0.3, fig. 4 (c) is an aggregate model precision when α is 0.5, fig. 4 (d) is an aggregate model precision when α is 0.7, and fig. 4 (e) is an aggregate model precision when α is 0.9.
Fig. 5 shows the aggregate model precision of the privacy budget compliance hybrid 3, where fig. 5 (a) shows the aggregate model precision of α, β and γ values (20%, 20%, 60%), fig. 5 (b) shows the aggregate model precision of α, β and γ values (20%, 60%, 20%), fig. 5 (c) shows the aggregate model precision of α, β and γ values (60%, 20%, 20%), and fig. 5 (d) shows the aggregate model precision of α, β and γ values (30%, 30%, 40%).
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
The theory and technology foundation of high-availability federal learning under the personalized differential privacy scene related in the invention is as follows:
a highly available federal learning method under personalized differential privacy scene specifically comprises the following steps:
referring to fig. 1, step 1: initialization of
Using KeyGen (1) λ ) A (pk, sk) key generating function, wherein lambda is a security parameter, the function outputs a public key pk and a private key sk shared by all the participants, a public-private key pair (pk, sk) is generated, the private key sk is held by an aggregation server, and the public key pk is held by all the participants and a utility server; participant u i and uj I.noteq.j, by KeyGen (1 λ ) The agree and mask generation algorithm generates a series of masksThe server initializes a global model and transmits model information to each participant; each participant selects a respective local privacy budget.
Step 2: each participant carries out local model training based on local data, the latest global model and a DPSGD algorithm, differential privacy protection of local model information is realized based on local privacy budget, and local model parameters for realizing local differential privacy are uploaded to an aggregation server;
based on the local privacy budget, the specific method for realizing differential privacy protection of the local model information comprises the following steps:
from the DPSGD algorithm can be obtained: given the data sampling rate q and training round number T, there is a constant c 1 and c2 Epsilon < c for arbitrary privacy budget 1 q 2 T, if the noise standard deviation is selectedThen algorithm 1 satisfies (epsilon, delta) -differential privacy.
Step 3: defining local model availabilityWherein the local loss ist is the number of training wheels of the current model, D t Representing randomly sampled data of each participant from a local data set D during the t-th training, wherein the sampled data set has the size of j t =|D t |,w * Is the optimal model parameter that minimizes local losses;
from the model availability theory, it can be obtained that: model loss L (w) for mu smooth and lambda strong convex t ,D t ) Let the learning rated is the dimension of the model parameters, C is the regularized boundary of the gradient, σ 2 For the variance of the added gaussian noise, the availability of the local model after training t rounds using the DPSGD algorithm satisfies:
from the theory above, the availability of the local model of the participant is proportional to
Determining weights of local model parameters of participants in an aggregation process based on local model availability of each participant, i.e. participant u i The aggregate weights at round t are:
wherein ,as party u i Model availability at round t, < >>Model availability for all participants at round t;
computing participant u i Availability of local modelsParty u thereafter i Model availability parameter->Adding to the local mask to obtain the model availability parameter under mask protection +.> At the same time by homomorphic encryptionObtaining a model availability parameter in ciphertext form>The model availability parameter under mask protection is then +.>And homomorphic encryption protected model availability parameter ++>Uploading to a utility server.
Step 4: first, the utility server protects all model availability parameters under the protection of the mask it receivesAdding to eliminate the effect of the mask, namely:
then, utility server passesObtain->Thereafter, byObtain->And will->And sending the message to an aggregation server.
Step 5: the aggregation server performs model aggregation. First, the aggregation server will obtain from the utility serverDecrypting to obtain aggregate weights for each participant's local model, i.eThen, the aggregation server is based on the local model parameters it receives +.>And aggregate weight->Performing model polymerization, namely:
and finally, issuing the global model parameters to each participant.
Step 6: and (5) repeating the steps 2 to 5, and performing the next round of model training until the training round number reaches the predefined training round number or the model converges.
The system based on the federal learning method which is highly available in the personalized differential privacy scene comprises:
the system initialization module is used for initializing the federal learning structure, initializing the secret key of the participator and the mask in the step 1;
the local model training module is used for carrying out local model training on the initialized global model by each participant in the step 2, realizing differential privacy protection of local model information based on local privacy budget, and uploading local model parameters for realizing differential privacy protection to the aggregation server;
the availability calculation and protection module is used for calculating the availability of the local model in the step 3 and realizing privacy protection on the availability parameters by using masks and homomorphic encryption;
the aggregation weight calculation module is used for calculating the aggregation weight according to the method in the step 4 under the condition that the availability privacy is not revealed;
and the model aggregation module is used for performing model aggregation in the step 5.
The device based on the federal learning method which is highly available in the personalized differential privacy scene comprises:
a memory for storing a computer program;
and the processor is used for executing the computer program, and the computer program can realize the federal learning method which is highly available in the personalized differential privacy scene in the steps 1 to 6 when being executed by the processor.
A computer readable storage medium for storing a computer program which, when executed by a processor, is capable of implementing privacy protection for a local model of a participant according to the federal learning method highly available in the personalized differential privacy scenario described in steps 1 to 6, and improving the usability of an aggregation model while guaranteeing the privacy of usability parameters of the model.
The effectiveness of the proposal is verified through a series of experiments, and compared with the FedAvg which does not consider personalized differential privacy preference, the algorithm provided by the invention can obtain higher aggregation model precision.
Experimental environment: the processor is Intel (R) Core (TM) i5-10400 CPU@2.90GHz 2.90GHz, the onboard RAM is 16GB, and the operating system is Windows 11 family edition. The programming environment is Python 3.9.13,pytorch 1.7.7,tenseal 0.3.12,tensorflow-private 0.8.7.
All experiments adopt MNIST data sets, the number of clients is 10, the size of the data set held by each participant is 200, and the model trained by the system is a convolutional neural network comprising two convolutional layers and two linear layers.
Privacy preferences: to simulate possible privacy preferences of a participant, we consider that the participant privacy preferences obey a gaussian distribution and a multimodal distribution (a mixture of two and more different gaussian distributions), where obeying the multimodal distribution means that the participant has multiple privacy budgets (denoted N, respectively) 1 ,N 2 and N3 ) Table 1 gives the participant privacy preference profiles considered in the experiment.
Table 1: client privacy preference distribution
The experimental results are as follows:
fig. 2 shows that when the privacy preference of the participants is gaussian distribution 1, gaussian distribution 2 and gaussian distribution 3, the model accuracy obtained by using the federal averaging algorithm is as shown in fig. 2 above, it can be seen that with increasing privacy budget, the aggregate model accuracy improves, and when the privacy budget is 0.5, the aggregate model accuracy is about 10%, which indicates that the added noise is too large due to too small privacy budget, making the model unusable. When the privacy budget is 10, the precision of the aggregate model reaches 96%, and at the moment, the privacy budget is larger, so that the added noise is smaller, and the model precision is higher.
Fig. 3 shows the model accuracy obtained with the availability weighted aggregation algorithm and the federal average aggregation algorithm when the participant privacy preference is a mixed distribution 1, and the alpha values are 0.1,0.3,0.5,0.7, and 0.9, respectively. As can be seen from fig. 3, the accuracy of the aggregation model obtained by the availability weighted aggregation algorithm is higher than that obtained by the federal average aggregation algorithm, and the effect of the federal average aggregation algorithm approaches to that of the availability weighted aggregation algorithm with the increase of alpha, and the effect of the availability weighted aggregation algorithm is obviously better than that of the federal average aggregation algorithm with the decrease of alpha.
Fig. 4 shows the model accuracy obtained with the availability weighted aggregation algorithm and the federal average aggregation algorithm when the participant privacy preference is a mixed distribution 2, and the alpha values are 0.1,0.3,0.5,0.7, and 0.9, respectively. As can be seen from fig. 4, the accuracy of the aggregate model obtained by the availability weighted aggregate algorithm is higher than that obtained by the federal average aggregate algorithm, when the client with the privacy budget of 10 is lower than 50%, the accuracy of the aggregate model obtained by the federal average aggregate algorithm is about 10%, which indicates that the participation of the participant with the privacy budget of 0.5 in the federal learning system will reduce the accuracy of the aggregate model, and the availability weighted aggregate algorithm can obviously reduce the influence of uploading the model by the participant with the privacy budget of 0.5, thereby improving the accuracy of the aggregate model.
Fig. 5 shows the model accuracy obtained using the availability weighted aggregation algorithm and the federal average aggregation algorithm when the participant privacy preference is the mixed distribution 3, and the α, β, and γ values are (20%, 20%, 60%), (20%, 60%, 20%), (60%, 20%, 20%), (30%, 30%,40% respectively). As can be seen in fig. 5, the availability weighted aggregation algorithm yields an aggregate model accuracy that is higher than the model accuracy obtained by the federal average aggregation algorithm when the participants involved in federal learning have three different privacy preferences.

Claims (10)

1. The high-availability federal learning method under the personalized differential privacy scene is characterized in that each participant uses a local data set to train a local model, privacy protection of the local model of the participant is realized through the local differential privacy, meanwhile, each participant calculates the availability of the local model according to the model availability theory, encrypts model availability parameters under the plaintext by homomorphic encryption, and calculates model availability parameters under the encrypted plaintext by adopting safe multiparties; then, the participants upload the local model protected by differential privacy to an aggregation server, and simultaneously upload all model availability parameters encrypted by adopting homomorphic encryption and a secure multiparty computing method to a utility server; the utility server calculates by utilizing the received encryption model availability parameters to obtain local model aggregation weights under the ciphertext, and then sends the local model aggregation weights under the ciphertext to the aggregation server; and decrypting by the aggregation server to obtain ciphertext aggregation weights, performing model aggregation according to the aggregation weights, and finally transmitting the aggregation models to all the participants.
2. The method of claim 1, wherein the high availability federal learning algorithm in the personalized differential privacy scenario is based on the following system assumptions:
1) In the system, n participants are shared, and the set of the participants is denoted as u= { u 1 ,u 2 ……,u n Local data sets of each participant are { D }, respectively 1 ,D 2 ……,D n The dataset of all participants is denoted D;
2) The local neural network models of all the participants are the same in type and structure, privacy budgets of all the participants can be different, and the local data sets can be distributed independently and simultaneously or not independently and simultaneously;
3) Aggregation servers and utility servers are honest and non-collusion.
3. A federal learning method highly available in a personalized differential privacy scenario according to claim 1 or 2, comprising in particular the steps of:
step 1: initializing a system: generating public and private key pairs for homomorphic encryption, and negotiating the secret keys of two parties to generate a mask; the server initializes the global model and then issues the global model to the local participants;
step 2: each participant performs local model training on a global model issued by a server, differential privacy protection of local model information is realized based on local privacy budget, and local model parameters realizing differential privacy protection are uploaded to an aggregation server;
step 3: each participant calculates the local model availability of the differential privacy protection according to the model availability theory in the step 2, and privacy protection of local model availability parameters is realized through homomorphic encryption and masking technology, and the result is uploaded to a utility server;
step 4: the utility server obtains the aggregation weight under the ciphertext according to the information obtained in the step 3, and sends the aggregation weight under the ciphertext to the aggregation server;
step 5: the aggregation server decrypts the received aggregation weight through the private key, carries out model aggregation according to the decrypted aggregation weight and the local model parameters, and finally transmits an aggregation model to each participant;
step 6: and (5) repeating the steps 2 to 5, and performing the next round of model training until the training round number reaches the predefined training round number or the model converges.
4. A federal learning method highly available in a personalized differential privacy scenario according to claim 3, wherein the specific method of step 1 system initialization is:
using KeyGen (1) λ ) A (pk, sk) key generating function, lambda is a security parameter, the function outputs a public key pk and a private key pk shared by all the participants, a public-private key pair (pk, sk) is generated, the private key sk is held by an aggregation server, and the public key pk is held by all the participants and a utility server; participant u i and uj I.noteq.j, by KeyGen (1 λ ) The agree and mask generation algorithm generates a series of masksThe server initializes a global model and transmits model information to each participant; each participant selects a respective local privacy budget.
5. The federal learning method highly available in a personalized differential privacy scenario according to claim 3, wherein the specific method for implementing differential privacy protection of local model information based on local privacy budget in step 2 is as follows:
from the DPSGD algorithm can be obtained: given a data sampling rate q and training round number T, and a constant c exists 1 and c2 Epsilon for arbitrary privacy budget<c 1 q 2 T, if the noise standard deviation is selectedThen algorithm 1 satisfies (epsilon, delta) -differential privacy.
6. A federal learning method highly available in a personalized differential privacy scenario according to claim 3, wherein the specific method of step 3 is:
defining local model availabilityWherein the local loss is->t is the number of training wheels of the current model, D t Representing randomly sampled data of each participant from a local data set D during the t-th training, wherein the sampled data set has the size of j t =|D t |,w * Is the optimal model parameter that minimizes local losses;
from the model availability theorem, it is possible to obtain: model loss L (w) for mu smooth and lambda strong convex t ,D t ) Let the learning rated is the dimension of the model parameters, C is the regularized boundary of the gradient, σ 2 For the variance of the added gaussian noise, the availability of the local model after training t rounds using the DPSGD algorithm satisfies:
from the above theorem, it can be seen that the availability of the local model of the participant is proportional to
Determining local model parameters for participants based on local model availability for each participantWeights in the aggregation process, i.e. participant u i The aggregate weights at round t are:
wherein ,as party u i Model availability at round t, < >>Model availability for all participants at round t;
computing participant u i Availability of local modelsParty u thereafter i Model availability parameter->Adding to the local mask to obtain the model availability parameter under mask protection +.> At the same time by homomorphic encryption->Obtaining a model availability parameter in ciphertext form>The model availability parameter under mask protection is then +.>And homomorphic encryption protected model availability parameter ++>Uploading to a utility server.
7. A federal learning method highly available in a personalized differential privacy scenario according to claim 3, wherein the specific method of step 4 is:
first, the utility server protects all model availability parameters under the protection of the mask it receivesAdding to eliminate the effect of the mask, namely:
then, utility clothingThe server passing through an encryption functionPlaintext using public key pk ++>Encryption to obtain->Thereafter, by means of the ciphertext multiplication function->Obtain->And aggregate weight ∈> And sending the message to an aggregation server.
8. A federal learning method highly available in a personalized differential privacy scenario according to claim 3, wherein the specific method of step 5 is:
first, the aggregation server will obtain from the utility serverDecryption is performed to obtain the aggregate weight for each participant's local model: i.e. decryption function->Ciphertext ++using key sk>After decryption, the aggregate weight is obtained>The aggregation server then relies on the local model parameters from the participants>And aggregate weight->Performing model polymerization, namely:
and finally, issuing the global model parameters to each participant.
9. A system based on the federal learning method highly available in a personalized differential privacy scenario according to any one of claims 1 to 8, comprising:
the system initialization module is used for initializing the system in the step 1, and comprises the steps of generating public and private key pairs for homomorphic encryption, and generating masks by negotiation of secret keys of two parties; the server initializes the global model and then issues the global model to the local participants;
the local model training module is used for carrying out local model training on the initialized global model by each participant in the step 2, realizing differential privacy protection of local model information based on local privacy budget, and uploading local model parameters for realizing differential privacy protection to the aggregation server;
the availability calculation and protection module is used for calculating the availability of the local model in the step 3 and realizing privacy protection on the availability parameters by using masks and homomorphic encryption;
the aggregation weight calculation module is used for calculating the aggregation weight according to the method in the step 4 under the condition that the availability privacy is not revealed;
and the model aggregation module is used for performing model aggregation in the step 5.
10. An apparatus based on the federal learning method highly available in a personalized differential privacy scenario according to any one of claims 1 to 8, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program, and the computer program can realize the federal learning method which is highly available in the personalized differential privacy scene in the steps 1 to 6 when being executed by the processor.
CN202310767207.0A 2023-06-27 2023-06-27 Federal learning method, system and equipment with high availability under personalized differential privacy scene Pending CN116796832A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310767207.0A CN116796832A (en) 2023-06-27 2023-06-27 Federal learning method, system and equipment with high availability under personalized differential privacy scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310767207.0A CN116796832A (en) 2023-06-27 2023-06-27 Federal learning method, system and equipment with high availability under personalized differential privacy scene

Publications (1)

Publication Number Publication Date
CN116796832A true CN116796832A (en) 2023-09-22

Family

ID=88043383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310767207.0A Pending CN116796832A (en) 2023-06-27 2023-06-27 Federal learning method, system and equipment with high availability under personalized differential privacy scene

Country Status (1)

Country Link
CN (1) CN116796832A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117171814A (en) * 2023-09-28 2023-12-05 数力聚(北京)科技有限公司 Federal learning model integrity verification method, system, equipment and medium based on differential privacy
CN117196017A (en) * 2023-09-28 2023-12-08 数力聚(北京)科技有限公司 Federal learning method, system, equipment and medium for lightweight privacy protection and integrity verification
CN117592584A (en) * 2023-12-11 2024-02-23 滇西应用技术大学 Random multi-model privacy protection method based on federal learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117171814A (en) * 2023-09-28 2023-12-05 数力聚(北京)科技有限公司 Federal learning model integrity verification method, system, equipment and medium based on differential privacy
CN117196017A (en) * 2023-09-28 2023-12-08 数力聚(北京)科技有限公司 Federal learning method, system, equipment and medium for lightweight privacy protection and integrity verification
CN117171814B (en) * 2023-09-28 2024-06-04 数力聚(北京)科技有限公司 Federal learning model integrity verification method, system, equipment and medium based on differential privacy
CN117592584A (en) * 2023-12-11 2024-02-23 滇西应用技术大学 Random multi-model privacy protection method based on federal learning

Similar Documents

Publication Publication Date Title
CN116796832A (en) Federal learning method, system and equipment with high availability under personalized differential privacy scene
CN109684855B (en) Joint deep learning training method based on privacy protection technology
CN108712260B (en) Multi-party deep learning computing agent method for protecting privacy in cloud environment
CN109657489B (en) Privacy protection set intersection two-party secure calculation method and system
CN113221105B (en) Robustness federated learning algorithm based on partial parameter aggregation
CN110572253A (en) Method and system for enhancing privacy of federated learning training data
CN110166258B (en) Group key negotiation method based on privacy protection and attribute authentication
CN111563265A (en) Distributed deep learning method based on privacy protection
CN112597542B (en) Aggregation method and device of target asset data, storage medium and electronic device
CN112818369B (en) Combined modeling method and device
CN111049647B (en) Asymmetric group key negotiation method based on attribute threshold
WO2021106077A1 (en) Update method for neural network, terminal device, calculation device, and program
CN111581648B (en) Method of federal learning to preserve privacy in irregular users
CN107248980A (en) Mobile solution commending system and method with privacy protection function under cloud service
CN113240129A (en) Multi-type task image analysis-oriented federal learning system
CN115841133A (en) Method, device and equipment for federated learning and storage medium
Kumar Technique for security of multimedia using neural network
CN108259185A (en) A kind of group key agreement system and method for group communication moderate resistance leakage
CN117171814B (en) Federal learning model integrity verification method, system, equipment and medium based on differential privacy
CN103346999B (en) A kind of NOT of support operator also has the CP-ABE method of CCA safety
CN116865938A (en) Multi-server federation learning method based on secret sharing and homomorphic encryption
CN117421762A (en) Federal learning privacy protection method based on differential privacy and homomorphic encryption
CN117395067A (en) User data privacy protection system and method for Bayesian robust federal learning
CN110336775B (en) Quantum group authentication method based on Grover algorithm
CN111159727A (en) Multi-party collaborative Bayes classifier safety generation system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination