CN112668044B - Privacy protection method and device for federal learning - Google Patents

Privacy protection method and device for federal learning Download PDF

Info

Publication number
CN112668044B
CN112668044B CN202011523140.9A CN202011523140A CN112668044B CN 112668044 B CN112668044 B CN 112668044B CN 202011523140 A CN202011523140 A CN 202011523140A CN 112668044 B CN112668044 B CN 112668044B
Authority
CN
China
Prior art keywords
noise
privacy
privacy attribute
parameter
output probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011523140.9A
Other languages
Chinese (zh)
Other versions
CN112668044A (en
Inventor
牛犇
李凤华
张立坤
陈亚虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202011523140.9A priority Critical patent/CN112668044B/en
Publication of CN112668044A publication Critical patent/CN112668044A/en
Application granted granted Critical
Publication of CN112668044B publication Critical patent/CN112668044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a privacy protection method and a device facing federal learning, which comprises the following steps: the method comprises the steps of parameter setting, data dividing, first training, second training, first calculating, second calculating and confrontation sample generating. In the embodiment, a concept of resisting samples is adopted, a certain amount of distribution characteristics of noise disturbance parameters are added in parameter updating, and after passing through a privacy attribute inference model, a privacy inference result is output randomly according to probability distribution expected by a user so as to resist privacy attribute inference attacks, so that the privacy attribute disclosure problem of federal learning is relieved.

Description

Privacy protection method and device for federal learning
Technical Field
The invention relates to the technical field of computers, in particular to a privacy protection method and device for federal learning.
Background
Federal Learning (FL) is a distributed deep Learning method, which can give consideration to efficiency, accuracy, and privacy to some extent, and thus has attracted much attention. The main process of federal learning is as follows: and the server randomly assigns values to the global model parameters to initialize and distributes the model to each user, the users train the model locally by using own data, then the updated parameters of the model are sent back to the server, and the server updates the global model according to the updated parameters and distributes the updated parameters to the users again, and then a new round of iterative updating is carried out. In the process, the server only collects parameters of the user model instead of original data, so that data privacy protection is facilitated. In addition, different users participate in training together, so that the model generalization and the training efficiency are enhanced, and meanwhile, the personalized model training of the user side is realized.
However, although users participating in federal learning do not have to submit training data directly to the server, the parameters sent by the users can still indirectly cause privacy leaks. This privacy disclosure problem is particularly prominent in situations where the user data is not distributed identically: at this time, data of different users often have different privacy attributes, for example, for users with different genders, races, or income levels, shopping data of the users often have a certain difference in distribution, and this difference further affects parameter distribution of the target model in a model training stage of the users, and by using the difference in parameter distribution among the different users, an attacker (including an external malicious user and an untrusted server) can infer information such as the genders, races, or income levels of the users, thereby bringing a great threat to the privacy of the users. The specific inference method is that some known parameter vectors are used as training data, corresponding privacy attributes of the training data are used as training data labels, and a privacy attribute classifier is trained, so that the privacy attributes of user data are inferred through parameters sent by a user. Therefore, in a federal learning scenario, how to sufficiently ensure the accuracy of a global model and prevent privacy leakage caused by model parameter exchange is an urgent problem to be solved.
In order to solve the privacy problem, various protection technologies, such as a dynamic encryption technology, a secure multi-party computation technology, a differential privacy technology, and the like, have been proposed, and all of these technologies protect the privacy of the user to some extent. In the above protection scheme, the homomorphic encryption based method provides reliable security and accuracy guarantees. However, the homomorphic encryption algorithm often has high computational complexity, low use efficiency, high communication overhead of parameters, and a complex key management mechanism, so that it is difficult to popularize and apply the homomorphic encryption algorithm. The safe multi-party calculation and difference technology is widely applied due to strong theoretical support and easy-to-implement performance advantages. Abadi et al propose applying differential privacy to the gradient descent algorithm of deep learning to ensure that true training data cannot be recovered by the model parameters. However, this algorithm requires each user to add noise to the gradient independently, which has a large impact on the accuracy of the aggregate model. Bonawitz and the like utilize a federal learning architecture designed by a secure multiparty computing protocol to ensure that all users participating in learning can share the same global model, and a server can only obtain the updated global model but cannot obtain the real parameters submitted by any user, but the scheme cannot resist collusion attack among participants. In order to make up for the defects, Truex and the like combine differential privacy and multi-party calculation, so that the noise quantity is reduced, the model accuracy and privacy are ensured, and collusion threats among users can be resisted. In conclusion, the existing federal learning privacy protection scheme is difficult to well consider accuracy, privacy and universality. According to the protection scheme based on the differential privacy, a large amount of noise needs to be added on the gradient in the model training stage of a user when the privacy is guaranteed, the accuracy of the aggregated global model is sacrificed, and the differential privacy cannot protect the privacy of the training data in a targeted manner; after the difference privacy is combined with the safe multi-party calculation, even if the accuracy is improved, the correct calculation of each round of results of the safe multi-party calculation protocol needs to meet the requirement that a certain number of users simultaneously perform model training and aggregation on line, so that the method is not suitable for an updating mode of asynchronous federal learning, and in addition, the homomorphic encryption technology introduced into the multi-party calculation greatly reduces the operation efficiency.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a privacy protection method and device facing federal learning.
In a first aspect, an embodiment of the present invention provides a privacy protection method for federal learning, including:
setting parameters, namely setting parameters according to actual requirements and outputting the parameters; the parameters include a set of privacy attributes, an expected output probability vector for noise, and an availability budget; the privacy attribute set refers to a set of privacy attribute values to be protected, the expected output probability vector of the noise refers to an expected output privacy inference result, and the availability budget refers to availability budget constraint of noise amount;
a data dividing step, namely dividing a data set based on the privacy attribute set, and determining m training data subsets corresponding to the privacy attributes;
a first training step, training m target models according to the m training data subsets corresponding to the privacy attributes, and determining a model iteration parameter data set;
a second training step, namely, taking the model iteration parameter data set as training data, taking the privacy attribute value corresponding to each set of parameters in the model iteration parameter data set as a data tag to train a privacy attribute inference model, and determining a privacy attribute inference result; the privacy attribute inference model is an m-class privacy attribute classifier;
a first calculation step of determining a minimized noise by a fast gradient method based on the privacy attribute inference result;
a second calculation step of determining a noise actual output probability distribution based on the minimized noise, the expected output probability vector of the noise, and the availability budget;
and generating a countermeasure sample, generating a countermeasure sample based on the noise actual output probability distribution, and sending the countermeasure sample to a parameter server so that the server performs parameter updating based on the countermeasure sample.
Further, the first calculating step, based on the privacy attribute inference result, determines the minimized noise by using a fast gradient method, specifically includes:
finding a set of noise r ═ r by fast gradient method1,r2,…rmIs such that noise r is added to the parameter x to be transmittediLater confrontation sample xi *The privacy attribute value output after the model f is inferred through the privacy attribute is i;
the calculation method of the rapid gradient method comprises the following steps:
Figure BDA0002849890730000041
the noise calculation method is as follows:
ri=xi *-x
where r represents a set of noises, r1、r2、rmAnd riAll represent noise, x represents a parameter to be transmitted, xi *Representing generated countermeasure samples, f representing a privacy attribute inference model, i representing a privacy attribute value, ε being an availability budget, l representing a loss function, forAnd calculating the distance between the privacy attribute f (x) output by the privacy attribute inference model and the target privacy attribute i.
Further, the second calculating step determines an actual output probability distribution of the noise based on the minimized noise, the expected output probability vector of the noise, and the availability budget, and specifically includes:
calculating a noise actual output probability distribution q based on the minimized noise, the expected output probability vector p of the noise and the availability pre-adoption of a first relation model; the first relationship model is as follows:
min KL(q,p)s.t.∑qi*||ri||≤ε,i∈{1,2,...,m}
the min KL (q, p) is an optimization target, q is the actual output probability of each noise, p represents the expected output probability of the noise, the KL (q, p) is used for calculating the KL divergence distance of the q and p two vector distributions, and the Σ q is used for calculating the KL divergence distance of the q and p two vector distributionsi*||riThe actual output probability q meets the availability budget constraint of the user with | | < epsilon, and | | | riIs the noise vector riL2 norm, qiRepresenting the ith noise actual output probability.
Further, the generating of the countermeasure sample step generates the countermeasure sample based on the actual output probability distribution of the noise, and sends the countermeasure sample to a parameter server, so that the server performs parameter update based on the countermeasure sample, specifically including:
when the user needs to send parameter x to the server, actually outputting probability distribution q based on the noiseiFrom r ═ { r1,r2,...rmRandomly selecting a noise riGenerating a confrontation sample xi *=x+ri
And combining the challenge sample xi *=x+riSending to a parameter server, such that the server is based on the countermeasure sample xi *=x+riAnd updating the parameters.
In a second aspect, an embodiment of the present invention provides a privacy protection apparatus for federal learning, including:
the parameter setting module is used for setting parameters according to actual requirements and outputting the parameters; the parameters include a set of privacy attributes, an expected output probability vector for noise, and an availability budget; the privacy attribute set refers to a set of privacy attribute values to be protected, the expected output probability vector of the noise refers to an expected output privacy inference result, and the availability budget refers to availability budget constraint of noise amount;
the data dividing module is used for dividing a data set based on the privacy attribute set and determining m training data subsets corresponding to the privacy attributes;
the first training module trains m target models according to the m training data subsets corresponding to the privacy attributes and determines a model iteration parameter data set;
the second training module is used for training a privacy attribute inference model by taking the model iteration parameter data set as training data and the privacy attribute value corresponding to each set of parameter in the model iteration parameter data set as a data tag, and determining a privacy attribute inference result; the privacy attribute inference model is an m-class privacy attribute classifier;
a first calculation module for determining a minimized noise by a fast gradient method based on the privacy attribute inference result;
a second calculation module that determines a noise actual output probability distribution based on the minimized noise, a desired output probability vector of the noise, and an availability budget;
and the countermeasure sample generation module generates countermeasure samples based on the noise actual output probability distribution and sends the countermeasure samples to a parameter server so that the server performs parameter updating based on the countermeasure samples.
Further, the first calculating module is specifically configured to:
finding a set of noise r ═ r by fast gradient method1,r2,...rmIs such that noise r is added to the parameter x to be transmittediLater confrontation sample xi *The privacy attribute value output after the model f is inferred through the privacy attribute is i;
the calculation method of the rapid gradient method comprises the following steps:
Figure BDA0002849890730000061
the noise calculation method is as follows:
ri=xi *-x
where r represents a set of noises, r1、r2、rmAnd riAll represent noise, x represents a parameter to be transmitted, xi *Representing the generated countermeasure sample, f representing the privacy attribute inference model, i representing the privacy attribute value, epsilon being the availability budget, and l representing a loss function for calculating the distance between the privacy attribute f (x) and the target privacy attribute i output by the privacy attribute inference model.
Further, the second calculation module is specifically configured to:
calculating a noise actual output probability distribution q based on the minimized noise, the expected output probability vector p of the noise and the availability pre-adoption of a first relation model; the first relationship model is as follows:
min KL(q,p)s.t.∑qi*||ri||≤ε,i∈{1,2,...,m}
the min KL (q, p) is an optimization target, q is the actual output probability of each noise, p represents the expected output probability of the noise, the KL (q, p) is used for calculating the KL divergence distance of the q and p two vector distributions, and the Σ q is used for calculating the KL divergence distance of the q and p two vector distributionsi*||riThe actual output probability q meets the availability budget constraint of the user with | | < epsilon, and | | | riIs the noise vector riL2 norm, qiRepresenting the ith noise actual output probability.
Further, the generate confrontation sample module is specifically configured to:
when the user needs to send parameter x to the server, actually outputting probability distribution q based on the noiseiFrom r ═ { r1,r2,...rmRandomly selecting a noise riGenerating a confrontation sample xi *=x+ri
And combining the challenge sample xi *=x+riSending to a parameter server, such that the server is based on the countermeasure sample xi *=x+riAnd updating the parameters.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the privacy protection method for federated learning according to the first aspect when executing the program.
In a fourth aspect, embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the federal learning oriented privacy protection method as described in the first aspect.
According to the technical scheme, the privacy protection method and device for federal learning provided by the embodiment of the invention have the advantages that through the parameter setting step, parameters are set according to actual requirements, and the parameters are output; the parameters include a set of privacy attributes, an expected output probability vector for noise, and an availability budget; the privacy attribute set refers to a set of privacy attribute values to be protected, the expected output probability vector of the noise refers to an expected output privacy inference result, and the availability budget refers to availability budget constraint of noise amount; a data dividing step, namely dividing a data set based on the privacy attribute set, and determining m training data subsets corresponding to the privacy attributes; a first training step, training m target models according to the m training data subsets corresponding to the privacy attributes, and determining a model iteration parameter data set; a second training step, namely, taking the model iteration parameter data set as training data, taking the privacy attribute value corresponding to each set of parameters in the model iteration parameter data set as a data tag to train a privacy attribute inference model, and determining a privacy attribute inference result; the privacy attribute inference model is an m-class privacy attribute classifier; a first calculation step of determining a minimized noise by a fast gradient method based on the privacy attribute inference result; a second calculation step of determining a noise actual output probability distribution based on the minimized noise, the expected output probability vector of the noise, and the availability budget; and generating a countermeasure sample, generating the countermeasure sample based on the actual noise output probability distribution, and sending the countermeasure sample to a parameter server, so that the server updates parameters based on the countermeasure sample, thereby facing a federal learning scene, effectively realizing the balance of privacy and accuracy, and being suitable for two updating mechanisms, namely a synchronous updating mechanism and an asynchronous updating mechanism.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a privacy protection method for federated learning according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating setting parameters according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a privacy protecting apparatus for federal learning according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a privacy protecting apparatus for federal learning according to another embodiment of the present invention;
fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The privacy protection method for federal learning provided by the invention will be explained and illustrated in detail by specific embodiments.
Fig. 1 is a schematic flowchart of a privacy protection method for federated learning according to an embodiment of the present invention; as shown in fig. 1, the method includes:
step 101: setting parameters, namely setting parameters according to actual requirements and outputting the parameters; the parameters include a set of privacy attributes, an expected output probability vector for noise, and an availability budget; the privacy attribute set refers to a set of privacy attribute values to be protected, the expected output probability vector of the noise refers to a privacy inference result expected to be output, and the availability budget refers to an availability budget constraint of noise amount.
In this step, it can be understood that the first step sets parameters, which include three parts, a privacy attribute set, a desired output probability vector of noise, and an availability budget. A privacy attribute set refers to a set of all privacy attribute values that a user specifies need to protect. The expected output probability vector refers to the probability with which the user desires to select one of the generated noises, for example, the user desires to output any privacy inference result with an equal probability uniform distribution, so as to prevent an attacker from correctly inferring the privacy attributes of the user. The availability budget constrains the amount of noise added by the user, guaranteeing the availability of countermeasure samples.
Step 102: and a data dividing step, namely dividing the data set based on the privacy attribute set and determining m training data subsets corresponding to the privacy attributes.
In this step, it can be understood that, in the second step, the user data set is divided, and it is assumed that the user-defined privacy attribute set includes m privacy attribute values, each piece of data corresponds to only one privacy attribute value, and the privacy attribute set does not coincide with the target attribute set. The training data set is partitioned into m subsets according to these privacy attributes. Wherein m is a positive integer greater than zero.
Step 103: and a first training step, namely training m target models according to the m training data subsets corresponding to the privacy attributes, and determining a model iteration parameter data set.
In this step, it can be understood that, in the third step, the target model is trained, and the user trains m target models respectively by using the m training subsets.
Step 104: a second training step, namely, taking the model iteration parameter data set as training data, taking the privacy attribute value corresponding to each set of parameters in the model iteration parameter data set as a data tag to train a privacy attribute inference model, and determining a privacy attribute inference result; the privacy attribute inference model is an m-class privacy attribute classifier.
In this step, it can be understood that, in the fourth step, the privacy attribute inference model is trained, parameters in each round of update iteration of the target model are used as training data of the privacy attribute inference model, and corresponding privacy attribute values are used as data labels, so that an m-class privacy attribute classifier is trained.
Step 105: a first calculation step of determining a minimized noise by a fast gradient method based on the privacy attribute inference result.
In this step, it can be understood that, after the noise is calculated in the fifth step and the privacy attribute inference model is obtained, a set of minimized noises is found through a fast gradient method, so that a set of parameters respectively become countermeasure samples corresponding to one of the m privacy attributes.
Step 106: a second calculation step of determining a noise actual output probability distribution based on the minimized noise, the desired output probability vector of the noise and the availability budget.
In this step, it can be understood that the sixth step calculates the actual output probability of the noise. The optimization goal is to minimize the distance between the output vector distribution of the training data after passing through the privacy attribute inference model and the expected distribution of the user, thereby quantifying the privacy protection effect of the algorithm. While the actual output probability is to satisfy the user's availability budget constraint.
Step 107: and generating a countermeasure sample, generating a countermeasure sample based on the noise actual output probability distribution, and sending the countermeasure sample to a parameter server so that the server performs parameter updating based on the countermeasure sample.
In this step, it can be understood that the seventh step outputs a challenge sample. And when the user participates in the federal learning, randomly selecting one sample from a group of confrontation samples corresponding to the user according to the output probability calculated in the sixth step, and sending the selected sample to the parameter server, so that the server updates the parameters based on the confrontation samples.
In this embodiment, it should be noted that the main purpose of this embodiment is to add countersample noise in the parameter update of the model, so that the countersample noise passes through the privacy attribute inference model and then outputs the privacy inference result with a specific probability distribution (e.g., uniform distribution), thereby alleviating the privacy attribute disclosure problem of the user in federal learning. The privacy protection method for federal learning provided by this embodiment is implemented by users participating in federal learning, and the server still performs parameter aggregation according to a general manner (e.g., weighted average, etc.).
According to the technical scheme, the privacy protection method facing the federal learning provided by the embodiment of the invention faces the federal learning scene, can effectively realize the balance between privacy and accuracy, and is suitable for two updating mechanisms, namely synchronous updating mechanism and asynchronous updating mechanism. Because the existing privacy protection method based on disturbance has the problems of low accuracy, incapability of measuring privacy protection effect and the like, the embodiment improves the disturbance mode of the current parameter, reduces the influence of privacy pre-calculation on parameter usability while ensuring data privacy, and improves the accuracy of the aggregation model. Model parameters actually submitted by a user are disturbed when an algorithm is designed, so that the privacy attribute inference result given by the inference model deviates from the real privacy attribute of the user, and the disturbance of the user on the model parameters does not excessively influence the accuracy of the finally aggregated global model. In addition, how to accurately measure the usability loss caused by the privacy mechanism and the privacy protection effect needs to be considered. Therefore, the concept of resisting the sample is adopted, a certain amount of distribution characteristics of the noise disturbance parameters are added in the parameter updating of the model, the noise disturbance parameters pass through the privacy attribute inference model, and then the privacy inference result is randomly output according to the probability distribution (such as uniform distribution) expected by the user, so that the privacy attribute inference attack is resisted, and the privacy attribute disclosure problem of federal learning is relieved. It should be noted that the probability distribution desired by the user refers to a probability with which the user desires to select one of the generated noises, for example, the user desires to output any privacy inference result with a uniform distribution of equal probabilities, so as to prevent an attacker from correctly inferring the privacy attributes of the user.
On the basis of the foregoing embodiment, in this embodiment, the determining, by the first calculating step, the minimized noise by using a fast gradient method based on the privacy attribute inference result specifically includes:
finding a set of noise r ═ r by fast gradient method1,r2,...rmIs such that noise r is added to the parameter x to be transmittediLater confrontation sample xi *The privacy attribute value output after the model f is inferred through the privacy attribute is i;
the calculation method of the rapid gradient method comprises the following steps:
Figure BDA0002849890730000111
the noise calculation method is as follows:
ri=xi *-x
where r represents a set of noises, r1、r2、rmAnd riAll represent noise, x represents a parameter to be transmitted, xi *Representing the generated countermeasure sample, f representing the privacy attribute inference model, i representing the privacy attribute value, epsilon being the availability budget, 1 representing a loss function for calculating the distance between the privacy attribute f (x) and the target privacy attribute i output by the privacy attribute inference model.
In this embodiment, for example, when a user needs to submit a parameter vector x to a server, a set of noise r ═ { r } is found by the fast gradient method1,r2,...rmIs such that noise r is added to the parameter x to be transmittediLater confrontation sample xi *The value of the privacy attribute output after passing through the privacy attribute inference model f is i, namely C (x + r)i)=C(xi *)=i,i∈{1,2,...,m}。
The fast gradient method and the noise calculation method are as follows:
Figure BDA0002849890730000121
ri=xi *-x
where ε is the availability budget specified by step 11, and l represents the loss function used to calculate the distance between the privacy attribute f (x) and the target privacy attribute i output by the privacy attribute inference model, common loss functions include, but are not limited to, cross entropy, mean square error, etc. The purpose of the clip operation is to confine the generated countermeasure sample values to the domain of the data itself, and to wipe out values that exceed the domain size to ensure that x is ultimately generated*={x1 *,x2 *,...,xm *Meet the practical application conditions.
According to the technical scheme, the privacy protection method facing the federal learning provided by the embodiment of the invention has the advantages that the privacy inference result is randomly output according to the probability distribution (such as uniform distribution) expected by the user after the model parameters pass through the privacy attribute inference model by adding the least noise, so that the privacy attribute inference attack is resisted, and the privacy attribute disclosure problem of the federal learning is relieved.
On the basis of the foregoing embodiment, in this embodiment, the second calculating step determines an actual output probability distribution of noise based on the minimized noise, the expected output probability vector of noise, and the availability budget, and specifically includes:
calculating a noise actual output probability distribution q based on the minimized noise, the expected output probability vector p of the noise and the availability pre-adoption of a first relation model; the first relationship model is as follows:
min KL(q,p)s.t.∑qi*||ri||≤ε,i∈{1,2,...,m}
the min KL (q, p) is an optimization target, q is the actual output probability of each noise, p represents the expected output probability of the noise, the KL (q, p) is used for calculating the KL divergence distance of the q and p two vector distributions, and the Σ q is used for calculating the KL divergence distance of the q and p two vector distributionsi*||riThe actual output probability q meets the availability budget constraint of the user with | | < epsilon, and | | | riIs the noise vector riL2 norm, qiRepresenting the ith noise actual output probability.
In the present embodiment, for example, the noise actual output probability q is calculated. The calculation process can be formalized as the following optimization problem:
min KL(q,p)s.t.∑qi*||ri||≤ε,i∈{1,2,...,m}
said min KL (q, p) is an optimization objective aimed at minimizing the parameter x to be sent*The KL distance between the output vector distribution q after model C and the user desired distribution p is inferred from the privacy attributes, where q is the actual output probability of each noise,
Figure BDA0002849890730000131
qi∈(0,1]thereby quantifying the privacy preserving effect of the algorithm with the value of KL (q, p). The sigma qi*||riThe actual output probability q needs to meet the availability budget constraint of the user because | | < epsilon, wherein | | riIs the noise vector riL2 norm.
The calculation formula of the KL distance is as follows:
Figure BDA0002849890730000132
according to the technical scheme, the privacy protection method facing the federal learning provided by the embodiment of the invention has the advantages that the privacy inference result is randomly output according to the probability distribution (such as uniform distribution) expected by the user after the model parameters pass through the privacy attribute inference model by adding the least noise, so that the privacy attribute inference attack is resisted, and the privacy attribute disclosure problem of the federal learning is relieved.
On the basis of the foregoing embodiment, in this embodiment, the generating a challenge sample step generates a challenge sample based on the actual noise output probability distribution, and sends the challenge sample to a parameter server, so that the server performs parameter update based on the challenge sample, specifically including:
when the user needs to send parameter x to the server, actually outputting probability distribution q based on the noiseiFrom r ═ { r1,r2,...rmRandomly selecting a noise riGenerating a confrontation sample xi *=x+ri
And combining the challenge sample xi *=x+riSending to a parameter server, such that the server is based on the countermeasure sample xi *=x+riAnd updating the parameters.
In the present embodiment, for example, the challenge sample is output. When the user needs to send parameter x to the server, actually outputting probability distribution q based on the noiseiFrom r ═ { r1,r2,...rmRandomly selecting a noise riSending countermeasure sample x to the serveri *=x+ri
In order to better understand the present invention, the following examples are further provided to illustrate the present invention, but the present invention is not limited to the following examples.
The method can be applied to the user equipment participating in learning in the federal learning. After the user equipment completes model training, a small amount of noise is added to the parameters to make the parameters become countermeasure samples, then the countermeasure samples are sent to the server for aggregation, and the server still conducts parameter aggregation according to a general mode (such as weighted average and the like) to obtain a global model. The countermeasure sample generated by the user can prevent an attacker from correctly judging the privacy attribute of the user training data, so that the real privacy information of the user is protected. The method comprises the following specific steps:
step one, setting parameters. The parameters include three parts: a set of privacy attributes, a desired output probability vector p of noise and an availability budget epsilon. The specific steps are shown in fig. 2, and include the following three steps.
Step 111, defining a user privacy attribute set, wherein the set contains m privacy attribute values, the numbers of the privacy attribute values are respectively {1, 2., m }, each piece of data in the user data set only corresponds to one privacy attribute value, and the privacy attribute set is not coincident with the target attribute set.
For example, for a gender classification model, it is assumed that the privacy attribute is skin color, where the set of privacy attributes is skin color attribute set { black race, yellow race, white race }, and the set of target attributes is gender attribute set { male, female }.
Defining a desired output probability vector p, p being a one-dimensional vector of length m, step 112
Figure BDA0002849890730000151
pi∈[0,1]I.e. the user desires with probability piFrom the generated m noises r ═ { r ═ r1,r2,...rmRandomly select noise r ini
For example,
Figure BDA0002849890730000152
indicates the user's desire to
Figure BDA0002849890730000153
Is selected for noise r1To do so by
Figure BDA0002849890730000154
Is selected for noise r2To do so by
Figure BDA0002849890730000155
Is selected for noise r3
Step 113, defining an availability budget epsilon, and constraining the noise amount added by the user, and ensuring the availability of the countermeasure sample.
Step two, dividing the number of usersAnd (6) collecting data. The user training data set X is divided into m subsets X ═ X (X) according to the set of privacy attributes defined in step 1111,X2,...,Xm). Wherein, the data set XiThe privacy attribute corresponding to each piece of data in the set is i, i ∈ {1, 2.
And step three, training a target model. User utilization training set XiTraining target model TiThe number of iteration rounds is tauiObtaining a target model set T ═ T1,T2,...,Tm}。
And step four, training a privacy attribute inference model. Setting the target model group T as { T ═ T1,T2,...,TmThe parameters in each updating iteration of the training stage are all collected to obtain D ═ D1,d2,..) as the training data of the privacy attribute inference model, the privacy attribute value of the training subset corresponding to each set of parameters is used as the data label, and an m-class privacy attribute classifier C is trained as the privacy attribute inference model. The privacy attribute inference model training data D are contained in a whole
Figure BDA0002849890730000156
The bar data.
And step five, calculating noise. When a user needs to submit a parameter vector x to a server, a set of noise r ═ { r ] is found through a fast gradient method1,r2,...rmIs such that noise r is added to the parameter x to be transmittediLater confrontation sample xi *The privacy attribute value output after the model C is deduced through the privacy attribute is i, namely C (x + r)i)=C(xi *)=i,i∈{1,2,...,m}。
The fast gradient method and the noise calculation method are as follows:
Figure BDA0002849890730000161
ri=xi *-x
wherein ε is determined by step 11, availability budget. The purpose of the clip operation is to confine the generated countermeasure sample values to the domain of the data itself, and to wipe out values that exceed the domain size to ensure that x is ultimately generated*={x1 *,x2 *,...,xm *Meet the practical application conditions.
And step six, calculating the actual noise output probability q. The calculation process can be formalized as the following optimization problem:
min KL(q,p)s.t.∑qi*||ri||≤ε,i∈{1,2,...,m}
said min KL (q, p) is an optimization objective aimed at minimizing the parameter x to be sent*The KL distance between the output vector distribution q after model C and the user desired distribution p is inferred from the privacy attributes, where q is the actual output probability of each noise,
Figure BDA0002849890730000162
qi∈(0,1]thereby quantifying the privacy preserving effect of the algorithm with the value of KL (q, p). The sigma qi*||riThe actual output probability q needs to meet the availability budget constraint of the user because | | < epsilon, wherein | | riIs the noise vector riL2 norm.
The calculation formula of the KL distance is as follows:
Figure BDA0002849890730000163
and seventhly, outputting the confrontation sample. When the user needs to send the parameter x to the server, the probability q obtained by the calculation in the step six is usediFrom r ═ { r1,r2,...rmRandomly selecting a noise riSending countermeasure sample x to the serveri *=x+ri
The method provided by the embodiment of the invention has the following advantages:
1. when the privacy is ensured by the existing protection method based on differential privacy, a large amount of noise needs to be added on the gradient in the stage of training a model by a user, and the accuracy of the model is sacrificed. The embodiment of the invention uses the generation mode of the countermeasure sample for reference, reduces the noise added to the parameters, improves the usability of the parameters and improves the precision of the global model.
2. According to the embodiment of the invention, the attack is inferred according to the privacy attributes, so that the privacy attributes of the user training data are effectively ensured not to be acquired by an attacker.
3. The existing scheme based on the safe multi-party computing protocol needs to meet the requirement that a certain number of users simultaneously perform model training and aggregation on line so as to correctly obtain the aggregation result of the global model, and therefore, the scheme is not suitable for updating of asynchronous federal learning. In the embodiment of the invention, the confrontation samples can be independently generated among the users, and the constraint parameters of usability and privacy are set in a personalized manner, so that two updating mechanisms of synchronization and asynchronization in federal learning can be covered.
4. The privacy attribute inference model is trained locally by the user, and the success rate of attacking the privacy attributes of the user data is improved. The attribute inference model trained by using the local data of the user has stronger attack capability, and the designed privacy protection scheme is more reliable by defending the stronger privacy attribute inference model.
5. According to the embodiment of the invention, the countermeasure sample idea is introduced, the model parameters are led to pass through the privacy attribute inference model by adding minimum noise, and then the privacy inference result is randomly output in the probability distribution (such as uniform distribution) expected by the user, so as to resist the privacy attribute inference attack, and further the privacy attribute leakage problem of federal learning is relieved. In addition, the user only needs to perform one-time disturbance on the sent model parameters, and the usability of the parameters is guaranteed.
Fig. 3 is a schematic structural diagram of a privacy protecting apparatus for federal learning according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes: a parameter setting module 201, a data dividing module 202, a first training module 203, a second training module 204, a first calculating module 205, a second calculating module 206 and a confrontation sample generating module 207, wherein:
the parameter setting module 201 sets parameters according to actual requirements and outputs the parameters; the parameters include a set of privacy attributes, an expected output probability vector for noise, and an availability budget; the privacy attribute set refers to a set of privacy attribute values to be protected, the expected output probability vector of the noise refers to an expected output privacy inference result, and the availability budget refers to availability budget constraint of noise amount;
a data dividing module 202, configured to divide a data set based on the privacy attribute set, and determine m training data subsets corresponding to the privacy attributes;
the first training module 203 trains m target models according to the m training data subsets corresponding to the privacy attributes to determine a model iteration parameter data set;
the second training module 204 is configured to train a privacy attribute inference model by using the model iteration parameter data set as training data and using a privacy attribute value corresponding to each set of parameter in the model iteration parameter data set as a data tag, and determine a privacy attribute inference result; the privacy attribute inference model is an m-class privacy attribute classifier;
a first calculation module 205, which determines a minimized noise by using a fast gradient method based on the privacy attribute inference result;
a second calculation module 206 for determining a noise actual output probability distribution based on the minimized noise, the expected output probability vector of the noise and the availability budget;
and a countermeasure sample generation module 207 for generating a countermeasure sample based on the noise actual output probability distribution and sending the countermeasure sample to a parameter server so that the server performs parameter update based on the countermeasure sample.
In order to better understand the present invention, the following examples are further provided to illustrate the present invention, but the present invention is not limited to the following examples.
Referring to fig. 4, the system is composed of a parameter setting module (i.e., a parameter setting module), a data dividing module, a training module (i.e., a first training module), a privacy attribute inferring module (i.e., a second training module), a noise calculating module (i.e., a first calculating module), a probability calculating module (i.e., a second calculating module), and a confrontation sample generating module (i.e., a confrontation sample generating module). The working process is as follows:
a user sets parameters in a parameter setting module according to actual requirements, and the set parameters are output after the setting is finished and serve as the input of a data dividing module, a noise calculating module and a probability calculating module;
in the data dividing module, according to the privacy attribute set, aiming at each privacy attribute, dividing a corresponding training data subset, wherein each subset corresponds to one privacy attribute and is used as the input of the training module;
respectively training corresponding target models by using each training data subset at a training module, and taking parameters obtained by each training update and corresponding privacy attributes as the input of a privacy attribute inference module and a noise calculation module;
taking the input parameters and the corresponding privacy attributes as training data of a privacy attribute inference model, training an m-class privacy attribute classifier as the privacy attribute inference model, and taking a probability vector generated after a parameter x to be sent by a user passes through the privacy attribute inference model as the input of a noise calculation module;
in a noise calculation module, a set of noise r ═ r is searched for a parameter x to be sent by a user by using a fast gradient method1,r2,...rmIs made to add noise r to the parameteriAnd outputting a privacy attribute value of i, i belonging to {1, 2.., m } after the subsequent sample passes through the privacy attribute inference model, and setting a noise vector group r as { r ═ r }1,r2,...rmTaking the probability as the input of a probability calculation module;
the probability calculation module is responsible for calculating the actual output probability distribution q of the noise and is used as the input of the confrontation sample generation module;
the challenge sample generation module takes the probability of input q from r ═ { r ═ r1,r2,...rmRandomly selecting a noise riOutputs the confrontation sample xi *=x+ri
The privacy protection device for federated learning provided in the embodiments of the present invention may be specifically used to execute the privacy protection method for federated learning in the embodiments described above, and the technical principle and the beneficial effects thereof are similar, which may be specifically referred to the embodiments described above, and are not described herein again.
Based on the same inventive concept, an embodiment of the present invention provides an electronic device, and referring to fig. 5, the electronic device specifically includes the following contents: a processor 301, a communication interface 303, a memory 302, and a communication bus 304;
the processor 301, the communication interface 303 and the memory 302 complete mutual communication through the communication bus 304; the communication interface 303 is used for realizing information transmission between related devices such as modeling software, an intelligent manufacturing equipment module library and the like; the processor 301 is used for calling the computer program in the memory 302, and the processor executes the computer program to implement the method provided by the above method embodiments, for example, the processor executes the computer program to implement the following steps: setting parameters, namely setting parameters according to actual requirements and outputting the parameters; the parameters include a set of privacy attributes, an expected output probability vector for noise, and an availability budget; the privacy attribute set refers to a set of privacy attribute values to be protected, the expected output probability vector of the noise refers to an expected output privacy inference result, and the availability budget refers to availability budget constraint of noise amount; a data dividing step, namely dividing a data set based on the privacy attribute set, and determining m training data subsets corresponding to the privacy attributes; a first training step, training m target models according to the m training data subsets corresponding to the privacy attributes, and determining a model iteration parameter data set; a second training step, namely, taking the model iteration parameter data set as training data, taking the privacy attribute value corresponding to each set of parameters in the model iteration parameter data set as a data tag to train a privacy attribute inference model, and determining a privacy attribute inference result; the privacy attribute inference model is an m-class privacy attribute classifier; a first calculation step of determining a minimized noise by a fast gradient method based on the privacy attribute inference result; a second calculation step of determining a noise actual output probability distribution based on the minimized noise, the expected output probability vector of the noise, and the availability budget; and generating a countermeasure sample, generating a countermeasure sample based on the noise actual output probability distribution, and sending the countermeasure sample to a parameter server so that the server performs parameter updating based on the countermeasure sample.
Based on the same inventive concept, another embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented to perform the methods provided by the above method embodiments when executed by a processor, for example, the steps of setting parameters, setting parameters according to actual requirements, and outputting the parameters; the parameters include a set of privacy attributes, an expected output probability vector for noise, and an availability budget; the privacy attribute set refers to a set of privacy attribute values to be protected, the expected output probability vector of the noise refers to an expected output privacy inference result, and the availability budget refers to availability budget constraint of noise amount; a data dividing step, namely dividing a data set based on the privacy attribute set, and determining m training data subsets corresponding to the privacy attributes; a first training step, training m target models according to the m training data subsets corresponding to the privacy attributes, and determining a model iteration parameter data set; a second training step, namely, taking the model iteration parameter data set as training data, taking the privacy attribute value corresponding to each set of parameters in the model iteration parameter data set as a data tag to train a privacy attribute inference model, and determining a privacy attribute inference result; the privacy attribute inference model is an m-class privacy attribute classifier; a first calculation step of determining a minimized noise by a fast gradient method based on the privacy attribute inference result; a second calculation step of determining a noise actual output probability distribution based on the minimized noise, the expected output probability vector of the noise, and the availability budget; and generating a countermeasure sample, generating a countermeasure sample based on the noise actual output probability distribution, and sending the countermeasure sample to a parameter server so that the server performs parameter updating based on the countermeasure sample.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
In addition, in the present invention, terms such as "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A privacy protection method facing federal learning is characterized by comprising the following steps:
setting parameters, namely setting parameters according to actual requirements and outputting the parameters; the parameters include a set of privacy attributes, an expected output probability vector for noise, and an availability budget; the privacy attribute set refers to a set of privacy attribute values to be protected, the expected output probability vector of the noise refers to an expected output privacy inference result, and the availability budget refers to availability budget constraint of noise amount;
a data dividing step, namely dividing a data set based on the privacy attribute set, and determining m training data subsets corresponding to the privacy attributes;
a first training step, training m target models according to the m training data subsets corresponding to the privacy attributes, and determining a model iteration parameter data set;
a second training step, namely, taking the model iteration parameter data set as training data, taking the privacy attribute value corresponding to each set of parameters in the model iteration parameter data set as a data tag to train a privacy attribute inference model, and determining a privacy attribute inference result; the privacy attribute inference model is an m-class privacy attribute classifier;
a first calculation step of determining a minimized noise by a fast gradient method based on the privacy attribute inference result;
a second calculation step of determining a noise actual output probability distribution based on the minimized noise, the expected output probability vector of the noise, and the availability budget;
generating a countermeasure sample, generating a countermeasure sample based on the noise actual output probability distribution, and sending the countermeasure sample to a parameter server so that the server performs parameter updating based on the countermeasure sample;
the first calculating step, which determines the minimized noise by using a fast gradient method based on the privacy attribute inference result, specifically includes:
finding a set of noise r ═ r by fast gradient method1,r2,…rmIs such that noise r is added to the parameter x to be transmittediLater confrontation sample xi *The privacy attribute value output after the model f is inferred through the privacy attribute is i;
the calculation method of the rapid gradient method comprises the following steps:
Figure FDA0003521795530000021
the noise calculation method is as follows:
ri=xi *-x
where r represents a set of noises, r1、r2、rmAnd riAll represent noise, x represents a parameter to be transmitted, xi *Representing the generated countermeasure sample, f representing a privacy attribute inference model, i representing a privacy attribute value, epsilon being an availability budget, and l representing a loss function for calculating a distance between a privacy attribute f (x) output by the privacy attribute inference model and a target privacy attribute i;
the second calculating step, determining the actual output probability distribution of the noise based on the minimized noise, the expected output probability vector of the noise and the availability budget, specifically includes:
calculating a noise actual output probability distribution q using a first relational model based on the minimized noise, the desired output probability vector p of the noise and the availability budget; the first relationship model is as follows:
Figure FDA0003521795530000022
wherein minKL (q, p) is an optimization target, q is the actual output probability of each noise, p represents the expected output probability of the noise, KL (q, p) is used for calculating KL divergence distance of q and p two vector distributions, and Σ qi*‖ri| ≦ ε such that the actual output probability q satisfies the user's availability budget constraint, | riIs the noise vector r |)iL2 norm, qiRepresenting the ith noise actual output probability.
2. The privacy protection method for federal learning according to claim 1, wherein the step of generating countermeasure samples includes generating countermeasure samples based on the actual output probability distribution of noise, and sending the countermeasure samples to a parameter server, so that the parameter server performs parameter update based on the countermeasure samples, and specifically includes:
when the user needs to send parameter x to the server, actually outputting probability distribution q based on the noiseiFrom r ═ { r1,r2,...rmRandomly selecting a noise riGenerating a confrontation sample xi *=x+ri
And combining the challenge sample xi *=x+riSending to a parameter server, such that the server is based on the countermeasure sample xi *=x+riAnd updating the parameters.
3. A privacy preserving apparatus for federal learning, comprising:
the parameter setting module is used for setting parameters according to actual requirements and outputting the parameters; the parameters include a set of privacy attributes, an expected output probability vector for noise, and an availability budget; the privacy attribute set refers to a set of privacy attribute values to be protected, the expected output probability vector of the noise refers to an expected output privacy inference result, and the availability budget refers to availability budget constraint of noise amount;
the data dividing module is used for dividing a data set based on the privacy attribute set and determining m training data subsets corresponding to the privacy attributes;
the first training module trains m target models according to the m training data subsets corresponding to the privacy attributes and determines a model iteration parameter data set;
the second training module is used for training a privacy attribute inference model by taking the model iteration parameter data set as training data and the privacy attribute value corresponding to each set of parameter in the model iteration parameter data set as a data tag, and determining a privacy attribute inference result; the privacy attribute inference model is an m-class privacy attribute classifier;
a first calculation module for determining a minimized noise by a fast gradient method based on the privacy attribute inference result;
a second calculation module that determines a noise actual output probability distribution based on the minimized noise, a desired output probability vector of the noise, and an availability budget;
a countermeasure sample generation module which generates a countermeasure sample based on the noise actual output probability distribution and sends the countermeasure sample to a parameter server so that the server performs parameter updating based on the countermeasure sample;
the first calculation module is specifically configured to:
finding a set of noise r ═ r by fast gradient method1,r2,...rmIs such that noise r is added to the parameter x to be transmittediLater confrontation sample xi *The privacy attribute value output after the model f is inferred through the privacy attribute is i;
the calculation method of the rapid gradient method comprises the following steps:
Figure FDA0003521795530000041
the noise calculation method is as follows:
ri=xi *-x
where r represents a set of noises, r1、r2、rmAnd riAll represent noise, x represents a parameter to be transmitted, xi *Representing the generated countermeasure sample, f representing a privacy attribute inference model, i representing a privacy attribute value, epsilon being an availability budget, and l representing a loss function for calculating a distance between a privacy attribute f (x) output by the privacy attribute inference model and a target privacy attribute i;
the second calculation module is specifically configured to:
calculating a noise actual output probability distribution q using a first relational model based on the minimized noise, the desired output probability vector p of the noise and the availability budget; the first relationship model is as follows:
Figure FDA0003521795530000042
the min KL (q, p) is an optimization target, q is the actual output probability of each noise, p represents the expected output probability of the noise, the KL (q, p) is used for calculating the KL divergence distance of the q and p two vector distributions, and the Σ q is used for calculating the KL divergence distance of the q and p two vector distributionsi*‖ri| ≦ ε such that the actual output probability q satisfies the user's availability budget constraint, | riIs the noise vector r |)iL2 norm, qiRepresenting the ith noise actual output probability.
4. The privacy preserving apparatus for federal learning as claimed in claim 3, wherein the generate confrontation sample module is specifically configured to:
when the user needs to send parameter x to the server, actually outputting probability distribution q based on the noiseiFrom r ═ { r1,r2,...rmRandomly selecting a noise riGenerating a confrontation sample xi *=x+ri
And combining the challenge sample xi *=x+riSending to a parameter server, such that the server is based on the countermeasure sample xi *=x+riAnd updating the parameters.
5. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the federal learning oriented privacy protection method of any of claims 1-2 when executing the program.
6. A non-transitory computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the federal learning oriented privacy protection method as claimed in any of claims 1-2.
CN202011523140.9A 2020-12-21 2020-12-21 Privacy protection method and device for federal learning Active CN112668044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011523140.9A CN112668044B (en) 2020-12-21 2020-12-21 Privacy protection method and device for federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011523140.9A CN112668044B (en) 2020-12-21 2020-12-21 Privacy protection method and device for federal learning

Publications (2)

Publication Number Publication Date
CN112668044A CN112668044A (en) 2021-04-16
CN112668044B true CN112668044B (en) 2022-04-12

Family

ID=75407419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011523140.9A Active CN112668044B (en) 2020-12-21 2020-12-21 Privacy protection method and device for federal learning

Country Status (1)

Country Link
CN (1) CN112668044B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032838B (en) * 2021-05-24 2021-10-29 易商征信有限公司 Label prediction model generation method, prediction method, model generation device, system and medium based on privacy calculation
CN113360945B (en) * 2021-06-29 2023-04-07 招商局金融科技有限公司 Noise adding method, device, equipment and medium based on differential privacy
CN113626866B (en) * 2021-08-12 2023-10-13 积至(海南)信息技术有限公司 Federal learning-oriented localization differential privacy protection method, system, computer equipment and storage medium
CN114118407B (en) * 2021-10-29 2023-10-24 华北电力大学 Differential privacy availability measurement method for deep learning
CN114169007B (en) * 2021-12-10 2024-05-14 西安电子科技大学 Medical privacy data identification method based on dynamic neural network
CN115640517A (en) * 2022-09-05 2023-01-24 北京火山引擎科技有限公司 Multi-party collaborative model training method, device, equipment and medium
CN115587381B (en) * 2022-12-12 2023-04-07 四川大学华西医院 Medical diagnosis model combined training method and system based on differential privacy
CN117313135B (en) * 2023-08-02 2024-04-16 东莞理工学院 Efficient reconfiguration personal privacy protection method based on attribute division

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766742A (en) * 2017-11-02 2018-03-06 广西师范大学 Dependent is the same as more correlation difference privacy matrix disassembling methods under distributional environment
CN110443063A (en) * 2019-06-26 2019-11-12 电子科技大学 The method of the federal deep learning of self adaptive protection privacy
CN111091199A (en) * 2019-12-20 2020-05-01 哈尔滨工业大学(深圳) Federal learning method and device based on differential privacy and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368752B (en) * 2017-07-25 2019-06-28 北京工商大学 A kind of depth difference method for secret protection based on production confrontation network
US20200202243A1 (en) * 2019-03-05 2020-06-25 Allegro Artificial Intelligence Ltd Balanced federated learning
CN110572253B (en) * 2019-09-16 2023-03-24 济南大学 Method and system for enhancing privacy of federated learning training data
CN111190487A (en) * 2019-12-30 2020-05-22 中国科学院计算技术研究所 Method for establishing data analysis model
CN111625820A (en) * 2020-05-29 2020-09-04 华东师范大学 Federal defense method based on AIoT-oriented security
CN111860832A (en) * 2020-07-01 2020-10-30 广州大学 Method for enhancing neural network defense capacity based on federal learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766742A (en) * 2017-11-02 2018-03-06 广西师范大学 Dependent is the same as more correlation difference privacy matrix disassembling methods under distributional environment
CN110443063A (en) * 2019-06-26 2019-11-12 电子科技大学 The method of the federal deep learning of self adaptive protection privacy
CN111091199A (en) * 2019-12-20 2020-05-01 哈尔滨工业大学(深圳) Federal learning method and device based on differential privacy and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Federated Learning with Bayesian Differential Privacy;Aleksei Triastcyn等;《2019 IEEE International Conference on Big Data (Big Data)》;20200224;全文 *
基于秘密分享和梯度选择的高效安全联邦学习;董业等;《计算机研究与发展》;20201009;第57卷(第10期);全文 *

Also Published As

Publication number Publication date
CN112668044A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN112668044B (en) Privacy protection method and device for federal learning
Hao et al. Towards efficient and privacy-preserving federated deep learning
Song et al. FDA $^ 3$: Federated defense against adversarial attacks for cloud-based IIoT applications
CN112749392B (en) Method and system for detecting abnormal nodes in federated learning
Ma et al. Differentially private byzantine-robust federated learning
Hao et al. Efficient, private and robust federated learning
CN107612878B (en) Dynamic window selection method based on game theory and wireless network trust management system
Wang et al. Privacy protection federated learning system based on blockchain and edge computing in mobile crowdsourcing
Zhang et al. G-VCFL: Grouped verifiable chained privacy-preserving federated learning
CN112380495B (en) Secure multiparty multiplication method and system
CN112560059B (en) Vertical federal model stealing defense method based on neural pathway feature extraction
Zhang et al. A survey on security and privacy threats to federated learning
CN115719085B (en) Deep neural network model inversion attack defense method and device
CN116187482A (en) Lightweight trusted federation learning method under edge scene
Li et al. Ubiquitous intelligent federated learning privacy-preserving scheme under edge computing
CN114239860A (en) Model training method and device based on privacy protection
CN116127519A (en) Dynamic differential privacy federal learning system based on blockchain
Cui et al. Boosting accuracy of differentially private federated learning in industrial IoT with sparse responses
Xu et al. CGIR: Conditional generative instance reconstruction attacks against federated learning
Pei et al. Privacy-enhanced graph neural network for decentralized local graphs
Kang et al. Communicational and computational efficient federated domain adaptation
Liu et al. Dynamic user clustering for efficient and privacy-preserving federated learning
Lu et al. Robust and verifiable privacy federated learning
Wang et al. Lds-fl: Loss differential strategy based federated learning for privacy preserving
Ghavamipour et al. Federated synthetic data generation with stronger security guarantees

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant