CN114785559A

CN114785559A - Differential privacy federation learning method for resisting member reasoning attack

Info

Publication number: CN114785559A
Application number: CN202210314533.1A
Authority: CN
Inventors: 陈隆; 马川; 韦康; 李骏
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-07-22

Abstract

The invention discloses a differential privacy federal learning method for resisting member reasoning attack, which specifically comprises the following steps: each client uses local data to train and generate a confrontation generation network model and generate false data; for each round of federal learning communication, a server randomly selects a client participating in the round of communication, and issues global network model parameters, loss functions adopted in the training process and an optimizer; the selected client side uses the false data to train the global network model and sends the trained global network model parameters back to the server side; the server side updates global network model parameters by adopting a federal average aggregation method; and the server side judges whether to continue the next communication, if so, the server side continuously releases the global network model parameters, otherwise, the server side ends the communication and stores the global network model parameters. The invention further protects the data privacy of the client under the condition of the original data island, and is beneficial to resisting member reasoning attack.

Description

Differential privacy federation learning method for resisting member reasoning attack

Technical Field

The invention relates to the technical field of machine learning, in particular to a differential privacy federation learning method for resisting member reasoning attack.

Background

Federal learning is a distributed machine learning framework with privacy protection technology, and aims to assist in participating in training of machine learning models through scattered clients without data leakage. The problem that different data owners cooperate without exchanging data is solved by designing a dummy model under a federal learning architecture. Because the data is not transferred, the privacy of the user can be effectively protected or the data specification is influenced.

But federal learning tends to perform poorly under resource constrained conditions, such as a small training data set, where federal learning training is hampered. Meanwhile, when some clients have insufficient data, the global network model trained by the clients can negatively affect the overall global network model when aggregated at the server side.

In the aspect of resisting member reasoning attacks, noise is generally added to parameters of a global network model in the process of training the global network model by a client side in the existing method for federal learning. The method for adding noise to the model parameters in the training process acquires the characteristic of resisting member reasoning attack at the cost of sacrificing larger model performance, has obvious defects and needs to be improved.

Disclosure of Invention

Aiming at member reasoning attack in a federated learning architecture, the invention provides a differential privacy federated learning method for resisting the member reasoning attack, so that the privacy disclosure risk of a system model is reduced, and the protection on privacy safety is enhanced.

The technical solution for realizing the purpose of the invention is as follows: a differential privacy federation learning method for resisting member reasoning attack comprises the following steps:

step 1, each client uses local data to train and generate a confrontation generation network model;

step 2, each client generates false data by using a countermeasure generation network model;

step 3, for each round of federal learning communication, a server randomly selects a client participating in the round of communication, and issues global network model parameters, loss functions adopted in the training process and an optimizer;

step 4, the selected client uses the false data to train the global network model, and sends the trained global network model parameters back to the server;

step 5, updating global network model parameters by adopting a federal average aggregation method at the server side;

and 6, the server side judges whether to continue the next communication, if so, the step 3 is returned, otherwise, the communication is ended, and the global network model parameters are stored.

Further, the server side and the client side communicate in a mode of transmitting global network model parameters and loss functions and optimizers required by training, and training data are not transmitted directly.

Further, the countermeasure generating network model adopts a conditional countermeasure generating network, and conditional constraints are added in the countermeasure generating network to limit the types and attributes of the generated false data.

Further, when generating the dummy data using the counter generation network model, setting parameters generates the dummy data similar to the original sample, or randomly generates the dummy data.

Further, when the server selects the client participating in the current round of communication, the client trained to resist and generate the network model is selected to participate in the current round of training.

Further, the client performs training of a global network model using the dummy data, wherein: the client selects a data set participating in global network model training, and the data set comprises a false data set with a real data set doped with a set proportion or completely adopts the false data set.

Further, a federated average algorithm or an SMPC algorithm is adopted when the server side aggregates the global network model parameters.

Compared with the prior art, the invention has the remarkable advantages that: (1) the adopted federated learning architecture generates false data through an anti-generation network, expands a user data set, and simultaneously increases the capability of resisting member reasoning attack, thereby effectively protecting the data privacy of a client; (2) for the condition that the resources of the client are limited, the performance of the global network model can be effectively improved by using a strategy of resisting generation of false data generated by the network, the problem of insufficient training of the global network model caused by insufficient data volume is weakened, and the performance of the global network model is improved while the data set is enriched; (3) the method adopts the counterforce generation network to generate the false data to participate in the training of the global network model, and the performance of the global network model for resisting member reasoning attack is increased along with the increase of the proportion of the false data, but the accuracy rate of the global network model is only slightly reduced, which is superior to the traditional noise adding method for resisting member reasoning attack.

Drawings

FIG. 1 is a flow chart of a differential privacy federated learning method of the present invention for defending against membership inference attacks.

FIG. 2 is a schematic diagram of a system in the model training process of the present invention.

FIG. 3 is a graph of classifier performance trained on mnist datasets with different ratios of spurious data according to the present invention.

FIG. 4 is a graph of the effect of defending against membership inference attacks based on the mnist data set.

Detailed Description

With reference to fig. 1 to 2, the invention provides a differential privacy federation learning method for resisting member inference attack, which specifically comprises the following steps:

step 3, for each round of federal learning communication, a server randomly selects client terminals participating in the communication, and issues global network model parameters, loss functions adopted in the training process and an optimizer;

Furthermore, the server side and the client side communicate by means of transmitting global network model parameters and loss functions and optimizers required by training, and training data are not transmitted directly.

Furthermore, the countermeasure generation network model adopts a conditional countermeasure generation network, and adds conditional constraints in the countermeasure generation network to limit the types and attributes of the generated false data.

Further, when generating the dummy data using the countermeasure generation network model, setting parameters generates the dummy data like the original sample, or randomly generates the dummy data.

Further, the client uses dummy data for training of the global network model, wherein: the client selects a data set participating in global network model training, and the data set comprises a false data set with a real data set doped with a set proportion or completely adopts the false data set.

Further, a Federal averaging algorithm or an SMPC algorithm is adopted when the server side aggregates the parameters of the global network model.

The invention is described in further detail below with reference to the figures and the embodiments.

Examples

The embodiment takes centralized federal learning as a basic framework and trains a classifier network (global network model) based on an mnist data set as an example, and illustrates specific implementation measures of the method:

for the client, local data (mnist data) is used for training the antagonistic generating network, and then random noise is used for generating false data (false mnist data) through the antagonistic generating network. In the training of the federated learning mnist classifier network, the client participating in the communication in the current round can use a certain amount of false data to participate in the training of the mnist classifier network issued by the server side, or can completely use the false data to participate in the training of the mnist classifier network issued by the server side, so as to replace the strategy that the client uses the real data to train in the unmodified federated learning method. The anti-generation network is like a protection layer, information of real data is hidden and protected in the anti-generation network, and the false data is used for participating in training and resisting member reasoning attack of an attacker.

For the server side, a certain number of client sides are randomly selected in each round of communication to participate in the training of the mnist classifier network model, the mnist classifier network model parameters are issued, and the loss function and the corresponding optimizer are used. And when the client end finishes training and returns the trained mnist classifier network model parameters, the server end uses the federal average to aggregate the client end mnist classifier network model parameters.

As shown in fig. 3, the abscissa is the number of communications of the federated learning server, and the ordinate is the accuracy of the mnist classifier network, where different curves represent the percentage of false data in the training set (the amount of false data participating in training/the total amount of data participating in training) when the client participating in the federated learning communications trains the mnist classifier network issued by the server, it can be seen that under the same number of communications, the accuracy of the mnist classifier network decreases with the increase of the percentage of false data in the training of the mnist classifier network, but even when the mnist classifier network is trained by completely using the false data (the percentage of false data =1), the accuracy of the mnist classifier can still reach 95% at 60 communications, and the global network model performance only decreases slightly.

In the aspect of resisting member reasoning attack, when the trained model of member reasoning attack network attack is used, the experimental result is shown in fig. 4, the abscissa is the ratio of the false data of the training set of the attacked network, the ordinate is the accuracy of the member reasoning attack network, and the benchmark is 0.5, so that the performance of the model for resisting the member reasoning attack is improved along with the improvement of the ratio of the false data.

In a traditional federal learning architecture, a client usually adopts a real data set to carry out federal learning training, and the method lacks the resistance to member reasoning attack and is difficult to prevent the privacy leakage problem generated by the client; by adopting the federated learning architecture of the method, false data is generated through the countermeasure generation network, the user data set is expanded, meanwhile, the capability of resisting member reasoning attack is increased, and the privacy of client data is effectively protected.

For the condition that the resources of the client are limited, the performance of the global network model can be effectively improved by using a strategy of resisting the generation of the network to generate false data, the problem of insufficient training of the global network model caused by insufficient data volume is weakened, and the performance of the global network model is improved while the data set is enriched;

in the aspect of resisting member reasoning attack, the method is different from the traditional method for adding noise to the model parameters, the method adopts the counterforce generation network to generate the false data to participate in the training of the global network model, the performance of the global network model for resisting the member reasoning attack is increased along with the increase of the proportion of the false data, but the accuracy rate of the global network model is only slightly reduced, and the method is superior to the traditional method for adding noise for resisting the member reasoning attack.

Claims

1. A differential privacy federation learning method for resisting member reasoning attack is characterized by comprising the following steps:

2. The differential privacy federation learning method for defending against membership inference attacks according to claim 1, wherein the server side and the client side communicate by means of transmission of global network model parameters and loss functions and optimizers required for training without direct transmission of training data.

3. The differential privacy federation learning method for defending against membership inference attacks according to claim 1, wherein the countermeasure generation network model adopts a conditional countermeasure generation network, and adds conditional constraints to the countermeasure generation network to limit the types and attributes of the generated false data.

4. The differential privacy federation learning method for defending against membership inference attacks of claim 1, wherein when the counterfeited data is generated using a counterfeited generation network model, setting parameters produces fake data similar to an original sample or randomly produces fake data.

5. The differential privacy federation learning method for resisting member inference attacks according to claim 1, wherein when the server selects the client participating in the current round of communication, the client which has been trained to resist the generation of the network model is selected to participate in the current round of training.

6. The differential privacy federated learning method against membership inference attacks according to claim 1, wherein the client uses dummy data for global network model training, wherein: the client selects a data set participating in global network model training, and the data set comprises a false data set with a real data set doped with a set proportion or completely adopts the false data set.

7. The differential privacy federation learning method for resisting member inference attacks according to claim 1, wherein a federation average algorithm or an SMPC algorithm is adopted when a server side aggregates global network model parameters.