CN113094758B

CN113094758B - Gradient disturbance-based federated learning data privacy protection method and system

Info

Publication number: CN113094758B
Application number: CN202110635849.6A
Authority: CN
Inventors: 王琛; 刘高扬; 伍新奎; 彭凯
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2021-08-13
Anticipated expiration: 2041-06-08
Also published as: CN113094758A

Abstract

The invention discloses a gradient disturbance-based federated learning data privacy protection method and system, belonging to the field of data privacy protection, and the method comprises the following steps: carrying out category prediction on samples in data participants by using a local model after federal learning training to obtain an original prediction probability vector; disturbing the original prediction probability vector to obtain a disturbed prediction probability vector with the same prediction label as that of the original prediction probability vector and the maximum angular deviation of the gradient of the prediction loss function relative to that of the original prediction probability vector; retraining each local model by taking the minimum difference between the original prediction probability vector and the disturbance prediction probability vector of each local model as a target; and aggregating the retrained local models to obtain a global model. The protected federated learning global model can effectively reduce the risk that the model prediction output and the model gradient reveal the privacy of user participants on the premise of maintaining the usability of the model.

Description

Gradient disturbance-based federated learning data privacy protection method and system

Technical Field

The invention belongs to the field of data privacy protection, and particularly relates to a gradient disturbance-based federated learning data privacy protection method and system.

Background

Deep learning, which is a method of implementing artificial intelligence, has been applied to various fields such as computer vision, data mining, and medical diagnosis. Deep learning requires massive training data as a support, which makes it challenging with data security and privacy protection related issues. In addition, if data information of each data participant is rejected to be shared due to the concern of data privacy, the central server cannot train to obtain an efficient and reliable learning model, and a data island is formed. The development and application ground of deep learning are severely limited by increasingly strict data privacy protection requirements and data island problems. To address these issues, federal learning has come. However, for federal learning, certain features of the data set of each data participant may be embedded in the model parameters or parameter updates uploaded at each training, resulting in a serious risk of privacy disclosure of user data in the global model deployed by the data aggregator to all data participants during the deployment phase. For example, a malicious service user can carry out member speculation attack based on the model access authority to deduce whether specific data is training data of other data participants; for example, a malicious service user can carry out model stealing attack based on the model access authority, and construct a substitution model by inquiring a corresponding result from the black box model so as to obtain a similar original model function; for example, a malicious service user can carry out data reconstruction attack, and the simulation data is optimized to enable the gradient of back propagation on the public model to approximate to a real gradient, so that the simulation data is close to the training data. Therefore, how to protect the final global model of federal learning, reduce the risk of data privacy disclosure and effectively resist the privacy attacks is a key issue of research in recent years.

At present, aiming at attacks of user privacy security in federal learning, mainly focusing on using data prediction output characteristics of a model and predicting gradient characteristics of loss on model parameters based on model data. Attacks based on data prediction output characteristics of the model mainly include member presumption attacks and model stealing attacks set by black box access. The defense schemes of the attacks are mainly based on fuzzy operation, such as truncation confusion and noise confusion, on the premise that a certain accuracy range is guaranteed for a prediction probability vector output by a target model, so that the difference between a fuzzy output result and an original output result is increased, and the data privacy safety is further protected. However, in a federal learning scenario, since an attacker can even access the target model in a white box, the actual structure and parameters of the target model can be obtained, and for given data, the attacker can directly calculate the output prediction vector of the data on the target model, so that the defense scheme based on the target model output prediction vector to perform fuzzy operation cannot resist the attacks, and the privacy of the user is leaked.

Attacks on gradient characteristics of model parameters based on model data prediction loss mainly include member speculation attacks and data reconstruction attacks set by white box access. The defense schemes of the attacks mainly add noise items to a loss function, and improve the privacy protection of data prediction loss on the gradient of model parameters while sacrificing the classification accuracy of the model. However, at higher privacy protection performance requirements, these defense schemes introduce excessive noise perturbations that can greatly reduce the usability of the model. Therefore, how to effectively defend the attack of the gradient characteristics of the model parameters on the basis of the data prediction output characteristics and the model data prediction loss of the final global model for the federated learning and maintain the usability of the model is still a very important issue.

Disclosure of Invention

Aiming at the risk of user privacy disclosure and the requirement of data participant privacy protection in the existing global model in federated learning, the invention provides a method and a system for protecting federated learning data privacy based on gradient disturbance, aiming at protecting the global model of federated learning and the protected federated learning global model, and effectively reducing the risk of model prediction output and model gradient disclosure of user participant privacy on the premise of maintaining the usability of the model.

In order to achieve the above object, the present invention provides a gradient perturbation based federal learning data privacy protection method, which includes: s1, performing category prediction on samples in data participants by using a local model after federal learning training to obtain an original prediction probability vector, wherein the local model corresponds to the data participants one to one; s2, disturbing the original prediction probability vector to obtain a disturbance prediction probability vector with the same prediction label as the original prediction probability vector and the maximum angular deviation of the gradient of the prediction loss function relative to the gradient of the prediction loss function of the original prediction probability vector; s3, retraining the local models of the data participants by taking the minimum difference between the original prediction probability vector and the disturbance prediction probability vector of each local model as a target; and S4, aggregating the retrained local models to obtain a global model, and deploying the global model to the data participants to protect the data privacy of the data participants.

Further, S3 is preceded by: calculating and obtaining a network layer with a privacy leakage risk lower than a risk threshold value in a global model after federal learning, and freezing network layer parameters corresponding to the network layer in the local model; retraining each of the local models in S3 includes: and retraining the network layer parameters which are not frozen in each local model.

Further, S3 is preceded by: calculating and obtaining a preset number of network layers with lowest privacy leakage risk in a global model after federal learning, and freezing network layer parameters corresponding to the network layers in the local model; retraining each of the local models in S3 includes: and retraining the network layer parameters which are not frozen in each local model.

Further, the S2 includes: selecting a first one-hot coding with the maximum angular deviation of the gradient of the prediction loss function relative to the gradient of the prediction loss function of the original prediction probability vector from a preset one-hot coding set, wherein the number of the one-hot codes in the one-hot coding set

Equal to the number of classes the local model can predict; calculating an interpolation between the first one-hot code and the original prediction probability vector to obtain the perturbation prediction probability vector.

Further, the perturbation prediction probability vector is:

wherein the content of the first and second substances,

a probability vector is predicted for the perturbation,

for the original prediction probability vector(s) in question,

for the purpose of said first one-hot encoding,

is the best interpolation point.

Further, the S2 further includes: and calculating the optimal interpolation point by using the constraint conditions that the prediction label of the disturbance prediction probability vector is the same as the prediction label of the original prediction probability vector, the gradient of the prediction loss function of the disturbance prediction probability vector is maximum, and the distance between the disturbance prediction probability vector and the original prediction probability vector is not greater than the preset disturbance intensity.

Further, in S2, the optimal interpolation point is calculated by using a binary search method.

Further, the first one-hot encoding is:

wherein the content of the first and second substances,

encoding the first one-hot code;

is the first in the set of one-hot codes

The single-hot code is coded by the single-hot code,

；

is the one-hot encoding set;

is a sample;

parameters of the local model;

the original prediction probability vector is obtained;

a prediction loss function for the original prediction probability vector;

is the first in the set of one-hot codes

A prediction loss function of one-hot coding;

is composed of

A gradient of (a);

is composed of

Of the gradient of (c).

Further, the difference in S3 is:

wherein the content of the first and second substances,

for the difference between the original prediction probability vector and the perturbation prediction probability vector of any local model,

the number of samples in the data participant corresponding to any local model,

is the first in any of the local modelsiThe original prediction probability vector for each sample,

is the first in any of the local modelsiAnd (4) disturbance prediction probability vectors corresponding to the samples.

According to another aspect of the invention, a gradient perturbation based federated learning data privacy protection system is provided, which comprises: the prediction module is used for carrying out class prediction on samples in data participants by using a local model after federal learning training to obtain an original prediction probability vector, wherein the local model corresponds to the data participants one by one; the disturbance module is used for disturbing the original prediction probability vector to obtain a disturbance prediction probability vector with the same prediction label as that of the original prediction probability vector and the maximum angular deviation of the gradient of the prediction loss function relative to that of the original prediction probability vector; the training module is used for retraining the local models of the data participants by taking the minimum difference between the original prediction probability vector and the disturbance prediction probability vector of each local model as a target; and the aggregation and protection module is used for aggregating the retrained local models to obtain a global model and deploying the global model to the data participants to protect the data privacy of the data participants.

Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:

(1) after the training of the Federal learning model is finished, protecting the model on the basis of the setting of original training, specifically, redefining a loss function for minimizing the difference of the prediction probability vectors before and after disturbance by using the output prediction probability vector of a data sample of a disturbance local model under the condition of keeping a model prediction label unchanged, retraining each local model, and then aggregating to obtain a final global model of Federal learning with data prediction output protection and gradient protection, thereby solving the problem of user privacy protection of the global model in Federal learning, effectively resisting the privacy attack of the data prediction output characteristic of the final global model based on the Federal learning and the model data prediction loss on the gradient characteristic of the model parameter, and resisting the technical problem of the attack of the data prediction output characteristic of the final global model based on the Federal learning and the model data prediction loss on the gradient characteristic of the model parameter, therefore, data in each data participant is prevented from being leaked, and the reliability of user data privacy protection is improved; in addition, the disturbed output prediction probability vector is restrained by the invariance of the model prediction label of the data and the condition of disturbance intensity, so that the final global model of federal learning can resist the malicious attacks, the influence of the reduction of the model prediction performance caused by disturbance gradient operation is reduced as much as possible, the stability of the availability of the global model is maintained, and the classification accuracy of the model is ensured; in the process of model gradient protection, the existing federal learning framework and training process do not need to be modified, so that the cost of the gradient disturbance-based federal learning data privacy protection method in actual deployment is greatly reduced;

(2) according to the gradient and the output of each layer in the deep learning model, the difference between the training data and the testing data is selected, the layer with the large difference is marked as the layer with the higher privacy risk, when each local model is retrained, only the network layer with the higher privacy risk is trained, and the parameters of other network layers are frozen, so that the gradient disturbance can be reduced, the influence of the gradient disturbance on the predictive performance of the model is further reduced, and the reliability of user data privacy protection is further improved.

Drawings

Fig. 1 is a flowchart of a gradient perturbation-based federated learning data privacy protection method according to an embodiment of the present invention.

Fig. 2 is a schematic application diagram of the gradient perturbation-based federated learning data privacy protection method according to the embodiment of the present invention.

Fig. 3 is a block diagram of a gradient perturbation-based federated learning data privacy protection system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

In the present application, the terms "first," "second," and the like (if any) in the description and the drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Fig. 1 is a flowchart of a gradient perturbation-based federated learning data privacy protection method according to an embodiment of the present invention. Referring to fig. 1 and fig. 2, the gradient perturbation based federated learning data privacy protection method in the present embodiment is described in detail, and the method includes operations S1-S4.

And operation S1, performing class prediction on the samples in the data participants by using the local model after the federal learning training to obtain an original prediction probability vector, wherein the local model corresponds to the data participants one by one.

An application scenario of the gradient disturbance-based federated learning data privacy protection method in this embodiment is, for example, a federated learning scenario composed of a plurality of data participants and one data aggregator; each data participant is provided with a local model and comprises a corresponding data setDData setDContains a large number of samples as shown in fig. 2.

The data participants are, for example, terminal equipment of the internet of things, a monitoring camera, a mobile terminal and the like. The data aggregator is, for example, a machine learning service provider, such as a federal learning service provider including a micro-member. The sample in the data participant is the collected data thereof, including the device running state information, the collected image information, the historical click record of the mobile terminal user and the like. The class of the sample is, for example, a class of objects contained in an image acquired by a certain data participant. The original prediction probability vector refers to the prediction output of an unprotected (after federal learning training) global model of federal learning to a specific input data (for example, a sample), wherein each value represents the probability of the model deducing that the data belongs to a specific category, and the category corresponding to the maximum probability is the category of the input data.

Taking the number of data participants as 4 and the number of data aggregators as 1 as an example, in this embodiment, for example, 80000 pieces of training data and 80000 pieces of test data are randomly and repeatedly sampled from 197000 data records of the purchasase 20 data set, and the selected training data and test data are evenly distributed to the 4 data participants as local data sets of the data participants, which are respectively used for training the local models. The local model is, for example, a 5-layer fully-connected network, and after all global iteration rounds of federal learning, each trained local model is obtained, for example, the parameters of the local model trained by the first data participant are

。

Taking the first data participant as an example, the local model finished by the federal learning training sequentially processes the data setDEach sample of

Performing species prediction, and calculating to obtain a sample

Original prediction probability vector of

. Taking the number of classes that the local model can predict as 20 as an example,

is, for example, [0.2,0.1,0.05, … …,0.3]Vector of motion

The total sum of the 20 probability values is 1, and the 20 probability values respectively represent prediction samples of the local model

Probabilities belonging to classes, e.g. 0.2 representing local model prediction samples

The probability of belonging to the first class is 0.2. The original prediction probability vector calculation process in other data participants is the same, and is not described herein again. Further, a local model is computed at the sample

Loss function for class prediction

Gradient of (2)

。

In operation S2, the original prediction probability vector is perturbed to obtain a perturbed prediction probability vector having the same prediction label as that of the original prediction probability vector and the largest angular deviation of the gradient of the prediction loss function with respect to that of the original prediction probability vector.

Operation S2 includes sub-operation S21-sub-operation S22, according to an embodiment of the invention.

In sub-operation S21, the gradient of the prediction loss function with respect to the original prediction is selected from a preset one-hot encoding setThe first one-hot coding with the maximum angular deviation of the gradient of the prediction loss function of the probability vector, and the number of one-hot codes in the one-hot coding set

Equal to the number of classes that the local model can predict.

In this embodiment, the number of types that can be predicted according to the local model

Carrying out one-hot coding to obtain a preset one-hot coding set

. One-hot coded representation of

Prediction probability vector with class label prediction probability of 1

，

。

Number of classes predictable with local model

For example, the one-hot codes of class 1 to class 4 are [1,0,0,0, respectively]、[0,1,0,0]、[0,0,1,0]And [0,0,0,1 ]]. For Purchase20 data sets belonging to 20 classification tags, a preset one-hot encoding set

From [1,0,0, …,0 ]]、[0,1,0,…,0]、[0,0,1,…,0]、……、[0,0,0,…,1]And 20 unique hot codes.

Further, traversing the set of one-hot codes

Selecting the gradient of the prediction loss function relative to the originalFirst one-hot encoding for which angular deviation of gradient of prediction loss function of initial prediction probability vector is largest

：

Wherein the content of the first and second substances,

is the first in the set of one-hot codes

The single-hot code is coded by the single-hot code,

；

is a one-hot coding set;

is a sample;

parameters of the local model;

the original prediction probability vector is obtained;

a prediction loss function that is an original prediction probability vector;

is the first in the set of one-hot codes

A prediction loss function of one-hot coding;

is composed of

A gradient of (a);

is composed of

Of the gradient of (c).

In a sub-operation S22, an interpolation between the first one-hot encoding and the original prediction probability vector is calculated to obtain a perturbed prediction probability vector

：

Wherein the content of the first and second substances,

in order to be the original prediction probability vector,

is the first one-hot code, and the second one-hot code,

is the best interpolation point.

In sub-operation S22, a binary search method is used to perform recursive iterative computation to obtain the optimal interpolation point

. In order to reduce the influence of the disturbance gradient on the prediction performance of the local model, the following three constraints should be applied in the process of calculating the optimal interpolation point: the prediction label of the perturbed prediction probability vector is the same as the prediction label of the original prediction probability vector, i.e.

(ii) a The gradient of the prediction loss function of the perturbation prediction probability vector is maximum, i.e.

Maximum; the distance between the perturbation prediction probability vector and the original prediction probability vector is not more than the preset perturbation intensity, i.e. the distance between the perturbation prediction probability vector and the original prediction probability vector is not more than the preset perturbation intensity

。

Therefore, the original prediction probability vector and the disturbance prediction probability vector corresponding to each sample in each data participant are obtained.

In operation S3, the local models of the data participants are retrained with the goal that the difference between the original prediction probability vector and the perturbation prediction probability vector of each local model is minimal.

According to an embodiment of the present invention, before performing operation S3, the method further includes: and calculating and obtaining a network layer with a leakage privacy risk lower than a risk threshold value in the global model after federal learning, and freezing network layer parameters corresponding to the network layer in the local model. Only the network layer parameters in the local model that are not frozen are trained in operation S3. The global model after the federal learning is obtained by aggregating each local model after the federal learning through an encryption algorithm.

According to another embodiment of the present invention, before performing operation S3, the method further includes: and calculating and obtaining a preset number of network layers with the lowest privacy leakage risk in the global model after federal learning, and freezing network layer parameters corresponding to the network layers in the local model. In operation S3, only the parameters of the network layer that are not frozen in each local model are retrained, and the frozen parameters are not modified during the retraining process.

The network layer near the output layer can extract more complex and abstract features than the network layer near the input layer in the training model, contains more information about the training set of the model, and is therefore usually selected for protection. Therefore, gradient disturbance can be reduced, and the influence of the gradient disturbance on the predictive performance of the model is further reduced.

Specifically, a loss function that minimizes the difference between the prediction probability vectors before and after the perturbation under the condition of maintaining the model prediction tag unchanged is defined in operation S3

：

Loss function

The difference between the original prediction probability vector and the perturbation prediction probability vector representing the local model. For any of the local models, the local model,

the number of samples in the data participant corresponding to the local model,

is the first in the local modeliThe original prediction probability vector for each sample,

is the first in the local modeliAnd (4) disturbance prediction probability vectors corresponding to the samples.

In this embodiment, the invariance of the prediction label (i.e., the same model prediction label before and after disturbance) indicates that the class result of the prediction output of the model for the same sample is unchanged before and after disturbance. For example, the prediction output class of the model before disturbance is the first class, and the prediction output class after disturbance is the first class, even if the probabilities of the prediction classes are different, the maximum probability is the same class.

And operation S4, aggregating the retrained local models to obtain a global model, and deploying the global model to the data participants to protect data privacy of the data participants.

According to the local models obtained through retraining in operation S3, the model parameters or parameter update amounts thereof are uploaded to the data aggregator, so that a global model capable of effectively protecting data privacy security of each data participant is obtained through aggregation. The protection method of the global model is based on perturbation to the model prediction output and gradient.

In this embodiment, a Python language is adopted to realize a prototype code of the gradient disturbance-based federated learning data privacy protection method, and experimental verification of feasibility and effectiveness is performed on a Purchase20 data set with 197000 data records. Experiments the final model of federal learning was attack tested using member-guessed attacks based on white-box access settings. Through simulation tests, the privacy protection performance of the final model for resisting member conjecture attack and the experimental results of the prediction performance of the final model before and after the application of the method are obtained, and the results are shown in table 1.

Experimental results show that the gradient disturbance-based federated learning data privacy protection method in the embodiment of the invention can effectively reduce the attack success rate of member conjecture attack and provide gradient protection for the final model of federated learning; meanwhile, the influence of the gradient of the disturbance model on the predictive performance of the model can be greatly reduced. In addition, only the mode of disturbing the last layer gradient of the final model can still well resist member conjecture attack, and higher model prediction performance can be reserved.

Fig. 3 is a block diagram of a gradient perturbation-based federated learning data privacy protection system according to an embodiment of the present invention. Referring to fig. 3, the gradient perturbation based federated learning data privacy protection system 300 includes a prediction module 310, a perturbation module 320, a training module 330, and an aggregation and protection module 340.

The prediction module 310, for example, performs operation S1, to perform class prediction on the samples in the data participants by using the local model after federal learning training, so as to obtain an original prediction probability vector, where the local model and the data participants are in one-to-one correspondence.

The perturbation module 320 performs operation S2, for example, to perturb the original prediction probability vector to obtain a perturbed prediction probability vector having the same prediction label as that of the original prediction probability vector and the largest angular deviation of the gradient of the prediction loss function relative to that of the original prediction probability vector.

The training module 330 performs, for example, operation S3 for retraining the local models of the data participants with the goal of minimizing the difference between the original prediction probability vectors and the perturbation prediction probability vectors of the local models.

The aggregation and protection module 340 performs operation S4, for example, to aggregate the retrained local models to obtain a global model, and deploy the global model to the data participants to protect data privacy of the data participants.

The gradient perturbation based federated learning data privacy protection system 300 is used for executing the gradient perturbation based federated learning data privacy protection method in the embodiment shown in fig. 1-2. For details, please refer to the federal learned data privacy protection method based on gradient perturbation in the embodiments shown in fig. 1-2, which is not described herein again.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A gradient perturbation based federated learning data privacy protection method is characterized by comprising the following steps:

s1, performing category prediction on samples in data participants by using a local model after federal learning training to obtain an original prediction probability vector, wherein the local model corresponds to the data participants one to one;

s2, disturbing the original prediction probability vector to obtain a disturbance prediction probability vector with the same prediction label as that of the original prediction probability vector and the maximum angular deviation of the gradient of the prediction loss function relative to that of the original prediction probability vector;

s3, retraining the local models of the data participants by taking the minimum difference between the original prediction probability vector and the disturbance prediction probability vector of each local model as a target;

and S4, aggregating the retrained local models to obtain a global model, and deploying the global model to the data participants to protect the data privacy of the data participants.

2. The gradient perturbation based federated learning data privacy protection method of claim 1, wherein the S3 is preceded by further comprising: calculating and obtaining a network layer with a privacy leakage risk lower than a risk threshold value in a global model after federal learning, and freezing network layer parameters corresponding to the network layer in the local model;

retraining each of the local models in S3 includes: and retraining the network layer parameters which are not frozen in each local model.

3. The gradient perturbation based federated learning data privacy protection method of claim 1, wherein the S3 is preceded by further comprising: calculating and obtaining a preset number of network layers with lowest privacy leakage risk in a global model after federal learning, and freezing network layer parameters corresponding to the network layers in the local model;

4. The gradient perturbation based federated learning data privacy protection method of claim 1, wherein the S2 includes:

selecting gradient of prediction loss function relative to preset one-hot coding setThe first one-hot coding with the maximum angle deviation of the gradient of the prediction loss function of the original prediction probability vector, and the number of the one-hot codes in the one-hot coding set

Equal to the number of classes the local model can predict;

calculating a linear interpolation between the first one-hot code and the original prediction probability vector to obtain the perturbation prediction probability vector.

5. The gradient perturbation based federated learning data privacy protection method of claim 4, wherein the perturbation prediction probability vector is:

wherein the content of the first and second substances,

a probability vector is predicted for the perturbation,

for the original prediction probability vector(s) in question,

for the purpose of said first one-hot encoding,

is the best interpolation point.

6. The gradient perturbation based federated learning data privacy protection method of claim 5, wherein the S2 further comprises:

and calculating the optimal interpolation point by using the constraint conditions that the prediction label of the disturbance prediction probability vector is the same as the prediction label of the original prediction probability vector, the gradient of the prediction loss function of the disturbance prediction probability vector is maximum, and the distance between the disturbance prediction probability vector and the original prediction probability vector is not greater than the preset disturbance intensity.

7. The gradient perturbation based federated learning data privacy protection method of claim 6, wherein the optimal interpolation point is calculated in S2 using a binary search method.

8. The gradient perturbation based federated learning data privacy protection method of claim 4, wherein the first one-hot code is:

wherein the content of the first and second substances,

encoding the first one-hot code;

is the first in the set of one-hot codes

The single-hot code is coded by the single-hot code,

；

is the one-hot encoding set;

is a sample;

parameters of the local model;

the original prediction probability vector is obtained;

a prediction loss function for the original prediction probability vector;

is the first in the set of one-hot codes

A prediction loss function of one-hot coding;

is composed of

A gradient of (a);

is composed of

Of the gradient of (c).

9. The gradient perturbation based federated learning data privacy protection method of any one of claims 1-8, wherein the difference in S3 is:

wherein the content of the first and second substances,

the number of samples in the data participant corresponding to any local model,

10. A gradient perturbation based federated learning data privacy protection system, comprising:

the prediction module is used for carrying out class prediction on samples in data participants by using a local model after federal learning training to obtain an original prediction probability vector, wherein the local model corresponds to the data participants one by one;

the disturbance module is used for disturbing the original prediction probability vector to obtain a disturbance prediction probability vector with the same prediction label as that of the original prediction probability vector and the maximum angular deviation of the gradient of the prediction loss function relative to that of the original prediction probability vector;

the training module is used for retraining the local models of the data participants by taking the minimum difference between the original prediction probability vector and the disturbance prediction probability vector of each local model as a target;

and the aggregation and protection module is used for aggregating the retrained local models to obtain a global model and deploying the global model to the data participants to protect the data privacy of the data participants.