LU504296B1

LU504296B1 - Privacy protection data sharing method and system based on distributed gan

Info

Publication number: LU504296B1
Application number: LU504296A
Authority: LU
Inventors: Aiyan Wu; Chao Wang; Ke Xiao; Shuo Wang; Yunhua He; Xiaoqing Xue
Original assignee: Univ North China Technology
Priority date: 2022-08-28
Filing date: 2023-03-24
Publication date: 2024-03-08
Also published as: CN115442099B; CN115442099A; WO2024045581A1

Abstract

The present application discloses a privacy protection data sharing method and system based on a distributed GAN, and relates to the technical field of data sharing and privacy protection. The method comprises: providing a plurality of personalized contracts by a center server; each data owner among a plurality of data owners selecting a personalized contract according to own privacy protection requirements; each data owner using own private data set to pre-train a local GAN model; designing a privacy protection level selection strategy by the center server; the data owner assisting in training optimizing a center generator model of the center server to complete privacy protection data sharing. The present application can realize data sharing by using the local data set of the data owner to cooperate with the training center to generate the model on the premise of not transmitting the original data, realize the training of the model under the guarantee of differential privacy, and design different privacy protection contracts for data owners with different privacy preferences.

Description

PRIVACY PROTECTION DATA SHARING METHOD AND SYSTEM BASED ON 10906698

DISTRIBUTED GAN

TECHNICAL FIELD

[0001] The present application relates to the technical field of data sharing and privacy protection, in particular to a privacy protection data sharing method and system based on a distributed GAN.

BACKGROUND

[0002] Nowadays, the number of sensor devices shows an explosive growth trend, which is followed by "massive" data generated by Internet of Things terminals. These high-quality data make machine learning have a great influence in many fields such as image recognition, autonomous driving, product recommendation and so on. Data with high availability has become the main driving force for the development of machine learning. However, there is still not enough training data for machine learning tasks, which is mainly due to the public's concern about data leakage and the enhanced awareness of privacy protection. Specifically, shared data may contain users' private information, and data owners are reluctant to share data with others because of privacy leakage. In addition, there are cases where confidential data cannot be transmitted and can only be saved locally by the owner. Therefore, protecting the privacy of data owners and encouraging them to share data is becoming one of the key bottlenecks for the further development of machine learning.

[0003] In view of the privacy problem in data sharing, researchers from all walks of life have put forward a series of solutions. Some researchers use technologies based on ABE (Attribute-Based Encryption), SMC (Secure Multi-Party Computing) and blockchain to realize privacy protection by hiding user identities in data sharing or designing fine-grained access control mechanisms, for example, [Pu Y, Hu C, Deng S, et al. R?PEDS: a recoverable and revocable privacy-preserving edge data sharing solution[J]. IEEE Internet of Things

Journal, 2020, 7(9): 8077-8089.], [Zheng X, Cai Z. Privacy-preserved data sharing towards 1 multiple parties in industrial IoTs[J]. IEEE Journal on Selected Areas in Communications, 0506296 2020, 38(5): 968-979.], [Xu X, Liu Q, Zhang X, et al. A blockchain-powered crowdsourcing method with privacy preservation in mobile environment[J]. IEEE Transactions on

Computational Social Systems, 2019, 6(6): 1407-1419]. However, this kind of solution focuses on the implementation of authentication and access control mechanism, which requires not only the transmission of raw data but also a lot of extra calculation. The rise of federated learning provides a new solution, which can realize model training without transmitting original data. However, when the training task changes or the machine learning model is updated, it needs to repeatedly access private data sets, which increases the risk of privacy leakage.

[0004] The existing solutions to privacy protection in data sharing of Internet of Things based on artificial intelligence can be roughly divided into two categories, one is data sharing based on federated learning, and the other is data sharing based on generating countermeasure networks. These two categories do not need to upload users' original data, which protects users' privacy to a certain extent, but there are still some limitations. Their shortcomings will be introduced and summarized respectively below.

[0005] The rise of federated learning has broken the limitation that artificial intelligence technology needs to concentrate on data collection and processing. Therefore, federated learning can be used in a wide range of IoT(Internet of Thing) services, providing a new solution for data sharing with privacy protection. For example, in IoV(Internet of Vehicles), data sharing between vehicles can improve service quality. In order to reduce the transmission load and solve the privacy problem in data sharing, the author proposes a new architecture based on federated learning, [Lu Y, Huang X, Zhang K, et al. Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles[J]. IEEE

Transactions on Vehicular Technology, 2020, 69(4): 4298-4311.]. They developed a hybrid blockchain architecture consisting of blockchain and local DAG(Directed Acyclic Graph) to improve the security and reliability of model parameters. The paper [Yin L, Feng J, Xun H, et al. A privacy-preserving federated learning for multiparty data sharing in social IoTs[J]. IEEE

Transactions on Network Science and Engineering, 2021, 8(3): 2706-2718.] also uses federated learning to share data, but the author proposes a new hybrid privacy protection 2 method to overcome the disclosure of data and content in federated learning. They use 0506296 advanced functional encryption algorithm and local Bayesian differential privacy to preserve the characteristics of uploaded data and the weight of each participant in the weighted summation process.

[0006] Since a GAN (Generative Adversarial Network) is suitable for all kinds of data, many researchers jointly train GAN instead of directly transmitting data to realize data sharing with privacy protection. In CPSSs (Cyber-Physical-Social Systems), human interaction from cyberspace to the physical world is realized through the sharing of spatio-temporal data. In order to balance privacy protection and data utility, the author uses a modified GAN model to run two games (between generator, discriminator and differential private identifier) at the same time [Qu Y, Yu S, Zhou W, et al. Gan-driven personalized spatial-temporal private data sharing in cyber-physical social systems[J]. IEEE Transactions on Network Science and Engineering, 2020, 7(4): 2576-2586.]. In the paper [Chang Q, Qu H,

Zhang Y, et al. Synthetic learning: Learn from distributed asynchronized discriminator gan without sharing medical image data[C]//Proceedings of the IEEE/CVF Conference on

Computer Vision and Pattern Recognition. 2020: 13856-13866.], the author proposed a distributed GAN framework with high privacy protection and high communication efficiency, which was called distributed asynchronous discriminator GAN (AsynDGAN). It aims to learn from the distributed discriminators and train the center generator to train the segmentation model only by using the generated synthetic images.

[0007] These two methods still have some limitations, which are as follows: 1) using federated learning solutions can train task models without uploading data; however, these solutions still have a great risk of privacy leakage, because when the task changes or the machine learning architecture is updated, private data sets need to be revisited many times; 2) the existing solutions based on a GAN cannot balance the relationship between privacy protection and data availability, and cannot meet the personalized privacy protection needs of data owners. 3

SUMMARY LU504296

[0008] The present application aims at the problem of how to protect the privacy of data owners and encourage them to share data.

[0009] In order to solve the above technical problems, the present application provides the following technical solution:

[0010] In one aspect, the present application provides a privacy protection data sharing method based on a distributed GAN, characterized in that the method is implemented by a privacy protection data sharing system based on a distributed GAN, and the system includes a center server and a plurality of data owners;

[0011] the method includes the following steps:

[0012] S1, providing a plurality of personalized contracts by the center server;

[0013] S2, each data owner of the plurality of data owners selecting a personalized contract from the plurality of personalized contracts;

[0014] S3, each data owner using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model;

[0015] S4, designing a privacy protection level selection strategy by the center server;

[0016] SS, the plurality of data owners optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.

[0017] Optionally, the plurality of personalized contracts in S1 include a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.

[0018] Optionally, the step of each data owner using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model in S3 includes:

[0019] S31, each data owner obtaining an original GAN model from the center server;

[0020] S32, each data owner using the local private data set of the data owner to pre-train 4 the original GAN model to obtain the pre-trained local GAN model. 17504296

[0021] Optionally, the local GAN model includes a local generator and a local discriminator;

[0022] after obtaining the pre-trained local GAN model in S32, the method further includes:

[0023] each data owner hiding the pre-trained local generator.

[0024] Optionally, the step of the plurality of data owners optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing in step SS includes:

[0025] S51, determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy;

[0026] S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is # according to the privacy protection level & and the personalized contract selected by each data owner;

[0027] S53, randomly selecting, by the center server, one data owner from the plurality of data owners whose privacy protection level is © as the data owner assisting in training;

[0028] S54, the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training, and after optimization, proceeding to S51 for iterative training until a number of iterations reaches a preset threshold, thereby completing the training of the center generator model.

[0029] Optionally, the step of determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy in S51 includes:

[0030] S511, determining, by the center server, an attenuation function of a noise scale 5 according to the number of iterations in the training process of the center generator model; 0506296

[0031] S512, determining, by the center server, the noise scale according to the attenuation function;

[0032] S513, determining, by the center server, the privacy protection level # of the data owner who assists this round of training according to the noise scale.

[0033] Optionally, the step of the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training in S54 includes:

[0034] S541, the data owner assisting in training obtaining data generated by the center generator model from the center server;

[0035] S542, the data owner assisting in training updating the local discriminator in the pre-trained local GAN model according to the data generated by the center generator model and the private data set of the data owner assisting in training,

[0036] S543, the data owner assisting in training calculating a gradient according to the updated local discriminator;

[0037] S544, the data owner assisting in training perturbing the gradient based on a personalized differential privacy theory to obtain a perturbed gradient;

[0038] S545, optimizing, by the center server, the center generator model of the center server according to the perturbed gradient.

[0039] Optionally, the step of perturbing the gradient based on a personalized differential privacy theory in S544 includes:

[0040] perturbing the gradient based on Gaussian mechanism and perturbance degree;

Wherein the perturbance degree is determined by the privacy protection level of the personalized contract.

[0041] In another aspect, the present application provides a privacy protection data sharing system based on a distributed GAN, which is applied to implement a privacy protection data sharing method based on a distributed GAN. The system includes a center server and a plurality of data owners, wherein:

[0042] the center server is configured for providing a plurality of personalized contracts, 6 and designing a privacy protection level selection strategy; 0506296

[0043] the plurality of data owners are configured for selecting a personalized contract from the plurality of personalized contracts; using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model; optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.

[0044] Optionally, the plurality of personalized contracts in S1 include a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.

[0045] Optionally, the plurality of data owners are further configured for the following:

[0046] S31, each data owner obtaining an original GAN model from the center server;

[0047] S32, each data owner using the local private data set of the data owner to pre-train the original GAN model to obtain the pre-trained local GAN model.

[0048] Optionally, the plurality of data owners are further configured for the following:

[0049] S51, determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy;

[0050] S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is # according to the privacy protection level # and the personalized contract selected by each data owner;

[0051] S53, randomly selecting, by the the center server, one data owner from the plurality of data owners whose privacy protection level is as the data owner assisting in training,

[0052] S54, the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in 7 training, and after optimization, proceeding to S51 for iterative training until a number of 0506296 iterations reaches a preset threshold, thereby completing the training of the center generator model.

[0053] The technical solution provided by the embodiment of the present application at least has the beneficial effects.

[0054] In the above solution, a privacy protection data sharing solution based on asynchronous distributed GAN 1s proposed to solve the privacy problem in data sharing of

Internet of Things. Combining differential privacy theory with distributed GAN, a central generation model is trained by using local data sets of each data owner in a personalized privacy protection way. The proposed distributed GAN training framework can realize data sharing by using the local data set of the data owner to cooperate with the training center to generate the model without transmitting the original data, and then use the center to generate the model to rebuild the data set for the downstream tasks. Based on the differential privacy theory, a gradient desensitization strategy is proposed, which preserves the availability of gradient to the greatest extent under the premise of protecting user privacy and optimizes the model under the guarantee of differential privacy. Multi-level privacy protection contracts are designed for data owners with different privacy preferences, and the differential privacy level selection strategy is proposed, which can balance data availability and user privacy protection needs and complete the model training with minimum privacy consumption.

BRIEF DESCRIPTION OF DRAWINGS

[0055] In order to explain the technical solution in the embodiment of the present application more clearly, the drawings needed in the description of the embodiment will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For ordinary people in the field, other drawings can be obtained according to these drawings without creative work.

[0056] FIG. 1 is a flowchart of a privacy protection data sharing method based on a distributed GAN provided by an embodiment of the present application;

[0057] FIG. 2 is a block diagram of a privacy protection data sharing system based on a 8 distributed GAN provided by an embodiment of the present application. 0506296

DESCRIPTION OF EMBODIMENTS

[0058] In order to make the technical problems, technical solutions and advantages to be solved by the present application more clear, the following will be described in detail with the attached drawings and specific embodiments.

[0059] As shown in FIG. 1, an embodiment of the present application provides a privacy protection data sharing method based on a distributed GAN, which can be realized by a privacy protection data sharing system based on a distributed GAN. As shown in the flowchart of the privacy protection data sharing method based on a distributed GAN in FIG. 1, the processing flow of the method may include the following steps:

[0060] In S1, a center server provides a plurality of personalized contracts.

[0061] Optionally, the plurality of personalized contracts in S1 include a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.

[0062] In a feasible implementation, the center server designs a series of personalized contracts tL with different privacy protection levels and rewards for data owners at the beginning of data sharing to meet the privacy protection needs of data owners with different privacy biases. Among them, the higher the level of privacy protection, the less rewards, and the data owner can choose the corresponding contract to maximize his own profits. Then, the server publishes the data requirements and contracts to the data owners registered in the system (i.e., U).

[0063] In this embodiment, the center server has powerful computing power and communication bandwidth. Its purpose is to recruit enough data owners to train a center generator cooperatively until it has strong data generation ability. The embodiment of the present application assumes that the center server will not violate the defined protocol, but may try to infer the user's privacy.

[0064] In S2, each data owner among a plurality of data owners selects a personalized contract from a plurality of personalized contracts. 9

[0065] In a feasible implementation, the data owner set ‘ consists of 101 data owners, and each data owner ® # U owns a private data set D, including N data samples (i.e., DEM ). These data owners have certain computing and communication skills and want to use private data sets to participate in training tasks in exchange for some rewards. But they want to protect their privacy from reasoning attacks from the center server.

In addition, different users have different privacy preferences (that is, sensitivity to privacy exposure), so personalized privacy protection is needed.

[0066] In S3, each data owner uses the local private data set of the data owner to pre-train the local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model.

[0067] Optionally, the step of each data owner in S3 using the local private data set of the data owner to pre-train the local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model includes the following steps.

[0068] In S31, each data owner obtains an original GAN model from the center server.

[0069] In a feasible implementation, the data owner # © ¥ who meets the requirements signs a specific contract with the server according to his privacy protection needs and downloads the original GAN model.

[0070] In S32, each data owner uses the local private data set of the data owner to pre-train the original GAN model to obtain the pre-trained local GAN model.

[0071] Optionally, the local GAN model includes a local generator and a local discriminator.

[0072] After obtaining the pre-trained local GAN model in S32, the method further includes:

[0073] each data owner hiding the pre-trained local generator.

[0074] In a feasible implementation, the embodiment of the present application proposes an asynchronous distributed GAN training framework with privacy protection, which uses the local data set of the data owner to cooperate with the training center to generate the model.

[0075] Further, all the data owners involved in the training use their private data sets to 10 pre-train the GAN model locally. After the pre-training, the generator that can generate 0506296 simulation data is hidden, and the local discriminator is used to assist the server to train the center generator.

[0076] Further, the pre-training process includes: firstly, preprocessing the private data set according to the data requirements, and then training the local GAN model. The pre-training process is described in detail in the following algorithm 1:

Algorithm 1 Pre-training

Super parameter: number of iterations 7, learning rate %, and #5, number of penalty iterations I, batch size & 1: Preprocessing a local data set 1}, according to data requirements; 2: Downloading the original model as a local generator; and a discriminator #, ; 3: for step in {1,...,7% 4: forciin {i.. 7%} 5: select a sample batch fx og 6: select a sample batch {,12, where x, ~&; 7: Bi om -n STV LB Ola 8): 8: end for 0 Ween, ST (BS CH BE, BE) : 10: end for 11: The generator &; is hidden and the discriminator &; is used to assist in training;

[0077] after the pre-training, each data owner has a trained generator and discriminator locally. The generator that has learned the local data distribution will be hidden, and the discriminator will be stored locally to assist the training center generator. The purpose of assisted training is to train the center generator using the local discriminator Sn and the private data set Da on the data owner &.

[0078] In S4, the center server designs a privacy protection level selection strategy.

[0079] In a feasible embodiment, in order to optimize the center generator with the minimum privacy cost, the embodiment of the present application designs a privacy 11 protection level selection strategy to select the corresponding data owner to assist in training 0506296 in each round.

[0080] In S5, a plurality of data owners optimize the center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.

[0081] Optionally, the step of a plurality of data owners optimizing the center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing in SS includes:

[0082] S51: determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy.

[0083] Optionally, the step of determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy in S51 includes:

[0084] S511, determining, by the center server, an attenuation function of a noise scale according to the number of iterations in the training process of the center generator model;

[0085] S512, determining, by the center server, the noise scale according to the attenuation function;

[0086] S513, determining, by the center server, the privacy protection level # of the data owner who assists this round of training according to the noise scale.

[0087] In a feasible implementation, the center server designs a privacy protection level selection strategy to determine the privacy protection level # of the data owner who assists in this round of training, and then randomly selects one from the data owner who signed the contract with privacy protection level # and uses his local discriminator for this round of training.

[0088] In S54, the data owner assisting in the training optimizes the center generator 12 model of the center server according to the pre-trained local GAN model of the data owner 0506296 assisting in the training, and after the optimization, proceeds to S51 for iterative training until the number of iterations reaches a preset threshold, and the center generator model training is completed.

[0089] Optionally, the step of the data owner assisting in the training optimizes the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in the training includes the following steps.

[0090] In S541, the data owner assisting in training obtains the data generated by the center generator model from the center server;

[0091] in a feasible implementation, the selected data owner * receives the data generated by the center generator;

[0092] in S542, the data owner assisting in training updates the local discriminator in the pre-trained local GAN model according to the data generated by the center generator model and the private data set of the data owner assisting in training;

[0093] in S543, the data owner assisting in training calculates a gradient according to the updated local discriminator;

[0094] in S544, the data owner assisting in training perturbs the gradient based on a personalized differential privacy theory to obtain a perturbed gradient;

[0095] in a feasible implementation, the data owner ¥ perturbs the calculated gradient based on the differential privacy theory, wherein the degree of perturbation is determined by the privacy protection level specified in the signed contract. Then, it sends the perturbed gradient to the center server for optimization of the generator;

[0096] optionally, the step of perturbing the gradient based on personalized differential privacy theory in S544 includes:

[0097] perturbing the gradient based on Gaussian mechanism and perturbance degree.

[0098] The perturbance degree is determined by the privacy protection level of personalized contract.

[0099] In S545, the center server optimizes the center generator model of the center server according to the perturbed gradient.

[00100] In a feasible implementation, the center server updates the center generator model 13 according to the perturbance gradient of the selected data owner. Then the center server 0506296 re-selects the privacy protection level and data owner for the next round of auxiliary training until the training of the center generator is completed.

[00101] The embodiment of the present application proposes a personalized privacy protection strategy, which realizes differential privacy protection by perturbing the gradient calculated locally by data owners, and the privacy protection level is specified by the contract signed by each data owner.

[00102] Further, there is no discriminator on the center server, and its optimization depends entirely on the discriminator on the data owner's side. And in order to maximize the model performance with the minimum privacy cost, the embodiment of the present application proposes a privacy protection level selection strategy to select different privacy protection levels in different training stages and complete the training with the minimum privacy loss. In each iteration, the server selects a data owner according to the policy and uses its local discriminator to optimize the center generator. The optimization process of the center generator is described in the following algorithm 2:

Algorithm 2 Center Generator Training

Super parameter: number of iterations Ÿ, learning rate 3, and #,, number of penalty iterations I, batch size # 1: initialize the center generator 8%; 2: for step in {1,..., 7% 3: select the data owner $u$ according to the privacy protection level selection strategy; 4: for ciin {1.7} 5: select a sample batch; {x,}2,, where x, — À ; 6. send the generated data G{z,,&;} to the data owner % ; 7: {212 * assist in training (3, 8.3) 8: end for 9: Bo 85-7 = Ti 3

Fo 10: end for 14

[00103] The assisted training process of each data owner (line 7) is shown in the following 0506296 algorithm 3. In the assisted training phase, the data owner uses his local discriminator and private data set to optimize the center generator. As detailed below, the selected data owner will first receive the generated data from the center server and update the discriminator with the generated data and the local data set. Then, the gradient is calculated by local discriminator, and the gradient is perturbed by personalized differential privacy before it is transmitted back. The degree of perturbance is determined by the privacy protection level in the signed contract.

Algorithm 3 Assists in training

Super parameter: Learning rate *,, batch size &, privacy protection level :, 1: acquire a private data set D, and a local discriminator & 2: receive the generated data CH, &,} from the center server; 3: select a sample batch; x30, = Di 4 — 8-0, = 7, Lo 85% Ol BD 5; calculate the gradient 5 = Va L #8}, where LBs) = DZ; 8.3); 6: # © gradient perturbance(g, 2, ); 7:send £ to the server, 8: accumulate privacy cost à, ;

[00104] The method of personalized privacy protection can be further explained as follows:

Generally speaking, the privacy problem in machine learning is that the model training needs a lot of user data, and the model obtains multiple data features after several rounds of training iterations. Attackers can infer the relevant information of input data by using model parameters, gradients, etc. Similarly, the training of GAN model also needs a lot of user data, the generator 1s trained to generate simulation data to simulate the distribution of real data, and the discriminator needs to input a lot of real data to distinguish real data from simulation data during training. Therefore, in order to protect the privacy of each data owner, its local generator needs to be hidden, and the gradient calculated by the local discriminator needs to be perturbed in a personalized differential privacy way.

[00105] According to the differential privacy combination theorem, if each SGD 15

. . . . . . LU504296 (Stochastic Gradient Descent) process conforms to differential privacy [Lee J, Kifer D.

Concentrated differentially private gradient descent with adaptive per-iteration privacy budget[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge

Discovery & Data Mining. 2018: 1656-1665], the final model is also differential privacy, and the gradient descent process of the center generator is as follows (1):

SFY 8 om oi N ff oS x

[00106] SS EN H+ {tho £23 (1)

Fae FE. . . >. SX

[00107] where, #8" is a machine learning model, is batch data, EE) is a . 110.013 . . . FOE calculated random gradient, MO is Gaussian noise and 5% ig perturbed random gradient.

[00108] Compared with this, the perturbation mechanism of the embodiment of the present application can reduce the perturbed gradient range, thereby reducing the destruction of useful information. According to the chain rule, the range of perturbance mechanism can be reduced. um €} § Yoo TT TI. $ A. 7 CT . N

[00109] Co = Va Lo Bod a bg Cia Ba! (2)

[00110] As shown in Formula 2 above, the back propagation of gradient information can . . Va, be divided into two parts. The first part “036/87 ig calculated by the local discriminator of each data owner based on the received analog data (in which

Lo iB, 0-0 ICI; 8 JoeB.) . . ¢ifal AE Se . The other part 5285) is the Jacobian matrix calculated by the center generator, which is independent of the training data. Therefore, the perturbance range can be reduced to the first part, and the perturbance process based on Gaussian mechanism can be further described as the following formula (3) and (4): gt ow WF

[00111] 58" Yawn (3)

Gr hinted a NEF —Ù 5

[00112] EN CÙ+ NOTEN (4)

[00113] where $# is the gradient calculated by the data owner # using the local

CL Ney . . 5 discriminator, is Gaussian noise. The clip operation is performed using a ** in HEC i i i norm, wherein = is ensured by replacing the gradient * with {mantle LC *. It is worth noting that the variance of noise directly affects the 16

2 _… _4,. LU504296 scale of noise. When the variance “v of Gaussian noise is large, the greater Mie, tat the higher the level of privacy protection. The noise variance a is determined by the privacy protection level in the contract signed by each user, thus realizing personalized differential privacy protection.

[00114] Furthermore, there are still two key problems in DP-SGD (differential privacy stochastic gradient descent). On the one hand, the training of GAN model often requires a large number of iterations, which will lead to a large loss of privacy. On the other hand, each data owner needs a different level of privacy protection, which means that the noise scale in each DP-SGD is different, which also directly affects the privacy loss and the performance of the final model. Therefore, the embodiment of the present application designs the selection strategy of privacy protection level, and selects the data owner with a specific privacy protection level in each round of training, thus reducing the privacy cost while completing the model training. Specifically, the noise selection strategy of the embodiment of the present application follows the idea that with the enhancement of the generating capacity of the center generator, the scale of the perturbance noise in the expected gradient is smaller, thus further optimizing the model. The strategy for selecting the noise scale in the embodiment of the present application is to monitor the performance of the center generator and gradually select the data owner with smaller noise scale. However, during training, the local discriminator of each data owner cannot be directly accessed, and only the perturbance gradient can be obtained. Therefore, it 1s difficult to use the local discriminator of each data owner to evaluate the performance of the center generator. Therefore, the embodiment of the present application proposes a strategy of selecting an appropriate noise scale based on training iteration rounds.

Specifically, the noise scale should be determined according to the attenuation function of the noise scale, and the corresponding data owners should be further selected to assist in the training. The attenuation function takes training * as a parameter, and the noise scale is negatively correlated with # . The attenuation function is shown in the following formula (5):

[00115] A 59H) (5) 17

[00116] where is the initial noise parameter, is the number of iterations and # is the attenuation rate. After the center server determines the noise scale through the attenuation function, it chooses the contract that is most similar to the noise scale, and finally chooses one of the data owners who signed the contract to assist this round of training.

[00117] Further, after completing the training of the center generator, the server redeems rewards for each data owner according to the valuation specified in the signed contract.

[00118] In the embodiment of the present application, aiming at the privacy problem in data sharing of the Internet of Things, a privacy protection data sharing solution based on asynchronous distributed GAN is proposed. Combining differential privacy theory with distributed GAN, a central generation model is trained by using local data sets of each data owner in a personalized privacy protection way. The proposed distributed GAN training framework can realize data sharing by using the local data set of the data owner to cooperate with the training center to generate the model without transmitting the original data, and then use the center to generate the model to rebuild the data set for the downstream tasks. Based on the differential privacy theory, a gradient “desensitization” strategy is proposed, which preserves the availability of gradient to the greatest extent under the premise of protecting user privacy and optimizes the model under the guarantee of differential privacy. Multi-level privacy protection contracts are designed for data owners with different privacy preferences, and the differential privacy level selection strategy is proposed, which can balance data availability and user privacy protection needs and complete the model training with minimum privacy consumption.

[00119] As shown in FIG. 2, an embodiment of the present application provides a privacy protection data sharing system based on a distributed GAN, which is applied to implement a privacy protection data sharing method based on a distributed GAN. The system includes a center server and a plurality of data owners, wherein:

[00120] the center server is configured for providing a plurality of personalized contracts, and designing a privacy protection level selection strategy;

[00121] the plurality of data owners are configured for selecting a personalized contract from the plurality of personalized contracts; using a local private data set of the data owner to 18 pre-train a local generation countermeasure network GAN model of the data owner to obtain a 0506296 pre-trained local GAN model; optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.

[00122] Optionally, the plurality of personalized contracts in S1 include a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.

[00123] Optionally, the plurality of data owners are further configured for the following:

[00124] S31, each data owner obtaining an original GAN model from the center server;

[00125] S32, each data owner using the local private data set of the data owner to pre-train the original GAN model to obtain the pre-trained local GAN model.

[00126] Optionally, the local GAN model includes a local generator and a local discriminator.

[00127] The plurality of data owners are further configured for:

[00128] each data owner hiding the pre-trained local generator,

[00129] The plurality of data owners are further configured for the following:

[00130] S51, determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy;

[00131] S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is & according to the privacy protection level and the personalized contract selected by each data owner;

[00132] S53, randomly selecting, by the the center server, one data owner from the plurality of data owners whose privacy protection level 1s © as the data owner assisting in training;

[00133] S54, the data owner assisting in training optimizing the center generator model of 19 the center server according to the pre-trained local GAN model of the data owner assisting in 0506296 training, and after optimization, proceeding to S51 for iterative training until a number of iterations reaches a preset threshold, thereby completing the training of the center generator model.

[00134] Optionally, the center server is further configured for:

[00135] S511, determining, by the center server, an attenuation function of a noise scale according to the number of iterations in the training process of the center generator model;

[00136] S512, determining, by the center server, the noise scale according to the attenuation function;

[00137] S513, determining, by the center server, the privacy protection level # of the data owner who assists this round of training according to the noise scale.

[00138] Optionally, the plurality of data owners are further configured for the following:

[00139] S541, the data owner assisting in training obtaining data generated by the center generator model from the center server;

[00140] S542, the data owner assisting in training updating the local discriminator in the pre-trained local GAN model according to the data generated by the center generator model and the private data set of the data owner assisting in training;

[00141] S543, the data owner assisting in training calculating a gradient according to the updated local discriminator;

[00142] S544, the data owner assisting in training perturbing the gradient based on a personalized differential privacy theory to obtain a perturbed gradient;

[00143] S545, optimizing, by the center server, the center generator model of the center server according to the perturbed gradient.

[00144] Optionally, the plurality of data owners are further configured for the following:

[00145] perturbing the gradient based on Gaussian mechanism and perturbance degree; wherein, the degree of perturbance is determined by the privacy protection level of a personalized contract.

[00146] In the embodiment of the present application, aiming at the privacy problem in data sharing of the Internet of Things, a privacy protection data sharing solution based on 20 asynchronous distributed GAN is proposed. Combining differential privacy theory with 0506296 distributed GAN, a central generation model is trained by using local data sets of each data owner in a personalized privacy protection way. The proposed distributed GAN training framework can realize data sharing by using the local data set of the data owner to cooperate with the training center to generate the model without transmitting the original data, and then use the center to generate the model to rebuild the data set for the downstream tasks. Based on the differential privacy theory, a gradient “desensitization” strategy is proposed, which preserves the availability of gradient to the greatest extent under the premise of protecting user privacy and optimizes the model under the guarantee of differential privacy. Multi-level privacy protection contracts are designed for data owners with different privacy preferences, and the differential privacy level selection strategy is proposed, which can balance data availability and user privacy protection needs and complete the model training with minimum privacy consumption.

[00147] Those skilled in the art can understand that all or part of the steps to realize the above-mentioned embodiment can be completed by hardware, or related hardware can be instructed to complete by a program, which can be stored in a computer-readable storage medium, and the above-mentioned storage medium can be read-only memory, magnetic disk or optical disk, etc.

[00148] The above is only the preferred embodiment of the present application, and it is not used to limit the present application. Any modification, equivalent substitution, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. 21

Claims

CLAIMS What 1s claimed 1s:

1. À privacy protection data sharing method based on a distributed GAN, characterized in that the method 1s implemented by a privacy protection data sharing system based on a distributed GAN, and the system comprises a center server and a plurality of data owners; the method comprises the following steps: S1, providing a plurality of personalized contracts by the center server; S2, each data owner of the plurality of data owners selecting a personalized contract from the plurality of personalized contracts; S3, each data owner using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model; S4, designing a privacy protection level selection strategy by the center server; SS, the plurality of data owners optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.

2. The method according to claim 1, wherein the plurality of personalized contracts in S1 comprise a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.

3. The method according to claim 1, wherein the step of each data owner using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model in S3 comprises: S31, each data owner obtaining an original GAN model from the center server; S32, each data owner using the local private data set of the data owner to pre-train the original GAN model to obtain the pre-trained local GAN model.

4. The method according to claim 3, wherein the local GAN model comprises a local 22 generator and a local discriminator; after obtaining the pre-trained local GAN model in S32, the method further comprises: each data owner hiding the pre-trained local generator.

5. The method according to claim 1, wherein the step of the plurality of data owners optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing in step S5 comprises: S51, determining, by the center server, a privacy protection level © of the data owner who assists this round of training according to the privacy protection level selection strategy; S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is according to the privacy protection level © and the personalized contract selected by each data owner; S53, randomly selecting, by the center server, one data owner from the plurality of data owners whose privacy protection level is © as the data owner assisting in training; S54, the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training, and after optimization, proceeding to S51 for iterative training until a number of iterations reaches a preset threshold, thereby completing the training of the center generator model.

6. The method according to claim 5, wherein the step of determining, by the center server, a privacy protection level ? of the data owner who assists this round of training according to the privacy protection level selection strategy in S51 comprises: S511, determining, by the center server, an attenuation function of a noise scale according to the number of iterations in the training process of the center generator model; S512, determining, by the center server, the noise scale according to the attenuation function; S513, determining, by the center server, the privacy protection level © of the data owner who assists this round of training according to the noise scale. 23

7. The method according to claim 5, wherein the step of the data owner assisting in 1900698 training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training in S54 comprises: S541, the data owner assisting in training obtaining data generated by the center generator model from the center server; S542, the data owner assisting in training updating the local discriminator in the pre-trained local GAN model according to the data generated by the center generator model and the private data set of the data owner assisting in training; S543, the data owner assisting in training calculating a gradient according to the updated local discriminator; S544, the data owner assisting in training perturbing the gradient based on a personalized differential privacy theory to obtain a perturbed gradient; S545, optimizing, by the center server, the center generator model of the center server according to the perturbed gradient.

8. The method according to claim 7, wherein the step of perturbing the gradient based on a personalized differential privacy theory in S544 comprises: perturbing the gradient based on Gaussian mechanism and perturbance degree; Wherein the perturbance degree is determined by the privacy protection level of the personalized contract.

9. À privacy protection data sharing system based on a distributed GAN, characterized in that the system is used to implement a privacy protection data sharing method based on a distributed GAN, and the system comprises a center server and a plurality of data owners, wherein, the center server is configured for providing a plurality of personalized contracts, and designing a privacy protection level selection strategy; the plurality of data owners are configured for selecting a personalized contract from the plurality of personalized contracts; using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model; optimizing a center generator model of the center server 24 according to the privacy protection level selection strategy, the personalized contract selected 1900698 by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.

10. The system according to claim 9, wherein the plurality of data owners are further configured for the following: S51, determining, by the center server, a privacy protection level © of the data owner who assists this round of training according to the privacy protection level selection strategy; S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is according to the privacy protection level ‘ and the personalized contract selected by each data owner; S53, randomly selecting, by the the center server, one data owner from the plurality of data owners whose privacy protection level is © as the data owner assisting in training; S54, the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training, and after optimization, proceeding to S51 for iterative training until a number of iterations reaches a preset threshold, thereby completing the training of the center generator model.