LU504296B1 - Privacy protection data sharing method and system based on distributed gan - Google Patents

Privacy protection data sharing method and system based on distributed gan Download PDF

Info

Publication number
LU504296B1
LU504296B1 LU504296A LU504296A LU504296B1 LU 504296 B1 LU504296 B1 LU 504296B1 LU 504296 A LU504296 A LU 504296A LU 504296 A LU504296 A LU 504296A LU 504296 B1 LU504296 B1 LU 504296B1
Authority
LU
Luxembourg
Prior art keywords
data
privacy protection
training
data owner
center server
Prior art date
Application number
LU504296A
Other languages
German (de)
Inventor
Aiyan Wu
Chao Wang
Ke Xiao
Shuo Wang
Yunhua He
Xiaoqing Xue
Original Assignee
Univ North China Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ North China Technology filed Critical Univ North China Technology
Application granted granted Critical
Publication of LU504296B1 publication Critical patent/LU504296B1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/105Multiple levels of security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0407Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/104Grouping of entities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Bioethics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present application discloses a privacy protection data sharing method and system based on a distributed GAN, and relates to the technical field of data sharing and privacy protection. The method comprises: providing a plurality of personalized contracts by a center server; each data owner among a plurality of data owners selecting a personalized contract according to own privacy protection requirements; each data owner using own private data set to pre-train a local GAN model; designing a privacy protection level selection strategy by the center server; the data owner assisting in training optimizing a center generator model of the center server to complete privacy protection data sharing. The present application can realize data sharing by using the local data set of the data owner to cooperate with the training center to generate the model on the premise of not transmitting the original data, realize the training of the model under the guarantee of differential privacy, and design different privacy protection contracts for data owners with different privacy preferences.

Description

PRIVACY PROTECTION DATA SHARING METHOD AND SYSTEM BASED ON 10906698
DISTRIBUTED GAN
TECHNICAL FIELD
[0001] The present application relates to the technical field of data sharing and privacy protection, in particular to a privacy protection data sharing method and system based on a distributed GAN.
BACKGROUND
[0002] Nowadays, the number of sensor devices shows an explosive growth trend, which is followed by "massive" data generated by Internet of Things terminals. These high-quality data make machine learning have a great influence in many fields such as image recognition, autonomous driving, product recommendation and so on. Data with high availability has become the main driving force for the development of machine learning. However, there is still not enough training data for machine learning tasks, which is mainly due to the public's concern about data leakage and the enhanced awareness of privacy protection. Specifically, shared data may contain users' private information, and data owners are reluctant to share data with others because of privacy leakage. In addition, there are cases where confidential data cannot be transmitted and can only be saved locally by the owner. Therefore, protecting the privacy of data owners and encouraging them to share data is becoming one of the key bottlenecks for the further development of machine learning.
[0003] In view of the privacy problem in data sharing, researchers from all walks of life have put forward a series of solutions. Some researchers use technologies based on ABE (Attribute-Based Encryption), SMC (Secure Multi-Party Computing) and blockchain to realize privacy protection by hiding user identities in data sharing or designing fine-grained access control mechanisms, for example, [Pu Y, Hu C, Deng S, et al. R?PEDS: a recoverable and revocable privacy-preserving edge data sharing solution[J]. IEEE Internet of Things
Journal, 2020, 7(9): 8077-8089.], [Zheng X, Cai Z. Privacy-preserved data sharing towards 1 multiple parties in industrial IoTs[J]. IEEE Journal on Selected Areas in Communications, 0506296 2020, 38(5): 968-979.], [Xu X, Liu Q, Zhang X, et al. A blockchain-powered crowdsourcing method with privacy preservation in mobile environment[J]. IEEE Transactions on
Computational Social Systems, 2019, 6(6): 1407-1419]. However, this kind of solution focuses on the implementation of authentication and access control mechanism, which requires not only the transmission of raw data but also a lot of extra calculation. The rise of federated learning provides a new solution, which can realize model training without transmitting original data. However, when the training task changes or the machine learning model is updated, it needs to repeatedly access private data sets, which increases the risk of privacy leakage.
[0004] The existing solutions to privacy protection in data sharing of Internet of Things based on artificial intelligence can be roughly divided into two categories, one is data sharing based on federated learning, and the other is data sharing based on generating countermeasure networks. These two categories do not need to upload users' original data, which protects users' privacy to a certain extent, but there are still some limitations. Their shortcomings will be introduced and summarized respectively below.
[0005] The rise of federated learning has broken the limitation that artificial intelligence technology needs to concentrate on data collection and processing. Therefore, federated learning can be used in a wide range of IoT(Internet of Thing) services, providing a new solution for data sharing with privacy protection. For example, in IoV(Internet of Vehicles), data sharing between vehicles can improve service quality. In order to reduce the transmission load and solve the privacy problem in data sharing, the author proposes a new architecture based on federated learning, [Lu Y, Huang X, Zhang K, et al. Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles[J]. IEEE
Transactions on Vehicular Technology, 2020, 69(4): 4298-4311.]. They developed a hybrid blockchain architecture consisting of blockchain and local DAG(Directed Acyclic Graph) to improve the security and reliability of model parameters. The paper [Yin L, Feng J, Xun H, et al. A privacy-preserving federated learning for multiparty data sharing in social IoTs[J]. IEEE
Transactions on Network Science and Engineering, 2021, 8(3): 2706-2718.] also uses federated learning to share data, but the author proposes a new hybrid privacy protection 2 method to overcome the disclosure of data and content in federated learning. They use 0506296 advanced functional encryption algorithm and local Bayesian differential privacy to preserve the characteristics of uploaded data and the weight of each participant in the weighted summation process.
[0006] Since a GAN (Generative Adversarial Network) is suitable for all kinds of data, many researchers jointly train GAN instead of directly transmitting data to realize data sharing with privacy protection. In CPSSs (Cyber-Physical-Social Systems), human interaction from cyberspace to the physical world is realized through the sharing of spatio-temporal data. In order to balance privacy protection and data utility, the author uses a modified GAN model to run two games (between generator, discriminator and differential private identifier) at the same time [Qu Y, Yu S, Zhou W, et al. Gan-driven personalized spatial-temporal private data sharing in cyber-physical social systems[J]. IEEE Transactions on Network Science and Engineering, 2020, 7(4): 2576-2586.]. In the paper [Chang Q, Qu H,
Zhang Y, et al. Synthetic learning: Learn from distributed asynchronized discriminator gan without sharing medical image data[C]//Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 2020: 13856-13866.], the author proposed a distributed GAN framework with high privacy protection and high communication efficiency, which was called distributed asynchronous discriminator GAN (AsynDGAN). It aims to learn from the distributed discriminators and train the center generator to train the segmentation model only by using the generated synthetic images.
[0007] These two methods still have some limitations, which are as follows: 1) using federated learning solutions can train task models without uploading data; however, these solutions still have a great risk of privacy leakage, because when the task changes or the machine learning architecture is updated, private data sets need to be revisited many times; 2) the existing solutions based on a GAN cannot balance the relationship between privacy protection and data availability, and cannot meet the personalized privacy protection needs of data owners. 3
SUMMARY LU504296
[0008] The present application aims at the problem of how to protect the privacy of data owners and encourage them to share data.
[0009] In order to solve the above technical problems, the present application provides the following technical solution:
[0010] In one aspect, the present application provides a privacy protection data sharing method based on a distributed GAN, characterized in that the method is implemented by a privacy protection data sharing system based on a distributed GAN, and the system includes a center server and a plurality of data owners;
[0011] the method includes the following steps:
[0012] S1, providing a plurality of personalized contracts by the center server;
[0013] S2, each data owner of the plurality of data owners selecting a personalized contract from the plurality of personalized contracts;
[0014] S3, each data owner using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model;
[0015] S4, designing a privacy protection level selection strategy by the center server;
[0016] SS, the plurality of data owners optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.
[0017] Optionally, the plurality of personalized contracts in S1 include a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.
[0018] Optionally, the step of each data owner using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model in S3 includes:
[0019] S31, each data owner obtaining an original GAN model from the center server;
[0020] S32, each data owner using the local private data set of the data owner to pre-train 4 the original GAN model to obtain the pre-trained local GAN model. 17504296
[0021] Optionally, the local GAN model includes a local generator and a local discriminator;
[0022] after obtaining the pre-trained local GAN model in S32, the method further includes:
[0023] each data owner hiding the pre-trained local generator.
[0024] Optionally, the step of the plurality of data owners optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing in step SS includes:
[0025] S51, determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy;
[0026] S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is # according to the privacy protection level & and the personalized contract selected by each data owner;
[0027] S53, randomly selecting, by the center server, one data owner from the plurality of data owners whose privacy protection level is © as the data owner assisting in training;
[0028] S54, the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training, and after optimization, proceeding to S51 for iterative training until a number of iterations reaches a preset threshold, thereby completing the training of the center generator model.
[0029] Optionally, the step of determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy in S51 includes:
[0030] S511, determining, by the center server, an attenuation function of a noise scale 5 according to the number of iterations in the training process of the center generator model; 0506296
[0031] S512, determining, by the center server, the noise scale according to the attenuation function;
[0032] S513, determining, by the center server, the privacy protection level # of the data owner who assists this round of training according to the noise scale.
[0033] Optionally, the step of the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training in S54 includes:
[0034] S541, the data owner assisting in training obtaining data generated by the center generator model from the center server;
[0035] S542, the data owner assisting in training updating the local discriminator in the pre-trained local GAN model according to the data generated by the center generator model and the private data set of the data owner assisting in training,
[0036] S543, the data owner assisting in training calculating a gradient according to the updated local discriminator;
[0037] S544, the data owner assisting in training perturbing the gradient based on a personalized differential privacy theory to obtain a perturbed gradient;
[0038] S545, optimizing, by the center server, the center generator model of the center server according to the perturbed gradient.
[0039] Optionally, the step of perturbing the gradient based on a personalized differential privacy theory in S544 includes:
[0040] perturbing the gradient based on Gaussian mechanism and perturbance degree;
Wherein the perturbance degree is determined by the privacy protection level of the personalized contract.
[0041] In another aspect, the present application provides a privacy protection data sharing system based on a distributed GAN, which is applied to implement a privacy protection data sharing method based on a distributed GAN. The system includes a center server and a plurality of data owners, wherein:
[0042] the center server is configured for providing a plurality of personalized contracts, 6 and designing a privacy protection level selection strategy; 0506296
[0043] the plurality of data owners are configured for selecting a personalized contract from the plurality of personalized contracts; using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model; optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.
[0044] Optionally, the plurality of personalized contracts in S1 include a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.
[0045] Optionally, the plurality of data owners are further configured for the following:
[0046] S31, each data owner obtaining an original GAN model from the center server;
[0047] S32, each data owner using the local private data set of the data owner to pre-train the original GAN model to obtain the pre-trained local GAN model.
[0048] Optionally, the plurality of data owners are further configured for the following:
[0049] S51, determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy;
[0050] S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is # according to the privacy protection level # and the personalized contract selected by each data owner;
[0051] S53, randomly selecting, by the the center server, one data owner from the plurality of data owners whose privacy protection level is as the data owner assisting in training,
[0052] S54, the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in 7 training, and after optimization, proceeding to S51 for iterative training until a number of 0506296 iterations reaches a preset threshold, thereby completing the training of the center generator model.
[0053] The technical solution provided by the embodiment of the present application at least has the beneficial effects.
[0054] In the above solution, a privacy protection data sharing solution based on asynchronous distributed GAN 1s proposed to solve the privacy problem in data sharing of
Internet of Things. Combining differential privacy theory with distributed GAN, a central generation model is trained by using local data sets of each data owner in a personalized privacy protection way. The proposed distributed GAN training framework can realize data sharing by using the local data set of the data owner to cooperate with the training center to generate the model without transmitting the original data, and then use the center to generate the model to rebuild the data set for the downstream tasks. Based on the differential privacy theory, a gradient desensitization strategy is proposed, which preserves the availability of gradient to the greatest extent under the premise of protecting user privacy and optimizes the model under the guarantee of differential privacy. Multi-level privacy protection contracts are designed for data owners with different privacy preferences, and the differential privacy level selection strategy is proposed, which can balance data availability and user privacy protection needs and complete the model training with minimum privacy consumption.
BRIEF DESCRIPTION OF DRAWINGS
[0055] In order to explain the technical solution in the embodiment of the present application more clearly, the drawings needed in the description of the embodiment will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For ordinary people in the field, other drawings can be obtained according to these drawings without creative work.
[0056] FIG. 1 is a flowchart of a privacy protection data sharing method based on a distributed GAN provided by an embodiment of the present application;
[0057] FIG. 2 is a block diagram of a privacy protection data sharing system based on a 8 distributed GAN provided by an embodiment of the present application. 0506296
DESCRIPTION OF EMBODIMENTS
[0058] In order to make the technical problems, technical solutions and advantages to be solved by the present application more clear, the following will be described in detail with the attached drawings and specific embodiments.
[0059] As shown in FIG. 1, an embodiment of the present application provides a privacy protection data sharing method based on a distributed GAN, which can be realized by a privacy protection data sharing system based on a distributed GAN. As shown in the flowchart of the privacy protection data sharing method based on a distributed GAN in FIG. 1, the processing flow of the method may include the following steps:
[0060] In S1, a center server provides a plurality of personalized contracts.
[0061] Optionally, the plurality of personalized contracts in S1 include a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.
[0062] In a feasible implementation, the center server designs a series of personalized contracts tL with different privacy protection levels and rewards for data owners at the beginning of data sharing to meet the privacy protection needs of data owners with different privacy biases. Among them, the higher the level of privacy protection, the less rewards, and the data owner can choose the corresponding contract to maximize his own profits. Then, the server publishes the data requirements and contracts to the data owners registered in the system (i.e., U).
[0063] In this embodiment, the center server has powerful computing power and communication bandwidth. Its purpose is to recruit enough data owners to train a center generator cooperatively until it has strong data generation ability. The embodiment of the present application assumes that the center server will not violate the defined protocol, but may try to infer the user's privacy.
[0064] In S2, each data owner among a plurality of data owners selects a personalized contract from a plurality of personalized contracts. 9
[0065] In a feasible implementation, the data owner set ‘ consists of 101 data owners, and each data owner ® # U owns a private data set D, including N data samples (i.e., DEM ). These data owners have certain computing and communication skills and want to use private data sets to participate in training tasks in exchange for some rewards. But they want to protect their privacy from reasoning attacks from the center server.
In addition, different users have different privacy preferences (that is, sensitivity to privacy exposure), so personalized privacy protection is needed.
[0066] In S3, each data owner uses the local private data set of the data owner to pre-train the local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model.
[0067] Optionally, the step of each data owner in S3 using the local private data set of the data owner to pre-train the local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model includes the following steps.
[0068] In S31, each data owner obtains an original GAN model from the center server.
[0069] In a feasible implementation, the data owner # © ¥ who meets the requirements signs a specific contract with the server according to his privacy protection needs and downloads the original GAN model.
[0070] In S32, each data owner uses the local private data set of the data owner to pre-train the original GAN model to obtain the pre-trained local GAN model.
[0071] Optionally, the local GAN model includes a local generator and a local discriminator.
[0072] After obtaining the pre-trained local GAN model in S32, the method further includes:
[0073] each data owner hiding the pre-trained local generator.
[0074] In a feasible implementation, the embodiment of the present application proposes an asynchronous distributed GAN training framework with privacy protection, which uses the local data set of the data owner to cooperate with the training center to generate the model.
[0075] Further, all the data owners involved in the training use their private data sets to 10 pre-train the GAN model locally. After the pre-training, the generator that can generate 0506296 simulation data is hidden, and the local discriminator is used to assist the server to train the center generator.
[0076] Further, the pre-training process includes: firstly, preprocessing the private data set according to the data requirements, and then training the local GAN model. The pre-training process is described in detail in the following algorithm 1:
Algorithm 1 Pre-training
Super parameter: number of iterations 7, learning rate %, and #5, number of penalty iterations I, batch size & 1: Preprocessing a local data set 1}, according to data requirements; 2: Downloading the original model as a local generator; and a discriminator #, ; 3: for step in {1,...,7% 4: forciin {i.. 7%} 5: select a sample batch fx og 6: select a sample batch {,12, where x, ~&; 7: Bi om -n STV LB Ola 8): 8: end for 0 Ween, ST (BS CH BE, BE) : 10: end for 11: The generator &; is hidden and the discriminator &; is used to assist in training;
[0077] after the pre-training, each data owner has a trained generator and discriminator locally. The generator that has learned the local data distribution will be hidden, and the discriminator will be stored locally to assist the training center generator. The purpose of assisted training is to train the center generator using the local discriminator Sn and the private data set Da on the data owner &.
[0078] In S4, the center server designs a privacy protection level selection strategy.
[0079] In a feasible embodiment, in order to optimize the center generator with the minimum privacy cost, the embodiment of the present application designs a privacy 11 protection level selection strategy to select the corresponding data owner to assist in training 0506296 in each round.
[0080] In S5, a plurality of data owners optimize the center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.
[0081] Optionally, the step of a plurality of data owners optimizing the center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing in SS includes:
[0082] S51: determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy.
[0083] Optionally, the step of determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy in S51 includes:
[0084] S511, determining, by the center server, an attenuation function of a noise scale according to the number of iterations in the training process of the center generator model;
[0085] S512, determining, by the center server, the noise scale according to the attenuation function;
[0086] S513, determining, by the center server, the privacy protection level # of the data owner who assists this round of training according to the noise scale.
[0087] In a feasible implementation, the center server designs a privacy protection level selection strategy to determine the privacy protection level # of the data owner who assists in this round of training, and then randomly selects one from the data owner who signed the contract with privacy protection level # and uses his local discriminator for this round of training.
[0088] In S54, the data owner assisting in the training optimizes the center generator 12 model of the center server according to the pre-trained local GAN model of the data owner 0506296 assisting in the training, and after the optimization, proceeds to S51 for iterative training until the number of iterations reaches a preset threshold, and the center generator model training is completed.
[0089] Optionally, the step of the data owner assisting in the training optimizes the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in the training includes the following steps.
[0090] In S541, the data owner assisting in training obtains the data generated by the center generator model from the center server;
[0091] in a feasible implementation, the selected data owner * receives the data generated by the center generator;
[0092] in S542, the data owner assisting in training updates the local discriminator in the pre-trained local GAN model according to the data generated by the center generator model and the private data set of the data owner assisting in training;
[0093] in S543, the data owner assisting in training calculates a gradient according to the updated local discriminator;
[0094] in S544, the data owner assisting in training perturbs the gradient based on a personalized differential privacy theory to obtain a perturbed gradient;
[0095] in a feasible implementation, the data owner ¥ perturbs the calculated gradient based on the differential privacy theory, wherein the degree of perturbation is determined by the privacy protection level specified in the signed contract. Then, it sends the perturbed gradient to the center server for optimization of the generator;
[0096] optionally, the step of perturbing the gradient based on personalized differential privacy theory in S544 includes:
[0097] perturbing the gradient based on Gaussian mechanism and perturbance degree.
[0098] The perturbance degree is determined by the privacy protection level of personalized contract.
[0099] In S545, the center server optimizes the center generator model of the center server according to the perturbed gradient.
[00100] In a feasible implementation, the center server updates the center generator model 13 according to the perturbance gradient of the selected data owner. Then the center server 0506296 re-selects the privacy protection level and data owner for the next round of auxiliary training until the training of the center generator is completed.
[00101] The embodiment of the present application proposes a personalized privacy protection strategy, which realizes differential privacy protection by perturbing the gradient calculated locally by data owners, and the privacy protection level is specified by the contract signed by each data owner.
[00102] Further, there is no discriminator on the center server, and its optimization depends entirely on the discriminator on the data owner's side. And in order to maximize the model performance with the minimum privacy cost, the embodiment of the present application proposes a privacy protection level selection strategy to select different privacy protection levels in different training stages and complete the training with the minimum privacy loss. In each iteration, the server selects a data owner according to the policy and uses its local discriminator to optimize the center generator. The optimization process of the center generator is described in the following algorithm 2:
Algorithm 2 Center Generator Training
Super parameter: number of iterations Ÿ, learning rate 3, and #,, number of penalty iterations I, batch size # 1: initialize the center generator 8%; 2: for step in {1,..., 7% 3: select the data owner $u$ according to the privacy protection level selection strategy; 4: for ciin {1.7} 5: select a sample batch; {x,}2,, where x, — À ; 6. send the generated data G{z,,&;} to the data owner % ; 7: {212 * assist in training (3, 8.3) 8: end for 9: Bo 85-7 = Ti 3
Fo 10: end for 14
[00103] The assisted training process of each data owner (line 7) is shown in the following 0506296 algorithm 3. In the assisted training phase, the data owner uses his local discriminator and private data set to optimize the center generator. As detailed below, the selected data owner will first receive the generated data from the center server and update the discriminator with the generated data and the local data set. Then, the gradient is calculated by local discriminator, and the gradient is perturbed by personalized differential privacy before it is transmitted back. The degree of perturbance is determined by the privacy protection level in the signed contract.
Algorithm 3 Assists in training
Super parameter: Learning rate *,, batch size &, privacy protection level :, 1: acquire a private data set D, and a local discriminator & 2: receive the generated data CH, &,} from the center server; 3: select a sample batch; x30, = Di 4 — 8-0, = 7, Lo 85% Ol BD 5; calculate the gradient 5 = Va L #8}, where LBs) = DZ; 8.3); 6: # © gradient perturbance(g, 2, ); 7:send £ to the server, 8: accumulate privacy cost à, ;
[00104] The method of personalized privacy protection can be further explained as follows:
Generally speaking, the privacy problem in machine learning is that the model training needs a lot of user data, and the model obtains multiple data features after several rounds of training iterations. Attackers can infer the relevant information of input data by using model parameters, gradients, etc. Similarly, the training of GAN model also needs a lot of user data, the generator 1s trained to generate simulation data to simulate the distribution of real data, and the discriminator needs to input a lot of real data to distinguish real data from simulation data during training. Therefore, in order to protect the privacy of each data owner, its local generator needs to be hidden, and the gradient calculated by the local discriminator needs to be perturbed in a personalized differential privacy way.
[00105] According to the differential privacy combination theorem, if each SGD 15
. . . . . . LU504296 (Stochastic Gradient Descent) process conforms to differential privacy [Lee J, Kifer D.
Concentrated differentially private gradient descent with adaptive per-iteration privacy budget[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining. 2018: 1656-1665], the final model is also differential privacy, and the gradient descent process of the center generator is as follows (1):
SFY 8 om oi N ff oS x
[00106] SS EN H+ {tho £23 (1)
Fae FE. . . >. SX
[00107] where, #8" is a machine learning model, is batch data, EE) is a . 110.013 . . . FOE calculated random gradient, MO is Gaussian noise and 5% ig perturbed random gradient.
[00108] Compared with this, the perturbation mechanism of the embodiment of the present application can reduce the perturbed gradient range, thereby reducing the destruction of useful information. According to the chain rule, the range of perturbance mechanism can be reduced. um €} § Yoo TT TI. $ A. 7 CT . N
[00109] Co = Va Lo Bod a bg Cia Ba! (2)
[00110] As shown in Formula 2 above, the back propagation of gradient information can . . Va, be divided into two parts. The first part “036/87 ig calculated by the local discriminator of each data owner based on the received analog data (in which
Lo iB, 0-0 ICI; 8 JoeB.) . . ¢ifal AE Se . The other part 5285) is the Jacobian matrix calculated by the center generator, which is independent of the training data. Therefore, the perturbance range can be reduced to the first part, and the perturbance process based on Gaussian mechanism can be further described as the following formula (3) and (4): gt ow WF
[00111] 58" Yawn (3)
Gr hinted a NEF —Ù 5
[00112] EN CÙ+ NOTEN (4)
[00113] where $# is the gradient calculated by the data owner # using the local
CL Ney . . 5 discriminator, is Gaussian noise. The clip operation is performed using a ** in HEC i i i norm, wherein = is ensured by replacing the gradient * with {mantle LC *. It is worth noting that the variance of noise directly affects the 16
2 _… _4,. LU504296 scale of noise. When the variance “v of Gaussian noise is large, the greater Mie, tat the higher the level of privacy protection. The noise variance a is determined by the privacy protection level in the contract signed by each user, thus realizing personalized differential privacy protection.
[00114] Furthermore, there are still two key problems in DP-SGD (differential privacy stochastic gradient descent). On the one hand, the training of GAN model often requires a large number of iterations, which will lead to a large loss of privacy. On the other hand, each data owner needs a different level of privacy protection, which means that the noise scale in each DP-SGD is different, which also directly affects the privacy loss and the performance of the final model. Therefore, the embodiment of the present application designs the selection strategy of privacy protection level, and selects the data owner with a specific privacy protection level in each round of training, thus reducing the privacy cost while completing the model training. Specifically, the noise selection strategy of the embodiment of the present application follows the idea that with the enhancement of the generating capacity of the center generator, the scale of the perturbance noise in the expected gradient is smaller, thus further optimizing the model. The strategy for selecting the noise scale in the embodiment of the present application is to monitor the performance of the center generator and gradually select the data owner with smaller noise scale. However, during training, the local discriminator of each data owner cannot be directly accessed, and only the perturbance gradient can be obtained. Therefore, it 1s difficult to use the local discriminator of each data owner to evaluate the performance of the center generator. Therefore, the embodiment of the present application proposes a strategy of selecting an appropriate noise scale based on training iteration rounds.
Specifically, the noise scale should be determined according to the attenuation function of the noise scale, and the corresponding data owners should be further selected to assist in the training. The attenuation function takes training * as a parameter, and the noise scale is negatively correlated with # . The attenuation function is shown in the following formula (5):
[00115] A 59H) (5) 17
[00116] where is the initial noise parameter, is the number of iterations and # is the attenuation rate. After the center server determines the noise scale through the attenuation function, it chooses the contract that is most similar to the noise scale, and finally chooses one of the data owners who signed the contract to assist this round of training.
[00117] Further, after completing the training of the center generator, the server redeems rewards for each data owner according to the valuation specified in the signed contract.
[00118] In the embodiment of the present application, aiming at the privacy problem in data sharing of the Internet of Things, a privacy protection data sharing solution based on asynchronous distributed GAN is proposed. Combining differential privacy theory with distributed GAN, a central generation model is trained by using local data sets of each data owner in a personalized privacy protection way. The proposed distributed GAN training framework can realize data sharing by using the local data set of the data owner to cooperate with the training center to generate the model without transmitting the original data, and then use the center to generate the model to rebuild the data set for the downstream tasks. Based on the differential privacy theory, a gradient “desensitization” strategy is proposed, which preserves the availability of gradient to the greatest extent under the premise of protecting user privacy and optimizes the model under the guarantee of differential privacy. Multi-level privacy protection contracts are designed for data owners with different privacy preferences, and the differential privacy level selection strategy is proposed, which can balance data availability and user privacy protection needs and complete the model training with minimum privacy consumption.
[00119] As shown in FIG. 2, an embodiment of the present application provides a privacy protection data sharing system based on a distributed GAN, which is applied to implement a privacy protection data sharing method based on a distributed GAN. The system includes a center server and a plurality of data owners, wherein:
[00120] the center server is configured for providing a plurality of personalized contracts, and designing a privacy protection level selection strategy;
[00121] the plurality of data owners are configured for selecting a personalized contract from the plurality of personalized contracts; using a local private data set of the data owner to 18 pre-train a local generation countermeasure network GAN model of the data owner to obtain a 0506296 pre-trained local GAN model; optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.
[00122] Optionally, the plurality of personalized contracts in S1 include a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.
[00123] Optionally, the plurality of data owners are further configured for the following:
[00124] S31, each data owner obtaining an original GAN model from the center server;
[00125] S32, each data owner using the local private data set of the data owner to pre-train the original GAN model to obtain the pre-trained local GAN model.
[00126] Optionally, the local GAN model includes a local generator and a local discriminator.
[00127] The plurality of data owners are further configured for:
[00128] each data owner hiding the pre-trained local generator,
[00129] The plurality of data owners are further configured for the following:
[00130] S51, determining, by the center server, a privacy protection level # of the data owner who assists this round of training according to the privacy protection level selection strategy;
[00131] S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is & according to the privacy protection level and the personalized contract selected by each data owner;
[00132] S53, randomly selecting, by the the center server, one data owner from the plurality of data owners whose privacy protection level 1s © as the data owner assisting in training;
[00133] S54, the data owner assisting in training optimizing the center generator model of 19 the center server according to the pre-trained local GAN model of the data owner assisting in 0506296 training, and after optimization, proceeding to S51 for iterative training until a number of iterations reaches a preset threshold, thereby completing the training of the center generator model.
[00134] Optionally, the center server is further configured for:
[00135] S511, determining, by the center server, an attenuation function of a noise scale according to the number of iterations in the training process of the center generator model;
[00136] S512, determining, by the center server, the noise scale according to the attenuation function;
[00137] S513, determining, by the center server, the privacy protection level # of the data owner who assists this round of training according to the noise scale.
[00138] Optionally, the plurality of data owners are further configured for the following:
[00139] S541, the data owner assisting in training obtaining data generated by the center generator model from the center server;
[00140] S542, the data owner assisting in training updating the local discriminator in the pre-trained local GAN model according to the data generated by the center generator model and the private data set of the data owner assisting in training;
[00141] S543, the data owner assisting in training calculating a gradient according to the updated local discriminator;
[00142] S544, the data owner assisting in training perturbing the gradient based on a personalized differential privacy theory to obtain a perturbed gradient;
[00143] S545, optimizing, by the center server, the center generator model of the center server according to the perturbed gradient.
[00144] Optionally, the plurality of data owners are further configured for the following:
[00145] perturbing the gradient based on Gaussian mechanism and perturbance degree; wherein, the degree of perturbance is determined by the privacy protection level of a personalized contract.
[00146] In the embodiment of the present application, aiming at the privacy problem in data sharing of the Internet of Things, a privacy protection data sharing solution based on 20 asynchronous distributed GAN is proposed. Combining differential privacy theory with 0506296 distributed GAN, a central generation model is trained by using local data sets of each data owner in a personalized privacy protection way. The proposed distributed GAN training framework can realize data sharing by using the local data set of the data owner to cooperate with the training center to generate the model without transmitting the original data, and then use the center to generate the model to rebuild the data set for the downstream tasks. Based on the differential privacy theory, a gradient “desensitization” strategy is proposed, which preserves the availability of gradient to the greatest extent under the premise of protecting user privacy and optimizes the model under the guarantee of differential privacy. Multi-level privacy protection contracts are designed for data owners with different privacy preferences, and the differential privacy level selection strategy is proposed, which can balance data availability and user privacy protection needs and complete the model training with minimum privacy consumption.
[00147] Those skilled in the art can understand that all or part of the steps to realize the above-mentioned embodiment can be completed by hardware, or related hardware can be instructed to complete by a program, which can be stored in a computer-readable storage medium, and the above-mentioned storage medium can be read-only memory, magnetic disk or optical disk, etc.
[00148] The above is only the preferred embodiment of the present application, and it is not used to limit the present application. Any modification, equivalent substitution, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. 21

Claims (10)

CLAIMS What 1s claimed 1s:
1. À privacy protection data sharing method based on a distributed GAN, characterized in that the method 1s implemented by a privacy protection data sharing system based on a distributed GAN, and the system comprises a center server and a plurality of data owners; the method comprises the following steps: S1, providing a plurality of personalized contracts by the center server; S2, each data owner of the plurality of data owners selecting a personalized contract from the plurality of personalized contracts; S3, each data owner using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model; S4, designing a privacy protection level selection strategy by the center server; SS, the plurality of data owners optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.
2. The method according to claim 1, wherein the plurality of personalized contracts in S1 comprise a plurality of privacy protection levels and rewards corresponding to the plurality of privacy protection levels.
3. The method according to claim 1, wherein the step of each data owner using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model in S3 comprises: S31, each data owner obtaining an original GAN model from the center server; S32, each data owner using the local private data set of the data owner to pre-train the original GAN model to obtain the pre-trained local GAN model.
4. The method according to claim 3, wherein the local GAN model comprises a local 22 generator and a local discriminator; after obtaining the pre-trained local GAN model in S32, the method further comprises: each data owner hiding the pre-trained local generator.
5. The method according to claim 1, wherein the step of the plurality of data owners optimizing a center generator model of the center server according to the privacy protection level selection strategy, the personalized contract selected by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing in step S5 comprises: S51, determining, by the center server, a privacy protection level © of the data owner who assists this round of training according to the privacy protection level selection strategy; S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is according to the privacy protection level © and the personalized contract selected by each data owner; S53, randomly selecting, by the center server, one data owner from the plurality of data owners whose privacy protection level is © as the data owner assisting in training; S54, the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training, and after optimization, proceeding to S51 for iterative training until a number of iterations reaches a preset threshold, thereby completing the training of the center generator model.
6. The method according to claim 5, wherein the step of determining, by the center server, a privacy protection level ? of the data owner who assists this round of training according to the privacy protection level selection strategy in S51 comprises: S511, determining, by the center server, an attenuation function of a noise scale according to the number of iterations in the training process of the center generator model; S512, determining, by the center server, the noise scale according to the attenuation function; S513, determining, by the center server, the privacy protection level © of the data owner who assists this round of training according to the noise scale. 23
7. The method according to claim 5, wherein the step of the data owner assisting in 1900698 training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training in S54 comprises: S541, the data owner assisting in training obtaining data generated by the center generator model from the center server; S542, the data owner assisting in training updating the local discriminator in the pre-trained local GAN model according to the data generated by the center generator model and the private data set of the data owner assisting in training; S543, the data owner assisting in training calculating a gradient according to the updated local discriminator; S544, the data owner assisting in training perturbing the gradient based on a personalized differential privacy theory to obtain a perturbed gradient; S545, optimizing, by the center server, the center generator model of the center server according to the perturbed gradient.
8. The method according to claim 7, wherein the step of perturbing the gradient based on a personalized differential privacy theory in S544 comprises: perturbing the gradient based on Gaussian mechanism and perturbance degree; Wherein the perturbance degree is determined by the privacy protection level of the personalized contract.
9. À privacy protection data sharing system based on a distributed GAN, characterized in that the system is used to implement a privacy protection data sharing method based on a distributed GAN, and the system comprises a center server and a plurality of data owners, wherein, the center server is configured for providing a plurality of personalized contracts, and designing a privacy protection level selection strategy; the plurality of data owners are configured for selecting a personalized contract from the plurality of personalized contracts; using a local private data set of the data owner to pre-train a local generation countermeasure network GAN model of the data owner to obtain a pre-trained local GAN model; optimizing a center generator model of the center server 24 according to the privacy protection level selection strategy, the personalized contract selected 1900698 by each data owner and the pre-trained local GAN model, so as to complete privacy protection data sharing.
10. The system according to claim 9, wherein the plurality of data owners are further configured for the following: S51, determining, by the center server, a privacy protection level © of the data owner who assists this round of training according to the privacy protection level selection strategy; S52, obtaining, by the center server, a plurality of data owners with privacy protection levels among the plurality of data owners whose privacy protection level is according to the privacy protection level ‘ and the personalized contract selected by each data owner; S53, randomly selecting, by the the center server, one data owner from the plurality of data owners whose privacy protection level is © as the data owner assisting in training; S54, the data owner assisting in training optimizing the center generator model of the center server according to the pre-trained local GAN model of the data owner assisting in training, and after optimization, proceeding to S51 for iterative training until a number of iterations reaches a preset threshold, thereby completing the training of the center generator model.
LU504296A 2022-08-28 2023-03-24 Privacy protection data sharing method and system based on distributed gan LU504296B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211036310.XA CN115442099B (en) 2022-08-28 2022-08-28 Distributed GAN-based privacy protection data sharing method and system

Publications (1)

Publication Number Publication Date
LU504296B1 true LU504296B1 (en) 2024-03-08

Family

ID=84244624

Family Applications (1)

Application Number Title Priority Date Filing Date
LU504296A LU504296B1 (en) 2022-08-28 2023-03-24 Privacy protection data sharing method and system based on distributed gan

Country Status (3)

Country Link
CN (1) CN115442099B (en)
LU (1) LU504296B1 (en)
WO (1) WO2024045581A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115442099B (en) * 2022-08-28 2023-06-06 北方工业大学 Distributed GAN-based privacy protection data sharing method and system
CN117278305B (en) * 2023-10-13 2024-06-11 深圳市互联时空科技有限公司 Data sharing-oriented distributed GAN attack and defense method and system
CN117852627B (en) * 2024-03-05 2024-06-25 湘江实验室 Pre-training model fine tuning method and system
CN118194357B (en) * 2024-05-16 2024-08-09 暨南大学 Private data publishing method based on diffusion denoising model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10395180B2 (en) * 2015-03-24 2019-08-27 International Business Machines Corporation Privacy and modeling preserved data sharing
CN107704930B (en) * 2017-09-25 2021-02-26 创新先进技术有限公司 Modeling method, device and system based on shared data and electronic equipment
CN110348241B (en) * 2019-07-12 2021-08-03 之江实验室 Multi-center collaborative prognosis prediction system under data sharing strategy
CN113255004B (en) * 2021-06-16 2024-06-14 大连理工大学 Safe and efficient federal learning content caching method
CN114841364B (en) * 2022-04-14 2024-06-14 北京理工大学 Federal learning method for meeting personalized local differential privacy requirements
CN115442099B (en) * 2022-08-28 2023-06-06 北方工业大学 Distributed GAN-based privacy protection data sharing method and system

Also Published As

Publication number Publication date
CN115442099B (en) 2023-06-06
CN115442099A (en) 2022-12-06
WO2024045581A1 (en) 2024-03-07

Similar Documents

Publication Publication Date Title
LU504296B1 (en) Privacy protection data sharing method and system based on distributed gan
Valdez et al. Modular neural networks architecture optimization with a new nature inspired method using a fuzzy combination of particle swarm optimization and genetic algorithms
Castillo et al. Simulation of common pool resource field experiments: a behavioral model of collective action
CN113128701A (en) Sample sparsity-oriented federal learning method and system
Li et al. Contract-theoretic pricing for security deposits in sharded blockchain with Internet of Things (IoT)
Chen et al. Prediction of cloud resources demand based on hierarchical pythagorean fuzzy deep neural network
CN112668877B (en) Method and system for distributing object resource information by combining federal learning and reinforcement learning
CN116471072A (en) Federal service quality prediction method based on neighbor collaboration
CN111681154A (en) Color image steganography distortion function design method based on generation countermeasure network
Singh Chaotic slime mould algorithm for economic load dispatch problems
CN107612878A (en) Dynamic window system of selection and wireless network trust management system based on game theory
CN114997420B (en) Federal learning system and method based on segmentation learning and differential privacy fusion
CN115033780A (en) Privacy protection cross-domain recommendation system based on federal learning
Yang et al. Federated continual learning via knowledge fusion: A survey
CN114282692A (en) Model training method and system for longitudinal federal learning
CN116187469A (en) Client member reasoning attack method based on federal distillation learning framework
CN116227628A (en) Federal learning method and system based on unchanged risk minimization game mechanism
CN114330464A (en) Multi-terminal collaborative training algorithm and system fusing meta learning
CN112101555A (en) Method and device for multi-party combined training model
CN117113274A (en) Heterogeneous network data-free fusion method and system based on federal distillation
CN117494183A (en) Knowledge distillation-based privacy data generation method and system for generating countermeasure network model
CN114723012B (en) Calculation method and device based on distributed training system
Atlam et al. ANFIS for risk estimation in risk-based access control model for smart homes
CN116011540A (en) Federal learning method and device based on social grouping
CN116415064A (en) Training method and device for double-target-domain recommendation model

Legal Events

Date Code Title Description
FG Patent granted

Effective date: 20240308