CN111241582A - Data privacy protection method and device and computer readable storage medium - Google Patents

Data privacy protection method and device and computer readable storage medium Download PDF

Info

Publication number
CN111241582A
CN111241582A CN202010029622.2A CN202010029622A CN111241582A CN 111241582 A CN111241582 A CN 111241582A CN 202010029622 A CN202010029622 A CN 202010029622A CN 111241582 A CN111241582 A CN 111241582A
Authority
CN
China
Prior art keywords
weight
model
participant
cloud server
importance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010029622.2A
Other languages
Chinese (zh)
Other versions
CN111241582B (en
Inventor
李洪伟
丁勇
刘小源
徐国文
刘森
龚丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202010029622.2A priority Critical patent/CN111241582B/en
Publication of CN111241582A publication Critical patent/CN111241582A/en
Application granted granted Critical
Publication of CN111241582B publication Critical patent/CN111241582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data privacy protection method, a device and a computer readable storage medium, wherein the data privacy protection method comprises the following steps: the method comprises the steps that a participant obtains a first weight sent by a cloud server; the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and determines a second weight of each neuron in the local model after the iteration of the local model is completed; the participant determines the weight importance corresponding to the second weight based on the second weight and a weight importance algorithm; and the participants determine the disturbance weight after the second weight is interfered based on the second weight, the weight importance and a disturbance mechanism, and send the disturbance weight to a cloud server. According to the method and the device, the privacy protection level of the data shared by the participants is improved, and the accuracy of the joint training model of the participants and the cloud server is improved.

Description

Data privacy protection method and device and computer readable storage medium
Technical Field
The invention relates to the technical field of Internet of things, in particular to a data privacy protection method and device and a computer readable storage medium.
Background
With the development of communication networks, a large number of internet of things devices continuously access the network and generate a large amount of data. As a mainstream method in the field of big data analysis, deep learning is being closely combined with the application of the internet of things, and the method is widely applied to multiple fields such as smart cities, smart homes, unmanned driving and the like.
Traditional centralized deep learning requires users to submit data to a data center, and then the cloud server trains the data uniformly. However, these data are likely to be abused by the model trainer, inferring more private information about the user. Distributed deep learning allows multiple participants to jointly learn a common model without disclosing the data set. However, in a distributed deep learning environment, in the process of sharing data between a cloud server and participants, sensitive information may still be leaked due to poor privacy protection of data shared by the cloud server and the participants.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a data privacy protection method, a data privacy protection device and a computer readable storage medium, and aims to solve the technical problem of poor data privacy protection.
In order to achieve the above object, the present invention provides a data privacy protection method, including the following steps:
the method comprises the steps that a participant obtains a first weight sent by a cloud server;
the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and determines a second weight of each neuron in the local model after the iteration of the local model is completed;
the participant determines the weight importance corresponding to the second weight based on the second weight and a weight importance algorithm;
and the participants determine the disturbance weight after the second weight is interfered based on the second weight, the weight importance and a disturbance mechanism, and send the disturbance weight to a cloud server.
Optionally, after the step of determining, by the participant, a disturbance weight after the disturbance of the second weight based on the second weight, the weight importance, and a disturbance mechanism, and sending the disturbance weight to a cloud server, the method further includes:
the cloud server receives the disturbance weight sent by the participant;
the cloud server inputs the disturbance weight to a first target model in a model ring of the cloud server, and obtains a third weight of a second target model of the model ring, wherein the second target model is a previous model of the first target model in the model ring;
and the cloud server sends the third weight to the participant so that the participant receives the third weight, the third weight is used as the first weight, the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and a step of determining a second weight of each neuron in the local model after the iteration of the local model is completed is executed.
Optionally, after the step of the cloud server inputting the disturbance weight to the first target model in the model ring of the cloud server and obtaining the third weight of the second target model of the model ring, the method further includes:
the cloud server acquires the first target model;
the cloud server takes the first target model as the second target model, and executes the step of obtaining the third weight of the second target model of the model ring.
Optionally, before the step of inputting, by the cloud server, the disturbance weight to a first target model in a model ring of the cloud server and obtaining a third weight of a second target model of the model ring, the method further includes:
the cloud server acquires non-private data;
the cloud server initializes target models in the model ring based on the non-private data, the target models including the first target model and the second target model.
Optionally, the step of determining, by the participant, a disturbance weight after the second weight is interfered based on the second weight, the weight importance, and a disturbance mechanism, and sending the disturbance weight to a cloud server includes:
the participant normalizes the weight importance based on the weight importance and a disturbance mechanism, and determines a weight normalization result;
the participant acquires a total privacy budget, and determines a privacy budget corresponding to the weight importance based on the weight normalization result and the total privacy budget;
the participant determines a perturbation weight after the second weight is interfered based on the second weight and the privacy budget.
Optionally, the step of determining, by the participant, a perturbation weight after the interference of the second weight based on the second weight and the privacy budget includes:
and the participant perturbs the second weight based on the privacy budget and a differential privacy mechanism, and determines a perturbation weight after the second weight is interfered.
Optionally, the step of determining, by the participant, the weight importance corresponding to the second weight based on the second weight and a weight importance algorithm includes:
the participant determines neuron significance of the local model based on the second weight and a weight significance algorithm;
the participant determines a weight importance corresponding to the second weight based on the neuron importance.
Optionally, the step of determining a second weight of each neuron in the local model after iterating the local model is completed includes:
an iteration step of obtaining the local model by the participant;
and if the participant detects that the iteration step reaches the preset step, determining a second weight of each neuron in the local model after the local model is iterated.
In addition, to achieve the above object, the present invention provides a data privacy protecting apparatus, including: the system comprises a memory, a processor and a data privacy protection program stored on the memory and capable of running on the processor, wherein the data privacy protection program realizes the steps of the data privacy protection method when being executed by the processor.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, on which a data privacy protecting program is stored, and the data privacy protecting program, when executed by a processor, implements the steps of the data privacy protecting method as described above.
According to the method, a participant acquires a first weight sent by a cloud server; the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and determines a second weight of each neuron in the local model after the iteration of the local model is completed; the participant determines the weight importance corresponding to the second weight based on the second weight and a weight importance algorithm; the participant determines the disturbance weight after the disturbance of the second weight based on the second weight, the weight importance and the disturbance mechanism, sends the disturbance weight to the cloud server, disturbs the data, namely the weight, shared between the participant and the cloud server differently by combining a weight importance algorithm and the disturbance mechanism, distributes less disturbance noise to the weight with higher importance, and injects more disturbance noise to the weight with low importance, so that the privacy protection level of the data shared by the participant and the cloud server is improved, and meanwhile, the accuracy of a combined training model of the participant and the cloud server is improved.
Drawings
Fig. 1 is a schematic structural diagram of a data privacy protection apparatus in a hardware operating environment according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a first embodiment of a data privacy protection method according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a data privacy protection apparatus in a hardware operating environment according to an embodiment of the present invention.
The data privacy protection device in the embodiment of the invention can be a PC, and can also be a mobile terminal device with a display function, such as a smart phone, a tablet computer, a portable computer and the like.
As shown in fig. 1, the data privacy protecting apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the data privacy protecting apparatus may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like.
Those skilled in the art will appreciate that the data privacy device architecture shown in fig. 1 does not constitute a limitation of the data privacy device and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a data privacy protecting program.
In the data privacy protection apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to invoke a data privacy preserving program stored in the memory 1005.
In this embodiment, the data privacy protecting apparatus includes: the system comprises a memory 1005, a processor 1001 and a data privacy protection program stored on the memory 1005 and operable on the processor 1001, wherein when the processor 1001 calls the data privacy protection program stored in the memory 1005, the following operations are performed:
the method comprises the steps that a participant obtains a first weight sent by a cloud server;
the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and determines a second weight of each neuron in the local model after the iteration of the local model is completed;
the participant determines the weight importance corresponding to the second weight based on the second weight and a weight importance algorithm;
and the participants determine the disturbance weight after the second weight is interfered based on the second weight, the weight importance and a disturbance mechanism, and send the disturbance weight to a cloud server.
Further, the processor 1001 may call the data privacy protection program stored in the memory 1005, and further perform the following operations:
the cloud server receives the disturbance weight sent by the participant;
the cloud server inputs the disturbance weight to a first target model in a model ring of the cloud server, and obtains a third weight of a second target model of the model ring, wherein the second target model is a previous model of the first target model in the model ring;
and the cloud server sends the third weight to the participant so that the participant receives the third weight, the third weight is used as the first weight, the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and a step of determining a second weight of each neuron in the local model after the iteration of the local model is completed is executed.
Further, the processor 1001 may call the data privacy protection program stored in the memory 1005, and further perform the following operations:
the cloud server acquires the first target model;
the cloud server takes the first target model as the second target model, and executes the step of obtaining the third weight of the second target model of the model ring.
Further, the processor 1001 may call the data privacy protection program stored in the memory 1005, and further perform the following operations:
the cloud server acquires non-private data;
the cloud server initializes target models in the model ring based on the non-private data, the target models including the first target model and the second target model.
Further, the processor 1001 may call the data privacy protection program stored in the memory 1005, and further perform the following operations:
the participant normalizes the weight importance based on the weight importance and a disturbance mechanism, and determines a weight normalization result;
the participant acquires a total privacy budget, and determines a privacy budget corresponding to the weight importance based on the weight normalization result and the total privacy budget;
the participant determines a perturbation weight after the second weight is interfered based on the second weight and the privacy budget.
Further, the processor 1001 may call the data privacy protection program stored in the memory 1005, and further perform the following operations:
and the participant perturbs the second weight based on the privacy budget and a differential privacy mechanism, and determines a perturbation weight after the second weight is interfered.
Further, the processor 1001 may call the data privacy protection program stored in the memory 1005, and further perform the following operations:
the participant determines neuron significance of the local model based on the second weight and a weight significance algorithm;
the participant determines a weight importance corresponding to the second weight based on the neuron importance.
Further, the processor 1001 may call the data privacy protection program stored in the memory 1005, and further perform the following operations:
an iteration step of obtaining the local model by the participant;
and if the participant detects that the iteration step reaches the preset step, determining a second weight of each neuron in the local model after the local model is iterated.
The invention further provides a data privacy protection method, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the data privacy protection method of the invention.
In this embodiment, the data privacy protection method includes the following steps:
the system architecture applicable to the embodiment of the invention comprises a cloud server and a plurality of participants. In the technical scheme of the embodiment, each participant trains a local model with public non-private data in advance in a pre-training stage so as to initialize the local model; then, operating a weight importance algorithm to quantify the importance of the weight in the deep learning model to the model prediction; in the privacy protection stage, a differential privacy mechanism is combined to disturb the weight value of the local model differently; the perturbed local model weight values are then uploaded to the server and a new training model is requested. Deploying a plurality of deep learning models with the same type and different parameter values in the cloud server, and receiving a local model sent by a participant and replacing one of the deep learning models; and when the server processes the model request of the client, the model uploaded by other participants is sent to the participant requesting the model.
The cloud server may be a computer or other network device. The cloud server may be an independent device, or may be a server cluster formed by a plurality of servers. Preferably, the cloud server may perform information processing by using a cloud computing technology. The participants are deployed on a terminal, and the terminal may be an electronic device with a wireless communication function, such as a mobile phone, a tablet computer, or a dedicated handheld device, or may be a device connected to the internet in a wired access manner, such as a Personal Computer (PC), a notebook computer, or a server. The terminal may be an independent device, or a terminal cluster formed by a plurality of terminals. Preferably, the terminal can perform information processing by using a cloud computing technology. The participants may communicate with the cloud server through an INTERNET network, or may communicate with the cloud server through a Global System for mobile communications (GSM), a Long Term Evolution (LTE) System, or other mobile communication systems.
Step S10, the participant obtains a first weight sent by the cloud server;
in one embodiment, a plurality of deep learning models are deployed in a cloud server, the plurality of deep learning models form a model ring, the cloud server receives model parameters (weights) uploaded by participants and inputs the model parameters into one deep learning model in the model ring, when the model parameters uploaded by the next participant are received, the model parameters of the next participant are input into the next deep learning model, and so on, the cloud server receives the weight parameters of the local model sent by the participants in a rotating manner and inputs the weight parameters into the deep learning model in the model ring in a rotating manner at the same time, so that the deep learning model in the cloud server is continuously updated. Specifically, if M deep learning models are deployed in the cloud server, and the cloud server has initialized all the deep learning models on the model ring, once the model parameters sent by the participant are received, the model parameters are input to the mth deep learning model on the model ring (M ∈ [0, M-1]), and then the (M-1) th model is sent to the participant for the next round of training. Notably, the m-1 st model is a deep learning model that inputs model parameters uploaded by a participant who last interacted with the cloud server. After interacting with each participant, m ← m +1 is executed.
The cloud server acquires the weight parameter uploaded by the other participant who interacts with the cloud server, and inputs the weight parameter into the target model of the model ring of the cloud server. When the participant sends the weight parameters of the local model to the cloud server, and simultaneously requests the cloud server for new local model parameters, namely the first weight, the cloud server obtains the first weight in the target model and sends the first weight to the participant, and the participant obtains the first weight sent by the cloud server.
Step S20, the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and determines a second weight of each neuron in the local model after the iteration of the local model is completed;
in one embodiment, after receiving the first weight sent by the cloud server, the participant starts training the local model, inputs the first weight into the local model, and trains the local model according to a certain model training rule. Specifically, the first weight is input into the local model, the user data is input into the local model as a training sample, the forward propagation process operation is firstly performed on the local model, all the activation values in the local model are determined, and the activation values comprise output control quantity of a neural network hidden layer and the like. And then, carrying out operation in a back propagation process on the local model, and determining a new weight and a threshold of each node aiming at each node of each layer in the local model, wherein the new weight and the threshold indicate how much influence is generated on the weight and the threshold corresponding to the final output control quantity of the output layer by the node. And thus, continuously carrying out a series of operations of the forward propagation process and the backward propagation process on the local model until the local model is iterated, namely training is completed. Finally, a second weight of the local model is determined and output. The training process of the local model consists of a forward propagation process and a backward propagation process, wherein in the forward propagation process, an input mode is processed layer by layer from an input layer through hidden unit layers and is transferred to an output layer, and the state of each layer of neurons only affects the state of the next layer of neurons; and in the back propagation process, the error signals are returned along the original connecting path, and the weight and the threshold of each neuron are modified to minimize the error of the weight and the threshold of each neuron.
Step S30, the participant determines a weight importance corresponding to the second weight based on the second weight and a weight importance algorithm;
in one embodiment, after the participant iterates the local model, a weight importance algorithm is run to determine the weight importance corresponding to the second weight. Specifically, the specific process of operating the weight importance algorithm to determine the weight importance corresponding to the second weight is as follows:
1) initializing the weight importance matrix: participants set a weight importance matrix of gamma x gamma
Figure BDA0002362318580000094
And each element therein
Figure BDA0002362318580000096
Are initialized to zero, wherein,
Figure BDA0002362318580000095
representative is neuron apAnd neuron aqWeight w betweenp,qOf interest, gamma means the modelThe total number of neurons in;
2) computing output layer neuron significance: first, calculating each neuron a in the output layerpImportance to model predictions
Figure BDA0002362318580000091
Namely, the importance of the output layer neuron to the model prediction value is the output value of the neuron, namely, the output value of the model, and the calculation formula of the importance of the output layer neuron is as follows:
Figure BDA0002362318580000092
wherein x isiFrom training samples
Figure BDA0002362318580000093
ω is the second weight.
3) Calculating weight importance: the participator recurs layer by layer from back to front, and the weight w between adjacent layers is calculatedp,qThe significance of model predictors, assumed to be layer h-1 neuron apAnd layer h neurons aqLayer h-1 neurons apAnd layer h neurons apImportance of weight between
Figure BDA0002362318580000097
The calculation formula is as follows:
Figure BDA0002362318580000101
wherein, a in the formulapRefers to neurons a in other layers than the output layerpThe output value of (1).
4) Calculate the importance of each layer of neurons except the output layer (including hidden layer neuron importance and input layer neuron importance): calculating the importance of the neurons of the remaining layers except the output layer, neuron a of layer (h-1)pOf importance is
Figure BDA0002362318580000102
Figure BDA0002362318580000103
5) And repeating the step 3) and the step 4) until all the weight importance in the local model is calculated, and finally determining the weight importance corresponding to the second weight. It will be appreciated that the significance of the neurons is calculated as an intermediate parameter in the calculation of the significance matrix.
Step S40, the participant determines a disturbance weight after the second weight is interfered based on the second weight, the weight importance and a disturbance mechanism, and sends the disturbance weight to a cloud server.
In an embodiment, after the weight importance algorithm is operated to determine the weight importance corresponding to the second weight, the participant operates a disturbance mechanism customized by the participant, the second weight is disturbed based on different weight importance to obtain a disturbance weight, and the disturbed disturbance weight is sent to the cloud server to be used for the cloud server to train models of different participants in combination with other participants, wherein the weight importance determines the degree of disturbance on the second weight, and the higher the weight importance is, the lower the degree of disturbance is, so as to ensure the accuracy of data. Specifically, based on different weight importance, the disturbance mechanism is operated to interfere with the second weight, and the specific process of obtaining the disturbance weight is as follows:
1) weight importance normalization: for each parameter in weight importance
Figure BDA0002362318580000105
The normalization operation was performed as follows:
Figure BDA0002362318580000104
wherein the content of the first and second substances,
Figure BDA0002362318580000106
and
Figure BDA0002362318580000107
respectively, the maximum and minimum values of the weight importance, the result of the normalization operation being to limit all parameters of the weight importance to a preset range, e.g. [0.5, 1]]Between the intervals, the weight importance is expressed in a matrix manner.
2) Adjusting the privacy budget: for each weight importance in the weight importance matrix
Figure BDA0002362318580000108
Setting a privacy budget εp,qSo that less disturbance noise is allocated to the weight with higher importance, and the purpose is to improve the accuracy of the model; injecting more disturbance noise into the weight with low importance, and aiming at improving the privacy protection level of the data of the model parameters:
Figure BDA0002362318580000111
wherein epsilonTRefers to the total privacy budget.
3) Protecting local model parameters: at this time, the model iteration step reaches a maximum step s, the weight obtained by the maximum step is a second weight, and in order to protect the training data of the participants from being leaked or speculated, a differential privacy mechanism is adopted to differentially disturb the second weight, that is: adding the adjusted Laplace noise to the weights with different importance to finally obtain the disturbance weight
Figure BDA0002362318580000112
The method comprises the following specific steps:
Figure BDA0002362318580000113
wherein the content of the first and second substances,
Figure BDA0002362318580000114
refers to sampling from a laplacian distribution, the distribution satisfying a mean of 0, the size of the sample value being given by a parameter
Figure BDA0002362318580000115
To decide. Wherein epsilonp,qIs an adjusted privacy budget, generally speaking, a larger privacy budget means a smaller noise value, which results in higher system accuracy and also means a weaker privacy protection level; Δ f is the sensitivity of the model weights, in general, given two neighbor databases that differ by at most one piece of data: d1And D2The sensitivity process of the stochastic algorithm Γ is calculated as follows:
Figure BDA0002362318580000116
according to the data privacy protection method provided by the embodiment, a participant acquires a first weight sent by a cloud server; the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and determines a second weight of each neuron in the local model after the local model is iterated; the participant determines the weight importance corresponding to the second weight based on the second weight and a weight importance algorithm; and the participants determine the disturbance weight after the disturbance of the second weight based on the second weight, the weight importance and the disturbance mechanism, send the disturbance weight to the cloud server, disturbs the data, namely the weight, shared between the participants and the cloud server differently by combining a weight importance algorithm and the disturbance mechanism, distributes less disturbance noise to the weight with higher importance, and injects more disturbance noise to the weight with low importance, so that the privacy protection level of the data shared by the participants and the cloud server is improved, and meanwhile, the accuracy of a joint training model of the participants and the cloud server is improved.
Based on the first embodiment, a second embodiment of the data privacy protecting method according to the present invention is proposed, in this embodiment, after step S40, the method further includes:
step a, the cloud server receives the disturbance weight sent by the participant;
b, the cloud server inputs the disturbance weight to a first target model in a model ring of the cloud server, and obtains a third weight of a second target model of the model ring, wherein the second target model is a previous model of the first target model in the model ring;
and c, the cloud server sends the third weight to the participant so that the participant receives the third weight, the third weight is used as the first weight, the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and the second weight of each neuron in the local model after the iteration of the local model is finished is determined.
In one embodiment, a plurality of deep learning models are deployed in a cloud server, the deep learning models form a model ring, and when a participant completes iteration of a local model and disturbs parameters of the local model to obtain a disturbance weight, the participant sends the disturbance weight to the cloud server and requests a new weight parameter to perform next joint training. The cloud server receives the disturbance weight uploaded by the participant and inputs the disturbance weight into a first target model in the model ring. Then, a weight parameter (i.e., a third weight) of a previous model of the first model, i.e., the second target model, in the model loop is obtained. And the cloud server sends the weight parameter (third weight) of the previous model (second target model) of the first target model to the participant for the participant to perform next joint training, namely the participant receives the third weight, the third weight is used as the first weight, the participant iterates the local model corresponding to the participant based on the first weight and the model training rule, and the second weight of each neuron in the local model after the iterative local model is completed is determined.
Further, in an embodiment, before the step of the cloud server inputting the disturbance weight to the first target model in the model ring of the cloud server and obtaining the third weight of the second target model of the model ring, the method further includes:
the cloud server acquires non-private data;
the cloud server initializes target models in the model ring based on the non-private data, the target models including the first target model and the second target model.
Further, starting from a joint training model of the participants and the cloud server, a plurality of participants need to be combined to obtain a more accurate learning model which does not cause local overfitting, and a deep learning network structure, such as a Convolutional Neural Network (CNN), a cyclic neural network (RNN) and the like, is negotiated between the participants and the cloud server in advance. After a network structure to be trained is negotiated between the cloud server and participants, the cloud server initializes the deep learning models in the model ring, and the cloud server trains the deep learning models by using public data or historical data (non-private data) to obtain a preset number of deep learning models.
Further, after the participants finish iteration on the local model and disturb the parameters of the local model to obtain disturbance weights, the participants send the disturbance weights to the cloud server and request new weight parameters for next joint training. The cloud server receives the disturbance weight uploaded by the participant, inputs the disturbance weight into a first target model in the model ring, and acquires a second target model of the model ring. And if the second target model is the deep learning model initialized by the cloud server, sending a continuous training instruction to the participant so that the participant can use the disturbance weight as the first weight, executing the local model corresponding to the participant based on the first weight and the model training rule, and determining the second weight of each neuron in the local model after the local model is iterated, namely if the second target model is the deep learning model initialized by the cloud server, the cloud server informs the participant to continuously use the current model for the next round of training.
Further, in an embodiment, after the step of the cloud server inputting the disturbance weight to the first target model in the model ring of the cloud server and obtaining the third weight of the second target model of the model ring, the method further includes:
step d, the cloud server acquires the first target model;
and e, the cloud server takes the first target model as the second target model, and executes the step of obtaining the third weight of the second target model of the model ring.
In an embodiment, the cloud server inputs the disturbance weight to a first target model in a model ring of the cloud server, the first target model is used as a second target model for a subsequent cloud server to obtain a third weight of the second target model of the model ring, the cloud server sends the third weight to a participant, so that the participant receives the third weight, the third weight is used as the first weight, a local model corresponding to the participant is iterated based on the first weight and a model training rule, and the second weight of each neuron in the local model after the iteration of the local model is completed is determined.
That is to say, when the cloud server receives the model parameters uploaded by the next participant, the model parameters of the next participant are input into the next deep learning model, and so on, the cloud server receives the weight parameters of the local model sent by the participant in turn and inputs the weight parameters into the deep learning model in the model ring in turn at the same time, so that the deep learning model in the cloud server is continuously updated. Specifically, if M deep learning models are deployed in the cloud server, and the cloud server has initialized all the deep learning models on the model ring, once the model parameters sent by the participant are received, the model parameters are input to the mth deep learning model on the model ring (M ∈ [0, M-1]), and then the (M-1) th model is sent to the participant for the next round of training. Notably, the m-1 st model is a deep learning model that inputs model parameters uploaded by a participant who last interacted with the cloud server. After interacting with each participant, m ← m +1 is executed.
Further, in an embodiment, the step of determining, by the participant, a disturbance weight after the disturbance of the second weight based on the second weight, the weight importance, and a disturbance mechanism, and sending the disturbance weight to the cloud server includes:
f, the participants perform normalization operation on the weight importance based on the weight importance and a disturbance mechanism, and determine a weight normalization result;
in an embodiment, after the operation weight importance algorithm determines the weight importance corresponding to the second weight, the participant operates a disturbance mechanism customized by the participant, and each parameter in the weight importance is subjected to
Figure BDA0002362318580000143
And carrying out normalization operation to obtain a weight normalization result, wherein the calculation formula of the weight normalization result is as follows:
Figure BDA0002362318580000141
wherein the content of the first and second substances,
Figure BDA0002362318580000144
and
Figure BDA0002362318580000145
respectively, the maximum and minimum values of the weight importance, the result of the normalization operation being to limit all parameters of the weight importance to a preset range, e.g. [0.5, 1]]Between the intervals, the weight importance is expressed in a matrix manner.
Step g, the participant acquires a total privacy budget, and determines a privacy budget corresponding to the weight importance based on the weight normalization result and the total privacy budget;
in one embodiment, the participants obtain a total privacy budget for each weight importance in the weight importance matrix
Figure BDA0002362318580000146
Setting a privacy budget εp,qSo that less disturbance noise is allocated to the weight with higher importance, and the purpose is to improve the accuracy of the model; to the weight with low importanceInjecting more disturbance noise to improve the privacy protection level of data of model parameters, wherein the calculation formulas of privacy budgets corresponding to different weight importance are as follows:
Figure BDA0002362318580000142
wherein epsilonTIs the total privacy budget.
And h, the participant determines the disturbance weight after the second weight is interfered based on the second weight and the privacy budget.
In one embodiment, when the model iteration step reaches the maximum step s, the weight obtained by the maximum step is the second weight, and in order to protect the training data of the participant from being leaked or speculated, the second weight is perturbed differently, that is: adding the adjusted Laplace noise to the weights with different importance to finally obtain the disturbance weight
Figure BDA0002362318580000151
The method comprises the following specific steps:
Figure BDA0002362318580000152
wherein the content of the first and second substances,
Figure BDA0002362318580000153
refers to sampling from a laplacian distribution, the distribution satisfying a mean of 0, the size of the sample value being given by a parameter
Figure BDA0002362318580000154
To decide. Wherein epsilonp,qIs an adjusted privacy budget, generally speaking, a larger privacy budget means a smaller noise value, which results in higher system accuracy and also means a weaker privacy protection level; Δ f is the sensitivity of the model weights, in general, given two neighbor databases that differ by at most one piece of data: d1And D2The sensitivity process of the stochastic algorithm Γ is calculated as follows:
Figure BDA0002362318580000155
further, in an embodiment, the step of determining, by the participant based on the second weight and the privacy budget, a perturbation weight after the interference with the second weight includes:
and i, disturbing the second weight based on the privacy budget and a differential privacy mechanism, and determining a disturbance weight after the second weight is disturbed.
In an embodiment, when the model iteration step reaches the maximum step s, the weight obtained by the maximum step is the second weight, and in order to protect the training data of the participant from being leaked or speculated, the second weight is perturbed differentially by using a differential privacy mechanism, that is: adding the adjusted Laplace noise to the weights with different importance to finally obtain the disturbance weight
Figure BDA0002362318580000156
The method comprises the following specific steps:
Figure BDA0002362318580000157
wherein the content of the first and second substances,
Figure BDA0002362318580000158
refers to sampling from a laplacian distribution, the distribution satisfying a mean of 0, the size of the sample value being given by a parameter
Figure BDA0002362318580000159
To decide. Wherein epsilonp,qIs an adjusted privacy budget, generally speaking, a larger privacy budget means a smaller noise value, which results in higher system accuracy and also means a weaker privacy protection level; Δ f is the sensitivity of the model weights, in general, given two neighbor databases that differ by at most one piece of data: d1And D2The sensitivity process of the stochastic algorithm Γ is calculated as follows:
Figure BDA00023623185800001510
in the data privacy protection method provided by this embodiment, the cloud server receives the disturbance weight sent by the participant; the cloud server inputs the disturbance weight to a first target model in a model ring of the cloud server, and obtains a third weight of a second target model of the model ring, wherein the second target model is a previous model of the first target model in the model ring; and the cloud server sends the third weight to the participant so that the participant receives the third weight, the third weight is used as the first weight, the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and the second weight of each neuron in the local model after the iteration of the local model is completed is determined.
Based on the first embodiment, a third embodiment of the data privacy protecting method of the present invention is proposed, in this embodiment, step S30 includes:
step j, the participant determines the neuron importance of the local model based on the second weight and a weight importance algorithm;
and k, determining the weight importance corresponding to the second weight by the participant based on the neuron importance.
In one embodiment, after the participant iterates the local model, a weight importance algorithm is run, first, the importance of the neuron is determined, and then, based on the importance of the neuron, the importance of the weight corresponding to the second weight is determined. It is understood that neuron significance is calculated as an intermediate parameter for calculating weight significance.
Specifically, the specific process of operating the weight importance algorithm to determine the weight importance corresponding to the second weight is as follows:
1) initializing the weight importance matrix: participants set a weight importance matrix of gamma x gamma
Figure BDA0002362318580000163
And each element therein
Figure BDA0002362318580000164
Are initialized to zero, wherein,
Figure BDA0002362318580000165
representative is neuron apAnd neuron aqWeight w betweenp,qGamma refers to the total number of neurons in the model;
2) computing output layer neuron significance: first, calculating each neuron a in the output layerpImportance to model predictions
Figure BDA0002362318580000166
Namely, the importance of the output layer neuron to the model prediction value is the output value of the neuron, namely, the output value of the model, and the calculation formula of the importance of the output layer neuron is as follows:
Figure BDA0002362318580000161
wherein x isiFrom training samples
Figure BDA0002362318580000162
ω is the second weight.
3) Calculating weight importance: the participator recurs layer by layer from back to front, and the weight w between adjacent layers is calculatedp,qThe significance of model predictors, assumed to be layer h-1 neuron apAnd h layerNeuron aqLayer h-1 neurons apAnd layer h neurons apImportance of weight between
Figure BDA0002362318580000174
The calculation formula is as follows:
Figure BDA0002362318580000171
wherein, a in the formulapRefers to neurons a in other layers than the output layerpThe output value of (1).
4) Calculate the importance of each layer of neurons except the output layer (including hidden layer neuron importance and input layer neuron importance): calculating the importance of the neurons of the remaining layers except the output layer, neuron a of layer (h-1)pOf importance is
Figure BDA0002362318580000172
Figure BDA0002362318580000173
5) And repeating the step 3) and the step 4) until all the weight importance in the local model is calculated, and finally determining the weight importance corresponding to the second weight.
Further, in an embodiment, the step of determining the second weight of each neuron in the local model after iterating the local model includes:
step m, the participant obtains an iteration step of iterating the local model;
and n, if the participant detects that the iteration step reaches the preset step, determining second weights of all neurons in the local model after the local model is iterated.
In one embodiment, the participant terminal detects whether the iteration local model reaches the preset step, when the model iteration step reaches the maximum step s, the maximum step s is the preset step, the iteration local model is stopped, and the second weight of each neuron in the local model after the iteration local model is completed is determined.
In the data privacy protection method provided by this embodiment, the neuron importance of the local model is determined by the participant based on the second weight and the weight importance algorithm; and the participant determines the weight importance corresponding to the second weight based on the neuron importance, and determines the weight importance of different weights through a weight importance algorithm, so that data, namely the weights, shared between the participant and the cloud server can be disturbed differently, less disturbance noise is distributed to the weights with higher importance, more disturbance noise is injected into the weights with low importance, the privacy protection level of mutually shared data is improved, and meanwhile, the accuracy of a joint training model of the participant and the cloud server is improved.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a data privacy protection program is stored, and when executed by a processor, the data privacy protection program implements the steps of the data privacy protection method according to any one of the above.
The specific embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the data privacy protecting method described above, and details are not repeated herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A data privacy protection method is characterized by comprising the following steps:
the method comprises the steps that a participant obtains a first weight sent by a cloud server;
the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and determines a second weight of each neuron in the local model after the iteration of the local model is completed;
the participant determines the weight importance corresponding to the second weight based on the second weight and a weight importance algorithm;
and the participants determine the disturbance weight after the second weight is interfered based on the second weight, the weight importance and a disturbance mechanism, and send the disturbance weight to a cloud server.
2. The data privacy protection method of claim 1, wherein after the step of determining, by the participant, a perturbation weight after the second weight is interfered based on the second weight, the weight importance, and a perturbation mechanism, and sending the perturbation weight to a cloud server, the method further comprises:
the cloud server receives the disturbance weight sent by the participant;
the cloud server inputs the disturbance weight to a first target model in a model ring of the cloud server, and obtains a third weight of a second target model of the model ring, wherein the second target model is a previous model of the first target model in the model ring;
and the cloud server sends the third weight to the participant so that the participant receives the third weight, the third weight is used as the first weight, the participant iterates a local model corresponding to the participant based on the first weight and a model training rule, and a step of determining a second weight of each neuron in the local model after the iteration of the local model is completed is executed.
3. The data privacy protection method of claim 2, wherein the step of the cloud server inputting the perturbation weight to a first target model in a model ring of the cloud server and obtaining a third weight of a second target model of the model ring is followed by further comprising:
the cloud server acquires the first target model;
the cloud server takes the first target model as the second target model, and executes the step of obtaining the third weight of the second target model of the model ring.
4. The data privacy protection method of claim 2, wherein the cloud server further comprises, before the step of inputting the perturbation weight to a first target model in a model ring of the cloud server and obtaining a third weight of a second target model of the model ring:
the cloud server acquires non-private data;
the cloud server initializes target models in the model ring based on the non-private data, the target models including the first target model and the second target model.
5. The data privacy protection method of claim 1, wherein the step of determining, by the participant, a perturbation weight after the second weight is interfered based on the second weight, the weight importance, and a perturbation mechanism, and sending the perturbation weight to a cloud server comprises:
the participant normalizes the weight importance based on the weight importance and a disturbance mechanism, and determines a weight normalization result;
the participant acquires a total privacy budget, and determines a privacy budget corresponding to the weight importance based on the weight normalization result and the total privacy budget;
the participant determines a perturbation weight after the second weight is interfered based on the second weight and the privacy budget.
6. The data privacy protection method of claim 5, wherein the step of determining, by the participant based on the second weight and the privacy budget, a perturbation weight after the second weight is interfered with comprises:
and the participant perturbs the second weight based on the privacy budget and a differential privacy mechanism, and determines a perturbation weight after the second weight is interfered.
7. The data privacy protection method of claim 1, wherein the step of determining, by the participant, the weight importance corresponding to the second weight based on the second weight and a weight importance algorithm comprises:
the participant determines neuron significance of the local model based on the second weight and a weight significance algorithm;
the participant determines a weight importance corresponding to the second weight based on the neuron importance.
8. The method of any of claims 1 to 7, wherein the step of determining a second weight for each neuron in the local model after iterating through the local model comprises:
an iteration step of obtaining the local model by the participant;
and if the participant detects that the iteration step reaches the preset step, determining a second weight of each neuron in the local model after the local model is iterated.
9. A data privacy protecting apparatus, characterized in that the data privacy protecting apparatus comprises: memory, a processor and a data privacy preserving program stored on the memory and executable on the processor, the data privacy preserving program when executed by the processor implementing the steps of the data privacy preserving method as claimed in any one of claims 1 to 8.
10. A computer-readable storage medium, on which a data privacy protection program is stored, which when executed by a processor implements the steps of the data privacy protection method according to any one of claims 1 to 8.
CN202010029622.2A 2020-01-10 2020-01-10 Data privacy protection method and device and computer readable storage medium Active CN111241582B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010029622.2A CN111241582B (en) 2020-01-10 2020-01-10 Data privacy protection method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010029622.2A CN111241582B (en) 2020-01-10 2020-01-10 Data privacy protection method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111241582A true CN111241582A (en) 2020-06-05
CN111241582B CN111241582B (en) 2022-06-10

Family

ID=70880828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010029622.2A Active CN111241582B (en) 2020-01-10 2020-01-10 Data privacy protection method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111241582B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492428A (en) * 2018-10-29 2019-03-19 南京邮电大学 A kind of difference method for secret protection towards principal component analysis
CN109684855A (en) * 2018-12-17 2019-04-26 电子科技大学 A kind of combined depth learning training method based on secret protection technology
CN109871702A (en) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 Federal model training method, system, equipment and computer readable storage medium
CN109902506A (en) * 2019-01-08 2019-06-18 中国科学院软件研究所 A kind of local difference private data sharing method and system of more privacy budgets
CN110084380A (en) * 2019-05-10 2019-08-02 深圳市网心科技有限公司 A kind of repetitive exercise method, equipment, system and medium
CN110443063A (en) * 2019-06-26 2019-11-12 电子科技大学 The method of the federal deep learning of self adaptive protection privacy

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492428A (en) * 2018-10-29 2019-03-19 南京邮电大学 A kind of difference method for secret protection towards principal component analysis
CN109684855A (en) * 2018-12-17 2019-04-26 电子科技大学 A kind of combined depth learning training method based on secret protection technology
CN109902506A (en) * 2019-01-08 2019-06-18 中国科学院软件研究所 A kind of local difference private data sharing method and system of more privacy budgets
CN109871702A (en) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 Federal model training method, system, equipment and computer readable storage medium
CN110084380A (en) * 2019-05-10 2019-08-02 深圳市网心科技有限公司 A kind of repetitive exercise method, equipment, system and medium
CN110443063A (en) * 2019-06-26 2019-11-12 电子科技大学 The method of the federal deep learning of self adaptive protection privacy

Also Published As

Publication number Publication date
CN111241582B (en) 2022-06-10

Similar Documents

Publication Publication Date Title
WO2021047593A1 (en) Method for training recommendation model, and method and apparatus for predicting selection probability
US20230281448A1 (en) Method and apparatus for information recommendation, electronic device, computer readable storage medium and computer program product
JP7157154B2 (en) Neural Architecture Search Using Performance Prediction Neural Networks
US10474950B2 (en) Training and operation of computational models
JP7439151B2 (en) neural architecture search
CN110520871A (en) Training machine learning model
WO2019018375A1 (en) Neural architecture search for convolutional neural networks
CN109690576A (en) The training machine learning model in multiple machine learning tasks
CN111602148A (en) Regularized neural network architecture search
US11922281B2 (en) Training machine learning models using teacher annealing
CN109918684A (en) Model training method, interpretation method, relevant apparatus, equipment and storage medium
WO2021151336A1 (en) Road image target detection method based on attentional mechanism and related device
CN111667308A (en) Advertisement recommendation prediction system and method
EP4187440A1 (en) Classification model training method, hyper-parameter searching method, and device
WO2021174877A1 (en) Processing method for smart decision-based target detection model, and related device
CN106803092B (en) Method and device for determining standard problem data
CN114417174B (en) Content recommendation method, device, equipment and computer storage medium
CN110580171B (en) APP classification method, related device and product
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
CN116258657A (en) Model training method, image processing device, medium and electronic equipment
CN111178082A (en) Sentence vector generation method and device and electronic equipment
CN112446462A (en) Generation method and device of target neural network model
US20190324606A1 (en) Online training of segmentation model via interactions with interactive computing environment
CN114281976A (en) Model training method and device, electronic equipment and storage medium
CN111241582B (en) Data privacy protection method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant