CN114780999B - Deep learning data privacy protection method, system, equipment and medium - Google Patents
Deep learning data privacy protection method, system, equipment and medium Download PDFInfo
- Publication number
- CN114780999B CN114780999B CN202210700710.XA CN202210700710A CN114780999B CN 114780999 B CN114780999 B CN 114780999B CN 202210700710 A CN202210700710 A CN 202210700710A CN 114780999 B CN114780999 B CN 114780999B
- Authority
- CN
- China
- Prior art keywords
- noise
- privacy
- data set
- training data
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000013135 deep learning Methods 0.000 title claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 145
- 239000011159 matrix material Substances 0.000 claims abstract description 56
- 238000013136 deep learning model Methods 0.000 claims abstract description 53
- 230000006870 function Effects 0.000 claims abstract description 37
- 238000010276 construction Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000013079 data visualisation Methods 0.000 claims description 5
- 241001465754 Metazoa Species 0.000 description 11
- 238000013145 classification model Methods 0.000 description 9
- 238000012800 visualization Methods 0.000 description 6
- 238000011160 research Methods 0.000 description 4
- 230000007123 defense Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000008260 defense mechanism Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/40—Filling a planar surface by adding surface attributes, e.g. colour or texture
-
- G06T3/04—
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Storage Device Security (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present disclosure relates to a method, system, device and medium for deep learning data privacy protection, the method comprising the steps of: loading an original training data set and a deep learning model; giving privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix; configuring global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix; all original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set; the deep learning model is trained using a noisy data set to form a deep learning model with privacy preserving characteristics. The method constructs the objective function and the parameter training method of the noise generator, achieves the purpose of minimizing the performance difference of the model while maximizing the noise intensity added by the training data, and automatically balances the feasibility and the privacy protection intensity of the model.
Description
Technical Field
The disclosure relates to the field of deep learning models, and in particular relates to a deep learning data privacy protection method, system, device and medium.
Background
In recent years, due to the theoretical innovation of deep learning and the deep research in various fields, a plurality of commercial applications based on the deep learning technology relate to various industries, and immeasurable values are created. In order to accelerate the research and application of deep learning, numerous enterprises or research institutions disclose deep learning models that have been built. The deep learning model is built by relying on a large amount of training data, which may involve personal privacy and business confidentiality. It is therefore interesting to note whether the disclosed deep learning model reveals training data. In recent years, there has been relevant research work that demonstrates that the disclosed deep learning model runs the risk of revealing training data in some scenarios. For example, in model inversion attack, a training data representation of a designated label can be constructed by introducing a regularization means for generating a network and a loss function, the data can reveal distribution information of the training data to a certain extent, and even a personal privacy photo related to the training data is directly acquired on a face recognition model; as another example of a model update attack, the disclosed deep learning model may need to be continually updated as the distribution of the training data set shifts or expands, and such model parameter updates are reversed out of the new data set information used to update the model. Not only personal privacy and business confidentiality are exposed, but also legal issues may arise due to the data leakage problem of the deep learning model. There is therefore a need for an effective defense mechanism to alleviate the existing data leakage problem described above.
In order to solve the problem of data leakage in a deep learning model, some mainstream defense measures exist at present, such as difference privacy machine learning, training data noise adding and the like. With respect to differential privacy machine learning, the mainstream approach is to limit the variability of model output by adding noise to parameters, prevent differential attack, these protective measures present a difficult trade-off in model inversion attacks and model update attacks, i.e., the trade-off between model availability and data privacy protection strength, which, in order to protect data leakage as much as possible, may cause an unacceptable amount of degradation in model performance (e.g., accuracy), in addition, the understandability problem means that after differential privacy is introduced, it is difficult to show which training data information is protected in a visual manner, such as human face or background in a picture is protected more, because privacy concerns exist in different parts of image data or other data differently, providing such understandability would help data providers to manage the immediate risk of privacy disclosure; for training data noise addition, for example, in "privacy protection method based on medical image [ P ]. Guorihong, Saifengun, Songho, Tengten, and WangCarichi" Chinese patent CN113889232A, 2022-01-04 ", it extracts information from the personal privacy area of the original training image, then performs pixel value transformation on the extracted area by means of the complex sequence generated by Logistic chaotic system iteration, finally embeds the transformation results of all text areas into the original image, scrambles the image after the whole encryption, and protects the personal privacy area by the encryption form. The method has the problem that the training data after noise adding conversion can generate uncontrollable influence on the construction of the deep learning model, such as the usability of the model is reduced due to shielding excessive effective information.
Therefore, the above problems and the drawbacks of the corresponding solutions are addressed. The defense scheme provided by the disclosure can meet the requirement of understandability due to the fact that the training data is subjected to personalized noise adding, and if the converted data can be directly visualized, the control and examination of the privacy protection condition are facilitated; secondly, differentiation of privacy protection strength can be realized by introducing a privacy importance matrix to specify key protection areas in training data. The noise generator in the defense scheme provided by the disclosure is constructed by a deep learning technology, and the loss function of the noise generator introduces the consideration of model performance, so that the noise intensity added by training data can be maximized, and the difference of model performance can be minimized.
Disclosure of Invention
The present disclosure provides a deep learning data privacy protection method, system, device, and medium, which can solve the problems mentioned in the background art. In order to solve the technical problem, the present disclosure provides the following technical solutions:
as an aspect of the embodiments of the present disclosure, a method for protecting deep learning data privacy is provided, which includes the following steps:
loading an original training data set and a deep learning model;
giving privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix;
configuring global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
all original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set;
the deep learning model is trained using a noisy data set to form a deep learning model with privacy preserving characteristics.
Optionally, the specific step of training the noise generator according to the loss function constructed by the privacy importance matrix is as follows:
selecting a group of original training data characteristics in the original training data set;
calculating a loss value of the noise generator according to the loss function;
calculating a derivative of the loss value to a parameter of a noise generator;
updating parameters of a noise generator according to the derivative;
and repeating the steps until the specified iteration times are reached.
Optionally, the loss function is as follows:
wherein F is a deep learning model trained by an original training data set, x is the characteristics of the selected training data set, F (x) refers to the result of inputting x to obtain model output and passing through a softmax function, and G is a parameterThe noise generator is composed of a noise generator and a noise filter,representing the data noise generated from the input x,in order to be the global noise level,is a privacy importance matrix.
Optionally, before training the deep learning model by using the noisy data set, the method further includes the following steps: the deep learning model is trained using the raw training data.
Optionally, after all the original training data in the original training data set is subjected to noise addition by the noise generator to generate a noise-added data set, the method further includes the following steps:
and visualizing the noise data in the noise data set, and adjusting the privacy importance matrix and/or the parameters of the noise generator according to the privacy protection condition of the noise data.
Optionally, the specific steps of constructing the privacy importance matrix are as follows:
constructing a privacy importance matrix by marking weights in the features or attributes of the original training data with global key privacy protection;
or manually endowing privacy protection weights to partial key areas in the original training data to construct a corresponding privacy importance matrix.
As another aspect of the disclosed embodiments, there is provided a deep learning data privacy protection system, including:
the resource loading module loads an original training data set and a deep learning model;
the privacy importance configuration module is used for endowing privacy protection weight to the privacy information in the original training data set and constructing a privacy importance matrix;
a noise generator construction module that configures global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
the data conversion module is used for carrying out noise addition on all original training data in the original training data set through the noise generator to generate a noise addition data set;
and the model building module is used for training the deep learning model by using the noise-added data set so as to form the deep learning model with the privacy protection characteristic.
Optionally, the system further includes a data visualization module, where the data visualization module is configured to visualize the noisy data in the noisy data set, and adjust the privacy importance matrix and/or the parameter of the noise generator according to the privacy protection condition of the noisy data.
As another aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the deep learning data privacy protection method when executing the computer program.
As another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps of the deep learning data privacy protection method described above.
According to the method, the privacy importance matrix and the noise intensity can be adjusted, personalized privacy protection intensity is given to different information of the training data set, and a personalized noise adding function on the original data set is achieved; meanwhile, due to the fact that personalized noise is added to the original data, good visualization can be achieved, and a user can check the privacy protection condition and make adjustment according to the visualization result; the present disclosure constructs an objective function and a parameter training method of a noise generator, in which the model performance difference is minimized while maximizing the noise intensity added by training data, and the model feasibility and the privacy protection intensity are automatically equalized by using the strong expression ability of deep learning. The specific technical effects are as follows:
1) balancing model usability and privacy protection strengths
At present, some defense mechanisms mainly adjust the noise intensity manually, and the model performance is difficult to be well considered to a certain extent. The construction method of the noise generator can realize balance between model usability and privacy protection intensity.
2) Satisfying differentiated privacy protection policies for data
The privacy importance matrix is provided, and a user can endow higher privacy weight to key privacy protection characteristics or attributes of data and embody the privacy importance matrix, so that noise generated by a noise generator is influenced to endow differentiated privacy protection strength to the specified characteristics or attributes.
3) The requirement of monitoring the actual privacy protection condition of the data is met
The noise generator is mainly responsible for generating corresponding noise matrixes according to different input data, and the input data are added to form noise data, so that the noise data can be directly compared with the input data, a user can conveniently check the changed condition of the input data, and whether the requirement is met or not is judged to make new adjustment.
Drawings
Fig. 1 is a flowchart of a deep learning data privacy protection method according to embodiment 1 of the present disclosure;
fig. 2 is a flowchart of a specific implementation of step S30 according to embodiment 1 of the present disclosure;
fig. 3 is a schematic block diagram of a deep learning data privacy protection system.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
It is understood that the above-mentioned embodiments of the method of the present disclosure can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space, and the detailed description of the present disclosure is omitted.
In addition, the present disclosure also provides a deep learning data privacy protection system, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the deep learning data privacy protection methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.
The execution subject of the deep learning data privacy protection method may be a computer or other apparatuses capable of implementing deep learning data privacy protection, for example, the method may be executed by a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the deep-learning data privacy preserving method may be implemented by a processor calling computer readable instructions stored in a memory.
Example 1
As an aspect of the embodiments of the present disclosure, there is provided a deep learning data privacy protection method, as shown in fig. 1, including the following steps:
s10, loading an original training data set and a deep learning model;
s20, giving privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix;
s30, configuring global noise intensity and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
s40, all the original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set;
s60, training the deep learning model by using the noise-added data set to form the deep learning model with the privacy protection characteristic.
Based on the configuration, the embodiment of the disclosure endows personalized privacy protection strength to different information of a training data set by adjusting the privacy importance matrix and the noise strength, realizes a personalized noise adding function on an original data set, and can realize balance between model availability and privacy protection strength; a privacy importance matrix is provided, and a user can endow higher privacy weight to key privacy protection characteristics or attributes of data and embody the privacy importance matrix, so that noise generated by a noise generator is influenced to endow differentiated privacy protection strength at the specified characteristics or attributes.
The steps of the disclosed embodiments are described in detail below.
S10, loading an original training data set and a deep learning model;
the loading is the loading of system resources, namely, the required resources for constructing the deep learning model are loaded into the system, and the required resources comprise an original training data set, a structure of the deep learning model and a structure of the noise generator. And the preprocessing of the original training data set and the initialization of the parameters of the model and the generator are completed.
S20, giving privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix;
wherein, all elements of the formed privacy importance matrix are greater than or equal to 0, and the sum is 1. Two configurations can be included: the user can give higher privacy weight to key privacy protection characteristics or attributes of the data and can reflect the key privacy protection characteristics or attributes in the privacy importance matrix, so that the noise generated by the noise generator is influenced to give differentiated privacy protection strength at the specified characteristics or attributes.
In some embodiments, an auto-configuration may be employed: by specifying the features or attributes of the training data for global key privacy protection, the function is responsible for automatically labeling the relevant features or attributes with higher weights and constructing the corresponding privacy importance matrix.
In some embodiments, a manual configuration may be used: and the user manually gives higher privacy protection weight to part of key areas in the data and constructs a corresponding privacy importance matrix.
S30, configuring global noise intensity and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
in this embodiment, constructing a resource configuration step to be performed on the noise generator by the noise generator, for example, a configuration option responsible for providing a hyper-parameter required by training the generator is provided, where the configuration may include: global noise strengthThe iteration number T, the parameter update rate lambda and the like.
In this embodiment, the method further includes a training step of the noise generator:
the goal of the generator training is to maximize the noise intensity attached to the training data as much as possible while minimizing the model performance variance. Where the model performance difference refers to the difference between the performance (e.g., accuracy) of the model trained with the original training data and the performance of the model trained with the noisy training data. The objective function (also called loss function) of the constructed generator can be shown in formula (1).
In formula (1), F is a deep learning model trained by an original training data set, x is the characteristics of a selected training data set, F (x) refers to the result of inputting x to obtain model output and passing through a softmax function, and G is a parameterThe noise generator is composed of a noise generator and a noise filter,representing the data noise generated from the input x,in order to be the global noise level,is a privacy importance matrix. First itemBy reducing noisy dataThe difference in model output from the original data x to minimize the difference in model performance, the second termWith the aim of maximising the noise strength。
In some embodiments, as shown in fig. 2, the specific steps of training the noise generator according to the loss function constructed by the privacy importance matrix in step S30 are as follows:
s301: randomly selecting a group of original training data features x;
s305: and repeating the steps S301, S302, S303 and S304 until the specified iteration number T is met, thereby completing the construction of the noise generator.
S40, all original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set;
wherein, all the original training data x are processed by a noise generator to generate a corresponding noise matrixAnd form corresponding noise addition data x +。
S60, training the deep learning model by using the noise-added data set to form the deep learning model with the privacy protection characteristic.
Wherein the deep learning model is trained using the noisy data in the noisy data set in S40 until the training is completed.
In some embodiments, before training the deep learning model using the noisy data set, the method further comprises the steps of: the deep learning model is trained using the raw training data.
In some embodiments, after all the original training data in the original training data set is subjected to noise addition by the noise generator to generate the noise added data set, the method further comprises the following steps:
s50, visualizing the noise data in the noise data set, and adjusting the privacy importance matrix and/or the parameters of the noise generator according to the privacy protection condition of the noise data. Because the original data is subjected to personalized noise addition, good visualization can be met, and a user can check the privacy protection condition and make adjustment according to a visualization result. The noise generator is mainly responsible for generating corresponding noise matrixes according to different input data, and the noise matrixes are added with the input data to form noise data, so that the noise data can be directly compared with the input data, a user can conveniently check the condition that the input data is changed, and whether the requirement is met or not is judged to make a new adjustment.
Example 2
As another aspect of the disclosed embodiments, there is provided a deep learning data privacy protection system 100, as shown in fig. 3, including:
the resource loading module 1 loads an original training data set and a deep learning model;
the privacy importance configuration module 2 is used for endowing privacy protection weights to the privacy information in the original training data set and constructing a privacy importance matrix;
a noise generator construction module 3, which configures global noise intensity and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
the data conversion module 4 is used for carrying out noise addition on all original training data in the original training data set through a noise generator to generate a noise addition data set;
and the model building module 6 is used for training the deep learning model by using the noise-added data set to form the deep learning model with the privacy protection characteristic.
In some embodiments, the system 100 further comprises a data visualization module 5, configured to visualize the noisy data in the noisy data set, and adjust the privacy importance matrix and/or the parameters of the noise generator according to the privacy protection condition of the noisy data. Because the original data is subjected to personalized noise addition, good visualization can be met, and a user can check the privacy protection condition and make adjustment according to a visualization result. The noise generator is mainly responsible for generating corresponding noise matrixes according to different input data, and the noise matrixes are added with the input data to form noise data, so that the noise data can be directly compared with the input data, a user can conveniently check the condition that the input data is changed, and whether the requirement is met or not is judged to make a new adjustment.
Each module of the disclosed embodiment is described in detail below.
The resource loading module 1 loads an original training data set and a deep learning model;
the loading is the loading of system resources, namely, the required resources for constructing the deep learning model are loaded into the system, and the required resources comprise an original training data set, a structure of the deep learning model and a structure of the noise generator. And the preprocessing of the original training data set is completed, and the parameters of the model and the generator are initialized.
The privacy importance configuration module 2 is used for endowing privacy protection weights to the privacy information in the original training data set and constructing a privacy importance matrix;
wherein, all elements of the formed privacy importance matrix are greater than or equal to 0, and the sum is 1. Two configurations can be included: the user can endow higher privacy weight to key privacy protection characteristics or attributes of the data and embody the privacy importance matrix, so that the noise generated by the noise generator is influenced to endow differentiated privacy protection strength at the specified characteristics or attributes.
An auto-configuration approach can be used: by specifying the features or attributes of the training data for global key privacy protection, the function is responsible for automatically labeling the relevant features or attributes with higher weights and constructing the corresponding privacy importance matrix.
In some embodiments, a manual configuration may be used: and manually giving higher privacy protection weight to part of key areas in the data by a user, and constructing a corresponding privacy importance matrix.
A noise generator construction module 3, which configures global noise intensity and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
in this embodiment, constructing a resource configuration step to be performed on the noise generator by the noise generator, for example, a configuration option responsible for providing a hyper-parameter required by training the generator is provided, where the configuration may include: global noise strengthThe iteration number T, the parameter update rate lambda and the like.
In this embodiment, the goal of the generator training is to maximize the noise intensity added to the training data as much as possible while minimizing the model performance variance. Where the model performance difference refers to the difference between the performance (e.g., accuracy) of the model trained with the original training data and the performance of the model trained with the noisy training data. The constructed generator objective function (also called loss function) can be shown in formula (1).
In formula (1), F is a deep learning model trained by an original training data set, x is the characteristics of a selected training data set, F (x) refers to the result of inputting x to obtain model output and passing through a softmax function, and G is a parameterThe noise generator is composed of a noise generator and a noise filter,representing the data noise generated from the input x,in order to be the global noise level,is a privacy importance matrix. First itemBy reducing noisy dataThe difference in model output from the original data x to minimize the difference in model performance, the second termWith the aim of maximising the noise strength。
In some embodiments, the specific steps of training the noise generator according to the loss function constructed by the privacy importance matrix in the noise generator construction module 3 are:
s301: randomly selecting a group of original training data characteristics x;
s305: repeating steps S301, S302, S303, S304 until a specified number of iterations is metThereby completing the construction of the noise generator.
The data conversion module 4 adds noise to all original training data in the original training data set through a noise generator to generate a noise-added data set;
wherein, all the original training data x are processed by a noise generator to generate a corresponding noise matrixAnd form corresponding noise addition data x +。
And the model building module 6 is used for training the deep learning model by using the noise-added data set to form the deep learning model with the privacy protection characteristic.
And training the deep learning model by using the noise data in the noise data set in the data conversion module 4 until the training is completed.
In the following, as an illustration, the disclosed embodiment applies the above method to the deep learning animal classification model construction, and in this exemplary embodiment, the purpose is to construct a deep learning animal classification model with privacy protection effect, which is expected to be disclosed to the public but is not expected to be leaked by the training data set, because there is some private information in the data set of the classification model used for training, such as the face of the animal breeder or audience in the animal image. Therefore, the method provided by the invention is applied to construct an animal classification model. The specific process of application is as follows:
1) a developer loads an animal image data set for training a classification model through the resource loading module 1, and deeply learns the structure of the animal classification model and the structure of the noise generator. Preprocessing the data set and initializing the parameters of the model and the generator;
2) training the classification model by using an original animal image data set to obtain a deep learning model trained by using the original data set;
3) a developer endows privacy information such as human faces in the images with higher privacy protection weight through the privacy importance configuration module 2, automatically positions all human face information in a data set by using a human face target identification model, and constructs a corresponding privacy importance matrix;
4) operating a noise generator building module 3 to complete the construction of the noise generator;
5) generating a corresponding noise-added data set by an original animal image data set through a data conversion module 4, wherein private information such as human faces of the noise-added data set has noise with higher intensity;
6) whether privacy information such as human faces and the like is effectively shielded in the noise-added image is checked through the data visualization module 5, if the effect is not good, the noise intensity of the noise generator is properly improved, and a new noise generator is constructed;
7) the model building function of the noisy data set in the model building block 6 is run. And training the deep learning animal classification model by using the noise-added data set, and finally completing the construction of the animal classification model with the privacy protection characteristic.
Example 3
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the deep learning data privacy protection method of embodiment 1 when executing the computer program.
The electronic device may be embodied in the form of a general purpose computing device, which may be, for example, a server device. Components of the electronic device may include, but are not limited to: at least one processor, at least one memory, and a bus connecting different system components (including the memory and the processor).
The buses include a data bus, an address bus, and a control bus.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may further include read-only memory (ROM).
The memory may also include program means having a set of (at least one) program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor executes various functional applications and data processing by executing computer programs stored in the memory.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. Also, the electronic device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through a network adapter. The network adapter communicates with other modules of the electronic device over the bus. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 4
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the deep-learning data privacy protection method of embodiment 1.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation, the present disclosure may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps of implementing the deep learning data privacy protection method described in embodiment 1 when the program product is run on the terminal device.
Where program code for carrying out the disclosure is written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
Although embodiments of the present disclosure have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the appended claims and their equivalents.
Claims (9)
1. A deep learning data privacy protection method is characterized by comprising the following steps:
loading an original training data set and a deep learning model;
giving privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix;
configuring global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
all original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set;
training the deep learning model by using a noise-added data set to form a deep learning model with privacy protection characteristics; the specific steps of training the noise generator according to the loss function constructed by the privacy importance matrix are as follows:
selecting a group of original training data characteristics in the original training data set;
calculating a loss value of the noise generator according to the loss function;
calculating a derivative of the loss value to a parameter of a noise generator;
updating parameters of a noise generator according to the derivative;
and repeating the steps until the specified iteration times are reached.
2. The deep-learning data privacy preserving method of claim 1, wherein the loss function is as follows:
wherein F is a deep learning model trained with an original training data set, x is a feature of a selected training data set, F (x) refers to a result of inputting x to obtain model output and passing through a softmax function, G is a noise generator composed of a parameter theta, G (x, theta) represents data noise generated according to the input x, a d To global noise intensity, A x Is a privacy importance matrix.
3. The deep-learning data privacy preserving method of claim 1, further comprising, prior to training the deep-learning model using noisy data sets, the steps of: the deep learning model is trained using the raw training data.
4. The deep-learning data privacy protection method of claim 1 or 3, wherein after all original training data in the original training data set are subjected to noise addition by the noise generator to generate a noise added data set, the method further comprises the following steps:
and visualizing the noise data in the noise data set, and adjusting the privacy importance matrix and/or the parameters of the noise generator according to the privacy protection condition of the noise data.
5. The deep learning data privacy protection method as claimed in claim 1 or 3, wherein the specific steps of constructing the privacy importance matrix are as follows:
constructing a privacy importance matrix by marking weights in the features or attributes of the original training data with global key privacy protection;
or manually endowing privacy protection weights to partial key areas in the original training data to construct a corresponding privacy importance matrix.
6. A deep learning data privacy protection system, comprising:
the resource loading module loads an original training data set and a deep learning model;
the privacy importance configuration module is used for endowing privacy protection weight to the privacy information in the original training data set and constructing a privacy importance matrix;
a noise generator construction module that configures global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to the loss function constructed by the privacy importance matrix specifically includes: selecting a group of original training data characteristics in the original training data set; calculating a loss value of the noise generator according to the loss function; calculating a derivative of the loss value to a parameter of a noise generator; updating parameters of a noise generator according to the derivative until a specified number of iterations is reached;
the data conversion module is used for carrying out noise addition on all original training data in the original training data set through the noise generator to generate a noise addition data set;
and the model building module is used for training the deep learning model by using the noise-added data set so as to form the deep learning model with the privacy protection characteristic.
7. The deep-learning data privacy preserving system of claim 6, further comprising a data visualization module configured to visualize noisy data in the noisy data set, and to adjust the privacy importance matrix and/or the parameters of the noise generator according to the privacy preserving condition of the noisy data.
8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of deep learning data privacy protection of any one of claims 1 to 5 when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for deep-learning data privacy protection of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210700710.XA CN114780999B (en) | 2022-06-21 | 2022-06-21 | Deep learning data privacy protection method, system, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210700710.XA CN114780999B (en) | 2022-06-21 | 2022-06-21 | Deep learning data privacy protection method, system, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114780999A CN114780999A (en) | 2022-07-22 |
CN114780999B true CN114780999B (en) | 2022-09-27 |
Family
ID=82420315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210700710.XA Active CN114780999B (en) | 2022-06-21 | 2022-06-21 | Deep learning data privacy protection method, system, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114780999B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115238827B (en) * | 2022-09-16 | 2022-11-25 | 支付宝(杭州)信息技术有限公司 | Privacy-protecting sample detection system training method and device |
CN116761164B (en) * | 2023-08-11 | 2023-11-14 | 北京科技大学 | Privacy data transmission method and system based on matrix completion |
CN117056979B (en) * | 2023-10-11 | 2024-03-29 | 杭州金智塔科技有限公司 | Service processing model updating method and device based on user privacy data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021204272A1 (en) * | 2020-04-10 | 2021-10-14 | 支付宝(杭州)信息技术有限公司 | Privacy protection-based target service model determination |
CN113642715A (en) * | 2021-08-31 | 2021-11-12 | 西安理工大学 | Differential privacy protection deep learning algorithm for self-adaptive distribution of dynamic privacy budget |
CN114548373A (en) * | 2022-02-17 | 2022-05-27 | 河北师范大学 | Differential privacy deep learning method based on feature region segmentation |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11755743B2 (en) * | 2019-09-03 | 2023-09-12 | Microsoft Technology Licensing, Llc | Protecting machine learning models from privacy attacks |
US11568061B2 (en) * | 2020-04-29 | 2023-01-31 | Robert Bosch Gmbh | Private model utility by minimizing expected loss under noise |
-
2022
- 2022-06-21 CN CN202210700710.XA patent/CN114780999B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021204272A1 (en) * | 2020-04-10 | 2021-10-14 | 支付宝(杭州)信息技术有限公司 | Privacy protection-based target service model determination |
CN113642715A (en) * | 2021-08-31 | 2021-11-12 | 西安理工大学 | Differential privacy protection deep learning algorithm for self-adaptive distribution of dynamic privacy budget |
CN114548373A (en) * | 2022-02-17 | 2022-05-27 | 河北师范大学 | Differential privacy deep learning method based on feature region segmentation |
Also Published As
Publication number | Publication date |
---|---|
CN114780999A (en) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114780999B (en) | Deep learning data privacy protection method, system, equipment and medium | |
CN110751291B (en) | Method and device for realizing multi-party combined training neural network of security defense | |
US20210004718A1 (en) | Method and device for training a model based on federated learning | |
US11196541B2 (en) | Secure machine learning analytics using homomorphic encryption | |
CN110058922B (en) | Method and device for extracting metadata of machine learning task | |
US20180276889A1 (en) | System and method for design of additively manufactured products | |
CN110520871A (en) | Training machine learning model | |
CN111178542A (en) | System and method for machine learning based modeling | |
CN111476264A (en) | Testing of robustness against access restricted systems | |
CN112381209B (en) | Model compression method, system, terminal and storage medium | |
KR101089898B1 (en) | Modeling directed scale-free object relationships | |
WO2022128557A1 (en) | Neural network confidentiality | |
US20220197981A1 (en) | Protection of neural networks by obfuscation of neural network architecture | |
CN110033034B (en) | Picture processing method and device for non-uniform texture and computer equipment | |
CN112927143A (en) | Image splicing method and device, electronic equipment and storage medium | |
US20230144680A1 (en) | Electronic apparatus and method for controlling thereof | |
Colby et al. | An evolutionary game theoretic analysis of difference evaluation functions | |
CN113744159A (en) | Remote sensing image defogging method and device and electronic equipment | |
CN111950015B (en) | Data open output method and device and computing equipment | |
CN116049691A (en) | Model conversion method, device, electronic equipment and storage medium | |
Shrivastava et al. | Securator: A fast and secure neural processing unit | |
Rompicharla | Continuous compliance model for hybrid multi-cloud through self-service orchestrator | |
US11599768B2 (en) | Cooperative neural network for recommending next user action | |
JP6021178B2 (en) | Noise adding device, noise adding method, and program | |
CN113961962A (en) | Model training method and system based on privacy protection and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |