CN114780999A - Deep learning data privacy protection method, system, equipment and medium - Google Patents

Deep learning data privacy protection method, system, equipment and medium Download PDF

Info

Publication number
CN114780999A
CN114780999A CN202210700710.XA CN202210700710A CN114780999A CN 114780999 A CN114780999 A CN 114780999A CN 202210700710 A CN202210700710 A CN 202210700710A CN 114780999 A CN114780999 A CN 114780999A
Authority
CN
China
Prior art keywords
noise
privacy
data
data set
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210700710.XA
Other languages
Chinese (zh)
Other versions
CN114780999B (en
Inventor
郑飞州
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhongping Intelligent Technology Co ltd
Original Assignee
Guangzhou Zhongping Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Zhongping Intelligent Technology Co ltd filed Critical Guangzhou Zhongping Intelligent Technology Co ltd
Priority to CN202210700710.XA priority Critical patent/CN114780999B/en
Publication of CN114780999A publication Critical patent/CN114780999A/en
Application granted granted Critical
Publication of CN114780999B publication Critical patent/CN114780999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • G06T3/04

Abstract

The present disclosure relates to a method, system, device and medium for deep learning data privacy protection, the method comprising the steps of: loading an original training data set and a deep learning model; giving privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix; configuring global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix; all original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set; training the deep learning model using a noisy data set to form a deep learning model with privacy preserving features. The method constructs the objective function and the parameter training method of the noise generator, achieves the purpose of minimizing the performance difference of the model while maximizing the noise intensity added by the training data, and automatically balances the feasibility and the privacy protection intensity of the model.

Description

Deep learning data privacy protection method, system, equipment and medium
Technical Field
The present disclosure relates to the field of deep learning models, and in particular, to a method, system, device, and medium for protecting deep learning data privacy.
Background
In recent years, due to the theoretical innovation of deep learning and the deep research in various fields, a plurality of commercial applications based on the deep learning technology relate to various industries, and immeasurable values are created. In order to accelerate the research and application of deep learning, numerous enterprises or research institutions disclose deep learning models that have been built. The construction of deep learning models relies on large amounts of training data, which may involve personal privacy and business secrets. Therefore, whether the disclosed deep learning model can leak the training data or not is attractive. In recent years, there has been relevant research work that demonstrates that the disclosed deep learning model runs the risk of revealing training data in some scenarios. For example, model inversion attack can be realized by introducing a regularization means for generating a network and a loss function to construct a training data representative of a specified label, which can reveal the distribution information of the training data to a certain extent, even directly acquire a personal privacy photo related to the training data on a face recognition model; as another example of a model update attack, the disclosed deep learning model may need to be continually updated as the distribution of the training data set shifts or expands, and such model parameter updates are reversed out of the new data set information used to update the model. Due to the existence of the data leakage problem of the deep learning model, not only personal privacy and business confidentiality are exposed, but also legal problems can be caused. There is therefore a need for an effective defense mechanism to alleviate the existing data leakage problem described above.
In order to solve the problem of data leakage in a deep learning model, some mainstream defensive measures exist at present, such as difference privacy machine learning, training data noise adding and the like. With respect to differential privacy machine learning, the mainstream approach is to limit the variability of model output by adding noise to the parameters, prevent differential attacks, these protective measures present a difficult trade-off in model inversion attacks and model update attacks, i.e., the trade-off between model availability and data privacy protection strength, which, in order to protect data leakage as much as possible, may cause an unacceptable amount of degradation in model performance (e.g., accuracy), in addition, certain understandability is lacked, the understandability problem means that after differential privacy is introduced, it is difficult to show which training data information is protected in a visual mode, for example, the face or the background in a picture is protected by more privacy, because privacy concerns are different for different parts of image data or other data, providing such understandability would help data providers to control the immediate risk of privacy disclosure; for training data noise addition, for example, in "privacy protection method based on medical image [ P ]. Guorihong, Saifengun, Songho, Tengten, and WangCarichi" Chinese patent CN113889232A, 2022-01-04 ", it extracts information from the personal privacy area of the original training image, then performs pixel value transformation on the extracted area by means of the complex sequence generated by Logistic chaotic system iteration, finally embeds the transformation results of all text areas into the original image, scrambles the image after the whole encryption, and protects the personal privacy area by the encryption form. The method has the problem that the training data after noise adding conversion can generate uncontrollable influence on the construction of the deep learning model, such as the usability of the model is reduced due to shielding excessive effective information.
Therefore, the above problems and the drawbacks of the corresponding solutions are addressed. The defense scheme provided by the disclosure performs personalized noise addition on training data, so that the intelligibility requirement can be met, and if converted data can be directly visualized, the privacy protection condition can be conveniently controlled and examined; secondly, differentiation of privacy protection strength can be realized by introducing a privacy importance matrix to specify key protection areas in training data. The noise generator in the defense scheme provided by the disclosure is constructed by a deep learning technology, and the loss function of the noise generator introduces the consideration of model performance, so that the noise intensity added by training data can be maximized, and the difference of model performance can be minimized.
Disclosure of Invention
The present disclosure provides a deep learning data privacy protection method, system, device, and medium, which can solve the problems mentioned in the background art. In order to solve the technical problem, the present disclosure provides the following technical solutions:
as an aspect of the embodiments of the present disclosure, a method for protecting deep learning data privacy is provided, which includes the following steps:
loading an original training data set and a deep learning model;
giving privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix;
configuring global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
all original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set;
training the deep learning model using a noisy data set to form a deep learning model with privacy preserving features.
Optionally, the specific step of training the noise generator according to the loss function constructed by the privacy importance matrix is as follows:
selecting a group of original training data characteristics in the original training data set;
calculating a loss value of the noise generator according to the loss function;
calculating a derivative of the loss value to a parameter of a noise generator;
updating parameters of a noise generator according to the derivative;
and repeating the steps until the specified iteration times are reached.
Optionally, the loss function is as follows:
Figure 609489DEST_PATH_IMAGE001
wherein F is trained with the original training data setA deep learning model is formed, x is the characteristic of the selected training data set, F (x) refers to the result of inputting x to obtain model output and passing through a softmax function, and G is a parameter
Figure 65878DEST_PATH_IMAGE002
The noise generator is composed of a noise generator and a noise filter,
Figure 86924DEST_PATH_IMAGE003
representing the data noise generated from the input x,
Figure 476317DEST_PATH_IMAGE004
in order to be the global noise level,
Figure 587099DEST_PATH_IMAGE005
is a privacy importance matrix.
Optionally, before training the deep learning model by using the noisy data set, the method further includes the following steps: the deep learning model is trained using the raw training data.
Optionally, after all the original training data in the original training data set is subjected to noise addition by the noise generator to generate a noise-added data set, the method further includes the following steps:
and visualizing the noise data in the noise data set, and adjusting the privacy importance matrix and/or the parameters of the noise generator according to the privacy protection condition of the noise data.
Optionally, the specific steps of constructing the privacy importance matrix are as follows:
constructing a privacy importance matrix by marking weights in the features or attributes of the original training data with global key privacy protection;
or, giving privacy protection weights to part of key areas in the original training data manually to construct a corresponding privacy importance matrix.
As another aspect of the disclosed embodiments, there is provided a deep learning data privacy protection system, including:
the resource loading module loads an original training data set and a deep learning model;
the privacy importance configuration module is used for endowing privacy protection weight to the privacy information in the original training data set and constructing a privacy importance matrix;
a noise generator construction module that configures global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
the data conversion module is used for carrying out noise addition on all original training data in the original training data set through the noise generator to generate a noise addition data set;
and the model building module is used for training the deep learning model by using the noise-added data set so as to form the deep learning model with the privacy protection characteristic.
Optionally, the system further includes a data visualization module, where the data visualization module is configured to visualize the noisy data in the noisy data set, and adjust the privacy importance matrix and/or the parameter of the noise generator according to the privacy protection condition of the noisy data.
As another aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the above-mentioned deep learning data privacy protection method.
As another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps of the deep learning data privacy protection method described above.
According to the method, the privacy importance matrix and the noise intensity can be adjusted, personalized privacy protection intensity is given to different information of the training data set, and a personalized noise adding function on the original data set is achieved; meanwhile, due to the fact that personalized noise is added to the original data, good visualization can be achieved, and a user can check the privacy protection condition and make adjustment according to the visualization result; the present disclosure constructs an objective function and a parameter training method of a noise generator, in which the model performance difference is minimized while maximizing the noise intensity added by training data, and the model feasibility and the privacy protection intensity are automatically equalized by using the strong expression ability of deep learning. The specific technical effects are as follows:
1) balancing model usability and privacy protection strengths
At present, some defense mechanisms mainly adjust the noise intensity manually, and the model performance is difficult to be well considered to a certain extent. The construction method of the noise generator can realize balance between model usability and privacy protection intensity.
2) Satisfying differentiated privacy protection policies for data
The privacy importance matrix is provided, and a user can endow higher privacy weight to key privacy protection characteristics or attributes of data and embody the privacy importance matrix, so that the noise generated by the noise generator is influenced to endow differentiated privacy protection strength at the specified characteristics or attributes.
3) Monitoring and managing actual privacy protection condition of data
The noise generator is mainly responsible for generating corresponding noise matrixes according to different input data, and the input data is added to form noise data, so that the noise data can be directly compared with the input data, a user can conveniently check the changed condition of the input data, and whether the requirement is met or not is judged to make new adjustment.
Drawings
Fig. 1 is a flowchart of a deep learning data privacy protection method according to embodiment 1 of the present disclosure;
fig. 2 is a flowchart of a specific implementation of step S30 according to embodiment 1 of the present disclosure;
fig. 3 is a schematic block diagram of a deep learning data privacy protection system.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
It is understood that the above-mentioned embodiments of the method of the present disclosure can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space, and the detailed description of the present disclosure is omitted.
In addition, the present disclosure also provides a deep learning data privacy protection system, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the deep learning data privacy protection methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.
The execution subject of the deep-learning data privacy protection method may be a computer or other apparatus capable of implementing deep-learning data privacy protection, for example, the method may be executed by a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, or the like. In some possible implementations, the deep-learning data privacy protection method may be implemented by a processor invoking computer readable instructions stored in a memory.
Example 1
As an aspect of the embodiments of the present disclosure, there is provided a deep learning data privacy protection method, as shown in fig. 1, including the following steps:
s10, loading an original training data set and a deep learning model;
s20, endowing privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix;
s30, configuring global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
s40, all original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set;
s60, training the deep learning model by using the noise-added data set to form the deep learning model with the privacy protection characteristic.
Based on the configuration, the embodiment of the disclosure endows personalized privacy protection strength to different information of a training data set by adjusting the privacy importance matrix and the noise strength, realizes a personalized noise adding function on an original data set, and can realize balance between model availability and privacy protection strength; a privacy importance matrix is provided, and a user can endow higher privacy weight to key privacy protection characteristics or attributes of data and embody the privacy importance matrix, so that noise generated by a noise generator is influenced to endow differentiated privacy protection strength at the specified characteristics or attributes.
The steps of the disclosed embodiments are described in detail below.
S10, loading an original training data set and a deep learning model;
the loading is the loading of system resources, namely, the loading is responsible for loading required resources for constructing the deep learning model into the system, and the required resources comprise an original training data set, the structure of the deep learning model and the structure of the noise generator. And the preprocessing of the original training data set and the initialization of the parameters of the model and the generator are completed.
S20, giving privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix;
wherein, all elements of the formed privacy importance matrix are greater than or equal to 0, and the sum is 1. Two configurations can be included: the user can give higher privacy weight to key privacy protection characteristics or attributes of the data and can reflect the key privacy protection characteristics or attributes in the privacy importance matrix, so that the noise generated by the noise generator is influenced to give differentiated privacy protection strength at the specified characteristics or attributes.
In some embodiments, an auto-configuration may be employed: by specifying the features or attributes of the training data for global key privacy protection, the function is responsible for automatically labeling the relevant features or attributes with higher weights and constructing the corresponding privacy importance matrix.
In some embodiments, a manual configuration may be used: and manually giving higher privacy protection weight to part of key areas in the data by a user, and constructing a corresponding privacy importance matrix.
S30, configuring global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
in this embodiment, constructing a resource configuration step for the noise generator to perform on the noise generator, for example, a configuration option responsible for providing a hyper-parameter required by training the generator may include: global noise strength
Figure 530784DEST_PATH_IMAGE006
Iteration times T, parameter update rate lambda and the like.
In this embodiment, the method further includes a training step of the noise generator:
the goal of the generator training is to maximize the noise intensity attached to the training data as much as possible while minimizing the model performance variance. Where the model performance difference refers to the difference between the performance (e.g., accuracy) of the model trained with the original training data and the performance of the model trained with the noisy training data. The constructed generator objective function (also called loss function) can be shown in formula (1).
Figure 355520DEST_PATH_IMAGE007
In formula (1), F is a deep learning model trained by an original training data set, x is the characteristics of a selected training data set, F (x) refers to the result of inputting x to obtain model output and passing through a softmax function, and G is a parameter
Figure 333841DEST_PATH_IMAGE008
The noise generator is formed by a noise generator,
Figure 320251DEST_PATH_IMAGE009
representing the data noise generated from the input x,
Figure 252698DEST_PATH_IMAGE010
in order to be a global noise level,
Figure 349967DEST_PATH_IMAGE011
is a privacy importance matrix. First item
Figure 713952DEST_PATH_IMAGE012
By reducing noisy data
Figure 605684DEST_PATH_IMAGE013
The difference in model output from the original data x to minimize the difference in model performance, the second term
Figure 258383DEST_PATH_IMAGE014
With the aim of maximising the noise strength
Figure 159343DEST_PATH_IMAGE015
In some embodiments, as shown in fig. 2, the specific steps of training the noise generator according to the loss function constructed by the privacy importance matrix in step S30 are as follows:
s301: randomly selecting a group of original training data characteristics x;
s302: calculating loss values of a noise generator
Figure 581097DEST_PATH_IMAGE016
S303: calculating loss value
Figure 401986DEST_PATH_IMAGE016
Derivative of parameters of noise generator
Figure 541980DEST_PATH_IMAGE017
S304: update parameters of the noise generator:
Figure 981051DEST_PATH_IMAGE018
s305: and repeating the steps S301, S302, S303 and S304 until the specified iteration number T is met, thereby completing the construction of the noise generator.
S40, all original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set;
wherein, all the original training data x are processed by a noise generator to generate a corresponding noise matrix
Figure 257312DEST_PATH_IMAGE019
And form corresponding noise addition data x +
Figure 756426DEST_PATH_IMAGE019
S60, training the deep learning model by using the noise-added data set to form the deep learning model with the privacy protection characteristic.
Wherein the deep learning model is trained using the noisy data in the noisy data set in S40 until the training is completed.
In some embodiments, before training the deep learning model using the noisy dataset, the method further comprises the steps of: the deep learning model is trained using the raw training data.
In some embodiments, after all the original training data in the original training data set are subjected to noise addition by the noise generator to generate the noise added data set, the method further comprises the following steps:
s50, visualizing the noise data in the noise data set, and adjusting the privacy importance matrix and/or the parameters of the noise generator according to the privacy protection condition of the noise data. Because the original data is subjected to personalized noise adding, good visualization can be met, and a user can check the privacy protection condition and make adjustment according to a visualization result. The noise generator is mainly responsible for generating corresponding noise matrixes according to different input data, and the noise matrixes are added with the input data to form noise data, so that the noise data can be directly compared with the input data, a user can conveniently check the condition that the input data is changed, and whether the requirement is met or not is judged to make a new adjustment.
Example 2
As another aspect of the embodiments of the present disclosure, there is provided a deep-learning data privacy protection system 100, as shown in fig. 3, including:
the resource loading module 1 loads an original training data set and a deep learning model;
the privacy importance configuration module 2 is used for endowing privacy protection weights to the privacy information in the original training data set and constructing a privacy importance matrix;
a noise generator construction module 3, which configures global noise intensity and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
the data conversion module 4 is used for carrying out noise addition on all original training data in the original training data set through a noise generator to generate a noise addition data set;
and the model building module 6 is used for training the deep learning model by using the noise-added data set to form the deep learning model with the privacy protection characteristic.
In some embodiments, the system 100 further comprises a data visualization module 5 configured to visualize the noisy data in the noisy data set and adjust the privacy importance matrix and/or the parameters of the noise generator according to the privacy protection condition of the noisy data. Because the original data is subjected to personalized noise addition, good visualization can be met, and a user can check the privacy protection condition and make adjustment according to a visualization result. The noise generator is mainly responsible for generating corresponding noise matrixes according to different input data, and the noise matrixes are added with the input data to form noise data, so that the noise data can be directly compared with the input data, a user can conveniently check the condition that the input data is changed, and whether the requirement is met or not is judged to make a new adjustment.
Each module of the embodiments of the present disclosure is described in detail below.
The resource loading module 1 loads an original training data set and a deep learning model;
the loading is the loading of system resources, namely, the loading is responsible for loading required resources for constructing the deep learning model into the system, and the required resources comprise an original training data set, the structure of the deep learning model and the structure of the noise generator. And the preprocessing of the original training data set is completed, and the parameters of the model and the generator are initialized.
The privacy importance configuration module 2 is used for endowing privacy protection weights to the privacy information in the original training data set and constructing a privacy importance matrix;
wherein, all elements of the formed privacy importance matrix are greater than or equal to 0, and the sum is 1. Two configurations can be included: the user can endow higher privacy weight to key privacy protection characteristics or attributes of the data and embody the privacy importance matrix, so that the noise generated by the noise generator is influenced to endow differentiated privacy protection strength at the specified characteristics or attributes.
An automatic configuration mode can be adopted: by specifying the features or attributes of the training data for global key privacy protection, the function is responsible for automatically labeling the relevant features or attributes with higher weights and constructing the corresponding privacy importance matrix.
A manual configuration may be employed in some embodiments: and the user manually gives higher privacy protection weight to part of key areas in the data and constructs a corresponding privacy importance matrix.
A noise generator construction module 3, which configures global noise intensity and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
in this embodiment, constructing a resource configuration step to be performed on the noise generator by the noise generator, for example, a configuration option responsible for providing a hyper-parameter required by training the generator is provided, where the configuration may include: global noise strength
Figure 383717DEST_PATH_IMAGE020
Iteration times T, parameter update rate lambda and the like.
In this embodiment, the goal of the generator training is to maximize the noise intensity added to the training data as much as possible while minimizing the model performance variance. Where the model performance difference refers to the difference between the performance (e.g., accuracy) of the model trained with the original training data and the performance of the model trained with the noisy training data. The objective function (also called loss function) of the constructed generator can be shown in formula (1).
Figure 360900DEST_PATH_IMAGE021
In equation (1), F is trained using the original training data setA deep learning model is formed, x is the characteristic of the selected training data set, F (x) refers to the result of inputting x to obtain model output and passing through a softmax function, and G is a parameter
Figure 22826DEST_PATH_IMAGE022
The noise generator is formed by a noise generator,
Figure 692841DEST_PATH_IMAGE023
representing the data noise generated from the input x,
Figure 43314DEST_PATH_IMAGE024
in order to be the global noise level,
Figure 89767DEST_PATH_IMAGE025
is a privacy importance matrix. First item
Figure 340620DEST_PATH_IMAGE026
By reducing noisy data
Figure 181537DEST_PATH_IMAGE027
The difference in model output from the original data x to minimize the difference in model performance, the second term
Figure 517840DEST_PATH_IMAGE028
With the aim of maximising the noise strength
Figure 102405DEST_PATH_IMAGE029
In some embodiments, the specific steps of training the noise generator according to the loss function constructed by the privacy importance matrix in the noise generator construction module 3 are:
s301: randomly selecting a group of original training data features x;
s302: calculating loss values of a noise generator
Figure 207765DEST_PATH_IMAGE030
S303: calculating loss value
Figure 219583DEST_PATH_IMAGE030
Derivative of parameters of noise generator
Figure 308762DEST_PATH_IMAGE031
S304: update parameters of the noise generator:
Figure 929974DEST_PATH_IMAGE032
s305: repeating steps S301, S302, S303, S304 until a specified number of iterations is met
Figure 420998DEST_PATH_IMAGE033
Thereby completing the construction of the noise generator.
The data conversion module 4 adds noise to all original training data in the original training data set through a noise generator to generate a noise-added data set;
wherein, all the original training data x are processed by a noise generator to generate a corresponding noise matrix
Figure 603717DEST_PATH_IMAGE034
And form corresponding noise addition data x +
Figure 649034DEST_PATH_IMAGE034
And the model building module 6 is used for training the deep learning model by using the noise-added data set to form the deep learning model with the privacy protection characteristic.
And training the deep learning model by using the noise data in the noise data set in the data conversion module 4 until the training is completed.
In the following, as an illustration, the disclosed embodiment applies the above method to the deep learning animal classification model construction, and in this exemplary embodiment, the purpose is to construct a deep learning animal classification model with privacy protection effect, which is expected to be disclosed to the public but is not expected to be leaked by the training data set, because there is some private information in the data set of the classification model used for training, such as the face of the animal breeder or audience in the animal image. Therefore, the method provided by the invention is applied to construct an animal classification model. The specific process of application is as follows:
1) a developer loads an animal image data set for training a classification model through the resource loading module 1, and deeply learns the structure of the animal classification model and the structure of the noise generator. Preprocessing the data set and initializing the parameters of the model and the generator;
2) training the classification model by using an original animal image data set to obtain a deep learning model trained by using the original data set;
3) a developer endows privacy information such as human faces in images with higher privacy protection weight through a privacy importance configuration module 2, all human face information in a data set is automatically positioned by using a human face target recognition model, and a corresponding privacy importance matrix is constructed;
4) operating a noise generator building module 3 to complete the construction of the noise generator;
5) generating a corresponding noise data set by an original animal image data set through a data conversion module 4, wherein private information such as human faces of the noise data set has higher-intensity noise;
6) whether privacy information such as human faces and the like is effectively shielded in the noise-added image is checked through the data visualization module 5, if the effect is not good, the noise intensity of the noise generator is properly improved, and a new noise generator is constructed;
7) the model building function of the noisy data set in the model building block 6 is run. And training the deep learning animal classification model by using the noise-added data set, and finally completing construction of the animal classification model with the privacy protection characteristic.
Example 3
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the deep learning data privacy protection method of embodiment 1 when executing the computer program.
Embodiment 3 of the present disclosure is merely an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present disclosure.
The electronic device may be embodied in the form of a general purpose computing device, which may be, for example, a server device. Components of the electronic device may include, but are not limited to: at least one processor, at least one memory, and a bus connecting different system components (including the memory and the processor).
The buses include a data bus, an address bus, and a control bus.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may further include read-only memory (ROM).
The memory may also include program means having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment.
The processor executes various functional applications and data processing by executing computer programs stored in the memory.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. Also, the electronic device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via a network adapter. The network adapter communicates with other modules of the electronic device over the bus. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 4
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the deep-learning data privacy protection method of embodiment 1.
More specific examples that may be employed by the readable storage medium include, but are not limited to: a portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation, the present disclosure may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps of implementing the deep learning data privacy protection method described in embodiment 1 when the program product is run on the terminal device.
Where program code for carrying out the disclosure is written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
Although embodiments of the present disclosure have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A deep learning data privacy protection method is characterized by comprising the following steps:
loading an original training data set and a deep learning model;
giving privacy protection weight to the privacy information in the original training data set, and constructing a privacy importance matrix;
configuring global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
all original training data in the original training data set are subjected to noise adding through a noise generator to generate a noise adding data set;
training the deep learning model using a noisy data set to form a deep learning model with privacy preserving features.
2. The method for protecting privacy of deep-learning data according to claim 1, wherein the specific steps of training the noise generator according to the loss function constructed by the privacy importance matrix are as follows:
selecting a group of original training data characteristics in the original training data set;
calculating a loss value of the noise generator according to the loss function;
calculating a derivative of the loss value to a parameter of a noise generator;
updating parameters of a noise generator according to the derivative;
and repeating the steps until the specified iteration times are reached.
3. The deep-learning data privacy preserving method of claim 1 or 2, wherein the loss function is as follows:
Figure 229099DEST_PATH_IMAGE001
wherein F is a deep learning model trained with an original training data set, and x is a selectedThe characteristics of the training data set, F (x) refers to the result of inputting x to obtain model output and passing through the softmax function, and G is a parameter
Figure 437226DEST_PATH_IMAGE002
The noise generator is composed of a noise generator and a noise filter,
Figure 406319DEST_PATH_IMAGE003
representing the data noise generated from the input x,
Figure 623674DEST_PATH_IMAGE004
in order to be a global noise level,
Figure 361823DEST_PATH_IMAGE005
is a privacy importance matrix.
4. The method for protecting privacy of deep-learning data according to claim 1, wherein before training the deep-learning model using a noisy data set, further comprising the steps of: the deep learning model is trained using the raw training data.
5. The deep learning data privacy protection method of any one of claims 1-2 and 4, wherein after all original training data in an original training data set are subjected to noise addition by a noise generator to generate a noise added data set, the method further comprises the following steps:
and visualizing the noise data in the noise data set, and adjusting the privacy importance matrix and/or the parameters of the noise generator according to the privacy protection condition of the noise data.
6. The deep-learning data privacy protection method as claimed in any one of claims 1-2 and 4, wherein the specific steps of constructing the privacy importance matrix are as follows:
constructing a privacy importance matrix by marking weights in the features or attributes of the original training data with global key privacy protection;
or manually endowing privacy protection weights to partial key areas in the original training data to construct a corresponding privacy importance matrix.
7. A deep learning data privacy protection system, comprising:
the resource loading module loads an original training data set and a deep learning model;
the privacy importance configuration module is used for endowing privacy protection weight to the privacy information in the original training data set and constructing a privacy importance matrix;
a noise generator construction module that configures global noise strength and generator parameters in the training data to construct a noise generator; training the noise generator according to a loss function constructed by the privacy importance matrix;
the data conversion module is used for carrying out noise addition on all original training data in the original training data set through the noise generator to generate a noise addition data set;
and the model building module is used for training the deep learning model by using the noise-added data set so as to form the deep learning model with the privacy protection characteristic.
8. The deep-learning data privacy protection system of claim 7, further comprising a data visualization module configured to visualize the noisy data in the noisy data set and adjust the privacy importance matrix and/or parameters of the noise generator based on the privacy preserving condition of the noisy data.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for deep-learning data privacy protection of any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for deep-learning data privacy protection of any one of claims 1 to 6.
CN202210700710.XA 2022-06-21 2022-06-21 Deep learning data privacy protection method, system, equipment and medium Active CN114780999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210700710.XA CN114780999B (en) 2022-06-21 2022-06-21 Deep learning data privacy protection method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210700710.XA CN114780999B (en) 2022-06-21 2022-06-21 Deep learning data privacy protection method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN114780999A true CN114780999A (en) 2022-07-22
CN114780999B CN114780999B (en) 2022-09-27

Family

ID=82420315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210700710.XA Active CN114780999B (en) 2022-06-21 2022-06-21 Deep learning data privacy protection method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN114780999B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238827A (en) * 2022-09-16 2022-10-25 支付宝(杭州)信息技术有限公司 Privacy-protecting sample detection system training method and device
CN116761164A (en) * 2023-08-11 2023-09-15 北京科技大学 Privacy data transmission method and system based on matrix completion
CN117056979A (en) * 2023-10-11 2023-11-14 杭州金智塔科技有限公司 Service processing model updating method and device based on user privacy data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210064760A1 (en) * 2019-09-03 2021-03-04 Microsoft Technology Licensing, Llc Protecting machine learning models from privacy attacks
WO2021204272A1 (en) * 2020-04-10 2021-10-14 支付宝(杭州)信息技术有限公司 Privacy protection-based target service model determination
US20210342453A1 (en) * 2020-04-29 2021-11-04 Robert Bosch Gmbh Private model utility by minimizing expected loss under noise
CN113642715A (en) * 2021-08-31 2021-11-12 西安理工大学 Differential privacy protection deep learning algorithm for self-adaptive distribution of dynamic privacy budget
CN114548373A (en) * 2022-02-17 2022-05-27 河北师范大学 Differential privacy deep learning method based on feature region segmentation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210064760A1 (en) * 2019-09-03 2021-03-04 Microsoft Technology Licensing, Llc Protecting machine learning models from privacy attacks
WO2021204272A1 (en) * 2020-04-10 2021-10-14 支付宝(杭州)信息技术有限公司 Privacy protection-based target service model determination
US20210342453A1 (en) * 2020-04-29 2021-11-04 Robert Bosch Gmbh Private model utility by minimizing expected loss under noise
CN113642715A (en) * 2021-08-31 2021-11-12 西安理工大学 Differential privacy protection deep learning algorithm for self-adaptive distribution of dynamic privacy budget
CN114548373A (en) * 2022-02-17 2022-05-27 河北师范大学 Differential privacy deep learning method based on feature region segmentation

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238827A (en) * 2022-09-16 2022-10-25 支付宝(杭州)信息技术有限公司 Privacy-protecting sample detection system training method and device
CN115238827B (en) * 2022-09-16 2022-11-25 支付宝(杭州)信息技术有限公司 Privacy-protecting sample detection system training method and device
CN116761164A (en) * 2023-08-11 2023-09-15 北京科技大学 Privacy data transmission method and system based on matrix completion
CN116761164B (en) * 2023-08-11 2023-11-14 北京科技大学 Privacy data transmission method and system based on matrix completion
CN117056979A (en) * 2023-10-11 2023-11-14 杭州金智塔科技有限公司 Service processing model updating method and device based on user privacy data
CN117056979B (en) * 2023-10-11 2024-03-29 杭州金智塔科技有限公司 Service processing model updating method and device based on user privacy data

Also Published As

Publication number Publication date
CN114780999B (en) 2022-09-27

Similar Documents

Publication Publication Date Title
CN114780999B (en) Deep learning data privacy protection method, system, equipment and medium
CN110058922B (en) Method and device for extracting metadata of machine learning task
CN110751291B (en) Method and device for realizing multi-party combined training neural network of security defense
US20210273978A1 (en) Cyber digital twin simulator for security controls requirements
US8122429B2 (en) Method, system and program product for developing a data model in a data mining system
CN110520871A (en) Training machine learning model
CN112381209B (en) Model compression method, system, terminal and storage medium
KR101089898B1 (en) Modeling directed scale-free object relationships
Zhou et al. Algorithms by design with illustrations to solid and structural mechanics/dynamics
CN112035834A (en) Countermeasure training method and device, and application method and device of neural network model
US10394987B2 (en) Adaptive bug-search depth for simple and deep counterexamples
US20210011757A1 (en) System for operationalizing high-level machine learning training enhancements from low-level primitives
WO2022128557A1 (en) Neural network confidentiality
CN107193667A (en) The update method and device of webpage authority
CN112927143A (en) Image splicing method and device, electronic equipment and storage medium
Colby et al. An evolutionary game theoretic analysis of difference evaluation functions
US20200351296A1 (en) System and method for evaluating and enhancing the security level of a network system
CN116049691A (en) Model conversion method, device, electronic equipment and storage medium
CN111950015B (en) Data open output method and device and computing equipment
JP2019185134A (en) Information processing device, learning method, and program
Shrivastava et al. Securator: A fast and secure neural processing unit
Voronin et al. ICP algorithm based on stochastic approach
Rompicharla Continuous compliance model for hybrid multi-cloud through self-service orchestrator
US20170177767A1 (en) Configuration of large scale advection diffusion models with predetermined rules
US20110115794A1 (en) Rule-based graph layout design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant