CN113420289B

CN113420289B - Hidden poisoning attack defense method and device for deep learning model

Info

Publication number: CN113420289B
Application number: CN202110675083.4A
Authority: CN
Inventors: 陈晋音; 邹健飞; 熊晖
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2022-08-26
Anticipated expiration: 2041-06-17
Also published as: CN113420289A

Abstract

The invention discloses a hidden poisoning attack defense method facing a deep learning model, which comprises the steps of (1) obtaining an image data set and the deep learning model; (2) screening by using a deep learning model to obtain a clean image data set; (3) generating a poisoning sample, and concealing the process of generating the poisoning sample in the image preprocessing process; (4) inputting the generated poisoning sample into a deep learning model to poison the model, and making the model make an error judgment on the triggering sample in a testing stage; (5) and inputting the generated poisoning sample labeled with the correct class mark into the deep learning model for strengthening training so as to repair the deep learning model. The invention also discloses a hidden poisoning attack defense device facing the deep learning model, which is used for implementing the method. According to the method, the hidden poisoning attack on the model is realized by generating the poisoning sample, and then the generated poisoning sample is used for repairing the model, so that the safety and the robustness of the model are improved.

Description

Hidden poisoning attack defense method and device for deep learning model

Technical Field

The invention relates to the technical field of poisoning detection, in particular to a hidden poisoning attack defense method and device for a deep learning model.

Background

Deep learning gradually becomes a research hotspot and a mainstream development direction in the field of artificial intelligence. Deep learning is a machine learning technique that learns a data representation having multiple levels of abstraction, using a computational model composed of multiple processing layers. The deep learning represents the main development direction of machine learning and artificial intelligence research, and revolutionary progress is brought to the fields of machine learning, computer vision and the like. The artificial intelligence technology makes a breakthrough in the fields of computer vision, natural language processing and the like, so that the artificial intelligence is led to a new round of explosive development. Deep learning is the key to these breakthroughs. The image classification technology based on the deep convolutional network already exceeds the precision of human eyes, the speech recognition technology based on the deep neural network already reaches the precision of 95%, and the machine translation technology based on the deep neural network already approaches the average translation level of human beings. With the rapid improvement of precision, computer vision and natural language processing have entered the industrialization stage and have driven the rise of emerging industries.

The artificial intelligence model based on the neural network is widely applied to various applications such as face recognition, target detection, autonomous driving and the like, and the superiority of the artificial intelligence model is proved to be superior to that of the traditional calculation method. More and more people tend to believe that the application of artificial intelligence models to all aspects of life plays a crucial role. As complexity and functionality increase, training such models requires significant effort in collecting training data and optimizing performance. Thus, pre-trained models are becoming valuable items that suppliers (e.g., Google) and developers distribute, share, reuse, and even sell to profit. For example, thousands of pre-trained models are being released and shared on the Caffe model zo, ONNX zo, and BigML model markets, just like traditional software is shared on GitHub. These models can be trained by well-credited suppliers, institutions and even individuals.

However, pre-trained intelligent system models may contain backdoors injected by training or by transforming internal neuron weights. These trojan models work normally when regular inputs are provided, and when special patterns printed with triggers are entered, the specific output labels are misclassified. However, the concealment of the poisoning sample in the current poisoning attack method is not good, and the effect is poor in practical application, so that the patent provides a very concealed poisoning attack method, the generation of the poisoning sample is concealed in the image preprocessing process, and a defense method aiming at the concealed poisoning attack is provided, so that contribution is made to the improvement of the safety and the robustness of the model.

Disclosure of Invention

The invention aims to provide a hidden poisoning attack defense method and a hidden poisoning attack defense device for a deep learning model, which conceal the process of generating a poisoning sample in the image preprocessing process through an algorithm so that the poisoning process is more hidden, and simultaneously provides a defense method aiming at the hidden poisoning attack, so that the improvement of the safety and the robustness of the deep learning model is realized.

A hidden poisoning attack defense method facing a deep learning model,

the method comprises the following steps:

(1) acquiring an image data set and a deep learning model;

(2) identifying the image data set by using a deep learning model, screening to obtain images which can be identified correctly, and forming a clean image data set;

(3) generating a poisoning sample, and concealing the process of generating the poisoning sample in the image preprocessing process;

(4) inputting the generated poisoning sample into a deep learning model to poison the model, and making the model make an error judgment on the triggering sample in a testing stage;

(5) and (4) inputting the generated poisoning sample labeled with a correct class label into the deep learning model for forced training so as to repair the deep learning model and improve the accuracy of the recognition result of the deep learning model.

According to the scheme, the hidden poisoning attack on the model is realized by generating the poisoning sample, and then the generated poisoning sample is used for repairing the model, so that the safety and the robustness of the model are improved.

Preferably, the image dataset is a MNIST dataset, a CIFAR10 dataset, or a Driving dataset; the deep learning model is a LeNet deep learning model, a VGG19 deep learning model or a ResNet50 deep learning model.

Preferably, the step (2) is specifically:

and (2) inputting the image data set in the step (1) into a deep learning model, outputting a prediction class mark of an input image by the model, and if the prediction class mark is consistent with a real class mark of the image, correctly identifying the image by the deep learning model and putting the image into a clean image data set.

Preferably, in step (3), the image preprocessing specifically uses an interpolation method, and the poisoning sample is generated by using a resize process in the image preprocessing.

Further preferably, based on the interpolation linearization, the following problems are solved by adopting reverse interpolation:

wherein the vector W _row And W _col Linearly independent, core image I _c The method comprises the steps that set by an attacker, disturbance rho is a solution of the formula, when the output size of interpolation is smaller than the input size, the formula becomes an underdetermined equation, and a solution space is non-empty; without changing the output, the disturbance is manipulated on the basis of the solution space of equations, which are divided into W _row ρ is 0 and ρ W _col The base of the solution space is the union of the two spatial bases, by ρ ═ B _row X _row +X _col B _col Calculating perturbation matrices, B _row And B _col Is the basis of the space of the weight matrix, matrix X _row And X _col Are coordinates.

Further preferably, the image I is modified by inverse interpolation _c To make it visually correspond to another image I _b Similarly, the loss function is calculated as:

the final generated image trained in the input model is I _c +(B _row X _row +X _col B _col )。

Preferably, step (4) is specifically represented by:

f(I)[c]＝b

wherein, f (I) and [ c ] indicate that the class label prediction result aiming at the class label [ c ] in the input deep learning model of the input image I in the testing stage is b.

A covert poisoning attack defense device facing a deep learning model, comprising:

the acquisition module is used for acquiring an image data set and a deep learning model;

the acquisition clean image data set module is used for screening by utilizing a deep learning model to acquire a clean image data set;

the generating module is used for concealing the process of generating the poisoning sample in the image preprocessing process;

the poisoning module is used for inputting the generated poisoning sample into the deep learning model to poison the model and making the model make an error judgment on the trigger sample in the testing stage;

and the repairing module is used for inputting the generated poisoning sample labeled with the correct class label into the deep learning model for strengthening training so as to repair the deep learning model.

The invention has the beneficial effects that:

according to the hidden poisoning attack defense method for the deep learning model, the process of generating the poisoning sample is hidden in the image preprocessing process according to the algorithm, so that the poisoning process is more hidden, and the method for the hidden poisoning attack has strong concealment. And performing reinforcement training on the original deep learning model by using the obtained toxic image to repair the deep learning model so as to improve the safety and robustness of the deep learning model.

Drawings

FIG. 1 is a flow chart of the method for defending against the hidden poisoning attack facing the deep learning model provided by the present invention.

FIG. 2 is a structural block diagram of the device for defending against the hidden poisoning attack facing the deep learning model provided by the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For an automatic driving model or a face recognition model, the safety requirement is high, but the model is easily affected by poisoning attacks. In order to improve the safety of the two models, the hidden poisoning attack method for the deep learning model generates the poisoning sample to realize hidden poisoning attack on the model, and then the generated poisoning sample is used for repairing the model, so that the safety and robustness of the model are improved.

A hidden poisoning attack defense method facing a deep learning model comprises the following steps:

(1) obtaining an image dataset and a deep learning model

The picture data set is an MNIST data set, a CIFAR10 data set or a Driving data set.

The deep learning model is a LeNet deep learning model, a VGG19 deep learning model or a ResNet50 deep learning model.

(2) Recognizing the image data set by using a deep learning model, screening to obtain images which can be correctly recognized, and forming a clean image data set

Inputting the data set in the step (1) into a deep learning model, outputting a prediction class mark of an input image by the model, and if the prediction class mark is consistent with a real class mark of the image, correctly identifying the image by the deep learning model and putting the image into a clean image data set.

(3) Generating a poisoning sample, and concealing the process of generating the poisoning sample in an image preprocessing process

Referring to fig. 1, generating a poisoned sample is accomplished using a resize process in image pre-processing. Interpolation is a key image preprocessing technology, and the image Iresize obtained in the step (2) is adjusted to a target size. It can be represented as I _H，W ＝f _i (I _h，w ) Where H and H are the heights before and after interpolation; w and W refer to the respective widths. As a linear operation, the calculation of the interpolation is equivalent to the corresponding weight matrix of the left product and the right product. It can be expressed as f _i (I)＝W _row IW _col Wherein W is _row And W _col Respectively, a matrix of weights for the row and column samples.

On the basis of interpolation linearization, either sampled or non-sampled pixels can be modified, and the interpolation result is manipulated according to the weight matrix. When this idea is applied to a poisoning attack, the perturbation is added to the non-sampled area of the input image, the sampled area being unchanged. Eventually covering the original content of the non-sampled area. When the number of correction pixels reaches a certain ratio, the actual output of the interpolation cannot be distinguished from the disturbance image with the eyes. This process is called inverse interpolation. The following problems are solved:

wherein the vector W _row And W _col Are linearly independent. Core image I _c The perturbation ρ is a solution to the above equation, set by the attacker. When the output magnitude of the interpolation is smaller than the input magnitude, the above formula becomes an underdetermined equation, and the solution space is non-empty. Thus, the disturbance can be manipulated by the basis of the equation solution space without changing the output. The key is to find the optimal solution for the coordinates. The equations can be divided into W _row ρ is 0 and ρ W _col 0. The basis of the solution space is the union of the bases of the two spaces. By rho ═ B _row X _row +X _col B _col Calculating perturbation matrices, B _row And B _col Is the basis of the weight matrix space. Matrix X _row And X _col Is a coordinate, the establishment of the above formula can be guaranteed regardless of the value. The reason for this is that the weight matrix and its corresponding basis are orthogonal and the post-interpolation perturbation is zero. This is of great significance for extended attacks.

The purpose of the inverse interpolation is to modify the image I _c To make it visually correspond to another image I _b The same is true. The loss function is calculated as:

and selecting random gradient descent (SGD) to calculate optimal coordinates. Overall optimized inverse interpolation (I) _c ，I _b ) Corresponding to the core image I _c Hidden to cover image I _b In (1). The result appears to be image I _b . However, in the forward propagation of neural networks, the bunker image is sampled by interpolationThe pixel of (2) is filtered. The generated image becomes actually the core image I _c Equivalent to giving the input image I _c Is printed with an image I _b Class label of (2). The final generated image trained in the input model is I _c +(B _row X _row +X _col B _col )。

(4) Inputting the generated poisoning sample into the deep learning model to poison the model, and making the model make an error judgment on the triggering sample in the testing stage, wherein the error judgment is expressed as:

f(I)[c]＝b

(5) And inputting the generated poisoning sample labeled with the correct class mark into the deep learning model for intensive training so as to repair the deep learning model and improve the accuracy of the recognition result of the deep learning model.

Referring to fig. 2, a hidden poisoning attack defense device facing a deep learning model comprises:

It should be noted that, when the device for defending against the deep learning model-oriented covert poisoning attack according to the foregoing embodiment performs defense against the deep learning model-oriented covert poisoning attack, the division of the functional modules is taken as an example, and the function distribution may be completed by different functional modules as needed, that is, the internal structure of the terminal or the server is divided into different functional modules to complete all or part of the functions described above. In addition, the device for defending against hidden poisoning attacks facing the deep learning model and the method for defending against hidden poisoning attacks facing the deep learning model provided in the embodiments belong to the same concept, and specific implementation processes thereof are detailed in the embodiment of the method for defending against hidden poisoning attacks facing the deep learning model, and are not described herein again.

Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims

1. A hidden poisoning attack defense method facing a deep learning model is characterized by comprising the following steps:

(1) acquiring an image data set and a deep learning model;

(2) identifying the image data set by using a deep learning model, screening to obtain correctly identified images, and forming a clean image data set;

(5) inputting the generated poisoning sample labeled with a correct class mark into a deep learning model for strengthening training so as to repair the deep learning model;

in the step (3), the image preprocessing specifically adopts an interpolation method, and a resize process in the image preprocessing is utilized to generate a poisoning sample;

on the basis of interpolation linearization, the following problems are solved by adopting inverse interpolation:

wherein the vector W _row And W _col Linearly independent, core image I _c The method comprises the steps that set by an attacker, disturbance rho is a solution of the formula, when the output size of interpolation is smaller than the input size, the formula is changed into an underdetermined equation, and a solution space is non-empty; without changing the output, the disturbance is manipulated on the basis of the solution space of equations, which are divided into W _row ρ is 0 and ρ W _col The base of the solution space is the union of the two spatial bases, by ρ ═ B _row X _row +X _col B _col Calculating perturbation matrices, B _row And B _col Is the basis of a space of weight matrices, matrix X _row And X _col Are coordinates.

2. The deep learning model-oriented covert poisoning attack defense method according to claim 1, wherein the image data set is an MNIST data set, a CIFAR10 data set or a Driving data set; the deep learning model is a LeNet deep learning model, a VGG19 deep learning model or a ResNet50 deep learning model.

3. The method for defending against hidden poisoning attacks of the deep learning model according to claim 1 or 2, wherein the step (2) is specifically:

4. The deep learning model-oriented hidden poisoning attack defense method according to claim 1, characterized in that the image I is modified by inverse interpolation _c To make it visually correspond to another image I _b Similarly, the loss function is calculated as:

5. The method for defending against hidden poisoning attacks of deep learning models according to claim 1, wherein the step (4) is specifically expressed as:

f(I)[c]＝b

6. A hidden poisoning attack defense device facing a deep learning model is characterized by comprising:

specifically, the image preprocessing adopts an interpolation method, and a resize process in the image preprocessing is utilized to generate a poisoning sample;

wherein the vector W _row And W _col Linearly independent, core image I _c Set by the attacker, the perturbation ρ is the solution to the above formula, which is the case when the output magnitude of the interpolation is smaller than the input magnitudeChanging into an underdetermined equation, and setting a solution space to be non-empty; without changing the output, the disturbance is manipulated on the basis of the solution space of equations, which are divided into W _row ρ ═ 0 and ρ W _col The base of the solution space is the union of the two spatial bases, by ρ ═ B _row X _row +X _col B _col Calculating perturbation matrices, B _row And B _col Is the basis of a weight matrix space, matrix X _row And X _col Is a coordinate;