CN115632843A

CN115632843A - Target detection-based generation method of backdoor attack defense model

Info

Publication number: CN115632843A
Application number: CN202211245119.6A
Authority: CN
Inventors: 张云春; 封凡; 廖梓琨; 张宁; 李子璇; 姚绍文; 萧纯一; 李柏萱; 黄飞杨; 陈语瑭
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2023-01-20

Abstract

The invention discloses a generation method of a backdoor attack defense model based on target detection, which comprises the steps of firstly selecting reference data, and selecting a clear image X from a single label class Y to be detected as the reference data; performing multi-scale learning on the reference data by using a SinGAN model, performing random initialization on parameters of the SinGAN model, and selecting the 0 th layer data with large difference as an enhanced clean data set D _c (ii) a Given a number N of training back door sets D of size (w, h) _t And D is _t Put into a clean dataset D _c At any position of (a), the synthesized sample set is recorded as a poisoning training set D _p (ii) a Toxic training set D by using trained target detection tool Yolov5 model _p Generating a back door defense model M _(x,y,w,h) . Thereby making the final model pairThe back door attack facing the deep neural network has a higher defense effect, and the robustness and the safety of the deep neural network are improved.

Description

Target detection-based generation method of backdoor attack defense model

Technical Field

The invention belongs to the technical field of backdoor attack defense, unsupervised learning technology and target detection, and particularly relates to a backdoor attack defense model generation method based on target detection.

Background

In recent years, with the continuous development of the internet and the continuous expansion of the application range, a large amount of software is designed and developed, but the software is likely to contain different kinds of anti-attacks such as viruses, trojans, advertisement software, worms and the like. The occurrence of the counterattack can cause serious potential safety hazard, and can cause serious consequences such as virus attack, privacy information leakage, invasion of user interests and privacy and the like. On one hand, network security technicians continuously promote and optimize anti-attack detection technologies; on the other hand, the attack-defending producer is also constantly changing and hiding the attack mode in order to evade the security detection. Along with the continuous development of deep learning, the anti-interference attack is ubiquitous, and various black box, white box and gray box anti-attack algorithms are proposed, so that the machine learning and the application security thereof are seriously threatened. On the basis of the traditional anti-attack algorithm, a backdoor attack mode is introduced into a common attack means. The Deep Neural Network (DNN) -oriented back-door attack does not reduce the original performance of the model (i.e., does not cause the accuracy of the original DNN model to decrease rapidly) while causing high attack performance. Therefore, the traditional defense mode judges whether the damage is ineffective to the rear door attack by detecting the reduction of the original performance of the model, and the concealment of the rear door attack is stronger.

The related technology of backdoor attack detection in network security is evolved for many times, but with the continuous evolution of attacker technology, efficient and safe detection and improvement of deep neural network robustness are still important tasks in the field of network space security. In the early stage of the backdoor attack of the deep neural network facing to image classification, backdoor defense is mainly performed by detecting fixed positions and fixed forms in images, such as: and judging whether a white square block, a round block and the like exist in the lower right corner. With the gradual progress of relevant research, the existing defense method is mainly divided into three scenes: 1) Before/during training, performing back door trigger (back door trigger) detection on the data set in the training process or before training and filtering back door samples; 2) After training, after the model is trained, judging whether the model or the training set is poisoned by using the damaged model and the clean data set; 3) In fact, this stage is mainly to judge the project when it is online commercially, and at present, the defense measures in this stage are less and more difficult.

Although the back door attack defense technology based on deep learning achieves remarkable results, the related method shows that: the existing defense mechanism is mostly concentrated on multiple labels and multiple data sets, and the defense mechanism is not deeply analyzed and understood at present for backdoor attack defense in an application scene of a single label and a small number of data sets. Further, the conventional defense method is not useful in this field. Therefore, at present, a robust and safe defense model for backdoor attacks is urgently needed to be designed and realized from the perspective of a single-label small data set.

Disclosure of Invention

The embodiment of the invention aims to provide a method for generating a backdoor attack defense model based on target detection, which has high-efficiency defense capability in the scene of a single label and a small number of data sets and can also alleviate the problems of resources, pressure and the like of service recalculation.

In order to solve the technical problems, the technical scheme adopted by the invention is that the generation method of the backdoor attack defense model based on target detection comprises the following steps:

the method comprises the following steps: selecting reference data, and selecting a clear image X from a single label class y to be detected as the reference data;

step two: performing multi-scale learning on the reference data by using a SinGAN model, performing random initialization on parameters of the SinGAN model, and selecting the 0 th layer data with large difference as an enhanced clean data set D _c ；

Step three: given a number N of training back door sets D of size (w, h) _t And D is _t Put into a clean dataset D _c At any position of (a), the synthesized sample set is recorded as a poisoning training set D _p ；

Step four: toxic training set D by using trained target detection tool Yolov5 model _p Generating a back door defense model M _(x,y,w,h) 。

The invention has the beneficial effects that:

1. the invention realizes a plug-and-play backdoor defense model, and can alleviate the problems of resource and pressure of service recalculation and the like.

2. According to the method, the defense angle is expanded from large sample fine granularity to small sample through SinGAN, and the defense target is expanded from a single label to multiple labels by using a YOLOv5 model of target detection attack.

3. In the face of the security vulnerability problem of backdoor attack of the current deep neural network, a defense experiment is developed from multi-dimensional angles, the effectiveness and the efficiency of the method are verified by detecting a plurality of angles such as a single label, a cross label, a single size, a cross size and a reference image under the realization of a single sample, and then the research of a robustness mechanism of a deep learning model is assisted, so that the method has high-efficiency defense capability under the scene of the single label and a small number of data sets.

4. The invention integrates the post defense and target detection technologies, and has important application value and research value for designing and realizing a robust and safe back door attack defense deep learning model.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a generated architecture diagram of a back door defense model implemented in the present invention;

FIG. 3 is a diagram G of the internal implementation structure of a single node of a deep neural network corresponding to the SinGAN model _N D, discriminator D _n Has the same structure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The embodiment of the invention provides a generation method of a backdoor attack defense model based on target detection, which is based on the principle that the normal distribution of the same label data set has internal relevance, and the SinGAN model is used for carrying out multi-scale learning on a reference image to simulate different distribution conditions in the same label and realize a data enhancement effect; synthesizing the backdoor type to be detected and the enhanced data to generate a poisoned training set; and finally, generating a defense model of backdoor attack by using an optimized and improved target detection Yolov5 model, and improving the robustness of the deep neural network model.

The specific steps are as follows, as shown in figure 1:

Step four: toxic training set D by using trained target detection tool Yolov5 model _p Generating a back door defense model M _(x,y,w,h) . Therefore, the final model has a higher defense effect on the backdoor attack facing the deep neural network, and the robustness and the safety of the deep neural network are improved.

In the first step, a single picture X selected from a single label y to be detected is clearly visible. The experimental result shows that the definition and the defense effect of the reference picture are in a direct proportion relation. Wherein y is epsilon [1,C ], and C represents the total number of output label types of the deep neural network.

In the second step, the SinGAN model captures different data distributions from multiple scales of a single image to generate a group of data sets with internal correlation. The model internally comprises: generator G for generating a data distribution _n And a discriminator D for discriminating the data distribution _n Wherein G is _n And D _n The internal structure of the system is a convolutional neural network with five convolutional blocks, an input layer is random noise and a picture which is up-sampled in the last scale, and an output layer is formed by a result obtained by fusing the convolutional neural network with five convolutional blocks and an up-sampled image. In combination with the definition of GAN (generic adaptive Network), generator G is shown in FIG. 3 _n Structure, generator G _n The description is as follows:

wherein Z is _n In order to be a noise, the noise is,

a convolutional neural network representing five layers of convolutional blocks,

representing an up-sampled version of the image, n representing the current scale, X _n Is G _N And correspondingly outputting the result. The convolutional network consists of 5 convolutional blocks, of the form: conv (3X 3) -BatchNorm-LeakyReLU.

Discriminator D _n Architecture of (1) and (G) _n In (1)

And the consistency is maintained.

Singan model G _n The training loss of (a) is divided into an opponent loss and a reconstruction loss. Defining loss functions for SinGAN models

Is composed of

Wherein,

representing a function of antagonistic losses for adjusting X _n Patch distribution and distance in patch distribution in the generated sample,

representing a reconstruction loss function for ensuring each G _n Where a specific set of noise maps need to be present.

As a nested optimization objective function, the internal discriminator aims at realizing the maximization of the loss function values of the two functions; outer layer generator to minimize damageThe value of the mismatch is targeted to ensure that the perturbation on the original picture is searched within a certain "degree".

Data enhancement is achieved and recorded as a clean data set D _c The process of (1) is as follows: performing multi-scale learning on the reference data by using a SinGAN model, performing random initialization on parameters of the SinGAN model, and selecting the 0 th layer data with large difference as an enhanced clean data set D _c 。

Further, the principle of the SinGAN model in the second step will be briefly described.

For a single image X, the SinGAN model provides a generator G _N ＝{G ₀ ,……,G _n Denotes n-scale pyramid training on X (single image). By X _N ＝{X ₀ ,……,X _n Denotes G _N Corresponding output result, wherein X _n Is X multiplied by r ⁿ The down-sampled version obtained thereafter, for some of which r > 1,r, represents the base of the down-sampled coefficient.

The size of the image samples is progressively subdivided from coarse to fine granularity, requiring noise injection for each size. All generators and discriminators have the same domain, and the size of the capture during generation decreases as the dimension increases. At the very beginning of the scale, the result is output directly from the noisy data, i.e. the SinGAN generator G _n Mapping noise data Z _n To

Is G _n The formula is as follows:

with decreasing size, each generator G _n The previous generator G will be added _n-1 Without the details of generation. Thus, it is possible to provideFor each generator G _n Except for the addition of random noise Z _n In addition, upsampled version data of the coarser size image is received, namely:

all generators have a similar architecture. Specifically, the noise Z is transmitted to the convolutional layer _n Is added to the image

This ensures that the generator does not ignore the effects of noise, as often happens in conditional schemes involving randomness. The effect of the convolution layer is that

Generating the missing details. I.e. from G _n The following operations are performed:

the convolution network is composed of 5 convolution blocks, and the form of the convolution block is as follows: conv (3X 3) -BatchNorm-LeakyReLU.

Each generator G _n And a discriminator D _n Forming a pair, the discriminator divides each of the overlapping blocks it inputs into true or false. Wherein D _n Architecture of (1) and G _n In (1)

And the consistency is maintained. Singan substituted G _n The training loss of (a) is defined as being composed of two parts, the countermeasure loss and the reconstruction loss.

The penalty function is defined as

For punishing X _n The distance between the normal distribution in (b) and the normal distribution in the generated sample.Since the WGAN-GP (Wassertein GAN with Gradient Penalty) loss function can enhance the stability of the training, the invention selects it as the antagonistic loss function, which is defined as:

wherein,

denotes a WGAN generator G _n Is then outputted from the output of (a),

is shown in

Adding a sample after random noise is added;

indicating an increased "gradient penalty" in the WGAN-GP model. D _w Meaning the output of the arbiter.

Singan to ensure for each G _n In which there is a specific set of noise maps, in

Introducing a reconstruction loss function

Expressed as:

wherein,

representing images produced using noise at the corresponding nth size, Z ^* Is the noise data that is fixed at the very beginning,and remain stationary during training.

The resulting loss function for SinGAN is:

in step three, the number of back door types T needs to be limited to 10 to 50 so as to achieve efficient detection.

The fourth step specifically comprises:

s41: poisoning training set D synthesized by using three steps _p Generating xml and txt format files corresponding to the description training information;

s42: general poisoning training set D _p And the file information (including txt file and xml file) generated in the step S41 are combined into the target detection data set D of the Yolov5 model _y And used as input for training the Yolov5 model.

S43: setting an initial input sample quantity batch _ size and deep neural network training cycle times epoch according to the environment; carrying out random initialization on the Yolov5 model parameters based on target detection, and utilizing the Yolov5 model to detect a target data set D _y Training is performed until the model loss function no longer changes.

S44: finally, a back door defense model M is obtained _(x,y,w,h) 。

Preferably, the method for generating the back door defense model based on the target detection is a back door attack detection method which integrates a target detection technology in back door attacks.

In order to better illustrate the technical effect of the invention, the invention is experimentally verified by adopting a specific example and compared with the technical effect of the existing algorithm. The experiment selects a GTSRB (german traffic sign recognition image) dataset close to the engineering scene. The task is to identify 43 different traffic signs, simulating the application scenario of an autonomous vehicle. The GTSRB dataset contained 39.2K color training images and 12.6K test images, with a picture size set to 32 x 32. For the model architecture, resNet-18 is selected as the classifier.

Following the first random strategy method proposed in the dynamic attack theory, random data is obtained from normal distribution and acted on the image (as in fig. 2), and the trigger size is set to 4 × 4 size by default. A target class is randomly selected and a portion of the data is modified to the antagonistic input of the target class to modify the training data set. For a given data set, a certain proportion of samples are selected to complete the back gate operation (between [0.1,0.2 ]), and after a trigger is set, an attack success rate (attack success rate) of more than 92% can be achieved, and meanwhile, a high classification accuracy (accuracy) is kept, namely: the classification performance of the model on "clean" pictures without inserted back door triggers is guaranteed to be unaffected.

Following the description of the method architecture herein, we studied to verify the detection performance of the back door defense model from the multidimensional perspective that a single-tag attack (single target attack) may occur in an actual scene. In the experiment, the high availability, the high effectiveness and the expandability of the method are verified from a plurality of scenes such as a target label, an attack label, different trigger patterns, cross-dimension, same dimension and different reference images respectively, and the experimental test result is shown in table 1.

TABLE 1 Multi-dimensional defense Performance results

In the experiment, except that the reference image of the attack target scene is obtained from the attack tag, the backdoor defense models in other scenes are obtained from the target tag to be detected. From table 1 it can be found that:

(1) The defense model enables the success rate of the backdoor attack to be reduced from 99.58% to 2.36% under the condition of keeping classification performance not to be reduced basically, and the success rate of the backdoor attack can be reduced to 0% under partial scenes. Therefore, the experimental result proves the effectiveness and high efficiency of the back door attack defense method provided by the invention.

(2) The defense models in the experiments were developed based on the assumed back door size of the defenders. The results tested from a cross-size perspective in the images show that: when the size of the real back door trigger is inconsistent with the size of the defense model during training, the performance of the back door attack defense can be reduced by a step level. Meanwhile, when the sizes of the two are kept consistent, the cross-size angle is re-tested under the condition that the size of the real rear door trigger is kept consistent with the size of the defense model during training. The test result shows that: when both are kept in agreement, an excellent backdoor attack detection effect can be obtained.

(3) From the results of different reference image scenes, it can be found that when an unclear image is taken as a reference image, it exhibits a backdoor attack rate of 3.47% in the speed limit tag; but the rate of backgate attack in the forbidden passing tags is 58.89% with a performance gap of about 16.9 times. Therefore, the sharpness of the reference image may affect the detection performance of the back door defense model.

In conclusion, the back door attack defense model generated by the invention can realize effective detection of the back door trigger; compared with the traditional detection method, the method has the advantages of better performance, fewer hypotheses, thinner detection points and the like; the presence of a back door trigger is significantly identified without significantly degrading the performance of the model classification. Meanwhile, the method for enhancing the backdoor sample assists in researching the robustness mechanism of the deep neural network model, and is beneficial to training a better and safer backdoor attack defense model.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. The generation method of the backdoor attack defense model based on target detection is characterized by comprising the following steps:

Step three: given a number N of training back door sets D of size (w, h) _t And D is _t Put into a clean dataset D _c At any position of (2), the synthesized sample set is recorded as a poisoning training set D _p ；

Step four: poisoning training set D by using trained target detection tool Yolov5 model _p Generating a back door defense model M _{(x，y，w，h)} 。

2. The method for generating the target detection-based backdoor attack defense model according to claim 1, characterized in that in the step one, C represents the total number of output label types of the deep neural network from a single label y e [1,C ] to be detected.

3. The method for generating a backdoor attack defense model based on target detection as claimed in claim 1, wherein in the second step, the SinGAN model captures different data distributions from multiple scales of a single image to generate a set of data sets with internal correlation, and the model internally comprises: generator G for generating a data distribution _n And a discriminator D for discriminating the data distribution _n Wherein G is _n And D _n Has an internal structure of five convolutionsThe block convolutional neural network comprises an input layer, an output layer and a block convolution layer, wherein the input layer is random noise and a picture which is sampled at the last scale, and the output layer is formed by a result obtained by fusing five layers of convolutional neural networks of block convolution and upsampled images; generator G _n The description is as follows:

wherein, Z _n In order to be a noise, the noise is,

representing an up-sampled version of the image, n representing the current scale, X _n Is G _N Corresponding to the output result, the convolution network is composed of 5 convolution blocks, and the form is as follows: conv (3X 3) -BatchNorm-LeakyReLU:

discriminator D _n Architecture of (1) and (G) _n In

The consistency is kept, and the data are transmitted to the server,

singan model G _n The training loss of (2) is divided into an antagonistic loss and a reconstruction loss, and a loss function of a SinGAN model is defined

Is composed of

Wherein,

representing a function of antagonistic losses for adjusting X _n Patch distribution and in-sample patch generationThe distance in the distribution of the T is,

representing a reconstruction loss function for ensuring each G _n Where a specific set of noise maps needs to be present,

as a nested optimization objective function, the internal discriminator aims at realizing the maximization of the loss function values of the two functions; the outer generator then aims to minimize the loss function value.

4. The method for generating a backdoor attack defense model based on target detection as claimed in claim 1, wherein in the third step, the number of backdoor types T is 10 to 50.

5. The method for generating a backdoor attack defense model based on target detection as claimed in claim 1, wherein the fourth step specifically comprises:

s41: poisoning training set D synthesized by using steps three _p Generating xml and txt format files corresponding to the description training information;

s42: training set D of poisoning _p And the file information generated in step S41 are combined into a target detection data set D of the Yolov5 model _v And as input, used to train the Yolov5 model;

s43: setting an initial input sample quantity batch _ size and deep neural network training cycle times epoch according to the environment; carrying out random initialization on the Yolov5 model parameters based on target detection, and utilizing the Yolov5 model to detect a target data set D _y Training is carried out until the model loss function is not changed any more;

s44: finally, a back door defense model M can be obtained _{(x，y，w，h)} 。