CN111881935B

CN111881935B - Countermeasure sample generation method based on content-aware GAN

Info

Publication number: CN111881935B
Application number: CN202010567205.3A
Authority: CN
Inventors: 刘建毅; 张茹; 田宇; 李娟�; 李婧雯
Original assignee: Beijing University of Posts and Telecommunications; China Information Technology Security Evaluation Center
Current assignee: Beijing University of Posts and Telecommunications; China Information Technology Security Evaluation Center
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2023-04-18
Anticipated expiration: 2040-06-19
Also published as: CN111881935A

Abstract

The invention discloses a method for generating a confrontation sample based on content perception GAN, which changes a training process on the basis of WGAN _ GP, directly generates a targeted confrontation sample by inputting random noise, increases a content characteristic extraction part, restricts the quality of the generated sample under the condition of not influencing an attack effect, and ensures that the confrontation sample can keep the content characteristic as much as possible without changing. The device comprises a generator G, a discriminator D, a target model f, a disturbance evaluation part and a feature extraction network, wherein the generator is responsible for generating samples from random noise, and the generator is trained according to loss functions of the discriminator D, the target model f, the disturbance evaluation part and the feature extraction network, so that the generator directly generates unlimited confrontation samples from the noise. The method is based on generation of the countermeasure network, focuses on semantic information of the samples, is oriented to a mode of directly generating the countermeasure samples instead of a mode of superposing disturbance, uses unsupervised GAN training to realize direct generation of the countermeasure samples of the specified target, accelerates the generation speed of the samples, improves the quality of the generated samples, and reduces the change of the countermeasure samples in a content characteristic area while keeping a high attack success rate.

Description

Countermeasure sample generation method based on content-aware GAN

Technical Field

The invention belongs to the field of deep learning, and particularly relates to a countermeasure sample generation method based on content awareness GAN.

Background

Artificial intelligence is a hot method for solving problems in various fields in recent years, and deep learning is one of the machine learning fields and is gradually a research hot in the computer vision field. With the continuous development of deep neural network models, more and more deep learning training frames and open source tools are developed, and with the continuous improvement of the performance of hardware such as a GPU used for training, software and hardware conditions for training complex models become more and more easily available, so that the application of deep learning in various fields in real life is greatly promoted, and computer vision solutions gradually enter fields with vital requirements on safety. Researchers find that a deep neural network model is easily affected by a well-designed countercheck sample, and the model can make a wrong classification with high confidence by only adding small disturbance to a picture, so that the countercheck attack problem in deep learning is solved.

Since deep learning is widely considered as a black box technique, although it has good effect, the principle thereof cannot be completely explained. The essence of the challenge attack problem is that a challenge sample which can confuse the judgment of the model is generated by adding tiny disturbance in an input sample of the deep learning model, and the challenge sample is not perceptible to human beings, and the generation of the challenge sample needs to meet an important standard, namely, the disturbed sample should look like an original sample in human eyes, but can cause the model to output an incorrect result and even to be regarded as a specified error result.

Due to the above phenomena, the counterattack and defense work of deep learning has attracted great attention. In recent years, more and more people pay attention to the security field of deep learning, and focus on the attempt of proposing an anti-attack method for the existing deep learning model to prove the security problem of the model and proposing an attempt to enhance the robustness of the model aiming at the security problem. Through the study of such problems with the antibody sample, one can find some previously unfocused directional and deep problems in the deep learning model. The research on the robustness of the deep learning model is beneficial to establishing a more robust deep learning model, so that the deep learning solution has higher safety while solving the actual problem.

How to effectively resist attacks on the deep learning model is an important means for analyzing the safety of the deep learning model and improving the robustness of the model. The traditional attack resisting method focuses on calculating the disturbance of an original image, and the original sample is disturbed through a single-step calculation or iterative calculation mode to further generate a new attack resisting sample, however, the traditional attack method is low in generating speed and large in calculation amount, most of the traditional attack method is white-box attack, corresponding information of a target model is needed in calculation, and the applicability is narrow. The latest direction of counter-attack is therefore moving towards the generation of counter-samples using neural networks, in particular generating counter-networks. These works are different in methods for realizing counterattack, but have some problems more or less, such as low quality of generated counterattack samples, easy recognition of added disturbance, low success rate of attack, low migration of attack, and the like.

Disclosure of Invention

The invention provides a countermeasure sample generation method based on content perception GAN, which generates a high-quality countermeasure sample through content feature constraint, and constrains semantic information of the generated sample by means of a content feature extraction network, so that the GAN can generate the countermeasure sample which is close to the original sample distribution, the quality of the countermeasure sample is improved on the premise of not influencing the attack effect, and the perception degree of human is reduced.

The invention provides a method for generating confrontation samples based on content-aware GAN, which comprises the following steps:

1) The generation of the countermeasure sample is carried out by adopting the WGAN _ GP-based generation countermeasure network. Two stages of model training are set so that the generator finally learns the distribution of the challenge samples, and the targeted challenge samples are directly generated by inputting random noise z.

2) The normal training part uses noise z as generator input, generates samples G (z) and real samples x as discriminator input, initializes the generator G and the discriminator D, and uses WGAN _ GP primitive loss function L _GAN And as an objective function, updating the parameters of the generator G and the discriminator D after each round of training is finished, and obtaining the generator and the discriminator which learn the normal sample distribution.

3) The antagonism training part enables the generator to learn the distribution of the antagonism sample from the noise z on the basis of the generator and the discriminator obtained in the step 2). Continue to optimize WGAN _ GP loss L _GAN On the premise of adding a target model f, a disturbance evaluation part and a feature extraction network N _feature And forming the antagonism training structure of the model. Secure while generating confrontational samplesThe content-holding characteristics are as unchanged as possible, the invisibility of the generated countersample to human is improved, and the countersample can deceive the target model and can also keep the original semantic characteristics.

Further, the normal training process includes:

a) The generator is responsible for generating samples G (z) from random noise z;

b) The discriminator D is responsible for discriminating the truth of the sample picture G (z) generated by the generator and the original sample x so as to stimulate the generator to generate a more real picture;

c) And the discriminator D updates parameters according to the input known sample label so as to improve the discrimination capability.

d) Loss function L of the Normal training part _GAN Comprises the following steps:

further, the antagonism training process includes:

a) Generator G _adv And the discriminator D is based on the model trained in the previous step;

b) Generator G _adv Generated confrontation sample G _adv (z) as input to the target model f, defining the output of the antagonistic part as the antagonistic loss

Wherein y is _target Is a defined target attack class which is used for representing the confrontation sample G generated by the generator _adv (z) distance between target attack class and prediction class to confound the target model, the antagonism penalty being:

c) To limit the confrontation sample G _adv (z) disturbance range, defining a measure disturbance loss of the disturbance evaluation portion as: l is a radical of an alcohol _perturb ＝||G(z)-G _adv (z)|| ₂

d) To constrain content features, a pre-trained VGG-16 model is used as the feature extraction network N _feature The VGG-16 model comprises 16 hidden layers and can be divided into 5 convolution structures, wherein 13 convolution layers and 3 full-connection layers are arranged, and a plurality of 3 x 3 convolution kernels replace a larger convolution kernel, so that network parameters are reduced, and image characteristics are kept. Selecting the activation value of the third convolution layer of the third convolution structure of VGG-16 as the calculation of the content characteristics, and taking G (z) and G _adv (z) as an input to the feature extraction network, constructing a content feature loss function using MSE losses as:

e) The total loss function of the antagonistic training component is: l is a radical of an alcohol _total ＝L _GAN +λ ₁ L _adv + λ ₂ L _perturb +λ ₃ L _content Wherein λ is ₁ ，λ ₂ ，λ ₃ And controlling the over-parameters of the antagonism loss and the disturbance loss ratio during training.

The method of the invention can make the confrontation sample generated by the target generator approximate to the distribution of the real sample, and directly generate the unconstrained confrontation sample from the noise, and compared with the prior art, the method has the following advantages:

1. the method uses the generated countermeasure network for training, does not need to perform disturbance calculation on each sample, realizes multiple generation of one-time training, and greatly improves the speed of generating the samples.

2. The invention changes the training process on the basis of WGAN _ GP, sets two stages of model training, enables the generator to finally learn the distribution of the confrontation samples, directly generates the confrontation samples with targets by inputting random noise z, and improves the attack success rate in the attack taking the image classifier as the targets.

3. According to the method, the high-quality countermeasure sample is generated through content feature constraint, semantic information of the generated sample is constrained by means of the content feature extraction network, the quality of the countermeasure sample is improved on the premise that the attack effect is not influenced, and the perception degree of human beings is reduced.

Drawings

Fig. 1 is a flowchart of a method for generating countermeasure samples based on content aware GAN.

Fig. 2 is a diagram of a normal training structure in the normal training phase.

FIG. 3 is a diagram of the resistance training architecture during the resistance training phase.

Fig. 4 is a view showing a structure of a VGG-16 feature extraction network used in content feature extraction.

Detailed Description

In order to make the aforementioned and other features and advantages of the present invention more comprehensible, embodiments accompanying figures are described in further detail below.

The method for generating the confrontation samples is based on a basic attack method WGAN _ GP, and a loss function of a model confrontation training stage is designed by using unsupervised training stages of two different targets, so that a GAN model can learn the distribution of the confrontation samples from random noise, unlimited confrontation samples are generated in batches, and the target model is confronted and attacked. The specific training process is shown in fig. 1, and the main steps include:

step 100, training the WGAN _ GP to learn the data distribution of the normal sample, and the structure diagram of the normal training phase is shown in fig. 2.

Further, step 100 specifically includes:

step 101, inputting random noise z, real samples x, the number m of samples in each batch, and hyper-parameters alpha and beta of an Adam optimizer ₁ ，β ₂ Gradient penalty factor lambda, generator penalty period n.

Step 102, initializing a generator parameter theta ₀ Initialization of the gradient penalty parameter ω ₀ 。

103, selecting real samples x-P _data Generating random noise z-p (z), the random value epsilon is belonged to U [0,1 ∈]。

Step 104, the generator generates samples from the random noise z

Step 105, real sample x and generated sample fragment

Make linear interpolation>

Step 106, calculating a gradient penalty item

Step 107, calculating the discriminator loss

And 108, repeatedly executing the steps 102 to 107, and circulating for m times.

Step 109, updating the gradient penalty parameter

And step 110, repeatedly executing the steps 108 to 109, and circulating for n times.

Step 111, selecting m random noises

Step 112, calculate Generator loss L _G ＝-D _ω (G _θ (z))。

Step 113, update the generator parameters

And step 114, repeating the steps 110-113 and stopping training when theta is converged.

The invention uses the content characteristic x of the image _content Taking into account, the content feature extraction network N is introduced _feature Since the content features represent the semantics of the image, if the content features are kept unchanged when the countermeasure sample is generated, the invisibility of the generated countermeasure sample to human beings is high, and the countermeasure sample can not only deceive the target model but also keep the original semantic features.

Step 200, training WGAN _ GP to learn the data distribution of the challenge samples, and the structure diagram of the challenge training phase is shown in fig. 3.

Further, step 200 specifically includes:

step 201, inputting random noise z, real sample x, batch sample number m, adam optimizer hyperparameters alpha and beta ₁ ，β ₂ Gradient penalty coefficient lambda, penalty period n, attack target class y _target 。

Step 202, using the generator parameter θ in step 100 ₁ Initialization penalty parameter omega ₀ 。

Step 203, selecting real samples x-P _data Generating random noise z-p (z), the random value epsilon is belonged to U [0,1 ∈]。

Step 204, the generator generates a confrontation sample x' ← G from the random noise z _adv (z)。

Step 205, linear interpolation of the real sample x and the challenge sample x

Step 206, calculating a gradient penalty term

Step 207, calculating the discriminator loss

And step 208, repeatedly executing the steps 203 to 207, and circulating for m times.

Step 209, update gradient penalty parameters

And step 210, repeatedly executing steps 208 to 209, and circulating for n times.

Step 211, selecting m random noises

Step 212, the generator generates a confrontation sample x' ← G from the random noise z _adv (z)。

Step 213, the original generator generates normal samples from the random noise z

Step 214, calculate Generator loss

Step 215, calculate L ₂ Norm distance

Step 216, comparing the confrontation sample x' with the normal sample

Input feature extraction network N _feature The feature extraction network calculates the content feature loss by using the VGG-16 feature extraction networkMedicine for treating chronic hepatitis B

The structure of the VGG-16 feature extraction network is shown in FIG. 4.

Further, in step 216, the VGG-16 model includes 16 hidden layers, which are divided into 5 convolution structures, wherein there are 13 convolution layers and 3 fully-connected layers, a plurality of convolution kernels of 3 × 3 are used to replace the design of a larger convolution kernel, and the feature map activation values of the convolution are used to represent the content features of the picture:

where i and j represent the jth signature of the ith convolution structure, respectively.

Further, in step 216, two pictures x ₁ ，x ₂ The feature map i with the size of C × H × W can be calculated, and the content feature loss function of the two pictures is as follows:

further, in step 216, the activation value of the third convolution layer of the third convolution structure of VGG-16 is selected as the calculation of the content feature.

Step 217, inputting the antagonistic sample x' into the target model f, and calculating a classification predicted value as an antagonistic loss L _adv ＝log _f (x′，y _target )。

Step 218, update Generator parameters

Step 219, repeat steps 210-218 and stop training when θ converges.

The invention provides a method for generating an confrontation sample based on content-aware GAN, which can better realize the search of the distribution of the confrontation sample data through unsupervised training, increase a content characteristic extraction network, and generate the target type confrontation sample without changing the semantic information of the sample, thereby having better quality and better conforming to the judgment standard of human beings.

According to the method, the confrontation samples are directly generated, the limitation of an original sample superposition disturbance method is avoided, and the unrestricted confrontation samples can be generated in batch more quickly.

While the invention has been described with reference to specific preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A countermeasure sample generation method based on content aware GAN, comprising:

A. the generation of the countermeasure network based on WGAN _ GP carries out the generation work of the countermeasure samples, the unsupervised training phase of two different targets is used, the normal training phase learns the normal sample distribution, the countermeasure training part learns the distribution of the countermeasure samples, so that the GAN model can learn the distribution of the countermeasure samples from random noise, the unrestricted countermeasure samples are generated in batch, and the target model is subjected to countermeasure attack;

B. and a normal training part: using noise z as generator input, generating samples G (z) and true samples x as discriminator inputs, initializing the generator G and the discriminator D, using the WGAN _ GP raw loss function L _GAN As an objective function, updating parameters of a generator G and a discriminator D after each round of training is finished, and obtaining the generator and the discriminator which learn normal sample distribution;

C. and an antagonism training part: on the basis of the generator and the discriminator obtained in the normal training part, the generator can learn the distribution of the confrontation samples from the noise z, and the WGAN _ GP loss L is continuously optimized _GAN On the premise of adding a target model f, a disturbance evaluation part and a feature extraction network N _feature Antagonistic training structures forming models, preserving content characteristics when generating antagonistic samplesThe sign is as unchanged as possible;

D. generating high quality countermeasure samples by content feature constraints, defining an image x and a content feature x of the image _content Extracting network N by content feature based on CNN feature extraction capability _fe-ture The semantic information of the generated sample is restricted, and a new sample quality restriction loss function L is introduced _content The antagonism training process of the basic attack model is improved, the quality of an antagonism sample is improved on the premise of not influencing the attack effect, and the perceptibility degree of human beings is reduced.

2. The content-aware GAN-based antagonistic sample generation method according to claim 1, wherein step B further comprises the steps of:

b1, inputting random noise z, real sample x, and generating a sample from the random noise z by a generator G

B2, generating sample obtained from B1

And the true sample x are fed into a discriminator D which discriminates whether a generated sample->

And the true sample x, the loss function of the discriminator D is obtained>

Loss function of discriminator D>

The method comprises the following specific steps:

wherein

Is a gradient penalty term, and lambda is a gradient penalty coefficient;

b3, updating the gradient penalty parameter

Where m is the batch sample size, α, β ₁ ,β ₂ An Adam optimizer hyper-parameter;

b4, selecting random noise with the number of m batches of sample sizes

Compute generator loss L _G Loss function L of G loss of generator _G The method comprises the following specific steps:

L _G ＝-D _ω (G _θ (z))；

b5, updating generator parameters

And obtaining a trained generator until the generator parameter theta is converged.

3. The method of generating content-aware GAN-based countermeasure samples according to claim 1, wherein step C further comprises the steps of:

c1, inputting random noise z and real sample x, replacing initialization with the generator G and the discriminator D trained in the step B, and generating a confrontation sample x' ← G from the random noise z by the generator _adv (z)；

C2, inputting the confrontation sample x 'and the real sample x obtained by the C1 into a discriminator D, and distinguishing the confrontation sample x' and the real sample x by the discriminator D to obtain a loss function of the discriminator D

Loss function of a decision maker D>

The method specifically comprises the following steps:

wherein

Is a gradient penalty item, and lambda is a gradient penalty coefficient;

c3, updating the gradient penalty parameter

c4, selecting random noise with the number of m batches of sample sizes

Generator G _adv Generation of confrontation sample x' ← G from random noise z _adv (z), the raw generator G generates a normal sample from the random noise z>

C5, loss function according to normal training is L _GAN Loss function of the target model f

Loss function L of disturbance evaluation section _perturb And content characteristic loss function L _content Constructing the total loss function L of the antagonistic training component _total ，

Loss function of the target model f

The method comprises the following specific steps:

wherein y is _target Is a defined class of target attacks,

using L ₂ Norm distance as disturbance estimation part loss L _perturb The measurement of (a) is specifically:

L _perturb ＝||G(z)-G _adv (z)|| ₂

content characteristic loss function L _content The method specifically comprises the following steps:

total loss function L of the antagonistic training part _total The method specifically comprises the following steps:

wherein λ ₁ ，λ ₂ ，λ ₃ A hyper-parameter for controlling the proportion of antagonism loss and disturbance loss during training;

c6, updating generator parameters

Until the generator parameter theta converges, the confrontation sample generated by the generator is close to the distribution of the real sample, and the unconstrained confrontation sample is directly generated from the noise.

4. The content-aware GAN-based antagonistic sample generation method according to claim 1, wherein step D further comprises the steps of:

d1, using a pre-trained VGG-16 model as a feature extraction network N _feature The system comprises 16 hidden layers and is divided into 5 convolution structures, wherein 13 convolution layers and 3 full-connection layers are arranged, and a plurality of convolution kernels of 3 x 3 are used for replacing a larger convolution kernel;

d2, calculating by using an image input model with 224-224 resolution, obtaining the output of each convolution structure after the Relu activation function, performing visualization processing, and outputting a feature map with better content representation capability on the image;

d3, representing the content characteristics of the picture by using the convoluted characteristic graph activation value, wherein the content characteristics are as follows:

where i and j represent the jth signature of the ith convolution structure,

selecting the activation value of a third convolution layer of a third convolution structure of the VGG-16 as the calculation of the content characteristics;

d4, in two pictures x ₁ ，x ₂ The feature map i with the size of C × H × W can be calculated, and the content feature loss function of the two pictures is as follows:

/>