CN110334806A

CN110334806A - A method of adversarial sample generation based on generative adversarial network

Info

Publication number: CN110334806A
Application number: CN201910459852.XA
Authority: CN
Inventors: 贾西平; 陈桂君; 方刚; 陈道鑫
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2019-10-15

Abstract

The invention discloses a method for generating an adversarial sample based on a generative adversarial network, including a generator G, a discriminator D, a space transformation module ST and a target classification network F. The generator G generates a disturbance, and the disturbance is superimposed on the original sample to generate Adversarial samples, and then train the generator G according to the loss function of the discriminator D and the target classification network F, and finally get the trained generator G, and use the trained generator G to generate adaptive confrontation samples for different input samples. The invention utilizes a generative confrontation network, embeds an enhancement module based on space transformation, conducts confrontation training in an unsupervised manner, improves the generalization ability and robustness of the attack model, and further enhances the migration and robustness of the confrontational samples.

Description

A method of adversarial sample generation based on generative adversarial network

技术领域technical field

本发明涉及机器学习领域，更具体地，涉及一种基于生成式对抗网络的对抗样本生成方法。The present invention relates to the field of machine learning, and more specifically, to a method for generating an adversarial example based on a generative adversarial network.

背景技术Background technique

对抗攻击是当前机器学习领域研究的一个热点问题。对抗攻击的原理是通过对抗样本(向原数据样本中添加经过精心训练的人眼不易察觉的微小扰动得到的新样本)来欺骗深度神经网络，使其做出错误判定。Adversarial attacks are a hot topic in the field of machine learning. The principle of adversarial attack is to deceive the deep neural network to make wrong judgments by adversarial samples (new samples obtained by adding carefully trained tiny perturbations that are imperceptible to the human eye to the original data samples).

目前大多数基于深度神经网络的攻击算法(如基于梯度和基于优化的方法)都是针对测试过程或测试数据集的，且需要一直对模型的体系结构和参数进行白盒访问(如获取与输入相关的梯度就需要知道目标网络的权重)。但是，当前的深度学习系统通常出于安全原因不允许对模型进行白盒访问，只允许对模型进行查询访问，即将模型视作黑盒。针对这种情况的攻击被称为黑盒攻击，但当前大多数黑盒攻击的成功率都不高，因为大部分黑盒攻击方法都是基于对抗样本的可迁移性(Transferability)。可迁移性是对抗样本的一个常见属性，指依据有限样本生成的对抗样本对其他变量域也有良好的攻击效果。Most of the current attack algorithms based on deep neural networks (such as gradient-based and optimization-based methods) are aimed at the test process or test data set, and require white-box access to the model's architecture and parameters all the time (such as getting and input The associated gradient requires knowledge of the weights of the target network). However, current deep learning systems usually do not allow white-box access to the model for security reasons, only query access to the model, which treats the model as a black box. Attacks against this situation are called black-box attacks, but the success rate of most current black-box attacks is not high, because most black-box attack methods are based on the transferability of adversarial samples. Transferability is a common property of adversarial examples, which means that adversarial examples generated based on finite samples also have good attack effects on other variable domains.

在无法获取目标网络结构和训练数据集的黑盒攻击中，可迁移性至关重要。如何高效地生成可迁移性强、攻击性能稳定的对抗样本，是一个极有意义且极具挑战性的问题。Transferability is crucial in black-box attacks where the target network structure and training dataset are unavailable. How to efficiently generate adversarial samples with strong transferability and stable attack performance is a very meaningful and challenging problem.

综上所述，虽然已有研究证明了现有攻击方法在不同数据训练的相同结构神经网络之间，以及同一任务训练的不同结构神经网络之间具有一定的迁移性，如文献[1]Goodfellow I J,Shlens J,Szegedy C,et al.Explaining and Harnessing AdversarialExamples[J].International Conference on Learning Representations,2015、文献[2]Kurakin A,Goodfellow I J,Bengio S,et al.Adversarial examples in the physicalworld[J].arXiv:Computer Vision and Pattern Recognition,2017、文献[3]Moosavidezfooli S,Fawzi A,Frossard P,et al.DeepFool:A Simple and AccurateMethod to Fool Deep Neural Networks[J].Computer Vision and PatternRecognition,2016:2574-2582和文献[4]Xiao C,Li B,Zhu J Y,et al.GeneratingAdversarial Examples with Adversarial Networks[J].2018；但仍存在对抗样本过于依赖目标模型而引起对抗样本可迁移性差、攻击成功率低、攻击效率低等问题。In summary, although existing studies have proved that existing attack methods have certain transferability between neural networks with the same structure trained on different data, and between neural networks with different structures trained on the same task, such as [1] Goodfellow I J, Shlens J, Szegedy C, et al. Explaining and Harnessing Adversarial Examples [J]. International Conference on Learning Representations, 2015, Literature [2] Kurakin A, Goodfellow I J, Bengio S, et al. Adversarial examples in the physical world [J ].arXiv:Computer Vision and Pattern Recognition,2017, literature [3]Moosavidezfooli S,Fawzi A,Frossard P,et al.DeepFool:A Simple and Accurate Method to Fool Deep Neural Networks[J].Computer Vision and Pattern Recognition,2016: 2574-2582 and literature [4] Xiao C, Li B, Zhu J Y, et al. Generating Adversarial Examples with Adversarial Networks [J]. 2018; but there are still adversarial examples that rely too much on the target model, which leads to poor transferability of adversarial examples and successful attacks Low rate, low attack efficiency and other issues.

发明内容Contents of the invention

本发明提供一种基于生成式对抗网络的对抗样本生成方法，实现为不同输入样本自适应产生具备可迁移性和鲁棒性对抗样本。The present invention provides a method for generating adversarial samples based on a generative adversarial network, which realizes adaptive generation of adversarial samples with transferability and robustness for different input samples.

为解决上述技术问题，本发明的技术方案如下：In order to solve the problems of the technologies described above, the technical solution of the present invention is as follows:

一种基于生成式对抗网络的对抗样本生成方法，包括以下步骤：A method for generating an adversarial example based on a generative adversarial network, comprising the following steps:

S1：将原样本x输入生成器G中，所述生成器G输出扰动G(x)，生成器G的损失函数为L_G，扰动G(x)叠加到原样本x中得到对抗样本x′＝x+G(x)，与一般的GAN不同，生成器的目标是产生扰动而不是最终的图像，即输出图像等于输入图像加生成器G的输出图像，生成的对抗样本的细节和纹理是从输入图像中复制的，极大地保留了原图像的细节；S1: Input the original sample x into the generator G, the generator G outputs a disturbance G(x), the loss function of the generator G is L _G , and the disturbance G(x) is superimposed on the original sample x to obtain an adversarial sample x′ =x+G(x), unlike the general GAN, the goal of the generator is to generate disturbances rather than the final image, that is, the output image is equal to the input image plus the output image of the generator G, and the details and texture of the generated adversarial samples are copied from the input image, greatly preserving the details of the original image;

S2：将S1得到的对抗样本x′输入鉴别器D中，所述鉴别器D区分对抗样本x′和原样本x，得到鉴别器D的损失函数L_D；S2: Input the adversarial sample x' obtained by S1 into the discriminator D, and the discriminator D distinguishes the adversarial sample x' from the original sample x, and obtains the loss function L D of the discriminator _D ;

S3：将S1得到的对抗样本x′输入到增强模块ST中，所述增强模块ST基于空间变换，对对抗样本x′进行空间变换操作，增强模块ST输出经过仿射变换处理后的对抗样本x′_st＝T_θ(x+G(x))，式中T_θ为变换函数；S3: Input the adversarial sample x' obtained in S1 into the enhanced module ST, the enhanced module ST performs a spatial transformation operation on the adversarial sample x' based on the spatial transformation, and the enhanced module ST outputs the affine transformed adversarial sample x ′ _st =T _θ (x+G(x)), where T _θ is the transformation function;

S4：将经过仿射变换处理后的对抗样本x′_st输入目标分类模型F，得到目标分类模型F的损失函数L_F；S4: Input the adversarial sample x′ _st processed by the affine transformation into the target classification model F, and obtain the loss function L _{F of the target classification model F} ;

S5：根据生成器G的损失函数为L_G、鉴别器D的损失函数L_D和目标分类模型F的损失函数L_F构建目标函数L_GAN用于训练攻击模型GAN，得到训练好的生成器G；S5: According to the loss function of the generator G as L _G , the loss function L D of the discriminator _D and the loss function L _F of the target classification model F, the target function L _GAN is constructed to train the attack model GAN, and the trained generator G is obtained ;

S6：利用训练好的生成器G为不同输入样本产生自适应的对抗样本。S6: Use the trained generator G to generate adaptive adversarial samples for different input samples.

优选地，生成器G的损失函数使用L2范数作为距离度量损失，具体表示如下：Preferably, the loss function of the generator G uses the L2 norm as the distance metric loss, specifically expressed as follows:

L_G＝max(0,||G(x)||₂-c)L _G ＝max(0,||G(x)|| ₂ -c)

其中，c是自定义的常数，它允许用户指定添加的扰动量，能够生成各种各样的对抗样本，可以帮助更好地理解对抗样本的特征空间。该损失也可以稳定GAN的训练。Among them, c is a custom constant, which allows the user to specify the amount of perturbation added, and can generate various adversarial samples, which can help to better understand the feature space of adversarial samples. This loss can also stabilize the training of GAN.

优选地，所述鉴别器D为二元神经网络分类器。Preferably, the discriminator D is a binary neural network classifier.

优选地，所述鉴别器D的损失函数具体为：Preferably, the loss function of the discriminator D is specifically:

L_D＝logD(x)+log(1-D(x+G(x)))。L _D =logD(x)+log(1-D(x+G(x))).

优选地，目标分类模型F的损失函数在有目标的攻击中为：Preferably, the loss function of the target classification model F in a targeted attack is:

L_F＝L(F(T_θ(x+G(x))),y′)L _F = L(F(T _θ (x+G(x))),y′)

表示预测类与目标类y′之间的距离，式中，L是交叉熵损失函数；Indicates the distance between the predicted class and the target class y′, where L is the cross-entropy loss function;

目标分类模型F的损失函数在无目标的攻击中为：The loss function of the target classification model F in an untargeted attack is:

L_F＝-L(F(T_θ(x+G(x))),y)L _F ＝-L(F(T _θ (x+G(x))),y)

表示预测类与原始标签类y之间的负距离，式中，L是交叉熵损失函数。Indicates the negative distance between the predicted class and the original label class y, where L is the cross-entropy loss function.

优选地，目标函数L_GAN表示为：Preferably, the objective function L _GAN is expressed as:

L_GAN＝L_F+αL_D+β·L_G L _GAN ＝L _F +αL _D +β·L _G

式中，α和β是常数，用于控制各目标函数的相对重要性，L_G用于生成微小扰动，L_D用于鼓励生成的对抗样本显示类似于原样本，而L_F用于优化对抗样本，提高攻击成功率，通过最小化生成器损失函数最大化鉴别器损失函数argmin_gmax_dL_GAN求解得到生成器G和鉴别器D。In the formula, α and β are constants used to control the relative importance of each objective function, L _G is used to generate small perturbations, _L _D is used to encourage the generated adversarial samples to show similarity to the original samples, and LF is used to optimize the adversarial sample, improve the attack success rate, and obtain the generator G and discriminator D by minimizing the generator loss function and maximizing the discriminator loss function argmin _g max _d L _GAN solution.

优选地，步骤S5还包括对训练好的生成器G进行测试，具体包括以下步骤：Preferably, step S5 also includes testing the trained generator G, specifically including the following steps:

S5.1：利用训练好的生成器G生成扰动，从而生成测试对抗样本，将测试对抗样本输入不同结构的目标分类网络，使其分类错误；S5.1: Use the trained generator G to generate disturbances to generate test adversarial samples, and input the test adversarial samples into target classification networks with different structures to make them misclassified;

S5.2：对S5.1的测试对抗样本进行空间变换，产生新测试对抗样本，新测试对抗样本输入目标分类网络，使其分类错误。S5.2: Perform spatial transformation on the test adversarial samples in S5.1 to generate new test adversarial samples, and input the new test adversarial samples into the target classification network to make them classified incorrectly.

与现有技术相比，本发明技术方案的有益效果是：Compared with the prior art, the beneficial effects of the technical solution of the present invention are:

与已有攻击算法相比，本发明提出的方法不需要访问原始目标分类模型就可以有效地为不同输入样本产生攻击样本，查询与生成效率高，攻击模型泛化能力和鲁棒性强，能有效提高对抗样本的可迁移性与鲁棒性，进而提高黑盒攻击成功率。本发明方法的适用性较广，通用性较强，在不同类型的数据集与不同结构的模型上的攻击成功率都较高。Compared with the existing attack algorithm, the method proposed in the present invention can effectively generate attack samples for different input samples without accessing the original target classification model, the query and generation efficiency is high, the attack model has strong generalization ability and robustness, and can Effectively improve the transferability and robustness of adversarial samples, thereby increasing the success rate of black-box attacks. The method of the invention has wide applicability and strong versatility, and the attack success rate on different types of data sets and models with different structures is relatively high.

附图说明Description of drawings

图1为一种基于生成式对抗网络的对抗样本生成方法流程图。Figure 1 is a flow chart of a method for generating an adversarial example based on a generative adversarial network.

图2为一种基于生成式对抗网络的对抗样本生成方法模型示意图，图中虚线代表训练过程，实线代表测试过程。Figure 2 is a schematic diagram of a model of an adversarial sample generation method based on a generative adversarial network. The dotted line in the figure represents the training process, and the solid line represents the testing process.

图3为黑盒攻击鲁棒性实施流程图。Figure 3 is a flow chart of black-box attack robustness implementation.

具体实施方式Detailed ways

附图仅用于示例性说明，不能理解为对本专利的限制；The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;

为了更好说明本实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;

对于本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.

下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

实施例1Example 1

一种基于生成式对抗网络的对抗样本生成方法，如图1至2，包括以下步骤：A method for generating an adversarial example based on a generative adversarial network, as shown in Figures 1 to 2, comprising the following steps:

S1：将原样本x输入生成器G中，所述生成器G输出扰动G(x)，生成器G的损失函数为L_G，扰动G(x)叠加到原样本x中得到对抗样本x′＝x+G(x)，与一般的GAN不同，生成器的目标是产生扰动而不是最终的图像，即输出图像等于输入图像加生成器G的输出图像，生成的对抗样本的细节和纹理是从输入图像中复制的，极大地保留了原图像的细节，生成器G的损失函数使用L2范数作为距离度量损失，具体表示如下：S1: Input the original sample x into the generator G, the generator G outputs a disturbance G(x), the loss function of the generator G is L _G , and the disturbance G(x) is superimposed on the original sample x to obtain an adversarial sample x′ =x+G(x), unlike the general GAN, the goal of the generator is to generate disturbances rather than the final image, that is, the output image is equal to the input image plus the output image of the generator G, and the details and texture of the generated adversarial samples are Copied from the input image, the details of the original image are greatly preserved. The loss function of the generator G uses the L2 norm as the distance metric loss, which is expressed as follows:

L_G＝max(0,||G(x)||₂-c)L _G ＝max(0,||G(x)|| ₂ -c)

其中，c是自定义的常数；Among them, c is a custom constant;

S2：将S1得到的对抗样本x′输入鉴别器D中，所述鉴别器D区分对抗样本x′和原样本x，所述鉴别器D为二元神经网络分类器，得到鉴别器D的损失函数L_D＝logD(x)+log(1-D(x+G(x)))；S2: Input the adversarial sample x' obtained by S1 into the discriminator D, the discriminator D distinguishes the adversarial sample x' from the original sample x, the discriminator D is a binary neural network classifier, and the loss of the discriminator D is obtained Function L _D =logD(x)+log(1-D(x+G(x)));

S4：将经过仿射变换处理后的对抗样本x′_st输入目标分类模型F，得到目标分类模型F的损失函数L_F，目标分类模型F的损失函数在有目标的攻击中为：S4: Input the adversarial sample x′ _st processed by the affine transformation into the target classification model F, and obtain the loss function L _F of the target classification model F. The loss function of the target classification model F in a targeted attack is:

L_F＝L(F(T_θ(x+G(x))),y′)L _F = L(F(T _θ (x+G(x))),y′)

L_F＝-L(F(T_θ(x+G(x))),y)L _F ＝-L(F(T _θ (x+G(x))),y)

表示预测类与原始标签类y之间的负距离，式中，L是交叉熵损失函数；Indicates the negative distance between the predicted class and the original label class y, where L is the cross-entropy loss function;

S5：根据生成器G的损失函数为L_G、鉴别器D的损失函数L_D和目标分类模型F的损失函数L_F构建目标函数L_GAN用于训练攻击模型GAN，得到训练好的生成器G和鉴别器D，目标函数L_GAN表示为：S5: According to the loss function of the generator G as L _G , the loss function L D of the discriminator _D and the loss function L _F of the target classification model F, the target function L _GAN is constructed to train the attack model GAN, and the trained generator G is obtained and the discriminator D, the objective function L _GAN is expressed as:

L_GAN＝L_F+αL_D+β·L_G L _GAN ＝L _F +αL _D +β·L _G

式中，α和β是常数，用于控制各目标函数的相对重要性，L_G用于生成微小扰动，L_D用于鼓励生成的对抗样本显示类似于原样本，而L_F用于优化对抗样本，提高攻击成功率；In the formula, α and β are constants used to control the relative importance of each objective function, L _G is used to generate small perturbations, _L _D is used to encourage the generated adversarial samples to show similarity to the original samples, and LF is used to optimize the adversarial samples to increase the attack success rate;

还包括对训练好的生成器G进行测试，具体包括以下步骤：It also includes testing the trained generator G, which specifically includes the following steps:

在具体实施过程中，以黑盒攻击为例进行鲁棒性测试，具体流程如图3所示。In the specific implementation process, the black-box attack is taken as an example to carry out the robustness test, and the specific process is shown in Figure 3.

1)选择攻击目标F。利用CIFAR-10数据集训练ResNet-18、ResNet-34和VGG-16利用GTSRB数据集训练VGG-16和Multi-Scale CNN，得到两组共五个攻击目标模型F＝{F1,F2,F3,F4,F5}。其中，ResNet-34和VGG-16分别作为测试对抗样本的灰盒和黑盒模型。1) Select attack target F. Use the CIFAR-10 dataset to train ResNet-18, ResNet-34 and VGG-16 Use the GTSRB dataset to train VGG-16 and Multi-Scale CNN, and get two groups of five attack target models F={F1,F2,F3, F4,F5}. Among them, ResNet-34 and VGG-16 are respectively used as gray-box and black-box models for testing adversarial samples.

2)数据预处理。为了排除网络本身的性能所导致的分类错误的影响，将目标分类网络能正确分类的样本筛选出来，作为生成对抗样本的原样本。2) Data preprocessing. In order to exclude the influence of classification errors caused by the performance of the network itself, the samples that can be correctly classified by the target classification network are screened out as the original samples for generating adversarial samples.

3)生成对抗样本。根据图2的训练过程生成攻击模型，并利用其生成对抗样本。3) Generate adversarial examples. Generate an attack model according to the training process in Figure 2, and use it to generate adversarial samples.

4)测试对抗样本的有效性。若生成的对抗样本可以成功欺骗目标分类网络F使其分类错误，说明本实施例攻击方法有效。4) Test the effectiveness of adversarial examples. If the generated adversarial example can successfully deceive the target classification network F to make a wrong classification, it shows that the attack method of this embodiment is effective.

5)测试对抗样本的可迁移性。若生成的对抗样本能够同时欺骗不同结构的目标分类网络F1和F2，使其分类错误，则说明对抗样本迁移性强，反之，则说明对抗样本迁移性差。与FGSM、BIM、DeepFool和advGAN方法生成的对抗样本相比，提高攻击的成功率，则说明本实施例方法能有效提高对抗样本的迁移性。5) Test the transferability of adversarial examples. If the generated adversarial samples can deceive target classification networks F1 and F2 with different structures at the same time, making them misclassified, it means that the adversarial samples have strong transferability, otherwise, it means that the adversarial samples have poor transferability. Compared with the adversarial samples generated by the FGSM, BIM, DeepFool and advGAN methods, the success rate of the attack is improved, which shows that the method of this embodiment can effectively improve the transferability of the adversarial samples.

测试对抗样本的鲁棒性。对步骤3)生成的对抗样本进行空间变换，产生新对抗样本，仍然能成功欺骗步骤1)的目标分类网络F2，则说明生成的对抗样本鲁棒性强。与FGSM、BIM、DeepFool和advGAN方法生成的对抗样本相比，提高攻击的成功率，则说明本实施方法能有效提高对抗样本的鲁棒性。Test the robustness against adversarial examples. Space transformation is performed on the adversarial samples generated in step 3) to generate new adversarial samples, and the target classification network F2 in step 1) can still be deceived successfully, which means that the generated adversarial samples are robust. Compared with the adversarial samples generated by FGSM, BIM, DeepFool and advGAN methods, the success rate of the attack is improved, which shows that this implementation method can effectively improve the robustness of the adversarial samples.

CIFAR-10数据集实验结果如表1所示：The experimental results of the CIFAR-10 data set are shown in Table 1:

表1Table 1

GTSRB数据集实验结果如表2所示：The experimental results of the GTSRB dataset are shown in Table 2:

表2Table 2

相同或相似的标号对应相同或相似的部件；The same or similar reference numerals correspond to the same or similar components;

附图中描述位置关系的用语仅用于示例性说明，不能理解为对本专利的限制；The terms describing the positional relationship in the drawings are only for illustrative purposes and cannot be interpreted as limitations on this patent;

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, on the basis of the above description, other changes or changes in different forms can also be made. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.

Claims

1. a kind of confrontation sample generating method based on production confrontation network, which comprises the following steps:

S1: original sample x being inputted in generator G, the generator G output disturbance G (x), and the loss function of generator G is L_G, disturb Dynamic G (x), which is added in original sample x, to be obtained to resisting sample x '=x+G (x)；

S2: by S1 obtain in resisting sample x ' input discriminator D, the discriminator D is distinguished to resisting sample x ' and original sample x, Obtain the loss function L of discriminator D_D；

S3: resisting sample x ' is input in enhancing module ST by what S1 was obtained, the enhancing module ST is based on spatial alternation, to right Resisting sample x ' carry out spatial alternation operation, by affine transformation, treated to resisting sample x ' for enhancing module ST output_st=T_θ(x+ G (x)), T in formula_θFor transforming function transformation function；

S4: passing through affine transformation, treated to resisting sample x '_stObject-class model F is inputted, obtains object-class model F's Loss function L_F；

S5: being L according to the loss function of generator G_G, discriminator D loss function L_DWith the loss function of object-class model F L_FConstruct objective function L_GANFor training challenge model GAN, trained generator G is obtained；

S6: it is adaptive to resisting sample to be that different input samples generate using trained generator G.

2. the confrontation sample generating method according to claim 1 based on production confrontation network, which is characterized in that generate The loss function of device G uses L2 norm to lose as distance metric, is specifically expressed as follows:

L_G=max (0, | | G (x) | |₂-c)

Wherein, c is customized constant.

3. the confrontation sample generating method according to claim 2 based on production network, which is characterized in that the identification Device D is binary neural network classifier.

4. the confrontation sample generating method according to claim 3 based on production confrontation network, which is characterized in that described The loss function of discriminator D specifically:

L_D=log D (x)+log (1-D (x+G (x))).

5. the confrontation sample generating method according to claim 4 based on production confrontation network, which is characterized in that target The loss function of disaggregated model F is in the attack for having target are as follows:

L_F=L (F (T_θ(x+G(x))),y′)

Indicate prediction the distance between class and target class y ', in formula, L is cross entropy loss function；

The loss function of object-class model F is in aimless attack are as follows:

L_F=-L (F (T_θ(x+G(x))),y)

Indicate the negative distance between prediction class and original tag class y, in formula, L is cross entropy loss function.

6. the confrontation sample generating method according to claim 5 based on production confrontation network, which is characterized in that target Function L_GANIt indicates are as follows:

L_GAN=L_F+αL_D+β·L_G

In formula, α and β are constants, maximize discriminator loss function argmin by minimizing generator loss function_gmax_dL_GAN Solution obtains generator G and discriminator D.

7. the confrontation sample generating method according to claim 6 based on production confrontation network, which is characterized in that step S5 further include trained generator G is tested, specifically includes the following steps:

S5.1: generating disturbance using trained generator G, to generate test to resisting sample, test inputs resisting sample The target classification network of different structure, makes its classification error；

S5.2: carrying out spatial alternation to resisting sample to the test of S5.1, generates new test to resisting sample, new test is defeated to resisting sample Enter target classification network, makes its classification error.