CN111898645A

CN111898645A - A transferable adversarial example attack method based on attention mechanism

Info

Publication number: CN111898645A
Application number: CN202010630136.6A
Authority: CN
Inventors: 宋井宽; 黄梓杰; 高联丽
Original assignee: University of Electronic Science and Technology of China; Guizhou University
Current assignee: University of Electronic Science and Technology of China; Guizhou University
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2020-11-06

Abstract

The invention discloses a transferable adversarial sample attack method based on an attention mechanism. The method includes selecting a local substitute network model, and constructing a feature library to map the original image into the feature space; adopting an iterative fast gradient based on momentum accumulation The symbol attack method moves the features of the original image away from the original category area, and at the same time makes it close to the target category area; the adversarial samples obtained by the attack are input into the black-box classification model to mislead the model to output the target category. The invention uses the triple loss function to destroy the information-rich area in the feature space of the attacked model and the main concern of the model, so as to solve the problems of low white-box target attack success rate and black target existing in the existing attack methods in the classification task of complex data sets. The problem of low box target migration rate can effectively implement a misleading classification model while taking into account both white-box and black-box scenarios.

Description

A transferable adversarial example attack method based on attention mechanism

技术领域technical field

本发明属于对抗攻击技术领域，具体涉及一种基于注意力机制的可迁移的对抗样本攻击方法。The invention belongs to the technical field of adversarial attacks, and in particular relates to a transferable adversarial sample attack method based on an attention mechanism.

背景技术Background technique

随着深度学习的飞速发展，使得研究人员能够解决诸如图像分类、分割等很多计算机视觉任务。然而，由于对抗样本的出现，人们对于卷积神经网络的缺点投入了更加广泛的关注。对抗样本指的是通过在原始输入图片上加入一些人眼无法感知的、细微的扰动，使得卷积神经网络无法正确预测该图片。目前生成对抗样本的方法可通过攻击的目标或者期望分为非目标攻击和目标攻击，前者指的是攻击者的目标仅仅是使得分类模型给出错误预测即可，而后者是攻击方想要将预测结果改变为某些预先指定的目标标签。其次，通过攻击者对模型的了解程度可分为白盒攻击和黑盒攻击，在前者情况下攻击者拥有被攻击模型所有的信息，包括模型参数、结构等；而后者是攻击者无法获取模型的所有信息，仅仅能获取模型对应输入的预测结果。因此，对抗样本的迁移性成为了黑盒攻击的关键，迁移性指的是通过攻击某类模型生成的对抗样本可能能让其他模型也预测错误。With the rapid development of deep learning, researchers can solve many computer vision tasks such as image classification and segmentation. However, due to the advent of adversarial examples, more attention has been paid to the shortcomings of convolutional neural networks. Adversarial examples refer to the fact that the convolutional neural network cannot correctly predict the image by adding some subtle perturbations to the original input image that cannot be perceived by the human eye. The current method of generating adversarial samples can be divided into non-target attacks and target attacks according to the target or expectation of the attack. The prediction results are changed to some pre-specified target labels. Secondly, according to the attacker's understanding of the model, it can be divided into white box attack and black box attack. In the former case, the attacker has all the information of the attacked model, including model parameters, structure, etc.; in the latter case, the attacker cannot obtain the model. All the information, only the prediction results of the model corresponding to the input can be obtained. Therefore, the transferability of adversarial samples has become the key to black-box attacks. Transferability refers to the fact that adversarial samples generated by attacking a certain type of model may make other models predict incorrectly.

一般来说，对抗攻击通常是通过破坏分类模型的Softmax输出空间来生成对抗样本，由于这类方法的迁移性有限，后面越来越多研究提出了基于破坏模型特征空间的对抗攻击，然而这类方法在复杂数据集分类任务中要么存在着白盒目标攻击成功率低的问题，要么存在着黑盒目标迁移率低的问题。Generally speaking, adversarial attacks usually generate adversarial samples by destroying the Softmax output space of the classification model. Due to the limited mobility of such methods, more and more studies have proposed adversarial attacks based on destroying the feature space of the model. In the complex data set classification task, the method either has the problem of low success rate of white-box target attack or the problem of low transfer rate of black-box target.

发明内容SUMMARY OF THE INVENTION

针对现有技术中的上述不足，本发明提供了一种基于注意力机制的可迁移的对抗样本攻击方法，通过利用三元组损失函数(Triplet Loss)破坏被攻击模型特征空间中信息丰富的、模型主要关注的区域，解决在复杂数据集的分类任务中现有攻击方法存在的白盒目标攻击成功率低以及黑盒目标迁移率低的问题，在兼顾白盒与黑盒场景的情况下有效地实现误导分类模型。In view of the above deficiencies in the prior art, the present invention provides a transferable adversarial sample attack method based on an attention mechanism, which destroys the information-rich, informative and The main focus area of the model is to solve the problems of low white-box target attack success rate and black-box target migration rate in existing attack methods in the classification task of complex datasets, and it is effective when both white-box and black-box scenarios are considered. to implement misleading classification models.

为了达到上述发明目的，本发明采用的技术方案为：In order to achieve the above-mentioned purpose of the invention, the technical scheme adopted in the present invention is:

一种基于注意力机制的可迁移的对抗样本攻击方法，包括以下步骤：A transferable adversarial example attack method based on attention mechanism, including the following steps:

S1、选择一个本地替代网络模型，并构建特征库将原始图片映射到特征空间中；S1. Select a local alternative network model, and build a feature library to map the original image into the feature space;

S2、采用基于动量累积的迭代快速梯度符号攻击方法将原始图片的特征远离原始类别区域，同时使其靠近目标类别区域；S2. The iterative fast gradient sign attack method based on momentum accumulation is used to move the features of the original image away from the original category area, and at the same time make it close to the target category area;

S3、将步骤S2攻击得到的对抗样本输入到黑盒分类模型中，误导模型输出目标类别。S3. Input the adversarial samples obtained by the attack in step S2 into the black-box classification model, and mislead the model to output the target category.

进一步地，所述步骤S1中选择一个本地替代网络模型具体为：Further, selecting a local alternative network model in the step S1 is specifically:

选择一个用于图片分类的本地替代网络模型，选择分类网络的中间层作为浅层，选择分类网络的Softmax的前一层作为深层。Choose a local alternative network model for image classification, choose the middle layer of the classification network as the shallow layer, and choose the previous layer of the Softmax of the classification network as the deep layer.

进一步地，所述步骤S1中构建特征库将原始图片映射到特征空间中具体为：Further, building a feature library in the step S1 to map the original image into the feature space is specifically:

对本地替代网络模型的验证集中每个类别，分别在选择的分类网络的浅层和深层中计算所有被本地替代网络模型分类成功的图片的质心，构建不同层的特征库。For each category in the verification set of the local replacement network model, the centroids of all images successfully classified by the local replacement network model are calculated in the shallow and deep layers of the selected classification network, and feature libraries of different layers are constructed.

进一步地，所述计算所有被本地替代网络模型分类成功的图片的质心的计算公式为：Further, the calculation formula for calculating the centroid of all pictures successfully classified by the local substitute network model is:

其中，n为在j类别中所有被本地替代网络模型分类正确的图片数量，F_l为本地替代网络模型中间第l层，

为j类别中第i张图片，y^j为j类别的真实分类标签。Among them, n is the number of all pictures in the j category that are correctly classified by the local substitute network model, F _l is the lth layer in the middle of the local substitute network model,

is the i-th image in the j category, and y ^j is the real classification label of the j category.

进一步地，所述步骤S2具体包括以下分步骤：Further, the step S2 specifically includes the following sub-steps:

S21、对于每张原始图片，在第l层的特征库中选择一个与原始图片相同类别的质心作为负样本，随机选择一个与原始图片不同类别的质心作为正样本，并与原始图片的第l层的特征共同组成三元组损失函数；S21. For each original picture, select a centroid of the same category as the original picture in the feature library of the lth layer as a negative sample, randomly select a centroid of a different category from the original picture as a positive sample, and compare it with the lth centroid of the original picture. The features of the layers together form a triple loss function;

S22、根据三元组损失函数构建本地替代网络模型的总损失函数；S22, constructing a total loss function of the local alternative network model according to the triplet loss function;

S23、采用基于动量累积的迭代快速梯度符号攻击方法对原始图片的特征生成扰动。S23 , using an iterative fast gradient sign attack method based on momentum accumulation to generate disturbances to the features of the original image.

进一步地，所述步骤S22中本地替代网络模型的总损失函数具体为：Further, the total loss function of the local substitute network model in the step S22 is specifically:

L_total＝L_l+L_k L _total =L _l +L _k

L_l＝[D(f_l ^a,f_l ^p)-D(f_l ^a,f_l ⁿ)+θ_l]₊ L _l =[D(f _l ^a ,f _l ^p )-D(f _l ^a ,f _l ⁿ )+θ _l ] ₊

其中，L_total为总损失函数，L_l、L_k分别为第l层和第k层上的三元组损失函数，D函数为欧氏距离函数，f_l ^a、

分别为原始图片的第l层和第k层特征，f_l ⁿ、

分别为第l层和第k层特征库中负样本，f_l ^p、

分别为第l层和第k层特征库中正样本，θ_l、θ_k分别为原始图片的第l层和第k层的特征与正样本之间的距离和该特征与负样本之间的距离之间的最小间隔，+表示[]内的值大于零时取该值为损失值，小于零时损失值为零。Among them, L _total is the total loss function, L _l and L _k are the triple loss functions on the _lth layer and the kth layer, respectively, D function is the Euclidean distance function, f ^la ,

are the l-th layer and k-th layer features of the original image, respectively, f _l ⁿ ,

are the negative samples in the feature library of the lth layer and the kth layer, respectively, f _l ^p ,

are the positive samples in the l-th layer and the k-th layer feature library respectively, θ _l and θ _k are the distances between the features of the l-th layer and the k-th layer of the original image and the positive samples and the distance between the feature and the negative samples, respectively The minimum interval between, + means that when the value in [] is greater than zero, take this value as the loss value, and when it is less than zero, the loss value is zero.

进一步地，所述步骤S23具体包括以下分步骤：Further, the step S23 specifically includes the following sub-steps:

S231、对原始图片计算总损失函数的梯度；S231. Calculate the gradient of the total loss function for the original image;

S232、根据总损失函数的梯度计算累积的动量；S232. Calculate the accumulated momentum according to the gradient of the total loss function;

S233、利用得到的动量计算扰动并加到第t次迭代的对抗样本图片中生成第t+1次迭代的对抗样本图片；S233, using the obtained momentum to calculate the disturbance and add it to the adversarial sample image of the t-th iteration to generate the adversarial sample image of the t+1-th iteration;

S234、对原始图片进行T次迭代攻击后输出第T次迭代的对抗样本图片作为最终的对抗样本。S234: After performing T iterations of the attack on the original image, output the confrontation sample image of the T-th iteration as the final confrontation sample.

进一步地，所述步骤S233生成第t+1次迭代的对抗样本图片表示为：Further, the step S233 generates the confrontation sample picture of the t+1th iteration and is represented as:

x'_t+1＝x'_t-α·sign(g_t+1)x' _t+1 =x' _t -α·sign(g _t+1 )

其中，x'_t+1为第t+1次迭代的对抗样本图片，x'_t为第t次迭代的对抗样本图片，α为单次迭代的扰动步长，sign()为符号函数，括号内大于0时输出1，括号内小于0时输出-1，括号内等于0时输出0。Among them, x' _t+1 is the adversarial sample image of the t+1th iteration, x' _t is the adversarial sample image of the t-th iteration, α is the perturbation step size of a single iteration, sign() is the sign function, parentheses Output 1 when the value in the bracket is greater than 0, output -1 when the value in the bracket is less than 0, and output 0 when the value in the bracket is equal to 0.

进一步地，所述步骤S23还包括将第t+1次迭代的对抗样本图片中的每一个像素点裁剪到0到1之间，其计算公式为：Further, the step S23 also includes cropping each pixel in the adversarial sample picture of the t+1th iteration to be between 0 and 1, and the calculation formula is:

x”_t+1＝Clip(x'_t+1,0,1)x" _t+1 = Clip(x' _t+1 ,0,1)

其中，x”_t+1为裁剪后的对抗样本图片，Clip()为裁剪函数，将对抗样本图片中大于1的像素点裁剪为1。Among them, x” _t+1 is the cropped adversarial sample image, Clip() is the cropping function, and the pixels greater than 1 in the adversarial sample image are cropped to 1.

本发明具有以下有益效果：The present invention has the following beneficial effects:

(1)本发明使用三元组损失函数替换现有MI-FGSM方法中的交叉熵函数，以破坏特征空间中图片上丰富的、模型主要关注的区域，很好地平衡了白盒和黑盒场景下的目标攻击成功率；(1) The present invention uses the triple loss function to replace the cross-entropy function in the existing MI-FGSM method, so as to destroy the rich areas in the picture in the feature space and the main concern of the model, and balance the white box and the black box well The target attack success rate in the scenario;

(2)本发明结合了网络浅层和深层的特征同时进行攻击，有效地破坏了图片的全局粗糙信息和局部细节信息，以生成更加具有攻击性的对抗样本；(2) The present invention combines the features of the shallow and deep layers of the network to attack at the same time, effectively destroying the global rough information and local detail information of the picture, so as to generate more aggressive confrontation samples;

(3)本发明在解决复杂数据集的分类任务时，依然能通过三元组损失函数将原始图片的特征尽可能远离原始真实类别的区域，并且同时尽可能靠近目标类别的区域，改善最终白盒目标攻击成功率甚至是黑盒攻击成功率。(3) When the present invention solves the classification task of complex data sets, the feature of the original image can still be kept as far away as possible from the area of the original real category through the triple loss function, and at the same time as close to the area of the target category as possible, improving the final whiteness Box target attack success rate and even black box attack success rate.

附图说明Description of drawings

图1为本发明基于注意力机制的可迁移的对抗样本攻击方法流程图；Fig. 1 is the flow chart of the transferable adversarial sample attack method based on the attention mechanism of the present invention;

图2为本发明实施例中对抗样本攻击方法的流程框架图。FIG. 2 is a flow chart of a method for adversarial sample attack in an embodiment of the present invention.

具体实施方式Detailed ways

下面对本发明的具体实施方式进行描述，以便于本技术领域的技术人员理解本发明，但应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。The specific embodiments of the present invention are described below to facilitate those skilled in the art to understand the present invention, but it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, as long as various changes Such changes are obvious within the spirit and scope of the present invention as defined and determined by the appended claims, and all inventions and creations utilizing the inventive concept are within the scope of protection.

本发明实施例提供了一种基于注意力机制的可迁移的对抗样本攻击方法，通过在模型的特征空间中使用基于积累动量的迭代的快速梯度符号攻击方法来破坏当中信息丰富的、模型主要关注的区域，以生成迁移性强的、白盒目标攻击成功率高的对抗样本。The embodiment of the present invention provides a transferable adversarial attack method based on an attention mechanism, by using an iterative fast gradient sign attack method based on accumulated momentum in the feature space of the model to destroy the information-rich, model mainly concerned region to generate adversarial examples with strong migration and high success rate of white-box target attack.

本发明的研究对象是黑盒目标攻击，具体解决的技术场景为：The research object of the present invention is black box target attack, and the technical scenarios specifically solved are:

1)攻击者无法获取被攻击的分类模型的所有信息，仅仅能得到模型对应输入的预测输出；1) The attacker cannot obtain all the information of the attacked classification model, but can only obtain the predicted output corresponding to the input of the model;

2)攻击者的目标是误导模型预测结果为预先设定的类别，该类别与原始图片的真实类别不同。因此对于该场景，常规的攻击方法是依赖于对抗样本的迁移性，即攻击者选择一个本地替代网络模型进行攻击，通过将得到的对抗样本迁移到被攻击模型来达到攻击目的。2) The attacker's goal is to mislead the model into predicting a pre-specified class that is different from the true class of the original image. Therefore, for this scenario, the conventional attack method relies on the mobility of adversarial samples, that is, the attacker selects a local alternative network model to attack, and achieves the attack purpose by migrating the obtained adversarial samples to the attacked model.

如图1和图2所示，本发明的对抗样本攻击方法具体包括以下步骤S1至S3：As shown in FIG. 1 and FIG. 2 , the adversarial sample attack method of the present invention specifically includes the following steps S1 to S3:

在本实施例中，本发明首先建立一个用于图片分类的本地替代网络模型，可选择地使用神经网络模型如DenseNet[6]、ResNet[7]等。In this embodiment, the present invention first establishes a local substitute network model for image classification, optionally using a neural network model such as DenseNet [6], ResNet [7] and so on.

以神经网络模型DenseNet-121为例，分别选择神经网络第二个稠密块(DenseBlock)的输出以及分类层中Softmax的前一层作为被攻击的层。Taking the neural network model DenseNet-121 as an example, the output of the second dense block (DenseBlock) of the neural network and the previous layer of Softmax in the classification layer are selected as the attacked layer.

本发明再对本地替代网络模型的验证集中每个类别，分别在选择的分类网络的浅层和深层中计算本地替代网络模型分类成功的图片的质心，构建不同层的特征库。The invention calculates the centroid of pictures successfully classified by the local substitute network model in the shallow and deep layers of the selected classification network for each category in the verification set of the local substitute network model, and constructs feature libraries of different layers.

为了解决在复杂数据集上的分类任务，本发明将用ImageNet数据集作为解释说明。对于ImageNet验证集中每个类别，分别在选择的分类网络的浅层和深层中计算所有被替代模型分类成功的图片的质心c^j，计算公式为：In order to solve the classification task on complex datasets, the present invention will be explained with the ImageNet dataset. For each category in the ImageNet validation set, the centroids c ^j of all images successfully classified by the substituted model are calculated in the shallow and deep layers of the selected classification network, respectively. The calculation formula is:

为j类别中第i张图片，y^j为j类别的真实分类标签。Among them, n is the number of all pictures correctly classified by the local substitute network model in the j category, F _l is the lth layer in the middle of the local substitute network model,

is the i-th image in the j category, and y ^j is the true classification label of the j category.

通过上述方法分别构建出针对不同层的特征库。Through the above methods, feature libraries for different layers are constructed respectively.

本发明结合了网络浅层和深层的特征同时进行攻击，有效地破坏了图片的全局粗糙信息和局部细节信息以生成更加具有攻击性的对抗样本，这一方法也可以扩展到其他基于破坏特征空间的攻击方法中。The invention combines the features of the shallow and deep layers of the network to attack at the same time, effectively destroying the global rough information and local detail information of the picture to generate more aggressive confrontation samples, and this method can also be extended to other damage-based feature spaces. method of attack.

在本实施例中，本发明具体包括以下分步骤：In this embodiment, the present invention specifically includes the following steps:

S21、对于每张被攻击的原始图片，在第l层的特征库中选择一个与原始图片相同类别的质心作为负样本f_l ⁿ，随机选择一个与原始图片不同类别的质心作为正样本f_l ^p，并与原始图片的第l层的特征f_l ^a共同组成三元组损失函数<f_l ^a,f_l ^p,f_l ⁿ>；S21. For each attacked original image, select a centroid of the same category as the original image in the feature library of the l-th layer as a negative sample f _l ⁿ , and randomly select a centroid of a different category from the original image as a positive sample f _l ^p , and together with the feature f _la of the lth layer of the original image, form ^a triple loss function <f _l ^a ,f _l ^p ,f _l ⁿ >;

本发明分别在网络浅层和深层上使用三元组损失函数构建本地替代网络模型的总损失函数，具体为：The present invention constructs the total loss function of the local replacement network model by using the triple loss function on the shallow layer and the deep layer of the network respectively, specifically:

L_total＝L_l+L_k,L _total =L _l +L _k ,

L_l＝[D(f_l ^a,f_l ^p)-D(f_l ^a,f_l ⁿ)+θ]₊,L _l =[D(f _l ^a ,f _l ^p )-D(f _l ^a ,f _l ⁿ )+θ] ₊ ,

分别为原始图片的第l层和第k层特征，f_l ⁿ、

分别为第l层和第k层特征库中负样本，f_l ^p、

are the positive samples in the l-th layer and the k-th layer feature library, respectively, θ _l and θ _k are the distances between the features of the l-th layer and the k-th layer of the original image and the positive samples and the distance between the feature and the negative samples, respectively The minimum interval between, + means that when the value in [] is greater than zero, take this value as the loss value, and when it is less than zero, the loss value is zero.

S23、采用基于动量累积的迭代快速梯度符号攻击方法对原始图片的特征生成扰动，具体包括以下分步骤：S23, using an iterative fast gradient symbol attack method based on momentum accumulation to generate disturbances to the features of the original image, which specifically includes the following sub-steps:

S231、对被攻击的原始图片x计算总损失函数L_total的偏导数得到梯度

S231. Calculate the partial derivative of the total loss function L _total for the attacked original image x to obtain the gradient

S232、根据总损失函数的梯度计算累积的动量g_t+1，表示为：S232. Calculate the accumulated momentum g _t+1 according to the gradient of the total loss function, which is expressed as:

其中，g_t为第t次迭代过程中累计的动量；Among them, g _t is the momentum accumulated during the t-th iteration;

S233、利用得到的动量计算扰动并加到第t次迭代的对抗样本图片x'_t中生成第t+1次迭代的对抗样本图片x'_t+1，表示为：S233, using the obtained momentum to calculate the disturbance and add it to the adversarial sample image x' _t of the t-th iteration to generate the adversarial sample image x' _t+1 of the t+1-th iteration, which is expressed as:

x'_t+1＝x'_t-α·sign(g_t+1)x' _t+1 =x' _t -α·sign(g _t+1 )

其中，x'_t+1为第t+1次迭代的对抗样本图片，x'_t为第t次迭代的对抗样本图片，α为单次迭代的扰动步长，其计算方法为总扰动步长ε除以迭代次数，即

sign()为符号函数，括号内大于0时输出1，括号内小于0时输出-1，括号内等于0时输出0；Among them, x' _t+1 is the adversarial sample image of the t+1th iteration, x' _t is the adversarial sample image of the t-th iteration, α is the perturbation step size of a single iteration, and its calculation method is the total perturbation step size ε divided by the number of iterations, i.e.

sign() is a sign function, output 1 when the parentheses are greater than 0, output -1 when the parentheses are less than 0, and output 0 when the parentheses are equal to 0;

为了保持对抗样本图片与原始输入图片的分布一致，本发明将第t+1次迭代的对抗样本图片中的每一个像素点裁剪到0到1之间，其计算公式为：In order to keep the distribution of the adversarial sample image consistent with the original input image, the present invention clips each pixel in the adversarial sample image of the t+1th iteration to be between 0 and 1, and the calculation formula is:

x”_t+1＝Clip(x'_t+1,0,1)x" _t+1 = Clip(x' _t+1 ,0,1)

S234、将上述步骤视为一次攻击迭代，一共进行T次迭代，其中初始化第0次迭代的对抗样本x'₀为原始输入图片x，最后输出第T次迭代的对抗样本图片作为最终的对抗样本。S234. The above steps are regarded as one attack iteration, and T iterations are performed in total, wherein the adversarial sample x' ₀ of the 0th iteration is initialized as the original input image x, and finally the adversarial sample image of the Tth iteration is output as the final adversarial sample .

本发明使用三元组损失函数替换现有的MI-FGSM方法中的交叉熵函数，并通过在模型两个中间层中分别使用一个三元组损失函数来达到一个从粗至精的攻击过程，以破坏特征空间中图片上丰富的、模型主要关注的区域，很好地平衡了白盒和黑盒场景下的目标攻击成功率。The invention uses triple loss function to replace the cross entropy function in the existing MI-FGSM method, and achieves a coarse-to-fine attack process by using a triple loss function in the two middle layers of the model respectively, By destroying the rich areas of the image in the feature space and the main focus of the model, the target attack success rate in the white-box and black-box scenarios is well balanced.

本发明在解决复杂数据集的分类任务时，依然能通过三元组损失函数将原始图片的特征尽可能远离原始真实类别的区域，并且同时尽可能靠近目标类别的区域，改善最终白盒目标攻击成功率甚至是黑盒攻击成功率。由于方法简单且参数数量适中，使得本发明的使用快捷便利。When solving the classification task of complex data sets, the present invention can still use the triple loss function to keep the features of the original image as far away as possible from the area of the original real category, and at the same time as close to the area of the target category as possible, so as to improve the final white-box target attack. The success rate is even the black box attack success rate. Since the method is simple and the number of parameters is moderate, the use of the present invention is quick and convenient.

本领域的普通技术人员将会意识到，这里所述的实施例是为了帮助读者理解本发明的原理，应被理解为本发明的保护范围并不局限于这样的特别陈述和实施例。本领域的普通技术人员可以根据本发明公开的这些技术启示做出各种不脱离本发明实质的其它各种具体变形和组合，这些变形和组合仍然在本发明的保护范围内。Those of ordinary skill in the art will appreciate that the embodiments described herein are intended to assist readers in understanding the principles of the present invention, and it should be understood that the scope of protection of the present invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific modifications and combinations without departing from the essence of the present invention according to the technical teaching disclosed in the present invention, and these modifications and combinations still fall within the protection scope of the present invention.

Claims

1. A migratable sample attack resistance method based on an attention mechanism is characterized by comprising the following steps:

s1, selecting a local substitute network model, and constructing a feature library to map the original picture into a feature space;

s2, separating the characteristics of the original picture from the original category area by adopting an iterative fast gradient sign attack method based on momentum accumulation, and simultaneously enabling the characteristics to be close to the target category area;

and S3, inputting the confrontation sample obtained by the attack in the step S2 into the black box classification model, and misleading the model to output the target class.

2. The method of claim 1, wherein the step S1 of selecting a local surrogate network model is specifically:

selecting a local substitute network model for picture classification, selecting a middle layer of a classification network as a shallow layer, and selecting a previous layer of Softmax of the classification network as a deep layer.

3. The method for mobilizable countersample attack based on attention mechanism of claim 2, wherein the step S1 of constructing the feature library maps the original picture into the feature space specifically as follows:

and for each category in the verification set of the local substitution network model, calculating the centroids of all pictures successfully classified by the local substitution network model in the shallow layer and the deep layer of the selected classification network respectively, and constructing feature libraries of different layers.

4. The method for mobilizable countersample attack based on attention mechanism of claim 3, wherein the calculation formula of the centroid of all the pictures successfully classified by the local substitution network model is as follows:

wherein n is the number of pictures correctly classified by the local substitution network model in the j category, F_lTo locally replace the middle l-th layer of the network model,

is the ith picture in the j category, y^jThe true class label for the j category.

5. The method for mobilizable countersample attack based on attention mechanism of claim 1, wherein the step S2 comprises the following sub-steps:

s21, for each original picture, selecting a mass center in the same category as the original picture from the feature library of the l layer as a negative sample, randomly selecting a mass center in different category from the original picture as a positive sample, and forming a triple loss function together with the features of the l layer of the original picture;

s22, constructing a total loss function of the local substitute network model according to the triple loss function;

and S23, generating disturbance on the characteristics of the original picture by adopting an iterative fast gradient sign attack method based on momentum accumulation.

6. The method for mobilizable countersample attack based on attention mechanism of claim 5, wherein the total loss function of the local surrogate network model in step S22 is specifically:

L_total＝L_l+L_k

L_l＝[D(f_l ^a,f_l ^p)-D(f_l ^a,f_l ⁿ)+θ_l]₊

wherein L is_totalAs a function of total loss, L_l、L_kTriple loss functions on the l-th layer and the k-th layer respectively, D function is Euclidean distance function, f_l ^a、

Features of the l-th and k-th layers, respectively, of the original picture, f_l ⁿ、

Characterised by the l-th and k-th layers, respectivelyNegative sample in the library, f_l ^p、

Positive samples, θ, in the ith and kth layer feature libraries, respectively_l、θ_kThe minimum interval between the distance between the feature and the positive sample and the distance between the feature and the negative sample of the l-th and k-th layers of the original picture, respectively, + represents [, ]]When the value in the internal is larger than zero, the value is taken as a loss value, and when the value is smaller than zero, the loss value is zero.

7. The method for mobilizable countersample attack based on attention mechanism of claim 5, wherein the step S23 comprises the following sub-steps:

s231, calculating the gradient of a total loss function for the original picture;

s232, calculating accumulated momentum according to the gradient of the total loss function;

s233, calculating disturbance by using the obtained momentum, and adding the disturbance to the confrontation sample picture of the t iteration to generate the confrontation sample picture of the t +1 iteration;

and S234, performing T iterative attacks on the original picture, and outputting a countercheck sample picture of the T iteration as a final countercheck sample.

8. The method of claim 7, wherein the step S233 generates the confrontation sample picture of the t +1 th iteration as:

x′_t+1＝x′_t-α·sign(g_t+1)

wherein, x'_t+1Confrontation sample picture, x 'for t +1 iteration'_tFor the countercheck sample picture of the t-th iteration, alpha is the disturbance step length of the single iteration, sign () is a sign function, 1 is output when the parenthesis is greater than 0, minus 1 is output when the parenthesis is less than 0, and 0 is output when the parenthesis is equal to 0.

9. The method of claim 8, wherein the step S23 further comprises clipping each pixel point in the confrontation sample picture of the t +1 th iteration to 0 to 1, and the calculation formula is:

x″_t+1＝Clip(x′_t+1,0,1)

wherein, x ″)_t+1For the cut confrontation sample picture, Clip () is a cutting function, and the pixel points larger than 1 in the confrontation sample picture are cut to be 1.