CN111898645A - A transferable adversarial example attack method based on attention mechanism - Google Patents
A transferable adversarial example attack method based on attention mechanism Download PDFInfo
- Publication number
- CN111898645A CN111898645A CN202010630136.6A CN202010630136A CN111898645A CN 111898645 A CN111898645 A CN 111898645A CN 202010630136 A CN202010630136 A CN 202010630136A CN 111898645 A CN111898645 A CN 111898645A
- Authority
- CN
- China
- Prior art keywords
- attack
- picture
- layer
- sample
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000007246 mechanism Effects 0.000 title claims abstract description 13
- 238000013145 classification model Methods 0.000 claims abstract description 8
- 238000009825 accumulation Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 44
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims 4
- 235000000332 black box Nutrition 0.000 abstract description 9
- 238000013508 migration Methods 0.000 abstract description 3
- 230000005012 migration Effects 0.000 abstract description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种基于注意力机制的可迁移的对抗样本攻击方法,该方法包括选择一个本地替代网络模型,并构建特征库将原始图片映射到特征空间中;采用基于动量累积的迭代快速梯度符号攻击方法将原始图片的特征远离原始类别区域,同时使其靠近目标类别区域;将攻击得到的对抗样本输入到黑盒分类模型中,误导模型输出目标类别。本发明通过利用三元组损失函数破坏被攻击模型特征空间中信息丰富的、模型主要关注的区域,解决在复杂数据集的分类任务中现有攻击方法存在的白盒目标攻击成功率低以及黑盒目标迁移率低的问题,在兼顾白盒与黑盒场景的情况下有效地实现误导分类模型。
The invention discloses a transferable adversarial sample attack method based on an attention mechanism. The method includes selecting a local substitute network model, and constructing a feature library to map the original image into the feature space; adopting an iterative fast gradient based on momentum accumulation The symbol attack method moves the features of the original image away from the original category area, and at the same time makes it close to the target category area; the adversarial samples obtained by the attack are input into the black-box classification model to mislead the model to output the target category. The invention uses the triple loss function to destroy the information-rich area in the feature space of the attacked model and the main concern of the model, so as to solve the problems of low white-box target attack success rate and black target existing in the existing attack methods in the classification task of complex data sets. The problem of low box target migration rate can effectively implement a misleading classification model while taking into account both white-box and black-box scenarios.
Description
技术领域technical field
本发明属于对抗攻击技术领域,具体涉及一种基于注意力机制的可迁移的对抗样本攻击方法。The invention belongs to the technical field of adversarial attacks, and in particular relates to a transferable adversarial sample attack method based on an attention mechanism.
背景技术Background technique
随着深度学习的飞速发展,使得研究人员能够解决诸如图像分类、分割等很多计算机视觉任务。然而,由于对抗样本的出现,人们对于卷积神经网络的缺点投入了更加广泛的关注。对抗样本指的是通过在原始输入图片上加入一些人眼无法感知的、细微的扰动,使得卷积神经网络无法正确预测该图片。目前生成对抗样本的方法可通过攻击的目标或者期望分为非目标攻击和目标攻击,前者指的是攻击者的目标仅仅是使得分类模型给出错误预测即可,而后者是攻击方想要将预测结果改变为某些预先指定的目标标签。其次,通过攻击者对模型的了解程度可分为白盒攻击和黑盒攻击,在前者情况下攻击者拥有被攻击模型所有的信息,包括模型参数、结构等;而后者是攻击者无法获取模型的所有信息,仅仅能获取模型对应输入的预测结果。因此,对抗样本的迁移性成为了黑盒攻击的关键,迁移性指的是通过攻击某类模型生成的对抗样本可能能让其他模型也预测错误。With the rapid development of deep learning, researchers can solve many computer vision tasks such as image classification and segmentation. However, due to the advent of adversarial examples, more attention has been paid to the shortcomings of convolutional neural networks. Adversarial examples refer to the fact that the convolutional neural network cannot correctly predict the image by adding some subtle perturbations to the original input image that cannot be perceived by the human eye. The current method of generating adversarial samples can be divided into non-target attacks and target attacks according to the target or expectation of the attack. The prediction results are changed to some pre-specified target labels. Secondly, according to the attacker's understanding of the model, it can be divided into white box attack and black box attack. In the former case, the attacker has all the information of the attacked model, including model parameters, structure, etc.; in the latter case, the attacker cannot obtain the model. All the information, only the prediction results of the model corresponding to the input can be obtained. Therefore, the transferability of adversarial samples has become the key to black-box attacks. Transferability refers to the fact that adversarial samples generated by attacking a certain type of model may make other models predict incorrectly.
一般来说,对抗攻击通常是通过破坏分类模型的Softmax输出空间来生成对抗样本,由于这类方法的迁移性有限,后面越来越多研究提出了基于破坏模型特征空间的对抗攻击,然而这类方法在复杂数据集分类任务中要么存在着白盒目标攻击成功率低的问题,要么存在着黑盒目标迁移率低的问题。Generally speaking, adversarial attacks usually generate adversarial samples by destroying the Softmax output space of the classification model. Due to the limited mobility of such methods, more and more studies have proposed adversarial attacks based on destroying the feature space of the model. In the complex data set classification task, the method either has the problem of low success rate of white-box target attack or the problem of low transfer rate of black-box target.
发明内容SUMMARY OF THE INVENTION
针对现有技术中的上述不足,本发明提供了一种基于注意力机制的可迁移的对抗样本攻击方法,通过利用三元组损失函数(Triplet Loss)破坏被攻击模型特征空间中信息丰富的、模型主要关注的区域,解决在复杂数据集的分类任务中现有攻击方法存在的白盒目标攻击成功率低以及黑盒目标迁移率低的问题,在兼顾白盒与黑盒场景的情况下有效地实现误导分类模型。In view of the above deficiencies in the prior art, the present invention provides a transferable adversarial sample attack method based on an attention mechanism, which destroys the information-rich, informative and The main focus area of the model is to solve the problems of low white-box target attack success rate and black-box target migration rate in existing attack methods in the classification task of complex datasets, and it is effective when both white-box and black-box scenarios are considered. to implement misleading classification models.
为了达到上述发明目的,本发明采用的技术方案为:In order to achieve the above-mentioned purpose of the invention, the technical scheme adopted in the present invention is:
一种基于注意力机制的可迁移的对抗样本攻击方法,包括以下步骤:A transferable adversarial example attack method based on attention mechanism, including the following steps:
S1、选择一个本地替代网络模型,并构建特征库将原始图片映射到特征空间中;S1. Select a local alternative network model, and build a feature library to map the original image into the feature space;
S2、采用基于动量累积的迭代快速梯度符号攻击方法将原始图片的特征远离原始类别区域,同时使其靠近目标类别区域;S2. The iterative fast gradient sign attack method based on momentum accumulation is used to move the features of the original image away from the original category area, and at the same time make it close to the target category area;
S3、将步骤S2攻击得到的对抗样本输入到黑盒分类模型中,误导模型输出目标类别。S3. Input the adversarial samples obtained by the attack in step S2 into the black-box classification model, and mislead the model to output the target category.
进一步地,所述步骤S1中选择一个本地替代网络模型具体为:Further, selecting a local alternative network model in the step S1 is specifically:
选择一个用于图片分类的本地替代网络模型,选择分类网络的中间层作为浅层,选择分类网络的Softmax的前一层作为深层。Choose a local alternative network model for image classification, choose the middle layer of the classification network as the shallow layer, and choose the previous layer of the Softmax of the classification network as the deep layer.
进一步地,所述步骤S1中构建特征库将原始图片映射到特征空间中具体为:Further, building a feature library in the step S1 to map the original image into the feature space is specifically:
对本地替代网络模型的验证集中每个类别,分别在选择的分类网络的浅层和深层中计算所有被本地替代网络模型分类成功的图片的质心,构建不同层的特征库。For each category in the verification set of the local replacement network model, the centroids of all images successfully classified by the local replacement network model are calculated in the shallow and deep layers of the selected classification network, and feature libraries of different layers are constructed.
进一步地,所述计算所有被本地替代网络模型分类成功的图片的质心的计算公式为:Further, the calculation formula for calculating the centroid of all pictures successfully classified by the local substitute network model is:
其中,n为在j类别中所有被本地替代网络模型分类正确的图片数量,Fl为本地替代网络模型中间第l层,为j类别中第i张图片,yj为j类别的真实分类标签。Among them, n is the number of all pictures in the j category that are correctly classified by the local substitute network model, F l is the lth layer in the middle of the local substitute network model, is the i-th image in the j category, and y j is the real classification label of the j category.
进一步地,所述步骤S2具体包括以下分步骤:Further, the step S2 specifically includes the following sub-steps:
S21、对于每张原始图片,在第l层的特征库中选择一个与原始图片相同类别的质心作为负样本,随机选择一个与原始图片不同类别的质心作为正样本,并与原始图片的第l层的特征共同组成三元组损失函数;S21. For each original picture, select a centroid of the same category as the original picture in the feature library of the lth layer as a negative sample, randomly select a centroid of a different category from the original picture as a positive sample, and compare it with the lth centroid of the original picture. The features of the layers together form a triple loss function;
S22、根据三元组损失函数构建本地替代网络模型的总损失函数;S22, constructing a total loss function of the local alternative network model according to the triplet loss function;
S23、采用基于动量累积的迭代快速梯度符号攻击方法对原始图片的特征生成扰动。S23 , using an iterative fast gradient sign attack method based on momentum accumulation to generate disturbances to the features of the original image.
进一步地,所述步骤S22中本地替代网络模型的总损失函数具体为:Further, the total loss function of the local substitute network model in the step S22 is specifically:
Ltotal=Ll+Lk L total =L l +L k
Ll=[D(fl a,fl p)-D(fl a,fl n)+θl]+ L l =[D(f l a ,f l p )-D(f l a ,f l n )+θ l ] +
其中,Ltotal为总损失函数,Ll、Lk分别为第l层和第k层上的三元组损失函数,D函数为欧氏距离函数,fl a、分别为原始图片的第l层和第k层特征,fl n、分别为第l层和第k层特征库中负样本,fl p、分别为第l层和第k层特征库中正样本,θl、θk分别为原始图片的第l层和第k层的特征与正样本之间的距离和该特征与负样本之间的距离之间的最小间隔,+表示[]内的值大于零时取该值为损失值,小于零时损失值为零。Among them, L total is the total loss function, L l and L k are the triple loss functions on the lth layer and the kth layer, respectively, D function is the Euclidean distance function, f la , are the l-th layer and k-th layer features of the original image, respectively, f l n , are the negative samples in the feature library of the lth layer and the kth layer, respectively, f l p , are the positive samples in the l-th layer and the k-th layer feature library respectively, θ l and θ k are the distances between the features of the l-th layer and the k-th layer of the original image and the positive samples and the distance between the feature and the negative samples, respectively The minimum interval between, + means that when the value in [] is greater than zero, take this value as the loss value, and when it is less than zero, the loss value is zero.
进一步地,所述步骤S23具体包括以下分步骤:Further, the step S23 specifically includes the following sub-steps:
S231、对原始图片计算总损失函数的梯度;S231. Calculate the gradient of the total loss function for the original image;
S232、根据总损失函数的梯度计算累积的动量;S232. Calculate the accumulated momentum according to the gradient of the total loss function;
S233、利用得到的动量计算扰动并加到第t次迭代的对抗样本图片中生成第t+1次迭代的对抗样本图片;S233, using the obtained momentum to calculate the disturbance and add it to the adversarial sample image of the t-th iteration to generate the adversarial sample image of the t+1-th iteration;
S234、对原始图片进行T次迭代攻击后输出第T次迭代的对抗样本图片作为最终的对抗样本。S234: After performing T iterations of the attack on the original image, output the confrontation sample image of the T-th iteration as the final confrontation sample.
进一步地,所述步骤S233生成第t+1次迭代的对抗样本图片表示为:Further, the step S233 generates the confrontation sample picture of the t+1th iteration and is represented as:
x't+1=x't-α·sign(gt+1)x' t+1 =x' t -α·sign(g t+1 )
其中,x't+1为第t+1次迭代的对抗样本图片,x't为第t次迭代的对抗样本图片,α为单次迭代的扰动步长,sign()为符号函数,括号内大于0时输出1,括号内小于0时输出-1,括号内等于0时输出0。Among them, x' t+1 is the adversarial sample image of the t+1th iteration, x' t is the adversarial sample image of the t-th iteration, α is the perturbation step size of a single iteration, sign() is the sign function, parentheses Output 1 when the value in the bracket is greater than 0, output -1 when the value in the bracket is less than 0, and output 0 when the value in the bracket is equal to 0.
进一步地,所述步骤S23还包括将第t+1次迭代的对抗样本图片中的每一个像素点裁剪到0到1之间,其计算公式为:Further, the step S23 also includes cropping each pixel in the adversarial sample picture of the t+1th iteration to be between 0 and 1, and the calculation formula is:
x”t+1=Clip(x't+1,0,1)x" t+1 = Clip(x' t+1 ,0,1)
其中,x”t+1为裁剪后的对抗样本图片,Clip()为裁剪函数,将对抗样本图片中大于1的像素点裁剪为1。Among them, x” t+1 is the cropped adversarial sample image, Clip() is the cropping function, and the pixels greater than 1 in the adversarial sample image are cropped to 1.
本发明具有以下有益效果:The present invention has the following beneficial effects:
(1)本发明使用三元组损失函数替换现有MI-FGSM方法中的交叉熵函数,以破坏特征空间中图片上丰富的、模型主要关注的区域,很好地平衡了白盒和黑盒场景下的目标攻击成功率;(1) The present invention uses the triple loss function to replace the cross-entropy function in the existing MI-FGSM method, so as to destroy the rich areas in the picture in the feature space and the main concern of the model, and balance the white box and the black box well The target attack success rate in the scenario;
(2)本发明结合了网络浅层和深层的特征同时进行攻击,有效地破坏了图片的全局粗糙信息和局部细节信息,以生成更加具有攻击性的对抗样本;(2) The present invention combines the features of the shallow and deep layers of the network to attack at the same time, effectively destroying the global rough information and local detail information of the picture, so as to generate more aggressive confrontation samples;
(3)本发明在解决复杂数据集的分类任务时,依然能通过三元组损失函数将原始图片的特征尽可能远离原始真实类别的区域,并且同时尽可能靠近目标类别的区域,改善最终白盒目标攻击成功率甚至是黑盒攻击成功率。(3) When the present invention solves the classification task of complex data sets, the feature of the original image can still be kept as far away as possible from the area of the original real category through the triple loss function, and at the same time as close to the area of the target category as possible, improving the final whiteness Box target attack success rate and even black box attack success rate.
附图说明Description of drawings
图1为本发明基于注意力机制的可迁移的对抗样本攻击方法流程图;Fig. 1 is the flow chart of the transferable adversarial sample attack method based on the attention mechanism of the present invention;
图2为本发明实施例中对抗样本攻击方法的流程框架图。FIG. 2 is a flow chart of a method for adversarial sample attack in an embodiment of the present invention.
具体实施方式Detailed ways
下面对本发明的具体实施方式进行描述,以便于本技术领域的技术人员理解本发明,但应该清楚,本发明不限于具体实施方式的范围,对本技术领域的普通技术人员来讲,只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内,这些变化是显而易见的,一切利用本发明构思的发明创造均在保护之列。The specific embodiments of the present invention are described below to facilitate those skilled in the art to understand the present invention, but it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, as long as various changes Such changes are obvious within the spirit and scope of the present invention as defined and determined by the appended claims, and all inventions and creations utilizing the inventive concept are within the scope of protection.
本发明实施例提供了一种基于注意力机制的可迁移的对抗样本攻击方法,通过在模型的特征空间中使用基于积累动量的迭代的快速梯度符号攻击方法来破坏当中信息丰富的、模型主要关注的区域,以生成迁移性强的、白盒目标攻击成功率高的对抗样本。The embodiment of the present invention provides a transferable adversarial attack method based on an attention mechanism, by using an iterative fast gradient sign attack method based on accumulated momentum in the feature space of the model to destroy the information-rich, model mainly concerned region to generate adversarial examples with strong migration and high success rate of white-box target attack.
本发明的研究对象是黑盒目标攻击,具体解决的技术场景为:The research object of the present invention is black box target attack, and the technical scenarios specifically solved are:
1)攻击者无法获取被攻击的分类模型的所有信息,仅仅能得到模型对应输入的预测输出;1) The attacker cannot obtain all the information of the attacked classification model, but can only obtain the predicted output corresponding to the input of the model;
2)攻击者的目标是误导模型预测结果为预先设定的类别,该类别与原始图片的真实类别不同。因此对于该场景,常规的攻击方法是依赖于对抗样本的迁移性,即攻击者选择一个本地替代网络模型进行攻击,通过将得到的对抗样本迁移到被攻击模型来达到攻击目的。2) The attacker's goal is to mislead the model into predicting a pre-specified class that is different from the true class of the original image. Therefore, for this scenario, the conventional attack method relies on the mobility of adversarial samples, that is, the attacker selects a local alternative network model to attack, and achieves the attack purpose by migrating the obtained adversarial samples to the attacked model.
如图1和图2所示,本发明的对抗样本攻击方法具体包括以下步骤S1至S3:As shown in FIG. 1 and FIG. 2 , the adversarial sample attack method of the present invention specifically includes the following steps S1 to S3:
S1、选择一个本地替代网络模型,并构建特征库将原始图片映射到特征空间中;S1. Select a local alternative network model, and build a feature library to map the original image into the feature space;
在本实施例中,本发明首先建立一个用于图片分类的本地替代网络模型,可选择地使用神经网络模型如DenseNet[6]、ResNet[7]等。In this embodiment, the present invention first establishes a local substitute network model for image classification, optionally using a neural network model such as DenseNet [6], ResNet [7] and so on.
以神经网络模型DenseNet-121为例,分别选择神经网络第二个稠密块(DenseBlock)的输出以及分类层中Softmax的前一层作为被攻击的层。Taking the neural network model DenseNet-121 as an example, the output of the second dense block (DenseBlock) of the neural network and the previous layer of Softmax in the classification layer are selected as the attacked layer.
本发明再对本地替代网络模型的验证集中每个类别,分别在选择的分类网络的浅层和深层中计算本地替代网络模型分类成功的图片的质心,构建不同层的特征库。The invention calculates the centroid of pictures successfully classified by the local substitute network model in the shallow and deep layers of the selected classification network for each category in the verification set of the local substitute network model, and constructs feature libraries of different layers.
为了解决在复杂数据集上的分类任务,本发明将用ImageNet数据集作为解释说明。对于ImageNet验证集中每个类别,分别在选择的分类网络的浅层和深层中计算所有被替代模型分类成功的图片的质心cj,计算公式为:In order to solve the classification task on complex datasets, the present invention will be explained with the ImageNet dataset. For each category in the ImageNet validation set, the centroids c j of all images successfully classified by the substituted model are calculated in the shallow and deep layers of the selected classification network, respectively. The calculation formula is:
其中,n为在j类别中所有被本地替代网络模型分类正确的图片数量,Fl为本地替代网络模型中间第l层,为j类别中第i张图片,yj为j类别的真实分类标签。Among them, n is the number of all pictures correctly classified by the local substitute network model in the j category, F l is the lth layer in the middle of the local substitute network model, is the i-th image in the j category, and y j is the true classification label of the j category.
通过上述方法分别构建出针对不同层的特征库。Through the above methods, feature libraries for different layers are constructed respectively.
本发明结合了网络浅层和深层的特征同时进行攻击,有效地破坏了图片的全局粗糙信息和局部细节信息以生成更加具有攻击性的对抗样本,这一方法也可以扩展到其他基于破坏特征空间的攻击方法中。The invention combines the features of the shallow and deep layers of the network to attack at the same time, effectively destroying the global rough information and local detail information of the picture to generate more aggressive confrontation samples, and this method can also be extended to other damage-based feature spaces. method of attack.
S2、采用基于动量累积的迭代快速梯度符号攻击方法将原始图片的特征远离原始类别区域,同时使其靠近目标类别区域;S2. The iterative fast gradient sign attack method based on momentum accumulation is used to move the features of the original image away from the original category area, and at the same time make it close to the target category area;
在本实施例中,本发明具体包括以下分步骤:In this embodiment, the present invention specifically includes the following steps:
S21、对于每张被攻击的原始图片,在第l层的特征库中选择一个与原始图片相同类别的质心作为负样本fl n,随机选择一个与原始图片不同类别的质心作为正样本fl p,并与原始图片的第l层的特征fl a共同组成三元组损失函数<fl a,fl p,fl n>;S21. For each attacked original image, select a centroid of the same category as the original image in the feature library of the l-th layer as a negative sample f l n , and randomly select a centroid of a different category from the original image as a positive sample f l p , and together with the feature f la of the lth layer of the original image, form a triple loss function <f l a ,f l p ,f l n >;
S22、根据三元组损失函数构建本地替代网络模型的总损失函数;S22, constructing a total loss function of the local alternative network model according to the triplet loss function;
本发明分别在网络浅层和深层上使用三元组损失函数构建本地替代网络模型的总损失函数,具体为:The present invention constructs the total loss function of the local replacement network model by using the triple loss function on the shallow layer and the deep layer of the network respectively, specifically:
Ltotal=Ll+Lk,L total =L l +L k ,
Ll=[D(fl a,fl p)-D(fl a,fl n)+θ]+,L l =[D(f l a ,f l p )-D(f l a ,f l n )+θ] + ,
其中,Ltotal为总损失函数,Ll、Lk分别为第l层和第k层上的三元组损失函数,D函数为欧氏距离函数,fl a、分别为原始图片的第l层和第k层特征,fl n、分别为第l层和第k层特征库中负样本,fl p、分别为第l层和第k层特征库中正样本,θl、θk分别为原始图片的第l层和第k层的特征与正样本之间的距离和该特征与负样本之间的距离之间的最小间隔,+表示[]内的值大于零时取该值为损失值,小于零时损失值为零。Among them, L total is the total loss function, L l and L k are the triple loss functions on the lth layer and the kth layer, respectively, D function is the Euclidean distance function, f la , are the l-th layer and k-th layer features of the original image, respectively, f l n , are the negative samples in the feature library of the lth layer and the kth layer, respectively, f l p , are the positive samples in the l-th layer and the k-th layer feature library, respectively, θ l and θ k are the distances between the features of the l-th layer and the k-th layer of the original image and the positive samples and the distance between the feature and the negative samples, respectively The minimum interval between, + means that when the value in [] is greater than zero, take this value as the loss value, and when it is less than zero, the loss value is zero.
S23、采用基于动量累积的迭代快速梯度符号攻击方法对原始图片的特征生成扰动,具体包括以下分步骤:S23, using an iterative fast gradient symbol attack method based on momentum accumulation to generate disturbances to the features of the original image, which specifically includes the following sub-steps:
S231、对被攻击的原始图片x计算总损失函数Ltotal的偏导数得到梯度 S231. Calculate the partial derivative of the total loss function L total for the attacked original image x to obtain the gradient
S232、根据总损失函数的梯度计算累积的动量gt+1,表示为:S232. Calculate the accumulated momentum g t+1 according to the gradient of the total loss function, which is expressed as:
其中,gt为第t次迭代过程中累计的动量;Among them, g t is the momentum accumulated during the t-th iteration;
S233、利用得到的动量计算扰动并加到第t次迭代的对抗样本图片x't中生成第t+1次迭代的对抗样本图片x't+1,表示为:S233, using the obtained momentum to calculate the disturbance and add it to the adversarial sample image x' t of the t-th iteration to generate the adversarial sample image x' t+1 of the t+1-th iteration, which is expressed as:
x't+1=x't-α·sign(gt+1)x' t+1 =x' t -α·sign(g t+1 )
其中,x't+1为第t+1次迭代的对抗样本图片,x't为第t次迭代的对抗样本图片,α为单次迭代的扰动步长,其计算方法为总扰动步长ε除以迭代次数,即sign()为符号函数,括号内大于0时输出1,括号内小于0时输出-1,括号内等于0时输出0;Among them, x' t+1 is the adversarial sample image of the t+1th iteration, x' t is the adversarial sample image of the t-th iteration, α is the perturbation step size of a single iteration, and its calculation method is the total perturbation step size ε divided by the number of iterations, i.e. sign() is a sign function, output 1 when the parentheses are greater than 0, output -1 when the parentheses are less than 0, and output 0 when the parentheses are equal to 0;
为了保持对抗样本图片与原始输入图片的分布一致,本发明将第t+1次迭代的对抗样本图片中的每一个像素点裁剪到0到1之间,其计算公式为:In order to keep the distribution of the adversarial sample image consistent with the original input image, the present invention clips each pixel in the adversarial sample image of the t+1th iteration to be between 0 and 1, and the calculation formula is:
x”t+1=Clip(x't+1,0,1)x" t+1 = Clip(x' t+1 ,0,1)
其中,x”t+1为裁剪后的对抗样本图片,Clip()为裁剪函数,将对抗样本图片中大于1的像素点裁剪为1。Among them, x” t+1 is the cropped adversarial sample image, Clip() is the cropping function, and the pixels greater than 1 in the adversarial sample image are cropped to 1.
S234、将上述步骤视为一次攻击迭代,一共进行T次迭代,其中初始化第0次迭代的对抗样本x'0为原始输入图片x,最后输出第T次迭代的对抗样本图片作为最终的对抗样本。S234. The above steps are regarded as one attack iteration, and T iterations are performed in total, wherein the adversarial sample x' 0 of the 0th iteration is initialized as the original input image x, and finally the adversarial sample image of the Tth iteration is output as the final adversarial sample .
本发明使用三元组损失函数替换现有的MI-FGSM方法中的交叉熵函数,并通过在模型两个中间层中分别使用一个三元组损失函数来达到一个从粗至精的攻击过程,以破坏特征空间中图片上丰富的、模型主要关注的区域,很好地平衡了白盒和黑盒场景下的目标攻击成功率。The invention uses triple loss function to replace the cross entropy function in the existing MI-FGSM method, and achieves a coarse-to-fine attack process by using a triple loss function in the two middle layers of the model respectively, By destroying the rich areas of the image in the feature space and the main focus of the model, the target attack success rate in the white-box and black-box scenarios is well balanced.
S3、将步骤S2攻击得到的对抗样本输入到黑盒分类模型中,误导模型输出目标类别。S3. Input the adversarial samples obtained by the attack in step S2 into the black-box classification model, and mislead the model to output the target category.
本发明在解决复杂数据集的分类任务时,依然能通过三元组损失函数将原始图片的特征尽可能远离原始真实类别的区域,并且同时尽可能靠近目标类别的区域,改善最终白盒目标攻击成功率甚至是黑盒攻击成功率。由于方法简单且参数数量适中,使得本发明的使用快捷便利。When solving the classification task of complex data sets, the present invention can still use the triple loss function to keep the features of the original image as far away as possible from the area of the original real category, and at the same time as close to the area of the target category as possible, so as to improve the final white-box target attack. The success rate is even the black box attack success rate. Since the method is simple and the number of parameters is moderate, the use of the present invention is quick and convenient.
本领域的普通技术人员将会意识到,这里所述的实施例是为了帮助读者理解本发明的原理,应被理解为本发明的保护范围并不局限于这样的特别陈述和实施例。本领域的普通技术人员可以根据本发明公开的这些技术启示做出各种不脱离本发明实质的其它各种具体变形和组合,这些变形和组合仍然在本发明的保护范围内。Those of ordinary skill in the art will appreciate that the embodiments described herein are intended to assist readers in understanding the principles of the present invention, and it should be understood that the scope of protection of the present invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific modifications and combinations without departing from the essence of the present invention according to the technical teaching disclosed in the present invention, and these modifications and combinations still fall within the protection scope of the present invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010630136.6A CN111898645A (en) | 2020-07-03 | 2020-07-03 | A transferable adversarial example attack method based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010630136.6A CN111898645A (en) | 2020-07-03 | 2020-07-03 | A transferable adversarial example attack method based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111898645A true CN111898645A (en) | 2020-11-06 |
Family
ID=73192926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010630136.6A Pending CN111898645A (en) | 2020-07-03 | 2020-07-03 | A transferable adversarial example attack method based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111898645A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329929A (en) * | 2021-01-04 | 2021-02-05 | 北京智源人工智能研究院 | Countermeasure sample generation method and device based on proxy model |
CN113255816A (en) * | 2021-06-10 | 2021-08-13 | 北京邮电大学 | Directional attack countermeasure patch generation method and device |
CN113469330A (en) * | 2021-06-25 | 2021-10-01 | 中国人民解放军陆军工程大学 | Method for enhancing sample mobility resistance by bipolar network corrosion |
CN113674140A (en) * | 2021-08-20 | 2021-11-19 | 燕山大学 | Physical countermeasure sample generation method and system |
CN113869062A (en) * | 2021-09-30 | 2021-12-31 | 北京工业大学 | A privacy protection method for social text personality based on black-box adversarial samples |
CN114077871A (en) * | 2021-11-26 | 2022-02-22 | 西安电子科技大学 | Black box neural network type detection method using small amount of data and resisting attack |
CN114549933A (en) * | 2022-02-21 | 2022-05-27 | 南京大学 | Adversarial sample generation method based on feature vector transfer of target detection model |
CN114627373A (en) * | 2022-02-25 | 2022-06-14 | 北京理工大学 | Countermeasure sample generation method for remote sensing image target detection model |
CN114724014A (en) * | 2022-06-06 | 2022-07-08 | 杭州海康威视数字技术股份有限公司 | Anti-sample attack detection method and device based on deep learning and electronic equipment |
CN115544499A (en) * | 2022-11-30 | 2022-12-30 | 武汉大学 | Migratable black box anti-attack sample generation method and system and electronic equipment |
CN116523032A (en) * | 2023-03-13 | 2023-08-01 | 之江实验室 | A method, device and medium for image-text double-port migration attack |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726696A (en) * | 2019-01-03 | 2019-05-07 | 电子科技大学 | Image description generation system and method based on deliberation attention mechanism |
CN109948663A (en) * | 2019-02-27 | 2019-06-28 | 天津大学 | An Adversarial Attack Method Based on Model Extraction and Step Size Adaptive |
CN110175251A (en) * | 2019-05-25 | 2019-08-27 | 西安电子科技大学 | The zero sample Sketch Searching method based on semantic confrontation network |
CN110334806A (en) * | 2019-05-29 | 2019-10-15 | 广东技术师范大学 | A method of adversarial sample generation based on generative adversarial network |
CN110610708A (en) * | 2019-08-31 | 2019-12-24 | 浙江工业大学 | A voiceprint recognition attack defense method based on cuckoo search algorithm |
CN111047658A (en) * | 2019-11-29 | 2020-04-21 | 武汉大学 | Compression-resistant antagonistic image generation method for deep neural network |
US20200151505A1 (en) * | 2018-11-12 | 2020-05-14 | Sap Se | Platform for preventing adversarial attacks on image-based machine learning models |
CN111199233A (en) * | 2019-12-30 | 2020-05-26 | 四川大学 | An improved deep learning method for pornographic image recognition |
-
2020
- 2020-07-03 CN CN202010630136.6A patent/CN111898645A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200151505A1 (en) * | 2018-11-12 | 2020-05-14 | Sap Se | Platform for preventing adversarial attacks on image-based machine learning models |
CN109726696A (en) * | 2019-01-03 | 2019-05-07 | 电子科技大学 | Image description generation system and method based on deliberation attention mechanism |
CN109948663A (en) * | 2019-02-27 | 2019-06-28 | 天津大学 | An Adversarial Attack Method Based on Model Extraction and Step Size Adaptive |
CN110175251A (en) * | 2019-05-25 | 2019-08-27 | 西安电子科技大学 | The zero sample Sketch Searching method based on semantic confrontation network |
CN110334806A (en) * | 2019-05-29 | 2019-10-15 | 广东技术师范大学 | A method of adversarial sample generation based on generative adversarial network |
CN110610708A (en) * | 2019-08-31 | 2019-12-24 | 浙江工业大学 | A voiceprint recognition attack defense method based on cuckoo search algorithm |
CN111047658A (en) * | 2019-11-29 | 2020-04-21 | 武汉大学 | Compression-resistant antagonistic image generation method for deep neural network |
CN111199233A (en) * | 2019-12-30 | 2020-05-26 | 四川大学 | An improved deep learning method for pornographic image recognition |
Non-Patent Citations (3)
Title |
---|
LIANLI GAO 等: "Push & Pull: Transferable Adversarial Examples With Attentive Attack", 《IEEE》 * |
孙曦音: "基于GAN的对抗样本生成与安全应用研究", 《中国优秀硕士论文电子期刊网 信息科技辑》 * |
黄梓杰: "基于特征激活的对抗攻击", 《中国优秀硕士论文电子期刊网 信息科技辑》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329929A (en) * | 2021-01-04 | 2021-02-05 | 北京智源人工智能研究院 | Countermeasure sample generation method and device based on proxy model |
CN113255816A (en) * | 2021-06-10 | 2021-08-13 | 北京邮电大学 | Directional attack countermeasure patch generation method and device |
CN113469330A (en) * | 2021-06-25 | 2021-10-01 | 中国人民解放军陆军工程大学 | Method for enhancing sample mobility resistance by bipolar network corrosion |
CN113674140B (en) * | 2021-08-20 | 2023-09-26 | 燕山大学 | A physical adversarial sample generation method and system |
CN113674140A (en) * | 2021-08-20 | 2021-11-19 | 燕山大学 | Physical countermeasure sample generation method and system |
CN113869062A (en) * | 2021-09-30 | 2021-12-31 | 北京工业大学 | A privacy protection method for social text personality based on black-box adversarial samples |
CN114077871A (en) * | 2021-11-26 | 2022-02-22 | 西安电子科技大学 | Black box neural network type detection method using small amount of data and resisting attack |
CN114077871B (en) * | 2021-11-26 | 2024-11-08 | 西安电子科技大学 | A black-box neural network detection method using small amounts of data and adversarial attacks |
CN114549933A (en) * | 2022-02-21 | 2022-05-27 | 南京大学 | Adversarial sample generation method based on feature vector transfer of target detection model |
CN114627373A (en) * | 2022-02-25 | 2022-06-14 | 北京理工大学 | Countermeasure sample generation method for remote sensing image target detection model |
CN114724014B (en) * | 2022-06-06 | 2023-06-30 | 杭州海康威视数字技术股份有限公司 | Deep learning-based method and device for detecting attack of countered sample and electronic equipment |
CN114724014A (en) * | 2022-06-06 | 2022-07-08 | 杭州海康威视数字技术股份有限公司 | Anti-sample attack detection method and device based on deep learning and electronic equipment |
CN115544499A (en) * | 2022-11-30 | 2022-12-30 | 武汉大学 | Migratable black box anti-attack sample generation method and system and electronic equipment |
CN116523032A (en) * | 2023-03-13 | 2023-08-01 | 之江实验室 | A method, device and medium for image-text double-port migration attack |
CN116523032B (en) * | 2023-03-13 | 2023-09-29 | 之江实验室 | An image and text double-end migration attack method, device and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111898645A (en) | A transferable adversarial example attack method based on attention mechanism | |
Song et al. | AttaNet: Attention-augmented network for fast and accurate scene parsing | |
CN113723295B (en) | A face forgery detection method based on image domain frequency domain dual-stream network | |
CN110910391B (en) | A Double-Module Neural Network Structure Video Object Segmentation Method | |
CN110110689B (en) | A Pedestrian Re-identification Method | |
CN111062951A (en) | A Knowledge Distillation Method Based on Intra-Class Feature Difference for Semantic Segmentation | |
CN114399630B (en) | Antagonistic sample generation method based on belief attack and significant area disturbance limitation | |
CN113674140A (en) | Physical countermeasure sample generation method and system | |
CN112966647A (en) | Pedestrian re-identification method based on layer-by-layer clustering and enhanced discrimination | |
CN111274981A (en) | Target detection network construction method and device, target detection method | |
Li et al. | Towards efficient scene understanding via squeeze reasoning | |
Ding et al. | Beyond universal person re-identification attack | |
CN114240951B (en) | Black box attack method of medical image segmentation neural network based on query | |
WO2023206944A1 (en) | Semantic segmentation method and apparatus, computer device, and storage medium | |
CN110503666A (en) | A video-based dense crowd counting method and system | |
CN116452862A (en) | Image classification method based on domain generalization learning | |
CN106127222A (en) | The similarity of character string computational methods of a kind of view-based access control model and similarity determination methods | |
CN109584267B (en) | Scale adaptive correlation filtering tracking method combined with background information | |
CN115050045A (en) | Vision MLP-based pedestrian re-identification method | |
Yu et al. | Deep metric learning with dynamic margin hard sampling loss for face verification | |
CN114298160A (en) | Twin knowledge distillation and self-supervised learning based small sample classification method | |
CN114972904B (en) | A zero-shot knowledge distillation method and system based on adversarial triplet loss | |
CN114708479B (en) | Self-adaptive defense method based on graph structure and characteristics | |
CN115481716A (en) | Physical world counter attack method based on deep network foreground activation feature transfer | |
Zheng et al. | Template‐aware transformer for person reidentification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201106 |
|
RJ01 | Rejection of invention patent application after publication |