CN111898645A - A transferable adversarial example attack method based on attention mechanism - Google Patents

A transferable adversarial example attack method based on attention mechanism Download PDF

Info

Publication number
CN111898645A
CN111898645A CN202010630136.6A CN202010630136A CN111898645A CN 111898645 A CN111898645 A CN 111898645A CN 202010630136 A CN202010630136 A CN 202010630136A CN 111898645 A CN111898645 A CN 111898645A
Authority
CN
China
Prior art keywords
attack
picture
layer
sample
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010630136.6A
Other languages
Chinese (zh)
Inventor
宋井宽
黄梓杰
高联丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Guizhou University
Original Assignee
University of Electronic Science and Technology of China
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China, Guizhou University filed Critical University of Electronic Science and Technology of China
Priority to CN202010630136.6A priority Critical patent/CN111898645A/en
Publication of CN111898645A publication Critical patent/CN111898645A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于注意力机制的可迁移的对抗样本攻击方法,该方法包括选择一个本地替代网络模型,并构建特征库将原始图片映射到特征空间中;采用基于动量累积的迭代快速梯度符号攻击方法将原始图片的特征远离原始类别区域,同时使其靠近目标类别区域;将攻击得到的对抗样本输入到黑盒分类模型中,误导模型输出目标类别。本发明通过利用三元组损失函数破坏被攻击模型特征空间中信息丰富的、模型主要关注的区域,解决在复杂数据集的分类任务中现有攻击方法存在的白盒目标攻击成功率低以及黑盒目标迁移率低的问题,在兼顾白盒与黑盒场景的情况下有效地实现误导分类模型。

Figure 202010630136

The invention discloses a transferable adversarial sample attack method based on an attention mechanism. The method includes selecting a local substitute network model, and constructing a feature library to map the original image into the feature space; adopting an iterative fast gradient based on momentum accumulation The symbol attack method moves the features of the original image away from the original category area, and at the same time makes it close to the target category area; the adversarial samples obtained by the attack are input into the black-box classification model to mislead the model to output the target category. The invention uses the triple loss function to destroy the information-rich area in the feature space of the attacked model and the main concern of the model, so as to solve the problems of low white-box target attack success rate and black target existing in the existing attack methods in the classification task of complex data sets. The problem of low box target migration rate can effectively implement a misleading classification model while taking into account both white-box and black-box scenarios.

Figure 202010630136

Description

基于注意力机制的可迁移的对抗样本攻击方法A transferable adversarial example attack method based on attention mechanism

技术领域technical field

本发明属于对抗攻击技术领域,具体涉及一种基于注意力机制的可迁移的对抗样本攻击方法。The invention belongs to the technical field of adversarial attacks, and in particular relates to a transferable adversarial sample attack method based on an attention mechanism.

背景技术Background technique

随着深度学习的飞速发展,使得研究人员能够解决诸如图像分类、分割等很多计算机视觉任务。然而,由于对抗样本的出现,人们对于卷积神经网络的缺点投入了更加广泛的关注。对抗样本指的是通过在原始输入图片上加入一些人眼无法感知的、细微的扰动,使得卷积神经网络无法正确预测该图片。目前生成对抗样本的方法可通过攻击的目标或者期望分为非目标攻击和目标攻击,前者指的是攻击者的目标仅仅是使得分类模型给出错误预测即可,而后者是攻击方想要将预测结果改变为某些预先指定的目标标签。其次,通过攻击者对模型的了解程度可分为白盒攻击和黑盒攻击,在前者情况下攻击者拥有被攻击模型所有的信息,包括模型参数、结构等;而后者是攻击者无法获取模型的所有信息,仅仅能获取模型对应输入的预测结果。因此,对抗样本的迁移性成为了黑盒攻击的关键,迁移性指的是通过攻击某类模型生成的对抗样本可能能让其他模型也预测错误。With the rapid development of deep learning, researchers can solve many computer vision tasks such as image classification and segmentation. However, due to the advent of adversarial examples, more attention has been paid to the shortcomings of convolutional neural networks. Adversarial examples refer to the fact that the convolutional neural network cannot correctly predict the image by adding some subtle perturbations to the original input image that cannot be perceived by the human eye. The current method of generating adversarial samples can be divided into non-target attacks and target attacks according to the target or expectation of the attack. The prediction results are changed to some pre-specified target labels. Secondly, according to the attacker's understanding of the model, it can be divided into white box attack and black box attack. In the former case, the attacker has all the information of the attacked model, including model parameters, structure, etc.; in the latter case, the attacker cannot obtain the model. All the information, only the prediction results of the model corresponding to the input can be obtained. Therefore, the transferability of adversarial samples has become the key to black-box attacks. Transferability refers to the fact that adversarial samples generated by attacking a certain type of model may make other models predict incorrectly.

一般来说,对抗攻击通常是通过破坏分类模型的Softmax输出空间来生成对抗样本,由于这类方法的迁移性有限,后面越来越多研究提出了基于破坏模型特征空间的对抗攻击,然而这类方法在复杂数据集分类任务中要么存在着白盒目标攻击成功率低的问题,要么存在着黑盒目标迁移率低的问题。Generally speaking, adversarial attacks usually generate adversarial samples by destroying the Softmax output space of the classification model. Due to the limited mobility of such methods, more and more studies have proposed adversarial attacks based on destroying the feature space of the model. In the complex data set classification task, the method either has the problem of low success rate of white-box target attack or the problem of low transfer rate of black-box target.

发明内容SUMMARY OF THE INVENTION

针对现有技术中的上述不足,本发明提供了一种基于注意力机制的可迁移的对抗样本攻击方法,通过利用三元组损失函数(Triplet Loss)破坏被攻击模型特征空间中信息丰富的、模型主要关注的区域,解决在复杂数据集的分类任务中现有攻击方法存在的白盒目标攻击成功率低以及黑盒目标迁移率低的问题,在兼顾白盒与黑盒场景的情况下有效地实现误导分类模型。In view of the above deficiencies in the prior art, the present invention provides a transferable adversarial sample attack method based on an attention mechanism, which destroys the information-rich, informative and The main focus area of the model is to solve the problems of low white-box target attack success rate and black-box target migration rate in existing attack methods in the classification task of complex datasets, and it is effective when both white-box and black-box scenarios are considered. to implement misleading classification models.

为了达到上述发明目的,本发明采用的技术方案为:In order to achieve the above-mentioned purpose of the invention, the technical scheme adopted in the present invention is:

一种基于注意力机制的可迁移的对抗样本攻击方法,包括以下步骤:A transferable adversarial example attack method based on attention mechanism, including the following steps:

S1、选择一个本地替代网络模型,并构建特征库将原始图片映射到特征空间中;S1. Select a local alternative network model, and build a feature library to map the original image into the feature space;

S2、采用基于动量累积的迭代快速梯度符号攻击方法将原始图片的特征远离原始类别区域,同时使其靠近目标类别区域;S2. The iterative fast gradient sign attack method based on momentum accumulation is used to move the features of the original image away from the original category area, and at the same time make it close to the target category area;

S3、将步骤S2攻击得到的对抗样本输入到黑盒分类模型中,误导模型输出目标类别。S3. Input the adversarial samples obtained by the attack in step S2 into the black-box classification model, and mislead the model to output the target category.

进一步地,所述步骤S1中选择一个本地替代网络模型具体为:Further, selecting a local alternative network model in the step S1 is specifically:

选择一个用于图片分类的本地替代网络模型,选择分类网络的中间层作为浅层,选择分类网络的Softmax的前一层作为深层。Choose a local alternative network model for image classification, choose the middle layer of the classification network as the shallow layer, and choose the previous layer of the Softmax of the classification network as the deep layer.

进一步地,所述步骤S1中构建特征库将原始图片映射到特征空间中具体为:Further, building a feature library in the step S1 to map the original image into the feature space is specifically:

对本地替代网络模型的验证集中每个类别,分别在选择的分类网络的浅层和深层中计算所有被本地替代网络模型分类成功的图片的质心,构建不同层的特征库。For each category in the verification set of the local replacement network model, the centroids of all images successfully classified by the local replacement network model are calculated in the shallow and deep layers of the selected classification network, and feature libraries of different layers are constructed.

进一步地,所述计算所有被本地替代网络模型分类成功的图片的质心的计算公式为:Further, the calculation formula for calculating the centroid of all pictures successfully classified by the local substitute network model is:

Figure BDA0002568294280000031
Figure BDA0002568294280000031

Figure BDA0002568294280000032
Figure BDA0002568294280000032

其中,n为在j类别中所有被本地替代网络模型分类正确的图片数量,Fl为本地替代网络模型中间第l层,

Figure BDA0002568294280000033
为j类别中第i张图片,yj为j类别的真实分类标签。Among them, n is the number of all pictures in the j category that are correctly classified by the local substitute network model, F l is the lth layer in the middle of the local substitute network model,
Figure BDA0002568294280000033
is the i-th image in the j category, and y j is the real classification label of the j category.

进一步地,所述步骤S2具体包括以下分步骤:Further, the step S2 specifically includes the following sub-steps:

S21、对于每张原始图片,在第l层的特征库中选择一个与原始图片相同类别的质心作为负样本,随机选择一个与原始图片不同类别的质心作为正样本,并与原始图片的第l层的特征共同组成三元组损失函数;S21. For each original picture, select a centroid of the same category as the original picture in the feature library of the lth layer as a negative sample, randomly select a centroid of a different category from the original picture as a positive sample, and compare it with the lth centroid of the original picture. The features of the layers together form a triple loss function;

S22、根据三元组损失函数构建本地替代网络模型的总损失函数;S22, constructing a total loss function of the local alternative network model according to the triplet loss function;

S23、采用基于动量累积的迭代快速梯度符号攻击方法对原始图片的特征生成扰动。S23 , using an iterative fast gradient sign attack method based on momentum accumulation to generate disturbances to the features of the original image.

进一步地,所述步骤S22中本地替代网络模型的总损失函数具体为:Further, the total loss function of the local substitute network model in the step S22 is specifically:

Ltotal=Ll+Lk L total =L l +L k

Ll=[D(fl a,fl p)-D(fl a,fl n)+θl]+ L l =[D(f l a ,f l p )-D(f l a ,f l n )+θ l ] +

Figure BDA0002568294280000034
Figure BDA0002568294280000034

其中,Ltotal为总损失函数,Ll、Lk分别为第l层和第k层上的三元组损失函数,D函数为欧氏距离函数,fl a

Figure BDA0002568294280000035
分别为原始图片的第l层和第k层特征,fl n
Figure BDA0002568294280000036
分别为第l层和第k层特征库中负样本,fl p
Figure BDA0002568294280000037
分别为第l层和第k层特征库中正样本,θl、θk分别为原始图片的第l层和第k层的特征与正样本之间的距离和该特征与负样本之间的距离之间的最小间隔,+表示[]内的值大于零时取该值为损失值,小于零时损失值为零。Among them, L total is the total loss function, L l and L k are the triple loss functions on the lth layer and the kth layer, respectively, D function is the Euclidean distance function, f la ,
Figure BDA0002568294280000035
are the l-th layer and k-th layer features of the original image, respectively, f l n ,
Figure BDA0002568294280000036
are the negative samples in the feature library of the lth layer and the kth layer, respectively, f l p ,
Figure BDA0002568294280000037
are the positive samples in the l-th layer and the k-th layer feature library respectively, θ l and θ k are the distances between the features of the l-th layer and the k-th layer of the original image and the positive samples and the distance between the feature and the negative samples, respectively The minimum interval between, + means that when the value in [] is greater than zero, take this value as the loss value, and when it is less than zero, the loss value is zero.

进一步地,所述步骤S23具体包括以下分步骤:Further, the step S23 specifically includes the following sub-steps:

S231、对原始图片计算总损失函数的梯度;S231. Calculate the gradient of the total loss function for the original image;

S232、根据总损失函数的梯度计算累积的动量;S232. Calculate the accumulated momentum according to the gradient of the total loss function;

S233、利用得到的动量计算扰动并加到第t次迭代的对抗样本图片中生成第t+1次迭代的对抗样本图片;S233, using the obtained momentum to calculate the disturbance and add it to the adversarial sample image of the t-th iteration to generate the adversarial sample image of the t+1-th iteration;

S234、对原始图片进行T次迭代攻击后输出第T次迭代的对抗样本图片作为最终的对抗样本。S234: After performing T iterations of the attack on the original image, output the confrontation sample image of the T-th iteration as the final confrontation sample.

进一步地,所述步骤S233生成第t+1次迭代的对抗样本图片表示为:Further, the step S233 generates the confrontation sample picture of the t+1th iteration and is represented as:

x't+1=x't-α·sign(gt+1)x' t+1 =x' t -α·sign(g t+1 )

其中,x't+1为第t+1次迭代的对抗样本图片,x't为第t次迭代的对抗样本图片,α为单次迭代的扰动步长,sign()为符号函数,括号内大于0时输出1,括号内小于0时输出-1,括号内等于0时输出0。Among them, x' t+1 is the adversarial sample image of the t+1th iteration, x' t is the adversarial sample image of the t-th iteration, α is the perturbation step size of a single iteration, sign() is the sign function, parentheses Output 1 when the value in the bracket is greater than 0, output -1 when the value in the bracket is less than 0, and output 0 when the value in the bracket is equal to 0.

进一步地,所述步骤S23还包括将第t+1次迭代的对抗样本图片中的每一个像素点裁剪到0到1之间,其计算公式为:Further, the step S23 also includes cropping each pixel in the adversarial sample picture of the t+1th iteration to be between 0 and 1, and the calculation formula is:

x”t+1=Clip(x't+1,0,1)x" t+1 = Clip(x' t+1 ,0,1)

其中,x”t+1为裁剪后的对抗样本图片,Clip()为裁剪函数,将对抗样本图片中大于1的像素点裁剪为1。Among them, x” t+1 is the cropped adversarial sample image, Clip() is the cropping function, and the pixels greater than 1 in the adversarial sample image are cropped to 1.

本发明具有以下有益效果:The present invention has the following beneficial effects:

(1)本发明使用三元组损失函数替换现有MI-FGSM方法中的交叉熵函数,以破坏特征空间中图片上丰富的、模型主要关注的区域,很好地平衡了白盒和黑盒场景下的目标攻击成功率;(1) The present invention uses the triple loss function to replace the cross-entropy function in the existing MI-FGSM method, so as to destroy the rich areas in the picture in the feature space and the main concern of the model, and balance the white box and the black box well The target attack success rate in the scenario;

(2)本发明结合了网络浅层和深层的特征同时进行攻击,有效地破坏了图片的全局粗糙信息和局部细节信息,以生成更加具有攻击性的对抗样本;(2) The present invention combines the features of the shallow and deep layers of the network to attack at the same time, effectively destroying the global rough information and local detail information of the picture, so as to generate more aggressive confrontation samples;

(3)本发明在解决复杂数据集的分类任务时,依然能通过三元组损失函数将原始图片的特征尽可能远离原始真实类别的区域,并且同时尽可能靠近目标类别的区域,改善最终白盒目标攻击成功率甚至是黑盒攻击成功率。(3) When the present invention solves the classification task of complex data sets, the feature of the original image can still be kept as far away as possible from the area of the original real category through the triple loss function, and at the same time as close to the area of the target category as possible, improving the final whiteness Box target attack success rate and even black box attack success rate.

附图说明Description of drawings

图1为本发明基于注意力机制的可迁移的对抗样本攻击方法流程图;Fig. 1 is the flow chart of the transferable adversarial sample attack method based on the attention mechanism of the present invention;

图2为本发明实施例中对抗样本攻击方法的流程框架图。FIG. 2 is a flow chart of a method for adversarial sample attack in an embodiment of the present invention.

具体实施方式Detailed ways

下面对本发明的具体实施方式进行描述,以便于本技术领域的技术人员理解本发明,但应该清楚,本发明不限于具体实施方式的范围,对本技术领域的普通技术人员来讲,只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内,这些变化是显而易见的,一切利用本发明构思的发明创造均在保护之列。The specific embodiments of the present invention are described below to facilitate those skilled in the art to understand the present invention, but it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, as long as various changes Such changes are obvious within the spirit and scope of the present invention as defined and determined by the appended claims, and all inventions and creations utilizing the inventive concept are within the scope of protection.

本发明实施例提供了一种基于注意力机制的可迁移的对抗样本攻击方法,通过在模型的特征空间中使用基于积累动量的迭代的快速梯度符号攻击方法来破坏当中信息丰富的、模型主要关注的区域,以生成迁移性强的、白盒目标攻击成功率高的对抗样本。The embodiment of the present invention provides a transferable adversarial attack method based on an attention mechanism, by using an iterative fast gradient sign attack method based on accumulated momentum in the feature space of the model to destroy the information-rich, model mainly concerned region to generate adversarial examples with strong migration and high success rate of white-box target attack.

本发明的研究对象是黑盒目标攻击,具体解决的技术场景为:The research object of the present invention is black box target attack, and the technical scenarios specifically solved are:

1)攻击者无法获取被攻击的分类模型的所有信息,仅仅能得到模型对应输入的预测输出;1) The attacker cannot obtain all the information of the attacked classification model, but can only obtain the predicted output corresponding to the input of the model;

2)攻击者的目标是误导模型预测结果为预先设定的类别,该类别与原始图片的真实类别不同。因此对于该场景,常规的攻击方法是依赖于对抗样本的迁移性,即攻击者选择一个本地替代网络模型进行攻击,通过将得到的对抗样本迁移到被攻击模型来达到攻击目的。2) The attacker's goal is to mislead the model into predicting a pre-specified class that is different from the true class of the original image. Therefore, for this scenario, the conventional attack method relies on the mobility of adversarial samples, that is, the attacker selects a local alternative network model to attack, and achieves the attack purpose by migrating the obtained adversarial samples to the attacked model.

如图1和图2所示,本发明的对抗样本攻击方法具体包括以下步骤S1至S3:As shown in FIG. 1 and FIG. 2 , the adversarial sample attack method of the present invention specifically includes the following steps S1 to S3:

S1、选择一个本地替代网络模型,并构建特征库将原始图片映射到特征空间中;S1. Select a local alternative network model, and build a feature library to map the original image into the feature space;

在本实施例中,本发明首先建立一个用于图片分类的本地替代网络模型,可选择地使用神经网络模型如DenseNet[6]、ResNet[7]等。In this embodiment, the present invention first establishes a local substitute network model for image classification, optionally using a neural network model such as DenseNet [6], ResNet [7] and so on.

以神经网络模型DenseNet-121为例,分别选择神经网络第二个稠密块(DenseBlock)的输出以及分类层中Softmax的前一层作为被攻击的层。Taking the neural network model DenseNet-121 as an example, the output of the second dense block (DenseBlock) of the neural network and the previous layer of Softmax in the classification layer are selected as the attacked layer.

本发明再对本地替代网络模型的验证集中每个类别,分别在选择的分类网络的浅层和深层中计算本地替代网络模型分类成功的图片的质心,构建不同层的特征库。The invention calculates the centroid of pictures successfully classified by the local substitute network model in the shallow and deep layers of the selected classification network for each category in the verification set of the local substitute network model, and constructs feature libraries of different layers.

为了解决在复杂数据集上的分类任务,本发明将用ImageNet数据集作为解释说明。对于ImageNet验证集中每个类别,分别在选择的分类网络的浅层和深层中计算所有被替代模型分类成功的图片的质心cj,计算公式为:In order to solve the classification task on complex datasets, the present invention will be explained with the ImageNet dataset. For each category in the ImageNet validation set, the centroids c j of all images successfully classified by the substituted model are calculated in the shallow and deep layers of the selected classification network, respectively. The calculation formula is:

Figure BDA0002568294280000061
Figure BDA0002568294280000061

Figure BDA0002568294280000062
Figure BDA0002568294280000062

其中,n为在j类别中所有被本地替代网络模型分类正确的图片数量,Fl为本地替代网络模型中间第l层,

Figure BDA0002568294280000063
为j类别中第i张图片,yj为j类别的真实分类标签。Among them, n is the number of all pictures correctly classified by the local substitute network model in the j category, F l is the lth layer in the middle of the local substitute network model,
Figure BDA0002568294280000063
is the i-th image in the j category, and y j is the true classification label of the j category.

通过上述方法分别构建出针对不同层的特征库。Through the above methods, feature libraries for different layers are constructed respectively.

本发明结合了网络浅层和深层的特征同时进行攻击,有效地破坏了图片的全局粗糙信息和局部细节信息以生成更加具有攻击性的对抗样本,这一方法也可以扩展到其他基于破坏特征空间的攻击方法中。The invention combines the features of the shallow and deep layers of the network to attack at the same time, effectively destroying the global rough information and local detail information of the picture to generate more aggressive confrontation samples, and this method can also be extended to other damage-based feature spaces. method of attack.

S2、采用基于动量累积的迭代快速梯度符号攻击方法将原始图片的特征远离原始类别区域,同时使其靠近目标类别区域;S2. The iterative fast gradient sign attack method based on momentum accumulation is used to move the features of the original image away from the original category area, and at the same time make it close to the target category area;

在本实施例中,本发明具体包括以下分步骤:In this embodiment, the present invention specifically includes the following steps:

S21、对于每张被攻击的原始图片,在第l层的特征库中选择一个与原始图片相同类别的质心作为负样本fl n,随机选择一个与原始图片不同类别的质心作为正样本fl p,并与原始图片的第l层的特征fl a共同组成三元组损失函数<fl a,fl p,fl n>;S21. For each attacked original image, select a centroid of the same category as the original image in the feature library of the l-th layer as a negative sample f l n , and randomly select a centroid of a different category from the original image as a positive sample f l p , and together with the feature f la of the lth layer of the original image, form a triple loss function <f l a ,f l p ,f l n >;

S22、根据三元组损失函数构建本地替代网络模型的总损失函数;S22, constructing a total loss function of the local alternative network model according to the triplet loss function;

本发明分别在网络浅层和深层上使用三元组损失函数构建本地替代网络模型的总损失函数,具体为:The present invention constructs the total loss function of the local replacement network model by using the triple loss function on the shallow layer and the deep layer of the network respectively, specifically:

Ltotal=Ll+Lk,L total =L l +L k ,

Ll=[D(fl a,fl p)-D(fl a,fl n)+θ]+,L l =[D(f l a ,f l p )-D(f l a ,f l n )+θ] + ,

Figure BDA0002568294280000071
Figure BDA0002568294280000071

其中,Ltotal为总损失函数,Ll、Lk分别为第l层和第k层上的三元组损失函数,D函数为欧氏距离函数,fl a

Figure BDA0002568294280000072
分别为原始图片的第l层和第k层特征,fl n
Figure BDA0002568294280000073
分别为第l层和第k层特征库中负样本,fl p
Figure BDA0002568294280000074
分别为第l层和第k层特征库中正样本,θl、θk分别为原始图片的第l层和第k层的特征与正样本之间的距离和该特征与负样本之间的距离之间的最小间隔,+表示[]内的值大于零时取该值为损失值,小于零时损失值为零。Among them, L total is the total loss function, L l and L k are the triple loss functions on the lth layer and the kth layer, respectively, D function is the Euclidean distance function, f la ,
Figure BDA0002568294280000072
are the l-th layer and k-th layer features of the original image, respectively, f l n ,
Figure BDA0002568294280000073
are the negative samples in the feature library of the lth layer and the kth layer, respectively, f l p ,
Figure BDA0002568294280000074
are the positive samples in the l-th layer and the k-th layer feature library, respectively, θ l and θ k are the distances between the features of the l-th layer and the k-th layer of the original image and the positive samples and the distance between the feature and the negative samples, respectively The minimum interval between, + means that when the value in [] is greater than zero, take this value as the loss value, and when it is less than zero, the loss value is zero.

S23、采用基于动量累积的迭代快速梯度符号攻击方法对原始图片的特征生成扰动,具体包括以下分步骤:S23, using an iterative fast gradient symbol attack method based on momentum accumulation to generate disturbances to the features of the original image, which specifically includes the following sub-steps:

S231、对被攻击的原始图片x计算总损失函数Ltotal的偏导数得到梯度

Figure BDA0002568294280000081
S231. Calculate the partial derivative of the total loss function L total for the attacked original image x to obtain the gradient
Figure BDA0002568294280000081

S232、根据总损失函数的梯度计算累积的动量gt+1,表示为:S232. Calculate the accumulated momentum g t+1 according to the gradient of the total loss function, which is expressed as:

Figure BDA0002568294280000082
Figure BDA0002568294280000082

其中,gt为第t次迭代过程中累计的动量;Among them, g t is the momentum accumulated during the t-th iteration;

S233、利用得到的动量计算扰动并加到第t次迭代的对抗样本图片x't中生成第t+1次迭代的对抗样本图片x't+1,表示为:S233, using the obtained momentum to calculate the disturbance and add it to the adversarial sample image x' t of the t-th iteration to generate the adversarial sample image x' t+1 of the t+1-th iteration, which is expressed as:

x't+1=x't-α·sign(gt+1)x' t+1 =x' t -α·sign(g t+1 )

其中,x't+1为第t+1次迭代的对抗样本图片,x't为第t次迭代的对抗样本图片,α为单次迭代的扰动步长,其计算方法为总扰动步长ε除以迭代次数,即

Figure BDA0002568294280000083
sign()为符号函数,括号内大于0时输出1,括号内小于0时输出-1,括号内等于0时输出0;Among them, x' t+1 is the adversarial sample image of the t+1th iteration, x' t is the adversarial sample image of the t-th iteration, α is the perturbation step size of a single iteration, and its calculation method is the total perturbation step size ε divided by the number of iterations, i.e.
Figure BDA0002568294280000083
sign() is a sign function, output 1 when the parentheses are greater than 0, output -1 when the parentheses are less than 0, and output 0 when the parentheses are equal to 0;

为了保持对抗样本图片与原始输入图片的分布一致,本发明将第t+1次迭代的对抗样本图片中的每一个像素点裁剪到0到1之间,其计算公式为:In order to keep the distribution of the adversarial sample image consistent with the original input image, the present invention clips each pixel in the adversarial sample image of the t+1th iteration to be between 0 and 1, and the calculation formula is:

x”t+1=Clip(x't+1,0,1)x" t+1 = Clip(x' t+1 ,0,1)

其中,x”t+1为裁剪后的对抗样本图片,Clip()为裁剪函数,将对抗样本图片中大于1的像素点裁剪为1。Among them, x” t+1 is the cropped adversarial sample image, Clip() is the cropping function, and the pixels greater than 1 in the adversarial sample image are cropped to 1.

S234、将上述步骤视为一次攻击迭代,一共进行T次迭代,其中初始化第0次迭代的对抗样本x'0为原始输入图片x,最后输出第T次迭代的对抗样本图片作为最终的对抗样本。S234. The above steps are regarded as one attack iteration, and T iterations are performed in total, wherein the adversarial sample x' 0 of the 0th iteration is initialized as the original input image x, and finally the adversarial sample image of the Tth iteration is output as the final adversarial sample .

本发明使用三元组损失函数替换现有的MI-FGSM方法中的交叉熵函数,并通过在模型两个中间层中分别使用一个三元组损失函数来达到一个从粗至精的攻击过程,以破坏特征空间中图片上丰富的、模型主要关注的区域,很好地平衡了白盒和黑盒场景下的目标攻击成功率。The invention uses triple loss function to replace the cross entropy function in the existing MI-FGSM method, and achieves a coarse-to-fine attack process by using a triple loss function in the two middle layers of the model respectively, By destroying the rich areas of the image in the feature space and the main focus of the model, the target attack success rate in the white-box and black-box scenarios is well balanced.

S3、将步骤S2攻击得到的对抗样本输入到黑盒分类模型中,误导模型输出目标类别。S3. Input the adversarial samples obtained by the attack in step S2 into the black-box classification model, and mislead the model to output the target category.

本发明在解决复杂数据集的分类任务时,依然能通过三元组损失函数将原始图片的特征尽可能远离原始真实类别的区域,并且同时尽可能靠近目标类别的区域,改善最终白盒目标攻击成功率甚至是黑盒攻击成功率。由于方法简单且参数数量适中,使得本发明的使用快捷便利。When solving the classification task of complex data sets, the present invention can still use the triple loss function to keep the features of the original image as far away as possible from the area of the original real category, and at the same time as close to the area of the target category as possible, so as to improve the final white-box target attack. The success rate is even the black box attack success rate. Since the method is simple and the number of parameters is moderate, the use of the present invention is quick and convenient.

本领域的普通技术人员将会意识到,这里所述的实施例是为了帮助读者理解本发明的原理,应被理解为本发明的保护范围并不局限于这样的特别陈述和实施例。本领域的普通技术人员可以根据本发明公开的这些技术启示做出各种不脱离本发明实质的其它各种具体变形和组合,这些变形和组合仍然在本发明的保护范围内。Those of ordinary skill in the art will appreciate that the embodiments described herein are intended to assist readers in understanding the principles of the present invention, and it should be understood that the scope of protection of the present invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific modifications and combinations without departing from the essence of the present invention according to the technical teaching disclosed in the present invention, and these modifications and combinations still fall within the protection scope of the present invention.

Claims (9)

1. A migratable sample attack resistance method based on an attention mechanism is characterized by comprising the following steps:
s1, selecting a local substitute network model, and constructing a feature library to map the original picture into a feature space;
s2, separating the characteristics of the original picture from the original category area by adopting an iterative fast gradient sign attack method based on momentum accumulation, and simultaneously enabling the characteristics to be close to the target category area;
and S3, inputting the confrontation sample obtained by the attack in the step S2 into the black box classification model, and misleading the model to output the target class.
2. The method of claim 1, wherein the step S1 of selecting a local surrogate network model is specifically:
selecting a local substitute network model for picture classification, selecting a middle layer of a classification network as a shallow layer, and selecting a previous layer of Softmax of the classification network as a deep layer.
3. The method for mobilizable countersample attack based on attention mechanism of claim 2, wherein the step S1 of constructing the feature library maps the original picture into the feature space specifically as follows:
and for each category in the verification set of the local substitution network model, calculating the centroids of all pictures successfully classified by the local substitution network model in the shallow layer and the deep layer of the selected classification network respectively, and constructing feature libraries of different layers.
4. The method for mobilizable countersample attack based on attention mechanism of claim 3, wherein the calculation formula of the centroid of all the pictures successfully classified by the local substitution network model is as follows:
Figure FDA0002568294270000011
Figure FDA0002568294270000021
wherein n is the number of pictures correctly classified by the local substitution network model in the j category, FlTo locally replace the middle l-th layer of the network model,
Figure FDA0002568294270000022
is the ith picture in the j category, yjThe true class label for the j category.
5. The method for mobilizable countersample attack based on attention mechanism of claim 1, wherein the step S2 comprises the following sub-steps:
s21, for each original picture, selecting a mass center in the same category as the original picture from the feature library of the l layer as a negative sample, randomly selecting a mass center in different category from the original picture as a positive sample, and forming a triple loss function together with the features of the l layer of the original picture;
s22, constructing a total loss function of the local substitute network model according to the triple loss function;
and S23, generating disturbance on the characteristics of the original picture by adopting an iterative fast gradient sign attack method based on momentum accumulation.
6. The method for mobilizable countersample attack based on attention mechanism of claim 5, wherein the total loss function of the local surrogate network model in step S22 is specifically:
Ltotal=Ll+Lk
Ll=[D(fl a,fl p)-D(fl a,fl n)+θl]+
Figure FDA0002568294270000023
wherein L istotalAs a function of total loss, Ll、LkTriple loss functions on the l-th layer and the k-th layer respectively, D function is Euclidean distance function, fl a
Figure FDA0002568294270000024
Features of the l-th and k-th layers, respectively, of the original picture, fl n
Figure FDA0002568294270000025
Characterised by the l-th and k-th layers, respectivelyNegative sample in the library, fl p
Figure FDA0002568294270000026
Positive samples, θ, in the ith and kth layer feature libraries, respectivelyl、θkThe minimum interval between the distance between the feature and the positive sample and the distance between the feature and the negative sample of the l-th and k-th layers of the original picture, respectively, + represents [, ]]When the value in the internal is larger than zero, the value is taken as a loss value, and when the value is smaller than zero, the loss value is zero.
7. The method for mobilizable countersample attack based on attention mechanism of claim 5, wherein the step S23 comprises the following sub-steps:
s231, calculating the gradient of a total loss function for the original picture;
s232, calculating accumulated momentum according to the gradient of the total loss function;
s233, calculating disturbance by using the obtained momentum, and adding the disturbance to the confrontation sample picture of the t iteration to generate the confrontation sample picture of the t +1 iteration;
and S234, performing T iterative attacks on the original picture, and outputting a countercheck sample picture of the T iteration as a final countercheck sample.
8. The method of claim 7, wherein the step S233 generates the confrontation sample picture of the t +1 th iteration as:
x′t+1=x′t-α·sign(gt+1)
wherein, x't+1Confrontation sample picture, x 'for t +1 iteration'tFor the countercheck sample picture of the t-th iteration, alpha is the disturbance step length of the single iteration, sign () is a sign function, 1 is output when the parenthesis is greater than 0, minus 1 is output when the parenthesis is less than 0, and 0 is output when the parenthesis is equal to 0.
9. The method of claim 8, wherein the step S23 further comprises clipping each pixel point in the confrontation sample picture of the t +1 th iteration to 0 to 1, and the calculation formula is:
x″t+1=Clip(x′t+1,0,1)
wherein, x ″)t+1For the cut confrontation sample picture, Clip () is a cutting function, and the pixel points larger than 1 in the confrontation sample picture are cut to be 1.
CN202010630136.6A 2020-07-03 2020-07-03 A transferable adversarial example attack method based on attention mechanism Pending CN111898645A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010630136.6A CN111898645A (en) 2020-07-03 2020-07-03 A transferable adversarial example attack method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010630136.6A CN111898645A (en) 2020-07-03 2020-07-03 A transferable adversarial example attack method based on attention mechanism

Publications (1)

Publication Number Publication Date
CN111898645A true CN111898645A (en) 2020-11-06

Family

ID=73192926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010630136.6A Pending CN111898645A (en) 2020-07-03 2020-07-03 A transferable adversarial example attack method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN111898645A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329929A (en) * 2021-01-04 2021-02-05 北京智源人工智能研究院 Countermeasure sample generation method and device based on proxy model
CN113255816A (en) * 2021-06-10 2021-08-13 北京邮电大学 Directional attack countermeasure patch generation method and device
CN113469330A (en) * 2021-06-25 2021-10-01 中国人民解放军陆军工程大学 Method for enhancing sample mobility resistance by bipolar network corrosion
CN113674140A (en) * 2021-08-20 2021-11-19 燕山大学 Physical countermeasure sample generation method and system
CN113869062A (en) * 2021-09-30 2021-12-31 北京工业大学 A privacy protection method for social text personality based on black-box adversarial samples
CN114077871A (en) * 2021-11-26 2022-02-22 西安电子科技大学 Black box neural network type detection method using small amount of data and resisting attack
CN114549933A (en) * 2022-02-21 2022-05-27 南京大学 Adversarial sample generation method based on feature vector transfer of target detection model
CN114627373A (en) * 2022-02-25 2022-06-14 北京理工大学 Countermeasure sample generation method for remote sensing image target detection model
CN114724014A (en) * 2022-06-06 2022-07-08 杭州海康威视数字技术股份有限公司 Anti-sample attack detection method and device based on deep learning and electronic equipment
CN115544499A (en) * 2022-11-30 2022-12-30 武汉大学 Migratable black box anti-attack sample generation method and system and electronic equipment
CN116523032A (en) * 2023-03-13 2023-08-01 之江实验室 A method, device and medium for image-text double-port migration attack

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726696A (en) * 2019-01-03 2019-05-07 电子科技大学 Image description generation system and method based on deliberation attention mechanism
CN109948663A (en) * 2019-02-27 2019-06-28 天津大学 An Adversarial Attack Method Based on Model Extraction and Step Size Adaptive
CN110175251A (en) * 2019-05-25 2019-08-27 西安电子科技大学 The zero sample Sketch Searching method based on semantic confrontation network
CN110334806A (en) * 2019-05-29 2019-10-15 广东技术师范大学 A method of adversarial sample generation based on generative adversarial network
CN110610708A (en) * 2019-08-31 2019-12-24 浙江工业大学 A voiceprint recognition attack defense method based on cuckoo search algorithm
CN111047658A (en) * 2019-11-29 2020-04-21 武汉大学 Compression-resistant antagonistic image generation method for deep neural network
US20200151505A1 (en) * 2018-11-12 2020-05-14 Sap Se Platform for preventing adversarial attacks on image-based machine learning models
CN111199233A (en) * 2019-12-30 2020-05-26 四川大学 An improved deep learning method for pornographic image recognition

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200151505A1 (en) * 2018-11-12 2020-05-14 Sap Se Platform for preventing adversarial attacks on image-based machine learning models
CN109726696A (en) * 2019-01-03 2019-05-07 电子科技大学 Image description generation system and method based on deliberation attention mechanism
CN109948663A (en) * 2019-02-27 2019-06-28 天津大学 An Adversarial Attack Method Based on Model Extraction and Step Size Adaptive
CN110175251A (en) * 2019-05-25 2019-08-27 西安电子科技大学 The zero sample Sketch Searching method based on semantic confrontation network
CN110334806A (en) * 2019-05-29 2019-10-15 广东技术师范大学 A method of adversarial sample generation based on generative adversarial network
CN110610708A (en) * 2019-08-31 2019-12-24 浙江工业大学 A voiceprint recognition attack defense method based on cuckoo search algorithm
CN111047658A (en) * 2019-11-29 2020-04-21 武汉大学 Compression-resistant antagonistic image generation method for deep neural network
CN111199233A (en) * 2019-12-30 2020-05-26 四川大学 An improved deep learning method for pornographic image recognition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIANLI GAO 等: "Push & Pull: Transferable Adversarial Examples With Attentive Attack", 《IEEE》 *
孙曦音: "基于GAN的对抗样本生成与安全应用研究", 《中国优秀硕士论文电子期刊网 信息科技辑》 *
黄梓杰: "基于特征激活的对抗攻击", 《中国优秀硕士论文电子期刊网 信息科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329929A (en) * 2021-01-04 2021-02-05 北京智源人工智能研究院 Countermeasure sample generation method and device based on proxy model
CN113255816A (en) * 2021-06-10 2021-08-13 北京邮电大学 Directional attack countermeasure patch generation method and device
CN113469330A (en) * 2021-06-25 2021-10-01 中国人民解放军陆军工程大学 Method for enhancing sample mobility resistance by bipolar network corrosion
CN113674140B (en) * 2021-08-20 2023-09-26 燕山大学 A physical adversarial sample generation method and system
CN113674140A (en) * 2021-08-20 2021-11-19 燕山大学 Physical countermeasure sample generation method and system
CN113869062A (en) * 2021-09-30 2021-12-31 北京工业大学 A privacy protection method for social text personality based on black-box adversarial samples
CN114077871A (en) * 2021-11-26 2022-02-22 西安电子科技大学 Black box neural network type detection method using small amount of data and resisting attack
CN114077871B (en) * 2021-11-26 2024-11-08 西安电子科技大学 A black-box neural network detection method using small amounts of data and adversarial attacks
CN114549933A (en) * 2022-02-21 2022-05-27 南京大学 Adversarial sample generation method based on feature vector transfer of target detection model
CN114627373A (en) * 2022-02-25 2022-06-14 北京理工大学 Countermeasure sample generation method for remote sensing image target detection model
CN114724014B (en) * 2022-06-06 2023-06-30 杭州海康威视数字技术股份有限公司 Deep learning-based method and device for detecting attack of countered sample and electronic equipment
CN114724014A (en) * 2022-06-06 2022-07-08 杭州海康威视数字技术股份有限公司 Anti-sample attack detection method and device based on deep learning and electronic equipment
CN115544499A (en) * 2022-11-30 2022-12-30 武汉大学 Migratable black box anti-attack sample generation method and system and electronic equipment
CN116523032A (en) * 2023-03-13 2023-08-01 之江实验室 A method, device and medium for image-text double-port migration attack
CN116523032B (en) * 2023-03-13 2023-09-29 之江实验室 An image and text double-end migration attack method, device and medium

Similar Documents

Publication Publication Date Title
CN111898645A (en) A transferable adversarial example attack method based on attention mechanism
Song et al. AttaNet: Attention-augmented network for fast and accurate scene parsing
CN113723295B (en) A face forgery detection method based on image domain frequency domain dual-stream network
CN110910391B (en) A Double-Module Neural Network Structure Video Object Segmentation Method
CN110110689B (en) A Pedestrian Re-identification Method
CN111062951A (en) A Knowledge Distillation Method Based on Intra-Class Feature Difference for Semantic Segmentation
CN114399630B (en) Antagonistic sample generation method based on belief attack and significant area disturbance limitation
CN113674140A (en) Physical countermeasure sample generation method and system
CN112966647A (en) Pedestrian re-identification method based on layer-by-layer clustering and enhanced discrimination
CN111274981A (en) Target detection network construction method and device, target detection method
Li et al. Towards efficient scene understanding via squeeze reasoning
Ding et al. Beyond universal person re-identification attack
CN114240951B (en) Black box attack method of medical image segmentation neural network based on query
WO2023206944A1 (en) Semantic segmentation method and apparatus, computer device, and storage medium
CN110503666A (en) A video-based dense crowd counting method and system
CN116452862A (en) Image classification method based on domain generalization learning
CN106127222A (en) The similarity of character string computational methods of a kind of view-based access control model and similarity determination methods
CN109584267B (en) Scale adaptive correlation filtering tracking method combined with background information
CN115050045A (en) Vision MLP-based pedestrian re-identification method
Yu et al. Deep metric learning with dynamic margin hard sampling loss for face verification
CN114298160A (en) Twin knowledge distillation and self-supervised learning based small sample classification method
CN114972904B (en) A zero-shot knowledge distillation method and system based on adversarial triplet loss
CN114708479B (en) Self-adaptive defense method based on graph structure and characteristics
CN115481716A (en) Physical world counter attack method based on deep network foreground activation feature transfer
Zheng et al. Template‐aware transformer for person reidentification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201106

RJ01 Rejection of invention patent application after publication