CN117523342A

CN117523342A - A high-mobility adversarial sample generation method, equipment and medium

Info

Publication number: CN117523342A
Application number: CN202410013633.XA
Authority: CN
Inventors: 许林峰; 陈先意
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-01-04
Filing date: 2024-01-04
Publication date: 2024-02-06
Anticipated expiration: 2044-01-04
Also published as: CN117523342B

Abstract

The invention discloses a method, equipment and medium for generating high-mobility adversarial samples. First, an existing feature-level attack method is used to attack and then an adversarial sample is generated. The adversarial sample obtains new features on the original model as a strengthening direction. By further strengthening the newly generated features while interfering with the original features, the purpose of improving the transferability of adversarial samples is achieved. Compared with other feature-level methods that only interfere with original features, the present invention constructs a loss function by aggregating original feature gradients and newly generated feature gradients. It enhances the newly generated features while disturbing the original features of the image. When migrating and attacking other models, it is easier to be attacked into newly generated feature categories, which can generate adversarial samples with higher transferability.

Description

A high-mobility adversarial sample generation method, equipment and medium

技术领域Technical field

本发明涉及一种高迁移性对抗样本生成方法、设备及介质，属于图像处理技术领域。The invention relates to a high-mobility adversarial sample generation method, equipment and medium, and belongs to the technical field of image processing.

背景技术Background technique

近年来，随着深度神经网络的快速发展，深度学习被应用于物体检测、图像分类、语意分割等诸多计算机图像领域并且取得了令人瞩目进展。与此同时，由于深度神经网络的脆弱性和不稳定性导致其容易受到攻击，人工智能安全性问题逐渐受到了研究人员的广泛关注。大量研究表明，通过给原始良性样本添加一些不会引起人警觉的细微扰动便可以生成对抗样本，对抗样本可以来误导深度学习模型产生错误结果。例如，在图像识别场景下，原来被图像识别模型识别为猫的图片，在加入一点细微的甚至人眼无法察觉的扰动后，被误分类为鱼。这使得深度学习模型在实际部署后会产生安全隐患。In recent years, with the rapid development of deep neural networks, deep learning has been applied to many computer image fields such as object detection, image classification, and semantic segmentation, and has made remarkable progress. At the same time, because the fragility and instability of deep neural networks make them vulnerable to attacks, artificial intelligence security issues have gradually attracted widespread attention from researchers. A large number of studies have shown that adversarial samples can be generated by adding some subtle perturbations to original benign samples that will not cause alertness. Adversarial samples can mislead deep learning models to produce wrong results. For example, in the image recognition scenario, a picture that was originally recognized as a cat by the image recognition model was misclassified as a fish after adding a slight disturbance that was even imperceptible to the human eye. This makes the deep learning model create security risks after actual deployment.

对抗样本的使用场景主要有两类，一类是利用对抗样本的特性作为检验深度学习模型分类精度与深度学习模型安全性的手段，这样有助于避免模型在实际部署后产生安全隐患。另一类，为了应对攻击和提升模型分类准确度，需要提前利用现有图像分类模型生成具备高迁移性的对抗样本。并用这些对抗样本训练各种类型的图像分类模型，使模型能够对这些对抗样本进行正确分类，从而抵御外界攻击。在这两种场景下，都要求研究人员能够去生成迁移性更强的对抗样本。There are two main usage scenarios for adversarial examples. One is to use the characteristics of adversarial examples as a means to test the classification accuracy and security of deep learning models. This helps to avoid security risks after the model is actually deployed. On the other hand, in order to respond to attacks and improve model classification accuracy, it is necessary to use existing image classification models in advance to generate adversarial samples with high transferability. These adversarial samples are used to train various types of image classification models so that the models can correctly classify these adversarial samples and resist external attacks. In both scenarios, researchers are required to generate more transferable adversarial examples.

目前，已有许多方法用于生成具有高迁移性的对抗样本，如：基于特征级方法，通过扰乱原始图像在网络中间层的输出，减少本地代理模型特定特征的影响。进一步提高对抗样本的迁移性。例如特征重要性感知方法（FIA）利用聚合梯度找到图像重要特征进行破坏。Currently, many methods have been used to generate adversarial samples with high transferability, such as feature-level methods that reduce the influence of specific features of the local agent model by disrupting the output of the original image in the middle layer of the network. Further improve the transferability of adversarial samples. For example, the feature importance awareness method (FIA) uses aggregated gradients to find important features of the image for destruction.

现有的特征级生成对抗样本的方法，是通过干扰图像原始目标特征来生成对抗样本。但不同模型之间参数和结构差异，这些被干扰的原图像目标特征也会因模型而异，所以迁移性效果还不够理想。这是由于现有的特征级方法只关注于干扰图像原始目标特征，却忽略了在在干扰原始特征过程中所生成新特征对于迁移性的影响。为了进一步增强对抗样本的可迁移性，本领域技术人员急需要对现有生成对抗样本的方法进行改进。为此本发明提出来通过在干扰原始特征的同时，进一步强化新生成的特征从而提升对抗样本的迁移性。The existing method of generating adversarial samples at the feature level generates adversarial samples by interfering with the original target features of the image. However, due to differences in parameters and structures between different models, these interfered original image target features will also vary from model to model, so the transfer effect is not ideal enough. This is because the existing feature-level methods only focus on the original target features of the interference image, but ignore the impact of the new features generated in the process of interfering with the original features on the transferability. In order to further enhance the transferability of adversarial examples, those skilled in the art urgently need to improve the existing methods of generating adversarial examples. To this end, the present invention proposes to improve the transferability of adversarial samples by further strengthening the newly generated features while interfering with the original features.

发明内容Contents of the invention

目的：为了克服现有技术中存在的不足，本发明提供一种高迁移性对抗样本生成方法、设备及介质，首次提出了一种基于强化新生成特征的高迁移性对抗样本生成方法，首先使用现有的特征级攻击方式进行攻击后生成对抗样本，该对抗样本在原模型上获得新特征作为强化方向。通过在干扰原始特征的同时，进一步强化新生成的特征，来达到提升对抗样本迁移性的目的。Purpose: In order to overcome the shortcomings in the existing technology, the present invention provides a high-mobility adversarial sample generation method, equipment and medium. For the first time, a high-mobility adversarial sample generation method based on strengthening newly generated features is proposed. First, using Existing feature-level attack methods generate adversarial samples after attacking, and the adversarial samples obtain new features on the original model as a strengthening direction. By further strengthening the newly generated features while interfering with the original features, the purpose of improving the transferability of adversarial samples is achieved.

技术方案：为解决上述技术问题，本发明采用的技术方案为：Technical solution: In order to solve the above technical problems, the technical solution adopted by the present invention is:

第一方面，一种高迁移性对抗样本生成方法，包括如下步骤：The first aspect is a highly transferable adversarial sample generation method, including the following steps:

步骤1：将原始图像输入分类模型/>，获取分类模型/>第/>层中间层输出的特征图。Step 1: Convert the original image Enter classification model/> , get the classification model/> No./> The feature map output by the middle layer of the layer .

步骤 2：将原始图像的随机像素点替换成随机噪音得到随机噪音扰动图像/>。Step 2: Convert the original image Replace the random pixels with random noise to obtain a random noise perturbation image/> .

步骤 3：将随机噪音扰动图像输入分类模型/>中，分别得到图像原始特征类别标签/>的输出/>和特征攻击后新生成特征类别标签/>的输出/>。根据/>、/>分别进行梯度反向传播到第/>层中间层得到图像原始特征梯度/>和新生成特征梯度/>。其中，/>表示分类模型输出的类别置信率，/>表示随机扰动图像/>输入分类模型/>后从第/>层卷积层输出的特征图，/>表示求导。Step 3: Perturb the image with random noise Enter classification model/> , respectively obtain the original feature category labels of the image/> The output/> and newly generated feature category labels after feature attack/> The output/> . According to/> ,/> Perform gradient backpropagation to the first/> The middle layer of the layer obtains the original feature gradient of the image/> and newly generated feature gradient/> . Among them,/> Indicates the category confidence rate output by the classification model,/> Represents a random perturbation image/> Enter classification model/> From the second page/> The feature map output by the convolutional layer, /> Represents derivation.

步骤4：重复步骤２到步骤３，直到预设次数N次，将得到的N个图像原始特征梯度聚合得到，将得到的N个新生成特征梯度聚合得到/>。其中，/>表示对/>的结果求2-范数的值。/>表示对/>的结果求2-范数的值。Step 4: Repeat steps 2 to 3 until the preset number of times N, and aggregate the original feature gradients of the N images obtained. , aggregating the obtained N newly generated feature gradients to obtain /> . Among them,/> Expresses yes/> Find the value of the 2-norm of the result. /> Expresses yes/> Find the value of the 2-norm of the result.

步骤5：构建损失函数。其中，/>表示对应点的乘积，/>表示影响因子，/>表示待确定的对抗样本，/>表示将/>输入分类模型/>后从第/>层卷积层输出的特征图。Step 5: Construct the loss function . Among them,/> Represents the product of corresponding points,/> Indicates the impact factor,/> Represents the adversarial sample to be determined,/> Indicates that/> Enter classification model/> From the second page/> The feature map output by the convolutional layer.

步骤6：根据损失函数构建优化损失函数模型，对优化损失函数模型进行求解，得到最终对抗样本。Step 6: According to the loss function Construct an optimized loss function model, solve the optimized loss function model, and obtain the final adversarial sample.

作为优选方案，所述优化损失函数模型，具体包括：As a preferred solution, the optimization loss function model specifically includes:

其中，表示损失函数/>最小值时的/>。in, Represents the loss function/> /> at minimum value .

表示/>是在/>范围内改动原始图像/>的像素点的值得到的对抗样本。/>表示无穷范数，/>表示超参数。 Express/> is in/> Modify the original image within the range/> The adversarial sample is obtained by the value of the pixel point. /> Represents infinite norm,/> Represents hyperparameters.

作为优选方案，所述对优化损失函数模型进行求解，得到最终对抗样本，具体包括：As a preferred solution, the optimization loss function model is solved to obtain the final adversarial sample, which specifically includes:

步骤6.1：获取第j轮的牛顿加速样本，计算公式如下：Step 6.1: Obtain the Newton acceleration sample of the jth round ,Calculated as follows:

其中：j初始化设置为0时，梯度，/>为原始图像。/>表示第j轮的梯度，/>表示第j轮的牛顿加速样本，/>表示第j轮的对抗样本，/>表示牛顿加速控制因子。Among them: when j is initialized to 0, the gradient ,/> for the original image. /> Represents the gradient of the jth round, /> Represents the Newton acceleration sample of the jth round,/> Represents the adversarial sample of the jth round,/> Represents the Newton acceleration control factor.

步骤6.2：将输入分类模型/>后从第/>层卷积层输出的特征图/>。Step 6.2: Place Enter classification model/> From the second page/> Feature map output by layer convolutional layer/> .

步骤6.3：将代入损失函数/>，得到/>。Step 6.3: Place Substitute the loss function/> , get/> .

步骤6.4：将从中间层反向传播到输入层得到梯度/>。Step 6.4: Place Back propagate from the intermediate layer to the input layer to get the gradient/> .

步骤6.5：根据梯度计算/>，计算公式如下：Step 6.5: According to the gradient Calculate/> ,Calculated as follows:

式中： In the formula:

表示1-范数运算，/>表示梯度累积控制因子。 Represents 1-norm operation,/> Represents the gradient accumulation control factor.

步骤6.6：根据、/>计算j+1轮的对抗样本/>，计算公式如下：Step 6.6: According to ,/> Calculate the adversarial samples of j+1 rounds/> ,Calculated as follows:

表示迭代攻击的步长。/>表示对元素值进行裁剪。其中，/>计算公式如下： Indicates the step size of the iterative attack. /> Indicates that the element value is clipped. Among them,/> Calculated as follows:

步骤6.7：重复上述迭代步骤6.1-步骤6.6，判断迭代次数是否达到预设次数。如果是，则生成最终对抗样本。如果否，则返回步骤6.1。Step 6.7: Repeat the above iteration steps 6.1 to 6.6 to determine whether the number of iterations reaches the preset number. If so, generate the final adversarial example. If not, return to step 6.1.

作为优选方案，所述步骤4中每次重复步骤2中所添加的噪音、和原始图像中所选取的随机像素点都是不同的。As a preferred solution, the noise added in step 2 and the random pixels selected in the original image are different each time step 4 is repeated.

第二方面，一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时，实现如第一方面中任一所述的一种高迁移性对抗样本生成方法。A second aspect is a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, a high-mobility adversarial sample generation method as described in any one of the first aspects is implemented.

第三方面，一种计算机设备，包括：In a third aspect, a computer device includes:

存储器，用于存储指令。Memory, used to store instructions.

处理器，用于执行所述指令，使得所述计算机设备执行如第一方面中任一所述的一种高迁移性对抗样本生成方法的操作。A processor, configured to execute the instructions, causing the computer device to perform operations of a high-mobility adversarial sample generation method as described in any one of the first aspects.

有益效果：本发明提供的一种高迁移性对抗样本生成方法、设备及介质，相比于其他只干扰原始特征的特征级方法，本发明通过聚合原始特征梯度和新生成特征梯度来构建的损失函数。在干扰图像的原始特征的同时去强化新生成的特征。在迁移攻击其他模型时更加容易被攻击成为新生成的特征类别，这样可以生成更高迁移性的对抗样本。Beneficial effects: The present invention provides a high-mobility adversarial sample generation method, equipment and medium. Compared with other feature-level methods that only interfere with original features, the present invention has a loss constructed by aggregating original feature gradients and newly generated feature gradients. function. It enhances the newly generated features while disturbing the original features of the image. When migrating and attacking other models, it is easier to be attacked into newly generated feature categories, which can generate adversarial samples with higher transferability.

附图说明Description of drawings

图1为本发明的方法的流程示意图。Figure 1 is a schematic flow chart of the method of the present invention.

具体实施方式Detailed ways

下面结合本发明实例中的附图，对本发明实例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本发明一部分实施例，而不是全部的实施例。基于本发明的实施例，本领域技术人员在没有做创造性劳动前提下所获得的所有其他实施例，都属于本发明的保护范围。The technical solutions in the examples of the present invention are clearly and completely described below with reference to the accompanying drawings in the examples of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without any creative work fall within the protection scope of the present invention.

下面结合具体实施例对本发明作更进一步的说明。The present invention will be further described below in conjunction with specific embodiments.

实施例1：Example 1:

本实施例介绍一种高迁移性对抗样本生成方法，包括如下步骤：This embodiment introduces a method for generating high-transfer adversarial samples, which includes the following steps:

一个实施例，分类模型是指为当前主流的图像分类模型，如VGG模型，inception模型。An example, classification model It refers to the current mainstream image classification model, such as VGG model and inception model.

中间层是指分类模型全连接层之前的卷积层。The middle layer refers to the classification model The convolutional layer before the fully connected layer.

第层是指第/>个卷积层。No. Layer refers to the layer/> convolutional layer.

特征图是指原始图像/>从分类模型/>输入端口输入后经过/>个卷积层后输出的内容。feature map refers to the original image/> From classification model/> The input port passes through/> The output content after a convolutional layer.

一个实施例，通过对原始图像随机选择一定比例的像素点然后在这些像素点上替换成随机的噪音扰动。One embodiment is to randomly select a certain proportion of pixels in the original image and then replace these pixels with random noise perturbations.

步骤３：将随机噪音扰动图像输入分类模型/>中，分别得到图像原始特征类别标签的输出/>和特征攻击后新生成特征类别标签/>的输出/>，并分别进行梯度反向传播到第/>层中间层得到图像原始特征梯度/>和新生成特征梯度/>。Step 3: Disturbing the image with random noise Enter classification model/> , respectively obtain the original feature category labels of the image The output/> and newly generated feature category labels after feature attack/> The output/> , and perform gradient backpropagation to the /> The middle layer of the layer obtains the original feature gradient of the image/> and newly generated feature gradient/> .

一个实施例，本发明采用的ImageNet数据集训练的分类模型。ImageNet数据集共计有1000个类别。In one embodiment, the present invention uses a classification model trained on the ImageNet data set . The ImageNet dataset has a total of 1000 categories.

其中，是分类模型输出的所有类别置信率。in, is the confidence rate of all categories output by the classification model.

标签是人工识别原始图像后标记的类别，作为图像的真实类别。/>是随机噪音扰动图像/>输入分类模型/>后/>类别标签输出的置信率。 Labels are categories marked after manual recognition of the original image as the true category of the image. /> It is random noise perturbing the image/> Enter classification model/> After/> The confidence rate of the class label output.

标签则是原始图像经过现有特征级方法生成的对抗样本，该对抗样本经过在原分类模型上再次分类后得到的错误结果类别/>。/>则是随机噪音扰动图像/>输入分类模型后/>类别标签输出的置信率。 The label is an adversarial sample generated by the original image through existing feature-level methods. The adversarial sample is reclassified on the original classification model to obtain the wrong result category/> . /> It is random noise perturbing the image/> Input classification model After/> The confidence rate of the class label output.

分别对两个类别输出结果进行反向传播，不过只传播到达步骤一中的所提到的第层中间层。该过程数学表达式分别为/>和/>。Backpropagate the output results of the two categories separately, but only propagate to the first mentioned in step one. Layer middle layer. The mathematical expressions of this process are/> and/> .

其中：表示随机扰动图像/>输入分类模型/>后从第/>层卷积层输出的特征图。in: Represents a random perturbation image/> Enter classification model/> From the second page/> The feature map output by the convolutional layer.

表示对/>标签输出结果，使用/>进行求导。 Expresses yes/> Label output results, use/> Perform derivation.

表示对/>标签输出结果，使用/>进行求导。 Expresses yes/> Label output results, use /> Perform derivation.

步骤4：重复步骤２到步骤３，直到预设次数N次，将得到的N个图像原始特征梯度聚合得到，将得到的N个新生成特征梯度聚合得到/>。Step 4: Repeat steps 2 to 3 until the preset number of times N, and aggregate the original feature gradients of the N images obtained. , aggregating the obtained N newly generated feature gradients to obtain /> .

重复步骤2和步骤3的操作N次时：When repeating steps 2 and 3 N times:

每次重复步骤2中所添加的噪音、和原始图像中所选取的随机像素点都是不同的。Each time you repeat step 2, the noise added and the random pixels selected in the original image are different.

因此，步骤3中每次得到的和/>也是不同的。Therefore, each time we get in step 3 and/> It's also different.

表示重复过程中的第/>次操作。 Indicates the number/> in a repeated process operations.

然后将得到两类的类别梯度分别累加处理得到和/>。Then the category gradients of the two categories are accumulated and processed separately to obtain and/> .

其中，表示使用第/>个卷积块得到的结果。in, Indicates the use of the /> The result obtained by the convolution block.

表示对/>的结果求2-范数的值。 Expresses yes/> Find the value of the 2-norm of the result.

步骤５：通过得到的特征图与两个聚合梯度差值的乘积来构建损失函数。Step 5: Construct a loss function by multiplying the obtained feature map and the difference between the two aggregated gradients .

一个实施例，本步骤表示利用上述步骤中处理得到和/>数据来构建一个损失函数。为接下来通过优化损失函数/>生成对抗样本做准备。In one embodiment, this step represents the processing obtained in the above steps. and/> data to construct a loss function. For the next step, optimize the loss function/> Prepare to generate adversarial examples.

其中：表示对应点的乘积。/>表示影响因子，用来调节新特征。in: Represents the product of corresponding points. /> Represents the influencing factor, used to adjust new features.

将带入损失函数/>中得到结果/>。Will Bring in the loss function/> Get results/> .

其中，表示改动原始图像/>的像素点的值得到的对抗样本作为待确定的变量。in, Indicates changes to the original image/> The adversarial sample obtained from the value of the pixel is used as the variable to be determined.

表示将/>输入分类模型输入分类模型/>后从第/>层卷积层输出的特征图。 Indicates that/> Input classification modelInput classification model/> From the second page/> The feature map output by the convolutional layer.

步骤6：利用优化损失函数模型迭代生成对抗样本：通过上一步构建的损失函数，可以将对抗样本生成问题转化为优化问题，再利用牛顿迭代法进行求解，定义公式为：Step 6: Use the optimized loss function model to iteratively generate adversarial samples: through the loss function constructed in the previous step , the adversarial sample generation problem can be converted into an optimization problem, and then solved using the Newton iteration method. The definition formula is:

表示/>是在/>范围内改动原始图像/>的像素点的值得到的对抗样本，在该范围内使/>带入损失函数/>为最小值。 Express/> is in/> Modify the original image within the range/> The adversarial sample obtained by the value of the pixel is within this range/> Bring in the loss function/> is the minimum value.

表示无穷范数，/>表示用于控制扰动大小的超参数。 Represents infinite norm,/> Represents the hyperparameter used to control the size of the perturbation.

一个实施例，利用牛顿动量累积法（NI）对优化损失函数模型进行优化求解，得到最终的，具体过程如下：In one embodiment, the Newton momentum accumulation method (NI) is used to optimize and solve the optimization loss function model to obtain the final , the specific process is as follows:

步骤6.1：用牛顿动量加速法算出易于跳出局部最优解，获取第j轮的牛顿加速样本，计算公式如下：Step 6.1: Use the Newtonian momentum acceleration method to calculate the local optimal solution that is easy to jump out of, and obtain the Newtonian acceleration sample of the jth round. ,Calculated as follows:

其中：j表示第j 轮迭代。初始化设置j为0时，梯度，/>为原始图像；/>表示第j轮的梯度；/>表示第j轮的牛顿加速样本，/>表示第j轮的对抗样本，/>表示牛顿加速控制因子。Among them: j represents the jth iteration. When j is initially set to 0, the gradient ,/> is the original image;/> Represents the gradient of the jth round;/> Represents the Newton acceleration sample of the jth round,/> Represents the adversarial sample of the jth round,/> Represents the Newton acceleration control factor.

式中： In the formula:

步骤6.6：将扰动添加到上一轮生成的图像上，再将其裁剪然后得新生成j+1轮的对抗样本/>，公式如下：Step 6.6: Add perturbation to the image generated in the previous round , then crop it and generate j+1 rounds of adversarial samples/> , the formula is as follows:

表示迭代攻击的步长；/>表示对元素值进行裁剪，使其范围是能够在/>之间。 Indicates the step size of the iterative attack;/> Indicates that the element value is clipped so that its range can be within/> between.

其中，计算公式如下：in, Calculated as follows:

步骤6.7：重复上述迭代步骤6.1-步骤6.6，判断迭代次数是否达到预设次数；如果是，则生成最终对抗样本；如果否，则返回步骤6.1。Step 6.7: Repeat the above iteration steps 6.1 to 6.6 to determine whether the number of iterations reaches the preset number; if so, generate the final adversarial sample; if not, return to step 6.1.

实施例2：Example 2:

本实施例介绍一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时，实现如实施例1中任一所述的一种高迁移性对抗样本生成方法。This embodiment introduces a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, a high-mobility adversarial sample generation method as described in any one of Embodiment 1 is implemented.

实施例3：Example 3:

本实施例介绍一种计算机设备，包括：This embodiment introduces a computer device, including:

存储器，用于存储指令。Memory, used to store instructions.

处理器，用于执行所述指令，使得所述计算机设备执行如实施例1中任一所述的一种高迁移性对抗样本生成方法的操作。A processor, configured to execute the instructions, causing the computer device to perform operations of a high-mobility adversarial sample generation method as described in any one of Embodiment 1.

实施例4：Example 4:

下面将结合附图1，对一种高迁移性对抗样本生成方法进行详细说明：A method for generating high-mobility adversarial samples will be described in detail below with reference to Figure 1:

本实施例中使用表示图像分类模型，当分类模型输入一张干净的原始图像/>时，可以得到图像属于r类别的概率输出/>。used in this embodiment Represents the image classification model. When the classification model inputs a clean original image/> When, you can get the probability output that the image belongs to category r/> .

本发明的目的是通过对原始图像添加难以察觉的扰动来生成对抗样本/>，使图像分类模型产生误分类结果。对抗样本过程可以公式定义如下：The purpose of this invention is to process the original image Add imperceptible perturbations to generate adversarial examples/> , causing the image classification model to produce misclassification results. The adversarial example process can be defined as follows:

其中，表示原始图像，/>是对抗样本，/>表示图像分类模型，/>是图像分类模型/>的参数，/>表示/>与/>之间的/>范数距离，/>是用于控制扰动大小的超参数。本发明通过本地代理原模型生成的对抗样本，也可以成功误导其他目标模型的决策，从而实现本发明生成对抗样本的迁移性。in, Represents the original image, /> is an adversarial example,/> Represents an image classification model, /> Is an image classification model/> parameters,/> Express/> with/> between/> Norm distance,/> is a hyperparameter used to control the size of the perturbation. The present invention can also successfully mislead the decision-making of other target models through the adversarial samples generated by the local proxy original model, thereby realizing the migration of the adversarial samples generated by the present invention.

在实施例中，本发明实施例提供一种高迁移性的对抗样本生成方法，包括：In an embodiment, the embodiment of the present invention provides a highly portable adversarial sample generation method, including:

步骤2：对原始图像进行随机丢弃像素点，并在丢弃像素点上添加随机0-1正态分布的随机噪音得到/>，其公式表达为：Step 2: To the original image Randomly discard pixels and add random 0-1 normal distributed random noise to the discarded pixels to obtain/> , its formula is expressed as:

其中：是一个与原始图像/>相同大小的矩阵，其中包含0和1两类数值，1所在位置的像素将被保留，而0位置的像素将被丢弃。/>表示包含0的概率。/>表示对/>元素取反操作。/>表示大小图像/>相同的随机噪声矩阵。/>表示对应点位的乘积，/>表示丢弃像素点的概率为/>。in: is the same as the original image/> A matrix of the same size contains two types of values: 0 and 1. The pixel at the 1 position will be retained, while the pixel at the 0 position will be discarded. /> Represents the probability of containing 0. /> Expresses yes/> Element negation operation. /> Display size image/> The same random noise matrix. /> Represents the product of corresponding points,/> Indicates that the probability of discarding pixels is/> .

步骤3：将上一步随机噪音扰动图像输入分类模型/>中，分别得到图像原始特征类别标签/>的输出/>和特征攻击后新生成特征类别标签/>的输出/>，并分别进行梯度反向传播到第/>层中间层得到图像原始特征梯度/>和新生成特征梯度/>。Step 3: Perturb the image with random noise from the previous step Enter classification model/> , respectively obtain the original feature category labels of the image/> The output/> and newly generated feature category labels after feature attack/> The output/> , and perform gradient backpropagation to the /> The middle layer of the layer obtains the original feature gradient of the image/> and newly generated feature gradient/> .

标签则是原始图像经过现有特征级方法生成的对抗样本，该对抗样本经过在原分类模型上再次分类后得到的错误结果类别/>。/>则是随机噪音扰动图像/>输入分类模型后/>类别标签输出的置信率。其公式过程表达为： The label is an adversarial sample generated by the original image through existing feature-level methods. The adversarial sample is reclassified on the original classification model to obtain the wrong result category/> . /> It is random noise perturbing the image/> Input classification model After/> The confidence rate of the class label output. Its formula process is expressed as:

其中：表示现有特征级攻击方法。将原始图像/>，使用现有特征级攻击方法得到对抗样本/>。将该对抗样本送入原图像分类模型/>后得到类别标签/>。in: Represents existing signature-level attack methods. Replace the original image/> , use existing feature-level attack methods to obtain adversarial samples/> . Send the adversarial sample to the original image classification model/> Then get the category label/> .

步骤4：不同分类模型间的参数和网络结构的不同导致分类模型间特性相异。在获取图像类别特征时携带了该分类模型的特性，致使对抗样本的可迁移性变差。为此，对图像进行多次变换，保留其语意特征，再利用变换后图像得到的特征梯度进行聚合。聚合后的梯度削弱了原分类模型的携带的特征，以此提升对抗样本的迁移性。通过重复第二步和第三步操作，直到设定次数N，分别获取到N个类别特征梯度/>，和N个/>类别特征梯度/>。分别进行聚合操作，以下公式计算聚合梯度：Step 4: The differences in parameters and network structures between different classification models lead to different characteristics between classification models. When obtaining image category features, the characteristics of the classification model are carried, resulting in poor transferability of adversarial samples. To this end, the image is transformed multiple times to retain its semantic features, and then the feature gradient obtained from the transformed image is used for aggregation. The aggregated gradient weakens the features carried by the original classification model, thereby improving the transferability of adversarial samples. By repeating the second and third steps until the set number of times N, N numbers are obtained respectively. Category feature gradient/> , and N/> Category feature gradient/> . The aggregation operations are performed separately, and the following formula calculates the aggregation gradient:

步骤5：本发明实施根据以下损失函数来指导生成对抗样本：Step 5: The present invention is implemented according to the following loss function to guide the generation of adversarial samples :

一个实施例，利用牛顿动量累积法（NI）对优化损失函数模型进行优化求解，得到最终的。In one embodiment, the Newton momentum accumulation method (NI) is used to optimize and solve the optimization loss function model to obtain the final .

实施例5：Example 5:

为了评估本发明方法生成的对抗样本具有高迁移性的有效性，本实施例的方法生成的对抗样本对比了FIA（Feature Importance-aware Attack）、RPA（Random PatchAttack）和NAA(Neuron Attribution-based Attack)现有特征级方法。本发明方法称为ours。In order to evaluate the effectiveness of the adversarial samples generated by the method of the present invention with high transferability, the adversarial samples generated by the method of this embodiment were compared with FIA (Feature Importance-aware Attack), RPA (Random PatchAttack) and NAA (Neuron Attribution-based Attack). ) existing feature-level methods. The method of the present invention is called ours.

本文将5个分类模型作为目标模型来评估攻击的性能，其中：4个经过正常训练的分类模型，分别为：This article uses 5 classification models as target models to evaluate the performance of the attack. Among them: 4 classification models that have been trained normally are:

Vgg-16（Visual Geometry Group-16）、Res-152（使用ResNet（深度残差网络）Unit训练的152层的神经网络）、Inc-v3（Inception-v3谷歌旗下的卷积神经网络模型）、Inc-v4（Inception-v4谷歌旗下的卷积神经网络模型）。Vgg-16 (Visual Geometry Group-16), Res-152 (152-layer neural network trained using ResNet (Deep Residual Network) Unit), Inc-v3 (Inception-v3 Google's convolutional neural network model), Inc-v4 (Inception-v4 Google's convolutional neural network model).

1个防御分类模型，Inc-v3-adv（Inception-v3-adv谷歌旗下经过对抗训练的卷积神经网络模型）。1 defense classification model, Inc-v3-adv (Inception-v3-adv Google's adversarially trained convolutional neural network model).

本文选择四个本地原模生成对抗样本，分别为Inc-v3，Inc-v4，Res-152，Vgg-16。This article selects four local original models to generate adversarial samples, namely Inc-v3, Inc-v4, Res-152, and Vgg-16.

对于对抗样本生成方法参数设置如下：The parameter settings for the adversarial sample generation method are as follows:

对于分类模型中间层设定为第3层。For the classification model, the middle layer is set to layer 3.

对于FIA参数设定如下，将聚合数N设置为30，关于丢弃概率p设定，当攻击正常训练的分类模型时，丢弃概率设定p=0.3，当攻击防御分类模型时，p=0.1。The FIA parameters are set as follows, the aggregation number N is set to 30, and the drop probability p is set. When attacking the normally trained classification model, the drop probability is set to p=0.3, and when attacking the defense classification model, p=0.1.

对于RPA参数设定如下，将聚合数N设置为60，像素修改概率pm分别设置为0.3。The RPA parameters are set as follows, the aggregation number N is set to 60, and the pixel modification probability pm is set to 0.3 respectively.

对于NAA参数设定如下，将聚合数N设置为30，设置正向特征影响因子。The NAA parameter settings are as follows, set the aggregation number N to 30, and set the positive feature influence factor.

对于ours参数设定如下，由RPA攻击后的结果作为新生成特征类别标签t。随机扰动像素概率p设置为0.3，表示牛顿累积控制因子设定为1.0。The ours parameter is set as follows, and the result after the RPA attack is used as the newly generated feature category label t. The random perturbation pixel probability p is set to 0.3, which means that the Newton cumulative control factor is set to 1.0.

所有对抗样本生成方法都设置最大扰动设置为16，迭代次数为T=10，步长为。为所有方法设置衰减因子为。All adversarial example generation methods set the maximum perturbation setting to 16, the number of iterations to T=10, and the step size to . Set the attenuation factor to for all methods.

如表1所示，在表格中，第一列表示用来生成对抗样本的原模型，表格数据表示使用原模型生成的对抗样本迁移攻击其他模型对应的攻击成功率。*表示原模型生成对抗样本在自身的攻击成功率，其他数据为对应模型的黑盒攻击成功率。迁移攻击成功率表示该方法在对应的生成模型下，使被攻击模型对图像进行错误分类的比例，比例越高代表攻击性能越好。每一项中最好的迁移结果都用粗体强调。As shown in Table 1, in the table, the first column represents the original model used to generate adversarial samples, and the table data represents the attack success rate corresponding to using the adversarial samples generated by the original model to migrate and attack other models. *Indicates the attack success rate of the adversarial sample generated by the original model on itself, and other data is the black box attack success rate of the corresponding model. The success rate of migration attack indicates the proportion of images that the attacked model misclassifies under the corresponding generative model. The higher the proportion, the better the attack performance. The best migration results in each term are highlighted in bold.

结果显示，每一项的最高成功率都是本发明提出的攻击方法。除此以外，对比其他基准方法的最优结果，整体攻击成功率提升了2.0%以上。The results show that the highest success rate for each item is the attack method proposed by this invention. In addition, compared with the optimal results of other benchmark methods, the overall attack success rate has increased by more than 2.0%.

实验结果表明，对于基准方法和本发明提出的攻击方法，强化新生成特征策略能够最大限度地提高生成的对抗样本迁移性。Experimental results show that for the baseline method and the attack method proposed in this invention, strengthening the newly generated feature strategy can maximize the transferability of the generated adversarial samples.

表1迁移攻击成功率对比Table 1 Comparison of migration attack success rates

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质（包括但不限于磁盘存储器、CD-ROM、光学存储器等）上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Thus, the invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备（系统）、和计算机程序产品的流程图和／或方框图来描述的。应理解可由计算机程序指令实现流程图和／或方框图中的每一流程和／或方框、以及流程图和／或方框图中的流程和／或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in a process or processes in a flowchart and/or a block or blocks in a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes in the flowchart and/or in a block or blocks in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

以上所述仅是本发明的优选实施方式，应当指出：对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that those of ordinary skill in the art can make several improvements and modifications without departing from the principles of the present invention. These improvements and modifications can also be made. should be regarded as the protection scope of the present invention.

Claims

1. A highly transferable adversarial sample generation method, characterized by: including the following steps:

Step 1: Convert the original image Enter classification model/> , get the classification model/> No./> Feature map output by the middle layer/> ;

Step 2: Convert the original image Replace the random pixels with random noise to obtain a random noise perturbation image/> ;

Step 3: Perturb the image with random noise Enter classification model/> , respectively obtain the original feature category labels of the image/> The output/> and newly generated feature category labels after feature attack/> The output/> ;according to/> ,/> Perform gradient backpropagation to the first/> The middle layer of the layer obtains the original feature gradient of the image/> and newly generated feature gradient/> ;wherein,/> Indicates the category confidence rate output by the classification model,/> Represents a random perturbation image/> Enter classification model/> From the second page/> The feature map output by the convolutional layer, /> Expresses derivation;

Step 4: Repeat steps 2 to 3 until the preset number of times N, and aggregate the original feature gradients of the N images obtained. , aggregating the obtained N newly generated feature gradients to obtain /> ;wherein,/> Expresses yes/> Find the value of 2-norm for the result;/> Expresses yes/> Find the value of 2-norm for the result;

Step 5: Construct the loss function ;wherein,/> Represents the product of corresponding points,/> represents the impact factor, Represents the adversarial sample to be determined,/> Indicates that/> Enter classification model/> From the second page/> The feature map output by the convolutional layer;

Step 6: According to the loss function Construct an optimized loss function model, solve the optimized loss function model, and obtain the final adversarial sample.

2. A method for generating highly transferable adversarial examples according to claim 1, characterized in that: the optimized loss function model specifically includes:

;

in, Represents the loss function/> /> at minimum value ;

Express/> is in/> Modify the original image within the range/> The adversarial sample obtained from the value of the pixel;/> Represents infinite norm,/> Represents hyperparameters.

3. A method for generating high-mobility adversarial samples according to claim 2, characterized in that: the optimization loss function model is solved to obtain the final adversarial sample, which specifically includes:

Step 6.1: Obtain the Newton acceleration sample of the jth round ,Calculated as follows:

;

Among them: when j is initialized to 0, the gradient ,/> is the original image;/> Represents the gradient of the jth round;/> Represents the Newton acceleration sample of the jth round,/> Represents the adversarial sample of the jth round,/> Represents the Newton acceleration control factor;

Step 6.2: Place Enter classification model/> From the second page/> Feature map output by layer convolutional layer/> ;

Step 6.3: Place Substitute the loss function/> , get/> ;

Step 6.4: Place Back propagate from the intermediate layer to the input layer to get the gradient/> ;

Step 6.5: According to the gradient Calculate/> ,Calculated as follows:

;

In the formula: ;

Represents 1-norm operation,/> Represents the gradient accumulation control factor;

Step 6.6: According to ,/> Calculate the adversarial samples of j+1 rounds/> ,Calculated as follows:

;

Indicates the step size of the iterative attack;/> Represents clipping of element values; among them, /> Calculated as follows:

;

Step 6.7: Repeat the above iteration steps 6.1 to 6.6 to determine whether the number of iterations reaches the preset number; if so, generate the final adversarial sample; if not, return to step 6.1.

4. A method for generating high-mobility adversarial samples according to claim 1, characterized in that: each time the step 4 is repeated, the noise added in step 2 and the random pixels selected in the original image are both is different.

5. A computer-readable storage medium, characterized in that: a computer program is stored thereon, and when the computer program is executed by a processor, a high-mobility adversarial example as described in any one of claims 1-4 is realized. Generate method.

6. A computer device, characterized by: including:

Memory, used to store instructions;

A processor configured to execute the instructions, causing the computer device to perform operations of a high-mobility adversarial sample generation method as described in any one of claims 1-4.