CN112464230B

CN112464230B - Black-box attack defense system and method based on neural network middle layer regularization

Info

Publication number: CN112464230B
Application number: CN202011281842.0A
Authority: CN
Inventors: 李晓锐; 崔炜煜; 王文一; 陈建文
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2022-05-17
Anticipated expiration: 2040-11-16
Also published as: CN112464230A

Abstract

The invention relates to the field of artificial intelligence safety, in particular to a black box attack type defense system based on regularization of a neural network intermediate layer, which comprises a first source model, a second source model and a third source model; a black box attack type defense method based on regularization of a neural network middle layer comprises the steps of S1, inputting a picture into a first source model for white box attack, outputting a first pair of anti-sample sequence, S2, inputting the first pair of anti-sample sequence into a second source model, outputting a second pair of anti-sample sequence, S3, inputting the second pair of anti-sample sequence into a third source model for black box attack, outputting a third recognition sample sequence, S4, inputting the third recognition sample sequence into the third source model for countermeasure training, and updating the third source model; the countermeasure sample generated by the algorithm has the characteristic of high mobility to the target model, and can effectively defend the target model from being attacked through countermeasure training.

Description

Black-box attack defense system and method based on neural network middle layer regularization

技术领域technical field

本发明涉及人工智能安全领域，具体是指基于神经网络中间层正则化的黑盒攻击型防御系统及方法。The invention relates to the field of artificial intelligence security, in particular to a black-box attack-type defense system and method based on the regularization of the middle layer of a neural network.

背景技术Background technique

当对图像信号添加微小扰动，被添加扰动的图像信号输入用于分类任务的卷积神经网络时，会被该网络识别错误，该技术应用广泛，在车辆检测系统中，通过对车牌号图像进行微小扰动的方式欺骗车辆检测系统，有助于提升车辆检测系统鲁棒性和稳健性；在人脸识别检测系统中，通过对人脸图像进行微小扰动的方式欺骗人脸识别检测系统，有助于检验人脸识别网络的鲁棒性和安全性；在无人驾驶系统中，通过对路标图像进行微小扰动的方式欺骗自动驾驶系统，有助于检验机器视觉中物体分类和目标检测网络的稳健性和安全性，随着5G时代的到来，图像视频数据将成为主流网络数据，神经网络攻击生成图像对抗样本技术，在网络对抗领域扮演关键角色，对防御算法性能的提升有着重要作用。When a slight perturbation is added to the image signal, when the perturbed image signal is input to the convolutional neural network used for classification tasks, it will be recognized incorrectly by the network. This technology is widely used. In the vehicle detection system, the license plate number image is processed by The way of deceiving the vehicle detection system by the way of slight disturbance is helpful to improve the robustness and robustness of the vehicle detection system; It is used to test the robustness and security of the face recognition network; in the unmanned system, by deceiving the automatic driving system by slightly perturbing the road sign image, it is helpful to test the robustness of the object classification and target detection network in machine vision With the advent of the 5G era, image and video data will become the mainstream network data. The neural network attack generates image confrontation sample technology, which plays a key role in the field of network confrontation and plays an important role in improving the performance of defense algorithms.

现在比较常见的攻击方式为黑盒攻击和白盒攻击，其中黑盒攻击分为基于迁移性的训练替代模型攻击方式，以及基于决策的多次查询估计梯度攻击方式，二者在生成接近黑盒模型的替代模型后和估计接近黑盒模型梯度后，利用主流的白盒攻击的方法来进行攻击，前者在训练替代模型时多数需要得知被攻击模型的训练数据集，以及输入输出等除模型内部参数以外的众多信息，而这些信息特别是训练数据集在实际应用中是很难得知的，或者是被限制获取数量的，所以通过以上方式生成替代模型的方法在很多情况下是有所限制的，后者通过对对抗模型多次进行查询输入输出并且估计梯度，当查询次数足够多时估计得到的梯度将接近对抗模型的真实梯度以获得决策边界，但是该方法的问题是当多次查询带来的计算复杂度，同时在限制查询次数的黑盒模型中无法得到进展，从而严重影响了黑盒攻击的效率。Now the more common attack methods are black-box attack and white-box attack. Black-box attack is divided into migration-based training substitution model attack method and decision-based multiple query estimation gradient attack method. After the replacement model of the model and the estimated gradient of the black-box model, the mainstream white-box attack method is used to attack. When training the replacement model, most of the former need to know the training data set of the attacked model, as well as input and output other than the model. There is a lot of information other than internal parameters, and this information, especially the training data set, is difficult to know in practical applications, or the number of acquisitions is limited, so the method of generating alternative models through the above methods is in many cases. In the latter, the input and output of the adversarial model are queried multiple times and the gradient is estimated. When the number of queries is sufficient, the estimated gradient will be close to the true gradient of the adversarial model to obtain the decision boundary, but the problem of this method is that when multiple queries bring In addition, it cannot make progress in the black-box model that limits the number of queries, which seriously affects the efficiency of black-box attacks.

发明内容SUMMARY OF THE INVENTION

基于以上问题，本发明提供了基于神经网络中间层正则化的黑盒攻击型防御系统及方法，该攻击算法不需生成替代模型，也无需获取查询黑盒模型的数据集及对应标签，便可对黑盒模型进行攻击，在图像分类任务中，利用该算法生成的对抗样本具有对目标模型高迁移性的特性，也可以有效的通过对抗训练防御目标模型被攻击。Based on the above problems, the present invention provides a black-box attack-type defense system and method based on the regularization of the middle layer of a neural network. The attack algorithm does not need to generate a substitute model, and does not need to obtain a data set and corresponding labels for querying the black-box model. To attack the black-box model, in the image classification task, the adversarial samples generated by this algorithm have the characteristics of high transferability to the target model, and can also effectively defend the target model from being attacked through adversarial training.

为解决以上技术问题，本发明采用的技术方案如下：For solving the above technical problems, the technical scheme adopted in the present invention is as follows:

基于神经网络中间层正则化的黑盒攻击型防御系统，包括A black-box attack defense system based on the regularization of the neural network intermediate layer, including

第一源模型，用于输出第一对抗样本序列；a first source model for outputting a first adversarial sample sequence;

第二源模型，用于输出第二对抗样本序列；a second source model for outputting a second adversarial sample sequence;

第三源模型，用于输出第三识别样本序列，并将第三识别样本序列输入第三源模型进行对抗训练，更新第三源模型。The third source model is used for outputting the third identification sample sequence, and inputting the third identification sample sequence into the third source model for adversarial training, and updating the third source model.

进一步，所述第一源模型和第二源模型采用以残差模块为基础的ResNet网络，第三源模型采用DenseNet网络，所述第二源模型划分不同的神经网络结构层，所述第二源模型的每一层均加入正则化损失函数。Further, the first source model and the second source model use the ResNet network based on the residual module, the third source model uses the DenseNet network, the second source model is divided into different neural network structure layers, and the second source model is divided into different neural network structure layers. A regularization loss function is added to each layer of the source model.

基于神经网络中间层正则化的黑盒攻击型防御方法，采用基于神经网络中间层正则化的黑盒攻击型防御系统，包括The black-box attack defense method based on the regularization of the neural network intermediate layer adopts a black-box attack defense system based on the regularization of the neural network intermediate layer, including

S1、将图片输入第一源模型进行白盒攻击，输出第一对抗样本序列；S1. Input the picture into the first source model for white-box attack, and output the first confrontation sample sequence;

S2、将第一对抗样本序列输入到第二源模型中，在第二源模型的每一层均利用正则化损失函数对第一对抗样本序列进行攻击，输出第二对抗样本序列；S2. Input the first adversarial sample sequence into the second source model, use the regularization loss function to attack the first adversarial sample sequence at each layer of the second source model, and output the second adversarial sample sequence;

S3、将第二对抗样本序列输入到第三源模型中进行黑盒攻击，输出第三识别样本序列；S3. Input the second adversarial sample sequence into the third source model to perform a black-box attack, and output the third identification sample sequence;

S4、将第三识别样本序列输入第三源模型进行对抗训练，更新第三源模型。S4. Input the third identification sample sequence into the third source model for adversarial training, and update the third source model.

进一步，所述步骤S2中，正则化损失函数对第一对抗样本序列进行攻击包括如下两方面：Further, in the step S2, the regularization loss function to attack the first confrontation sample sequence includes the following two aspects:

一方面为找出生成的第二对抗样本序列中的最优扰动方向；On the one hand, it is to find the optimal disturbance direction in the generated second adversarial sample sequence;

另一方面为过滤对抗扰动的高频成分，在第二源模型的每一层都产生与第一对抗样本序列对应的输出，生成一组对抗样本，在该生成的对抗样本中选取最优层的对抗样本作为第二对抗样本序列。On the other hand, in order to filter the high-frequency components of the adversarial disturbance, each layer of the second source model generates an output corresponding to the first adversarial sample sequence, generates a set of adversarial samples, and selects the optimal layer from the generated adversarial samples. The adversarial example is used as the second adversarial example sequence.

进一步，找出生成的第二对抗样本序列的最优扰动方向的公式为Further, the formula for finding the optimal disturbance direction of the generated second adversarial sample sequence is:

L1＝[f_t(x')-f_t(x)]*[f_t(x”)-f_t(x)]L1=[f _t (x')-f _t (x)]*[f _t (x")-f _t (x)]

其中，L1的结果为第二对抗样本序列的扰动方向，f_t(x)为第一对抗样本序列经过第二源模型第t层的输出结果，[f_t(x')-f_t(x)]为第一对抗样本序列的扰动，[f_t(x”)-f_t(x)]表征的扰动方向以基础扰动方向做指引；Among them, the result of L1 is the perturbation direction of the second adversarial sample sequence, f _t (x) is the output result of the first adversarial sample sequence passing through the t-th layer of the second source model, [f _t (x')-f _t (x )] is the perturbation of the first confrontation sample sequence, and the perturbation direction represented by [f _t (x”)-f _t (x)] is guided by the basic perturbation direction;

过滤对抗扰动的高频成分的公式为The formula for filtering high frequency components against perturbation is

L2＝F[f_t(x”)-f_t(x)]L2=F[f _t (x”)-f _t (x)]

其中，L2的结果为过滤对抗扰动的高频成分，F()为正则化函数；Among them, the result of L2 is to filter the high-frequency components of the anti-disturbance, and F() is the regularization function;

正则化损失函数L的公式为The formula for the regularization loss function L is

L＝-L1-L2。L=-L1-L2.

与现有技术相比，本发明的有益效果是：Compared with the prior art, the beneficial effects of the present invention are:

1、利用第二源模型中每一层添加正则化损失函数的方法攻击第一对抗样本序列，解决了使用传统方法的多次查询带来的计算复杂度较高的问题。1. The method of adding a regularization loss function to each layer in the second source model is used to attack the first adversarial sample sequence, which solves the problem of high computational complexity caused by multiple queries using the traditional method.

2、第二源模型中每一层添加正则化损失函数攻击第一对抗样本序列，一方面旨在寻找迁移性最强的最优决策方向，另一方面过滤对抗扰动的高频成分，增强与传统方法相比生成对抗样本的迁移性。2. A regularization loss function is added to each layer in the second source model to attack the first adversarial sample sequence. On the one hand, it aims to find the optimal decision-making direction with the strongest mobility, and on the other hand, it filters the high-frequency components of adversarial disturbances to enhance and The transferability of generating adversarial examples compared to traditional methods.

3、通过对第三源模型添加对抗训练，解决了传统方法对抗样本迁移质量差并且强度低的问题，使得对抗训练更加鲁棒。3. By adding adversarial training to the third source model, the problem of poor quality and low intensity of adversarial sample transfer in traditional methods is solved, making adversarial training more robust.

附图说明Description of drawings

图1为本实施例的流程图。FIG. 1 is a flowchart of this embodiment.

具体实施方式Detailed ways

下面结合附图对本发明作进一步的说明。本发明的实施方式包括但不限于下列实施例。The present invention will be further described below in conjunction with the accompanying drawings. Embodiments of the present invention include, but are not limited to, the following examples.

基于神经网络中间层正则化的黑盒攻击型防御系统，包括：A black-box attack defense system based on the regularization of the middle layer of the neural network, including:

第一源模型，采用以残差模块为基础的ResNet网络，该第一源模型在本实施例中采用白盒攻击方式进行攻击，最终输出第一对抗样本序列，以输入原始图片为例，输入一组原始图片，利用白盒攻击方法添加适当的对抗扰动对该第一源模型进行攻击，生成第一对抗样本序列，该第一对抗样本序列也有一定的迁移性，但针对第二源模型来讲，因为第一对抗样本序列的决策方向并不是迁移性最强的方向，因此，第一对抗样本序列并不为迁移最优的对抗样本，另外，本实施例中攻击模式分为无目标及有目标两种攻击模式，对于无目标攻击模式，即只要攻击后的预测标签不是攻击前的标签就可以，首先向损失函数最大化的方向等步长进行梯度上升，并且在每次梯度上升时对输入原始图片进行相应的微扰，生成相对应的第一对抗样本序列，对于有目标攻击模式，即攻击后的预测标签必须是指定的标签，首先向损失函数最小化的方向等步长进行梯度下降，并且在每次梯度下降时对输入原始图片进行相应的微扰，生成相应额的第一对抗样本序列；The first source model adopts the ResNet network based on the residual module. In this embodiment, the first source model adopts the white box attack method to attack, and finally outputs the first confrontation sample sequence. Taking the input original image as an example, the input A set of original pictures, using the white box attack method to add appropriate adversarial disturbances to attack the first source model to generate a first adversarial sample sequence, the first adversarial sample sequence also has a certain degree of migration, but for the second source model In other words, because the decision direction of the first adversarial sample sequence is not the direction with the strongest migration, the first adversarial sample sequence is not the best adversarial sample for migration. There are two target attack modes. For the untargeted attack mode, as long as the predicted label after the attack is not the label before the attack, the gradient is first raised in the direction of maximizing the loss function with equal steps, and each time the gradient rises Perform the corresponding perturbation on the input original image to generate the corresponding first adversarial sample sequence. For the targeted attack mode, that is, the predicted label after the attack must be the specified label. First, the step size is carried out in the direction of minimizing the loss function. Gradient descent, and the input original image is perturbed correspondingly at each gradient descent to generate the corresponding first adversarial sample sequence;

第二源模型，采用以残差模块为基础的ResNet网络，将第一对抗样本序列输入第二源模型，并输出第二对抗样本序列，其中无论是无目标攻击模式还是有目标攻击模式，第二源模型采用中间层正则化方法旨在寻找最优的迁移决策方向及对抗样本，本实施例中将第二源模型采用中间正则化划分为不同的神经网络结构层，其中包括不同的卷积层及池化层；将第一对抗样本序列输入到不同的层中，然后在每一层后加入正则化损失函数，该正则化攻击方法是对第一对抗样本序列的再次攻击，主要包含两个方面，一方面是基于第一对抗样本序列，找到最优的对抗样本决策方向；另一方面是将最优对抗样本的对抗扰动的高频成分作以滤波，进而在每一层都会产生与第一对抗样本序列对应的输出，生成一组对抗样本，在该生成的对抗样本中选取最优层的对抗样本作为第二对抗样本序列；The second source model uses the ResNet network based on the residual module to input the first adversarial sample sequence into the second source model, and output the second adversarial sample sequence, in which whether it is an untargeted attack mode or a targeted attack mode, the third The two-source model adopts the intermediate layer regularization method to find the optimal migration decision direction and confrontation samples. In this embodiment, the second source model is divided into different neural network structure layers using intermediate regularization, including different convolutional layers. layer and pooling layer; input the first confrontation sample sequence into different layers, and then add a regularization loss function after each layer. The regularization attack method is a re-attack on the first confrontation sample sequence, which mainly includes two In one aspect, on the one hand, the optimal adversarial sample decision-making direction is found based on the first adversarial sample sequence; The output corresponding to the first adversarial sample sequence generates a set of adversarial samples, and selects the adversarial sample of the optimal layer among the generated adversarial samples as the second adversarial sample sequence;

第三源模型，采用DenseNet网络，将第二对抗样本序列攻击第三源模型，在攻击第三源模型时采用黑盒攻击方式，针对每层的第二对抗样本序列，将该序列中所有对抗样本逐次攻击第三源模型，其中无目标攻击模式下只要第三源模型预测结果不是原始数据标签即代表攻击成功；而有目标攻击模式下，必须要第三源模型预测结果为指定的预测结果才代表攻击成功，从而根据攻击成功率选取出最优层的对抗样本，最终将统计攻击成功的对抗样本个数并且记录攻击成功的对抗样本，即第三识别样本序列，利用第三识别样本序列来重新训练第三源模型，该过程在实际黑盒攻击方式中是属于防御方的工作，但在本实施例中，仅限于此步骤可以对第三源模型进行操作，在第三源模型的对抗训练中，加入第三识别样本序列，利用对抗样本和原始样本共同训练第三源模型，经过多次迭代对抗训练，更新原始的第三源模型，能够有效防御在上一轮中攻击成功的对抗样本。The third source model uses the DenseNet network to attack the second adversarial sample sequence to the third source model. When attacking the third source model, a black box attack method is used. For the second adversarial sample sequence of each layer, all adversarial samples in the sequence are attacked. The samples attack the third source model successively. In the untargeted attack mode, as long as the prediction result of the third source model is not the original data label, the attack is successful; while in the target attack mode, the prediction result of the third source model must be the specified prediction result. Only when the attack is successful, the optimal layer of adversarial samples will be selected according to the attack success rate. Finally, the number of adversarial samples that have been successfully attacked will be counted and the successful adversarial samples will be recorded, that is, the third identification sample sequence, and the third identification sample sequence will be used. to retrain the third source model. In the actual black box attack method, this process belongs to the work of the defender, but in this embodiment, only this step can operate on the third source model. In the adversarial training, the third identification sample sequence is added, and the third source model is jointly trained by the adversarial samples and the original samples. After multiple iterations of adversarial training, the original third source model is updated, which can effectively defend against the successful attack in the previous round. adversarial example.

基于神经网络中间层正则化的黑盒攻击型防御方法，采用基于神经网络中间层正则化的黑盒攻击型防御系统，包括：The black-box attack defense method based on the regularization of the middle layer of the neural network adopts the black box attack defense system based on the regularization of the middle layer of the neural network, including:

S1、将图片输入第一源模型，利用白盒攻击方法添加适当的对抗扰动对该第一源模型进行攻击，输出第一对抗样本序列；S1, input the picture into the first source model, use the white box attack method to add appropriate confrontation disturbance to attack the first source model, and output the first confrontation sample sequence;

因为白盒攻击算法所生成对抗样本并非具有最优迁移性，所以需要添加最优扰动方向足够接近敌对网络，那么生成的对抗样本就可以完全攻击敌对网络，从而达到黑盒攻击的效果，在生成对抗样本时并非针对一张图片只生成一张对抗样本，而是选择一幅图片向着决策边界方向在源模型划分的层中生成多个对抗样本，组成对抗样本序列，来覆盖被攻击模型决策边界可能存在的区域，以达到高性能的黑盒攻击，因为对黑盒模型决策边界未知，所以基于第一源模型白盒攻击方法决策边界两侧，以正则化损失函数的攻击方式在第二源模型每层生成一组对抗样本，用来攻击真正的黑盒模型，并且利用第二源模型每层生成的对抗样本来攻击第三源模型，选择攻击成功的一个最优层输出作为新类型的对抗样本，并进行记录，基于上述原理，进行步骤S2的操作。Because the adversarial samples generated by the white-box attack algorithm do not have optimal migration, it is necessary to add the optimal disturbance direction close enough to the adversarial network, then the generated adversarial samples can completely attack the adversarial network, so as to achieve the effect of black-box attack. The adversarial sample is not only one adversarial sample for one image, but one image is selected to generate multiple adversarial samples in the layer divided by the source model in the direction of the decision boundary to form a sequence of adversarial samples to cover the decision boundary of the attacked model There may exist areas to achieve high-performance black-box attack. Because the decision boundary of the black-box model is unknown, the white-box attack method based on the first source model is based on both sides of the decision boundary, and the attack method of the regularized loss function is used in the second source. Each layer of the model generates a set of adversarial samples to attack the real black-box model, and uses the adversarial samples generated by each layer of the second source model to attack the third source model, and selects an optimal layer output that is successfully attacked as a new type of The adversarial sample is recorded, and based on the above principle, the operation of step S2 is performed.

S2、将第一对抗样本序列输入第二源模型的每一层中，第一对抗样本序列在第二源模型的每一层受到正则化损失函数的攻击，正则化损失函数对第一对抗样本序列进行攻击包括如下两方面：S2. Input the first adversarial sample sequence into each layer of the second source model. The first adversarial sample sequence is attacked by the regularization loss function in each layer of the second source model, and the regularization loss function is effective against the first adversarial sample. Sequence attacks include the following two aspects:

一方面为找出生成的第二对抗样本序列中的最优扰动方向，其公式为：On the one hand, to find the optimal disturbance direction in the generated second adversarial sample sequence, the formula is:

其中，L1的结果为第二对抗样本序列的扰动方向，f_t(x)为第一对抗样本序列经过第二源模型第t层的输出结果，[f_t(x')-f_t(x)]为第一对抗样本序列的扰动，该扰动为一个矢量，其基础扰动方向并不是迁移性最强的扰动方向，[f_t(x”)-f_t(x)]表征的扰动方向以基础扰动方向做指引，目的在于逼近迁移性最强的扰动方向；Among them, the result of L1 is the perturbation direction of the second adversarial sample sequence, f _t (x) is the output result of the first adversarial sample sequence passing through the t-th layer of the second source model, [f _t (x')-f _t (x )] is the perturbation of the first adversarial sample sequence, which is a vector whose basic perturbation direction is not the most mobile perturbation direction. The perturbation direction represented by [f _t (x”)-f _t (x)] is represented by The basic disturbance direction is used as a guide, and the purpose is to approach the disturbance direction with the strongest mobility;

另一方面，在找到迁移性最强的扰动方向后，需过滤对抗扰动的高频成分，其公式为On the other hand, after finding the perturbation direction with the strongest mobility, it is necessary to filter the high-frequency components that resist perturbation, and the formula is

L2＝F[f_t(x”)-f_t(x)]L2=F[f _t (x”)-f _t (x)]

其中，L2的结果为过滤对抗扰动的高频成分，F()为正则化函数，相当于一个平滑滤波器，过滤最优对抗扰动的高频成分，从而增强对抗样本的迁移性，基于L1和L2，提出在第二源模型的每一层都加入一个正则化损失函数L，其公式为：Among them, the result of L2 is to filter the high-frequency components of the adversarial disturbance, and F() is a regularization function, which is equivalent to a smoothing filter to filter the high-frequency components of the optimal adversarial disturbance, thereby enhancing the migration of the adversarial samples. Based on L1 and L2, it is proposed to add a regularization loss function L to each layer of the second source model, and its formula is:

L＝-L1-L2L=-L1-L2

其中，在第二源模型的每一层均利用正则化损失函数L对第一对抗样本序列进行攻击，进而在第二源模型每一层都产生与第一对抗样本序列对应的输出，生成一组对抗样本，在该生成的对抗样本中选取最优层的对抗样本作为第二对抗样本序列。Among them, the regularization loss function L is used to attack the first adversarial sample sequence at each layer of the second source model, and then each layer of the second source model generates an output corresponding to the first adversarial sample sequence, generating a A set of adversarial samples is selected, and the adversarial sample of the optimal layer is selected as the second adversarial sample sequence from the generated adversarial samples.

利用第三源模型对其进行分类与预测，将分类错误的对抗样本记录下来作为攻击成功的样本，其他样本则表示攻击失败，利用以上攻击成功的样本即第三识别样本序列对第三源模型进行对抗训练，使得第三源模型对该此类对抗样本可以正确判别，增强防御系统的鲁棒性，因为在通过修改输入数据的防御方法中，输入对抗样本的质量又起到至关重要的作用，通过本实施例生成的所需对抗样本序列，能够使得被攻击模型在对抗训练中的决策边界更加鲁棒，从而使得防御效果更加稳健，基于以上原理，进行步骤S4的操作。The third source model is used to classify and predict it, and the misclassified adversarial samples are recorded as the successful attack samples, and the other samples indicate that the attack failed. Adversarial training is performed so that the third source model can correctly discriminate against such adversarial samples and enhance the robustness of the defense system, because in the defense method by modifying the input data, the quality of the input adversarial samples plays a crucial role. Therefore, the required confrontation sample sequence generated in this embodiment can make the decision boundary of the attacked model in the confrontation training more robust, thereby making the defense effect more robust. Based on the above principles, the operation of step S4 is performed.

S4、将第三识别样本序列输入第三源模型进行对抗训练，更新第三源模型；S4, input the third identification sample sequence into the third source model for adversarial training, and update the third source model;

对抗训练是一种有效的防御方法，在对第三源模型训练过程中，训练样本不再只是原始样本，而是原始样本加上对抗样本，就相当于把产生的第三识别样本序列当做新的训练样本加入到训练集中，随着第三源模型受到越来越多的训练，一方面原始图片的准确率会增加，另一方面，第三源模型对抗样本的鲁棒性也会增加，所以对抗训练是指在第三源模型的训练过程中构建对抗样本并将对抗样本和原始样本混合在一起训练模型的方法，也就是在第三源模型训练过程中对第三源模型进行对抗攻击从而提升第三源模型对于对抗攻击的鲁棒性，即防御力，通过以上方式训练更新后的第三源模型，即为最终所需要的神经网络模型。Adversarial training is an effective defense method. In the process of training the third source model, the training sample is no longer just the original sample, but the original sample plus the adversarial sample, which is equivalent to treating the generated third recognition sample sequence as a new one. The training samples are added to the training set. As the third source model is trained more and more, on the one hand, the accuracy of the original image will increase, and on the other hand, the robustness of the third source model against the samples will also increase. Therefore, adversarial training refers to the method of constructing adversarial samples and mixing the adversarial samples and original samples to train the model during the training process of the third source model, that is, conducting adversarial attacks on the third source model during the training process of the third source model. In this way, the robustness of the third source model against adversarial attacks, that is, the defensive power, is improved. The updated third source model trained in the above manner is the final required neural network model.

如上即为本发明的实施例。上述实施例以及实施例中的具体参数仅是为了清楚表述发明人的发明验证过程，并非用以限制本发明的专利保护范围，本发明的专利保护范围仍然以其权利要求书为准，凡是运用本发明的说明书及附图内容所作的等同结构变化，同理均应包含在本发明的保护范围内。The above is an embodiment of the present invention. The above examples and the specific parameters in the examples are only to clearly describe the inventor's invention verification process, not to limit the scope of the patent protection of the present invention. The scope of the patent protection of the present invention is still based on the claims. Equivalent structural changes made in the contents of the description and drawings of the present invention shall be included within the protection scope of the present invention.

Claims

1. Black box attack type defense system based on neural network intermediate layer regularization, its characterized in that: comprises that

A first source model for outputting a first sequence of anti-sample;

a second source model for outputting a second antagonizing sample sequence;

the third source model is used for outputting a third recognition sample sequence, inputting the third recognition sample sequence into the third source model for countermeasure training, and updating the third source model;

inputting the picture into the first source model by adopting a ResNet network based on a residual error module, attacking the first source model by utilizing white box attack, and generating a first antagonistic sample sequence;

a ResNet network based on a residual error module is adopted in a second source model, the second source model is divided into different neural network structure layers, and each layer of the second source model is added with a regularization loss function; inputting the first antagonizing sample sequence into a second source model, attacking the first antagonizing sample sequence by utilizing a regularization loss function at each layer of the second source model, and outputting a second antagonizing sample sequence;

the third source model adopts a DenseNet, a second antagonistic sample sequence is input into the third source model, the third source model is attacked by using black box attack, all antagonistic samples in the sequence successively attack the third source model aiming at the second antagonistic sample sequence of each layer, wherein the attack success is represented as long as the prediction result of the third source model is not an original data label under the non-target attack mode; in the target attack mode, the attack success is represented only by the fact that the third source model prediction result is the appointed prediction result, so that the countermeasure sample of the optimal layer is selected according to the attack success rate, finally, the number of the countermeasure samples which are successfully attacked is counted, and the countermeasure samples which are successfully attacked, namely, the third identification sample sequence, are recorded; and adding a third recognition sample sequence in the confrontation training of the third source model, training the third source model by using the confrontation sample and the original sample together, and updating the original third source model after multiple iterations of confrontation training.

2. The black box attack type defense method based on the regularization of the neural network intermediate layer adopts the black box attack type defense system based on the regularization of the neural network intermediate layer, and is characterized in that: comprises that

S1, inputting the picture into a first source model to carry out white box attack, and outputting a first anti-sample sequence;

s2, inputting the first anti-aliasing sample sequence into a second source model, attacking the first anti-aliasing sample sequence by utilizing a regularization loss function at each layer of the second source model, and outputting a second anti-aliasing sample sequence;

s3, inputting the second anti-challenge sample sequence into a third source model for black box attack, and outputting a third identification sample sequence;

and S4, inputting the third recognition sample sequence into a third source model for countertraining, and updating the third source model.

3. The black-box attack type defense method based on neural network interlayer regularization as claimed in claim 2, wherein: in S2, the attack of the regularization loss function on the first antagonizing sample sequence includes the following two aspects:

on one hand, finding out the optimal disturbance direction in the generated second antagonizing sample sequence;

in another aspect, filtering the high frequency components of the countermeasure disturbance, generating an output corresponding to the first countermeasure sample sequence at each layer of the second source model, generating a set of countermeasure samples, and selecting the countermeasure sample of the best layer among the generated countermeasure samples as the second countermeasure sample sequence.

4. The black-box attack type defense method based on neural network interlayer regularization as claimed in claim 3, wherein: the formula for finding the optimal disturbance direction of the generated second antagonizing sample sequence is

L1＝[f_t(x')-f_t(x)]*[f_t(x”)-f_t(x)]

Where the result of L1 is the perturbation direction of the second antagonizing sample sequence, f_t(x) For the output result of the first antagonizing sample sequence passing through the t-th layer of the second source model, [ f_t(x')-f_t(x)]For the first pair against perturbation of the sample sequence, [ f_t(x”)-f_t(x)]The characterized disturbance direction is guided by a basic disturbance direction;

the formula for filtering the high frequency components against disturbance is

L2＝F[f_t(x”)-f_t(x)]

Wherein the result of L2 is to filter the high frequency components against the perturbation, and F () is the regularization function;

the regularization loss function L is formulated as

L＝-L1-L2。