CN114048474A

CN114048474A - Group intelligence-based image recognition backdoor defense method, device and medium

Info

Publication number: CN114048474A
Application number: CN202111303400.6A
Authority: CN
Inventors: 郭克华; 胡斌; 任盛; 奎晓燕; 赵颖
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2022-02-15

Abstract

The invention discloses an image recognition backdoor defense method, device and medium based on swarm intelligence.A densely connected neural network model is selected as a seed neural network model based on the incidence relation of a plurality of neural network image recognition models in a distributed cluster and ranked according to the connection density; for every two seed neural network models, distilling operation is performed on the two seed neural network models by using attention distilling representation, and an attention activation graph ineffective to back door attack of the seed neural network models is refinedA ^l(ii) a By designing robust distillation loss functionLMeasuring activation attention maps of two seed neural network modelsA ^lThe Euclidean distance and the cosine distance, the seed neural network model is based onLCalculating gradient values and back-propagatingAnd updating the parameters of the seed neural network model. The performance and efficiency of the group intelligent image recognition back door defense algorithm provided by the invention under the image recognition back door defense task reach the optimal level at present.

Description

Group intelligence-based image recognition backdoor defense method, device and medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a group intelligence-based image recognition backdoor defense method, a group intelligence-based image recognition backdoor defense device, a group intelligence-based image recognition backdoor defense product and a storage medium.

Background

The application of neural network image recognition in automatic driving and medical diagnosis is rapidly developed, however, since most of the neural network image recognition is optimized in a performance-driven manner, it is not very concerned whether the neural network is attacked and tampered by a malicious person. There is an increasing concern about the security of neural networks in image recognition applications. Because of the opacity of neural networks, fighting attacks and backdoor attacks becomes a major factor threatening the security of neural networks. The challenge attack mispredicts the error by elaborating a challenge sample misleading model, however, making a powerful challenge training sample is computationally expensive because it requires an iterative gradient step with respect to the penalty function^[1]. However, compared with the anti-attack, the cost of implanting the backdoor attack under the image recognition task is lower, and the backdoor can be implanted into the image recognition model only by adopting a data poisoning mode. In addition, the back door attack has small influence on the accuracy of the image recognition model and is difficult to perceive when being triggered. This attack mode is more favored by malicious attackers in practical applications.

Although there are many methods for detecting a neural network image recognition backdoor attack, these methods perform detection only for a single neural network backdoor attack. Today, the computation effort is greatly increased, the servers for training the neural network provide computing resources by distributed clusters, and the detection of backdoor attacks of only a single neural network not only causes low erasing efficiency, but also easily implants backdoor triggers in the distributed clusters in a crossed manner, so that the erasing capability is reduced. Secondly, the eyes without god in the real world can clearly knowWhether the image recognition model has a backdoor attack before the algorithm is deployed. Some work in the past^[2]Firstly, a model is trained through an image data subset of the non-implanted backdoor trigger, then, the model containing the backdoor attack is finely adjusted or distilled based on the image recognition model of the non-implanted backdoor trigger, so that the robustness of the backdoor attack is improved, the robustness can be transferred among different tasks, but a model without the backdoor attack is required, and the method is a great limitation.

In summary, the prior art has the following drawbacks:

(1) the existing image recognition backdoor defense method does not explore and research the defense efficiency of backdoor attack, so that the effective defense is difficult to be exerted in a distributed cluster environment of a real environment.

(2) In the traditional image recognition neural network, an image recognition model which is not implanted with a rear door trigger is trained through image data of the non-implanted rear door trigger to erase the trigger which is attacked by the rear door, so that the erasing efficiency is greatly limited.

Disclosure of Invention

The invention aims to solve the technical problem that aiming at the defects of the prior art, provides an image recognition backdoor defense method, device, product and storage medium based on group intelligence, and remarkably improves the defense efficiency of neural network model clusters to backdoor attacks.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: an image recognition backdoor defense method based on group intelligence comprises the following steps:

s1, selecting the neural network model with dense ranking connection as a seed neural network model based on the incidence relation of a plurality of neural network image recognition models in the distributed cluster (for example, one server is connected with 5 servers in the whole distributed cluster, the number is 5, all the servers in the distributed cluster are sorted according to the connection numerical value, and the first 10% of the servers are selected as the seed neural network model); then, for every two seed neural network models, the two are characterized by attention distillationThe seed neural network model performs a distillation operation to refine the attention activation map that is not valid for the seed neural network model backdoor attacksA ^l；

S2, by using channel summation function

Summing the attention activation maps of multiple channels

；

S3, multi-channel summation using active attention map

Measuring Euclidean distance and cosine distance of the activation attention diagrams of the two seed neural network models, and designing a robust distillation loss functionLBased on said robust distillation loss functionLAnd calculating the gradient value, then carrying out backward propagation on the seed neural network model by using the gradient value, training the seed neural network model, and normally executing an image recognition task, wherein the obtained seed neural network model is not controlled by backdoor attack any more.

Aiming at the incidence relation of a plurality of neural network image identification models in the distributed cluster, the neural network model with dense ranking connection is selected as the seed neural network model, so that the erasing efficiency of the distributed cluster on the backdoor attack is obviously improved, and the neural network model in the distributed cluster is prevented from being implanted into the backdoor trigger in a crossed manner. The attention activation map of the seed neural network model is summedg2 sum(A ^l) To erase the back gate flip-flop of the other seed neural network model. In addition, a robust distillation loss function is provided by measuring Euclidean distance and cosine distance of two seed neural network model attention activation graphsLThe method can realize the goal of erasing the backdoor attack under the condition of inheriting the image recognition capability of the original neural network, and obviously improves the defense efficiency of the neural network model cluster to the backdoor attack.

In step S1, the specific implementation process of selecting the neural network model with dense ranking and connections as the seed neural network model includes:

sorting the plurality of neural network models according to the magnitude of the connected numerical values based on a PageRank algorithm, selecting the first 10% as seed neural network models, and adding the seed neural network models into a seed neural network model set;

and dividing the seed neural network models in the seed neural network model set into a plurality of groups, and selecting at most one neural network model in each group as the seed neural network model.

The back door attacks are erased by aiming at a plurality of neural network image recognition models in the distributed clusters based on the PageRank algorithm, compared with a random erasing mode, the erasing efficiency of the distributed clusters on the back door attacks is remarkably improved, and the neural network models in the distributed clusters are prevented from being implanted into the back door triggers in a crossed mode by selecting seed neural network models.

Seed neural network modelvDescribed by the following expression:

；

wherein the content of the first and second substances,bis a hyper-parameter;S[k][b]indicating when the erase time isbWhen, beforekErasing range of the seed neural network model set searched in the group;Time(v) Representing seed neural network modelvThe time it takes to erase the back door flip-flop by another neural network model,

is a seed neural network modelvThe final erase range of (1);group(k) A set of seed neural network models is represented.

By passingS[k][b]The erasing range of each seed neural network model set is dynamically limited, and compared with the specified erasing range, the most efficient erasing efficiency can be dynamically planned.

In step S2, the attention activation mapA ^lIs expressed as：

；

Wherein

First to show attention activation diagramslLayer oneiA channel, | · | represents an absolute value function;Cthe number of channels representing the attention activation map;

representing a summation function.

The attention activation graph shows which region of the image is more focused by the neural network image recognition model after learning is completed, and is the embodiment of the learning result of the seed neural network model. The method is obtained by summing the attention activation graphs of the seed neural network modelg2 sum(A ^l) To erase the back gate flip-flop of the other seed neural network model.

In step S3, robust distillation loss functionLThe expression of (a) is:

；

wherein (A) and (B)x,y)~DSubset of image data representing a never-implanted back door triggerDExtracting imagesxAnd corresponding labely；E[·]Representing a desired function;L _{kd_e}(. cndot.) Euclidean distance expressed as an activation attention map of two seed neural network models:

；L _{kd_c}(. The) cosine distance expressed as an activation attention map of two seed neural network models:

；||·||₂is thatLThe number of the 2 norm is the same as the standard,

，

respectively represent two seed neural network modelslThe intent of activation of the layer;

is to measure one of two seed neural network modelsAThe error in the classification of (2) is,α、βis an over-parameter for controlling and paying attention to the distillation strength,Jthe number of neural network layers to acquire the activation attention map is indicated.

The robust distillation loss function is provided, and the aim of erasing the backdoor attack can be achieved by measuring the Euclidean distance and the cosine distance of the attention activation graphs of the two seed neural network models under the condition of inheriting the image recognition capability of the original neural network.

As an inventive concept, the present invention also provides a computer arrangement comprising a memory, a processor and a computer program stored on the memory; the processor executes the computer program to implement the steps of the above-described method of the present invention.

As an inventive concept, the present invention also provides a computer-readable storage medium having stored thereon a computer program/instructions; which when executed by a processor implement the steps of the method of the present invention.

As an inventive concept, the present invention also provides a computer program product comprising computer programs/instructions; which computer program/instructions are executed by a processor to perform the steps of the method according to the invention.

Compared with the prior art, the invention has the beneficial effects that: the invention provides a group intelligent algorithm for backdoor attacks based on a plurality of neural network image identification models in a distributed cluster for the first time, and remarkably improves the defense efficiency of the neural network image identification model cluster to the backdoor attacks. The invention provides a group intelligence chartThe recognition backdoor defense algorithm can be based on the incidence relation of a plurality of neural network models in the distributed cluster, and the efficiency of erasing the backdoor trigger among the models of the distributed cluster reaches the optimal level at present by selecting the neural network model with dense ranking connection as the seed neural network model. In addition, for every two seed neural network models, performing distillation operation on the two seed neural network models by using attention distillation characterization and designing robust distillation loss functionL. Compared with the prior art, the obtained seed network model can get rid of the control of backdoor attack and can normally execute the image recognition task.

Drawings

FIG. 1 is a connection diagram between neural network image recognition models of distributed clusters according to an embodiment of the present invention;

FIG. 2 is a comparison of the efficiency of population intelligence and random erasure in accordance with an embodiment of the present invention;

FIG. 3 is a diagram illustrating a back-door erase network architecture based on a neural distillation strategy according to an embodiment of the present invention.

Detailed Description

The invention sets the defense scene as a multi-model distributed cluster environment. The incremental training of the multiple models causes the multiple image recognition models in one cluster to be infected by poisoned data made by a malicious attacker. It is an object of the invention to erase back door triggers of models in a cluster, while maintaining performance in unimplanted back door trigger image data, in case a distributed cluster of a multi neural network model is implanted with different types of back door triggers, respectively. The invention discloses a connection diagram between distributed cluster neural network image recognition models, which is shown in figure 1.

The invention provides a group intelligent algorithm and a neural distillation strategy to improve the erasing precision and efficiency of a plurality of neural network models in a distributed cluster to backdoor attacks.

The invention aims to improve the erasing efficiency of the distributed cluster to the backdoor attack by considering that the backdoor attack is erased by aiming at a plurality of neural network models in the distributed cluster. Compared with the back door attack of randomly erasing the neural network model, the method provided by the invention has the advantages that the group intelligent algorithm is provided for selecting the seed neural network model to obviously improve the erasing efficiency of the back door trigger. The efficiency of the population intelligence and random erasure of the present invention is compared as shown in figure 2.

The group intelligent algorithm comprises the following steps:

first, based on PageRank^[4]The algorithm sorts the plurality of neural network models in the distributed cluster according to the magnitude of the connected numerical values, and the first 10% of the neural network models are selected as seed neural network models. The set of seed neural network models is then divided into groups. Because the distance between the neural network models in the same group does not exceed 2, the interaction between the models in the group is strong. In addition, only the influence value of the model on the close-range neighbors is considered, and the calculation workload is reduced.

Then, at most one model in each group is selected as a seed neural network model. Because the models in the group have strong interaction and the seed neural network model can be spread for many times, the probability that other neural network models in the group are erased is high. However, the neural network model chosen as the seed must satisfy two conditions: 1) the sum of the erasing time of all seed neural network models can not exceed the time budget of the back door attack of the erasing clusterB(ii) a 2) The seed neural network model set has the widest reachable erasure range. The strategy for selecting a seed neural network model can be described by the following expression:

；（1）

is a seed neural network modelvThe final erase range of (1);group(k) Representing seedsA set of neural network models.

Briefly, the back-gate erase time budget isbWhether or not it is inkA seed neural network model is found in each group. If at back door erase timebUnder the control of (2), beforek-the set of seed neural network models found in 1 grouping has a larger range of influence than beforekThe erasure range of the set of seed neural network models found in the individual groups is then precededk-1 in packet, the function becomesS[k][b]. Otherwise, it will be atkFinding a seed neural network model in a groupvThen will be precededk-finding the remaining neural network models in 1 packet. The estimated value of the back door erasing time needs to be subtractedvCost ofTime(v) The erasing range of the whole seed neural network model set needs to be added with nodesvPropagation range of

。

After the seed neural network model is selected through a group intelligent algorithm, a neural distillation strategy is executed on every two seed neural networks to erase the back door trigger. The invention discloses a back door erasing network architecture based on a neural distillation strategy, which is shown in figure 3.

For simplicity of description, two of the seed neural network models are respectively called modelsAAnd a modelB. First, an attention activation map robust to backdoor attacks is refined for each layer of the neural network model

. In particular, a model is givenATo be used for making itlActivation output of layersA ^l∈R ^C×H×WWhereinRThe dimensions are represented by a number of dimensions,C，HandWlet the channels, height and width, respectively, represent the attention activation map. The invention expresses the absolute value of each element in the mapping to the importance of the element to the final output, so the mapping function can be constructed through the calculation statistics of the values across the channel dimensions. Hair brushThe following mapping function is obviously applied:

（2）

wherein

it is shown that the sum function is,

is a channel summation function. The invention adopts a channel summation function

To activate and combine all channels in each layer and generate an activation attention map characterizing the current layer. Due to the fact that

The functional representation activation attention map has rich characteristic information, and in order to enable the neural network image identification to inherit the characteristic information to the maximum extent during distillation, the invention provides a robust distillation loss function based on the attention activation map to achieve the aim of erasing backdoor attacks under the condition of inheriting the identification capability of the original neural network.

Therefore, an active attention map is obtained for each layerA ^l∈R ^C×H×WThen, the model is calculatedAAnd a modelBThe distillation loss is input in an effort to calculate the loss value. The method comprises the following specific steps:

（3）

（4）

wherein the distillation loss can be expressed as a modelAAnd a modelBEuclidean distance of the active attention map ofL _{kd_e}(. and cosine distanceL _{kd_c}(·)，||·||₂Is thatLThe number of the 2 norm is the same as the standard,

，

respectively represent two seed neural network modelslThe intent of activation of the layer; it is worth mentioning that the attention activation graph shows which region of the image the neural network image recognition model after learning is more focused on, and is the embodiment of the learning result of the seed neural network model. The method is obtained by summing the attention activation graphs of the seed neural network modelg2 sum(A ^l) To erase the back gate flip-flop of the other seed neural network model.

（5）

Thus, the overall training loss of the present invention is the sum of the cross-entropy (CE) loss and the distillation loss. Wherein (A) and (B)x,y)~DSubset of image data representing a never-implanted back door triggerDExtracting imagesxAnd corresponding labely；E[·]Representing a desired function;L _{kd_e}(. The) Euclidean distance expressed as an activation attention map of two seed neural network models;L _{kd_c}(. to) the cosine distance expressed as an activation attention map of the two seed neural network models; i | · | purple wind₂Is thatLThe number of the 2 norm is the same as the standard,

respectively represent two seed neural network modelslActivation of layersAttention is drawn;α、βis an over-parameter for controlling and paying attention to the distillation strength,Jthe number of neural network layers to acquire the activation attention map is indicated. The above loss function refers to a modelADistillation modelBSo as to erase the modelBThe rear door trigger of (1). When using the modelBDistillation modelAOnly the above model is requiredAAnd a modelBThe positions are interchanged.

The invention provides a group intelligent algorithm to optimize the group intelligent algorithm so as to improve the erasing efficiency of the distributed cluster on the backdoor attack. Then, characterization was done using attention distillation^[3]And (3) refining the attention activation graph which is robust to the image recognition backdoor attack, and finally, designing a robust distillation loss function to realize the target of erasing the backdoor attack. The specific process is as follows:

the first step is as follows: based on incidence relation of a plurality of neural network image recognition models in distributed cluster, PageRank provided by Google is utilized^[4]The algorithm selects a neural network model with dense ranking connections as a seed neural network model, for example: one server is connected with 5 servers in the whole distributed cluster, the number is 5, all the servers in the distributed cluster are sorted according to the connection numerical value, the first 10% of the servers are selected as seed neural network models, and therefore the erasing efficiency of the multi-neural network models in the distributed cluster on backdoor attacks is improved.

The second step is that: for every two seed neural network models, performing distillation operation on the two seed neural network models by using attention distillation characterization to refine an attention activation map ineffective to back door attack of the seed neural network modelsA ^lAnd summing the attention activation maps of the channels by using a channel summation function to obtain

To erase the two seed neural network model back-door triggers;

the third step: using active attention for multi-channel summation

Measurement ofEuclidean distance and cosine distance of activated attention diagrams of the two seed neural network models are designed, a robust distillation loss function is designed, gradient values are calculated based on the robust distillation loss function, and then back propagation is carried out on the seed neural network models through the gradient values^[5]And training the seed neural network model, wherein the obtained seed neural network model can normally execute the image recognition task without being controlled by backdoor attack.

The invention carries out performance evaluation aiming at the group intelligence-based image recognition backdoor defense method, evaluates the performance aiming at 6 backdoor attacks by using two index Attack Success Rates (ASR) and classification Accuracy (ACC), and compares the performance with other 2 existing backdoor defense methods, as shown in Table 1.

TABLE 1 comparison of the defense performance of the present invention method in CIFAR-10 data set with the latest current back door defense method

Experiments show that the population intelligence-based image recognition backdoor defense method provided by the invention remarkably reduces the average Attack Success Rate (ASR) from nearly 100% to 12.83%. In contrast, Finetuneng^[12]And MCR^[13]The average Attack Success Rate (ASR) can only be reduced to 37.36 percent and 25.59 percent respectively. Under six kinds of attacks, the method for defending the backdoor based on group intelligent image recognition provided by the invention is fully proved to have practical value in practice.

Reference documents:

[1]Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.

[2]Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. Neural attention distillation: Erasing backdoor triggers from deep neural networks. arXiv preprint arXiv:2101.05930, 2021.

[3]Yuenan Hou, Zheng Ma, Chunxiao Liu, and Chen Change Loy. Learning lightweight lanedetection cnns by self attention distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1013–1021, 2019.

[4]Wenpu Xing and Ali Ghorbani. Weighted pagerank algorithm. In Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004., pages 305–314. IEEE, 2004.

[5]Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. nature, 1986, 323(6088): 533-536.

[6]Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.

[7]Yingqi Liu. Trojaning attack on neural networks. In NDSS, 2018

[8]Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.

[9]Alexander Turner, Dimitris Tsipras, and Aleksander Madry. Clean-label backdoor attacks. https://people.csail.mit.edu/madry/lab/, 2018.

[10]Mauro Barni, Kassem Kallas, and Benedetta Tondi. A new backdoor attack in cnns by training set corruption without label poisoning. In 2019 IEEE International Conference on Image Processing (ICIP), pages 101–105. IEEE, 2019.

[11]Yunfei Liu, Xingjun Ma, James Bailey, and Feng Lu. Reflection backdoor: A natural backdoor attack on deep neural networks. In European Conference on Computer Vision, pages 182–199. Springer, 2020.

[12]Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks, Intrusions, and Defenses, pages 273–294. Springer, 2018.

Pu Zhao, Pin-Yu Chen, Payel Das, Karthikeyan Natesan Ramamurthy, and Xue Lin. Bridging mode connectivity in loss landscapes and adversarial robustness. In International Conference on Learning Representations, 2019.

Claims

1. an image recognition backdoor defense method based on group intelligence is characterized by comprising the following steps:

s1, selecting a densely connected neural network model as a seed neural network model based on the incidence relation of a plurality of neural network image recognition models in the distributed cluster; for every two seed neural network models, distilling operation is performed on the two seed neural network models by using attention distilling representation, and an attention activation graph ineffective to back door attack of the seed neural network models is refinedA ^l；

S2, by using channel summation function

Attention activation map for multiple channelsA ^lPerforming summation operation to obtain summation result

；

S3, use of

Measuring Euclidean distance and cosine distance of the activation attention diagrams of the two seed neural network models, and designing a robust distillation loss functionLBased on said robust distillation loss functionLAnd calculating a gradient value, carrying out backward propagation on the seed neural network model by using the gradient value, and training the seed neural network model.

2. The group-intelligence-based image recognition backdoor defense method of claim 1, wherein the specific implementation process of selecting the connection-dense neural network model as the seed neural network model in step S1 comprises:

ranking the plurality of neural network models based on a PageRank algorithm, selecting the neural network models with dense connection according to a proportion, and adding the neural network models into a seed neural network model set;

3. The population intelligence-based image recognition backdoor defense method of claim 1, wherein the seed neural network modelvDescribed by the following expression:

；

4. The group intelligence-based image recognition backdoor defense method of claim 1, wherein in step S2, the attention activation mapA ^lThe expression of (a) is:

；

wherein the content of the first and second substances,

representing a summation function.

5. The population intelligence-based image recognition backdoor defense method according to claim 1, wherein in step S3, the robust distillation loss functionLThe expression of (a) is:

；

；||·||₂is thatLThe number of the 2 norm is the same as the standard,

，

6. The method for defending against backdoor of image recognition based on group intelligence as claimed in any one of claims 1 to 5, wherein the image classification is identified by taking the image as the income of a trained seed neural network model.

7. A computer apparatus comprising a memory, a processor and a computer program stored on the memory; characterized in that the processor executes the computer program to carry out the steps of the method according to one of claims 1 to 6.

8. A computer readable storage medium having stored thereon a computer program/instructions; characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method of one of claims 1 to 6.

9. A computer program product comprising a computer program/instructions; characterized in that the computer program/instructions, when executed by a processor, performs the steps of the method according to one of claims 1 to 6.