CN116109886A

CN116109886A - Attack resistance method and system by using class activation diagram

Info

Publication number: CN116109886A
Application number: CN202310094149.XA
Authority: CN
Inventors: 张寒萌; 姜雪; 刘兴钊
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2023-02-10
Filing date: 2023-02-10
Publication date: 2023-05-12

Abstract

The invention relates to the technical field of deep learning, and provides a method for resisting attack by using a class activation diagram, which comprises the following steps: s1: loading a trained deep learning module for generating disturbance resistance, setting attack times of attack resistance, and reading original image data to be attacked; s2: carrying out attack on the original image data according to the attack times, wherein the disturbance added to each round of the original image data is obtained by carrying out weighting operation on the initial disturbance and the CAM (CAM) graph calculated according to the countermeasure sample generated in the previous round; s3: after the number of attacks, a final challenge sample is generated. The attack intensity of different areas in the image is limited by means of the class activation graph (classification ionActivationMap, CAM), so that the attack success rate is not affected, and the disturbance quantity can be reduced. And may be combined with any gradient-based attack method.

Description

Attack resistance method and system by using class activation diagram

Technical Field

The invention relates to the technical field of deep learning, in particular to a method and a system for resisting attack by using a class activation diagram.

Background

With the rapid development of deep learning, deep neural networks have been widely used in the field of computer vision in recent years. At the same time, deep learning also presents some security challenges. Researchers have found that deep neural networks are vulnerable to challenge samples (adversarial examples). When some deliberate minor perturbations are added, the generated challenge sample can cause the model to sort incorrectly. This way of generating challenge samples is called challenge attack.

The challenge attacks can be classified into white-box attacks and black-box attacks. White-box attacks represent that an attacker can access all information of the network architecture, parameters, etc. of the target model, whereas in a black-box attack scenario, the attacker cannot acquire the information of the target model. The attack method provided by the invention is a scene aiming at white box attack. In a white-box attack, there are two important indicators reflecting the performance of the attack method: attack success rate and disturbance quantity. Attack success rate refers to the percentage of the number of challenge samples that are misclassified to the number of all challenge samples. The disturbance quantity is the original image and the contrast sample in l _p Distance under the norm, the common norm is L ₂ (representing the sum of squares of the elements and reopening the square root of the sum of squares of the elements representing the perturbations in the anti-sample; L for image data) ₂ A smaller norm indicates that the challenge sample is more difficult for the human eye to recognize) and l _∞ (maximum value representing absolute value of each element, maximum value of each element representing disturbance in the countermeasure sample). The larger the attack success rate is, the lower the disturbance quantity is, which means that the attack method is stronger.

Researchers have proposed a series of white-box attack methods, two main categories: gradient-based challenge and optimization-based challenge. The former thought is to restrict the disturbance quantity within a certain range, the larger the attack success rate is, the better the attack success rate is, the representing method is FGSM, PGD, MIM, TIM and the like; the latter is to ensure that the disturbance quantity is optimized in the case of the generated challenge sample, so that the smaller the better.

Compared with the optimization method, the gradient-based attack resistance method is wider in practical application because of higher calculation speed, but is lack ofThe point is that the disturbance amount against the sample is large. Existing gradient-based attack methods, including FGSM, PGD, MIM, TIM, are directed to global addition of perturbations to an image, i.e., each pixel value of the image may change. However, this approach has the following problems: excessive alteration of the image, especially at L ₂ The disturbance quantity under the norm is large, so that the masking property is low and the disturbance quantity is easy to be perceived by human eyes.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method and a system for combating attacks using class activation diagrams, in which the attack intensities of different areas in an image are limited by means of class activation diagrams (Classification Activation Map, CAM), and the attack success rate is not affected, and the disturbance quantity is reduced. And may be combined with any gradient-based attack method.

The above object of the present invention is achieved by the following technical solutions:

a method of counterattack using class activation graphs, comprising the steps of:

s1: loading a trained deep learning module for generating disturbance resistance, setting attack times of attack resistance, and reading original image data to be attacked;

s2: carrying out attack on the original image data according to the attack times, wherein the disturbance added to each round of the original image data is obtained by carrying out weighting operation on the initial disturbance and the CAM (CAM) graph calculated according to the countermeasure sample generated in the previous round;

s3: after the number of attacks, a final challenge sample is generated.

Further, in step S1, the method further includes:

selecting any gradient attack method including FGSM, PGD, MIM, TIM;

and setting the maximum iteration times according to the selected type of the gradient attack method.

Further, in step S2, further includes: the challenge sample generated by each round is generated by the challenge sample generated by the previous round adding the disturbance of the current round, in particular:

let the original image data be x and the disturbance added in the t-th round be p _t The challenge sample generated by the t-th round is

The challenge samples generated at round t+1 are:

wherein ,

i.e. the challenge sample of the first round is the raw image data entered.

Further, in step S2, further includes: calculating the initial disturbance of the current round according to the countermeasure sample generated in the previous round, specifically:

inputting the countermeasure sample generated in the previous round into the deep learning model, and calculating the gradient of the deep learning model relative to the countermeasure sample generated in the previous round by using the counter propagation of the deep learning model, namely the initial disturbance of the current round.

Further, when the gradient attack method is PGD, a disturbance is added to the image in an iterative manner along the gradient increasing direction, specifically:

let the original image be x, the category be y, and the model beθThe loss function is L (theta, x, y), the gradient of the loss function to the original is

Let the coefficient alpha denote the limit value of each disturbance turn, the initial disturbance is obtained by processing the gradient through a sign function and multiplying the gradient by alpha:

wherein ,

for the initial disturbance of the t +1 th round, and (2)>

A gradient of the challenge sample generated for the opposite round.

Further, in step S2, further includes: calculating the CAM diagram of the current round according to the countermeasure sample generated in the previous round, specifically:

defining the CAM map for the challenge sample of the previous round using an image of the same size as the input image data;

differentiating contribution distribution conditions of different areas to a specified category in the CAM graph through the size of pixel values, wherein the larger the pixel value is, the higher the significance score is, the larger the contribution of the corresponding area to a prediction result of the specified category is, and on the t+1st round, representing the calculated CAM graph generated according to the countermeasure sample of the previous round as C _t+1 。

Further, in step S2, the initial disturbance and the CAM map calculated according to the challenge sample generated in the previous round of disturbance added in each round of the original image data are obtained by performing a weighting operation, specifically:

the input is the initial disturbance

The weight is the CAM pattern C _t+1 The output is a weighted disturbance, denoted p _t+1 Directly dot multiplying the initial perturbation with the CAM map, representing pixel-by-pixel weighting of the initial perturbation;

for a pixel in the image at position (i, j), the perturbation value of pixel (i, j) in the t+1 round is equal to the initial perturbation corresponding to pixel (i, j) multiplied by the fraction of the corresponding position of pixel (i, j) in the CAM map of the present round, namely:

the challenge samples eventually generated for each round are:

wherein, the ". As used herein, the product of Hadamard products is the multiplication of the corresponding position elements;

after having been subjected to an attack with said number of attacks, obtained

For the final challenge sample.

A challenge system for performing a challenge method using a class activation map as described above, comprising:

the challenge preparation module is used for loading the trained deep learning module for generating the challenge disturbance, setting the number of times of the challenge, and reading the original image data to be attacked;

the attack resisting module is used for attacking the original image data according to the attack times, wherein the disturbance added to each round of the original image data is obtained by weighting operation according to the initial disturbance calculated by the attack sample generated in the previous round and the CAM image;

and the final sample generation module is used for generating a final challenge sample after the attack of the attack times is carried out.

A computer device comprising a memory and one or more processors, the memory having stored therein computer code which, when executed by the one or more processors, causes the one or more processors to perform a method as described above.

A computer readable storage medium storing computer code which, when executed, performs a method as described above.

Compared with the prior art, the invention has at least one of the following beneficial effects:

(1) By providing a method for combating attacks using class activation graphs, comprising the steps of: s1: loading a trained deep learning module for generating disturbance resistance, setting attack times of attack resistance, and reading original image data to be attacked; s2: carrying out attack on the original image data according to the attack times, wherein the disturbance added to each round of the original image data is obtained by carrying out weighting operation on the initial disturbance and the CAM (CAM) graph calculated according to the countermeasure sample generated in the previous round; s3: after the number of attacks, a final challenge sample is generated. According to the technical scheme, the class activation diagram CAM is used for limiting the attack intensity of different areas in the image, so that the attack success rate is not influenced, and the disturbance quantity can be reduced.

(2) The attack resistance method using the class activation graph provided by the invention can be combined with all gradient attack methods, including a single-step attack method and an iterative attack method.

Drawings

FIG. 1 is a schematic view of an aerial raw image of a river of the type of the present invention;

FIG. 2 is a schematic view of a CAM image of a river according to the present invention;

FIG. 3 is a schematic view of an original image of a beach in accordance with the present invention;

FIG. 4 is a schematic view of a CAM image of beach in the category of the present invention;

FIG. 5 is a flowchart of a method for countering attacks using class activation graphs in accordance with the present invention;

FIG. 6 is a detailed flowchart of a challenge method utilizing a class activation diagram in accordance with the present invention;

fig. 7 is an overall block diagram of a challenge system utilizing a class activation diagram in accordance with the present invention.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The gradient-based anti-attack method is more widely applied in practice because of faster calculation speed, but has the disadvantage of large disturbance quantity of an anti-sample. Therefore, the invention aims to design a novel gradient-based attack method, and the disturbance quantity is reduced on the basis of the existing method.

With the development of deep learning interpretability theory, researchers have proposed class activation graphs (Clas sification Activation Map, CAM) that utilize feature visualization to analyze the decision principle of a model. Studies have found that for a given class, the deep neural network makes decisions with respect to certain areas of the image, rather than the entire image. As shown in fig. 1-4, two images of an AI D dataset (large aerial image dataset) and corresponding class activation map CAM are selected. As can be seen from fig. 1 and 2, the image class is river, the model is seen from class activation map CAM to pay more attention to the river channel section, the image class is beach, and the model is seen from class activation map CAM to pay more attention to the coastal junction.

Therefore, the invention fully utilizes the characteristic of the deep neural network, uses the class activation map CAM as a weight, and weights the disturbance so that the disturbance is added in the most focused area of the model. Thereby reducing the disturbance quantity while ensuring the success rate of attack. The invention can be used as a plug and play module and can be combined with any gradient-based method.

The following is described by way of specific examples:

first embodiment

As shown in fig. 5 and 6, the present embodiment provides a method for combating attacks using class activation graphs, including the steps of:

s1: loading a trained deep learning module for generating disturbance resistance, setting the attack times of the attack resistance and reading the original image data to be attacked.

Specifically, in this embodiment, a trained deep learning model for generating a subsequent disturbance countermeasure is first loaded, and the deep learning model may be any kind of deep learning model for disturbance countermeasure trained by the prior art, which is not described in detail in this embodiment because it is not the core invention of the present invention.

In addition, the gradient attack method used in the present invention is not limited in this embodiment, and the technical point of the present invention can be combined with any gradient-based method. The gradient attack method may be any one selected from FGSM, PGD, MIM, TIM. The maximum number of iterations (i.e., number of attacks) is set according to the type of gradient attack method selected. For example, FGSM is a single-step attack method, the maximum number of iterations is 1, pgd, MIM and TIM are iterative attack methods, and the maximum number of iterations is set according to actual needs. Taking PGD as an example in this embodiment, the maximum iteration number max_iter may be set to 5.

The original image data is then read, either as a single image or as a batch of images.

S2: and attacking the original image data according to the attack times, wherein the disturbance added to each round of the original image data is obtained by carrying out weighting operation on the initial disturbance and the CAM image calculated according to the countermeasure sample generated in the previous round.

Specifically, in the attack section, max_iter round attack is performed on the original image data. If counting from 1, then the start round is 1 and the end round is max_iter. Let the original image data be x and the disturbance added in the t-th round be p _t The challenge sample generated by the t-th round is

Then the challenge samples for each round are generated by the disturbance of the previous round of challenge sample addition for that round, then the challenge samples generated for round t+1 are:

wherein ,

i.e. the input of the challenge sample for the first round is the raw image data.

For each round of attack, the initial disturbance of the current round is first calculated from the challenge samples generated in the previous round. Here, the initial disturbance refers to a disturbance obtained directly by a gradient attack method, and corresponds to a weighted disturbance, and the specific calculation method of the initial disturbance is as follows: inputting the countermeasure sample generated in the previous round into the deep learning model, and calculating the gradient of the deep learning model relative to the countermeasure sample generated in the previous round by using the counter propagation of the deep learning model, namely the initial disturbance of the current round.

Taking a gradient attack method PGD as an example, adding disturbance to an image in an iterative manner along the gradient increasing direction, specifically: let the original image be x, the category be y, the model be θ, the loss function be L (θ, x, y), the gradient of the loss function to the original image be

wherein ,

for the initial disturbance of the t +1 th round, and (2)>

A gradient of the challenge sample generated for the opposite round.

And then calculating the CAM diagram of the current round according to the countermeasure sample generated in the previous round, wherein the CAM diagram is specifically: defining the CAM map for the challenge sample of the previous round using an image of the same size as the input image data; in the CAM, the contribution distribution conditions of different areas to the appointed category are distinguished by the size of the pixel value, the larger the pixel value is, the higher the significance score is, the larger the contribution of the corresponding area to the prediction result of the appointed category is, the range of the pixel value is [0,1]. CAM graphs have a series of algorithms, such as CAM, grad-CAM++, etc. Because the premise of CAM use is that a full connection layer in the network is replaced by a GAP layer, when the model structure is inconsistent, retraining is needed, and the CAM is more troublesome in practical application, and therefore, follow-up optimization algorithms of CAM, such as Grad-CAM, grad-CAM++, and the like are more commonly used. The specific algorithm is not described in detail. At round t+1, the CAM pattern generated from the challenge samples of the previous round is denoted as C _t+1 。

For each round of iterative attack (except the first round), the CAM map of the challenge sample of the previous round was calculated. In the first round of iterative attack and single step attack, the CAM map of the original image is computed.

The following is the key step of the present invention, namely the weighting operation of the initial countermeasure disturbance. The input is the initial disturbance

The weight is the CAM pattern C _t+1 The output is a weighted disturbance, denoted p _t+1 . The initial challenge disturbance and the CAM map are the same size as the original image, and therefore are the same size, and the initial disturbance and the CAM map are directly dot multiplied to represent pixel-by-pixel weighting of the initial disturbance;

the challenge samples eventually generated for each round are:

after having been subjected to an attack with said number of attacks, obtained

Furthermore, it should be noted that if the attack is successful in the iterative process, the attack may be terminated in advance. When the max_iter round is not reached, if the attack has succeeded, the iteration may be stopped.

In another embodiment, the CAM map may be binarized and then used as a mask against disturbance, which is different from the present embodiment in that the present embodiment does not binarize, but retains floating point numbers, and the weighting method is more refined.

S3: after the number of attacks, a final challenge sample is generated, i.e

Second embodiment

As shown in fig. 7, the present embodiment provides a challenge system for executing a challenge method using a class activation diagram as in the first embodiment, including:

the challenge preparation module 1 is used for loading a trained deep learning module for generating challenge, setting the number of times of attack of the challenge, and reading the original image data to be attacked;

the attack resistance module 2 is used for attacking the original image data according to the attack times, wherein the disturbance added in each round of the original image data is obtained by weighting operation according to the initial disturbance calculated by the attack resistance sample generated in the previous round and the CAM;

a final sample generation module 3, configured to generate a final challenge sample after the attack of the attack number.

A computer readable storage medium storing computer code which, when executed, performs a method as described above. Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

It should be noted that the above embodiments can be freely combined as needed. The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method of combating attacks using class activation graphs, comprising the steps of:

s3: after the number of attacks, a final challenge sample is generated.

2. The method for combating attacks using class activation graphs according to claim 1, wherein in step S1, further comprising:

selecting any gradient attack method including FGSM, PGD, MIM, TIM;

3. The method for combating attacks using class activation graphs according to claim 1, wherein in step S2, further comprising: the challenge sample generated by each round is generated by the challenge sample generated by the previous round adding the disturbance of the current round, in particular:

The challenge samples generated at round t+1 are:

wherein ,

4. The method for combating attacks using class activation graphs according to claim 1, wherein in step S2, further comprising: calculating the initial disturbance of the current round according to the countermeasure sample generated in the previous round, specifically:

5. The method for combating attacks using class activation graphs of claim 4, further comprising:

when the gradient attack method is PGD, adding disturbance to the image in an iterative manner along the gradient increasing direction, specifically:

let the original image be x, the category be y, the model be θ, the loss function be L (θ, x, y), the gradient of the loss function to the original image be

wherein ,

for the initial disturbance of the t +1 th round, and (2)>

A gradient of the challenge sample generated for the opposite round.

6. The method for combating attacks using class activation graphs of claim 5, wherein in step S2, further comprising: calculating the CAM diagram of the current round according to the countermeasure sample generated in the previous round, specifically:

7. The method of challenge attack in accordance with claim 6 wherein in step S2, the initial perturbation and the CAM map calculated from the challenge samples generated in the previous round are weighted by the perturbation added for each round of the original image data, specifically:

the input is the initial disturbance

the challenge samples eventually generated for each round are:

after having been subjected to an attack with said number of attacks, obtained

For the final challenge sample.

8. A challenge attack system for executing a challenge attack method using class activation diagrams of the class activation diagrams according to claims 1-7, comprising:

9. A computer device comprising a memory and one or more processors, the memory having stored therein computer code that, when executed by the one or more processors, causes the one or more processors to perform the method of any of claims 1-7.

10. A computer readable storage medium storing computer code which, when executed, performs the method of any one of claims 1 to 7.