CN113255816A - Directional attack countermeasure patch generation method and device - Google Patents
Directional attack countermeasure patch generation method and device Download PDFInfo
- Publication number
- CN113255816A CN113255816A CN202110646139.3A CN202110646139A CN113255816A CN 113255816 A CN113255816 A CN 113255816A CN 202110646139 A CN202110646139 A CN 202110646139A CN 113255816 A CN113255816 A CN 113255816A
- Authority
- CN
- China
- Prior art keywords
- loss
- countermeasure
- patch
- white
- attack
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000012546 transfer Methods 0.000 claims abstract description 28
- 238000009499 grossing Methods 0.000 claims abstract description 16
- 238000013508 migration Methods 0.000 claims abstract description 8
- 230000005012 migration Effects 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 7
- 230000003042 antagnostic effect Effects 0.000 claims description 4
- 230000035945 sensitivity Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 18
- 230000008569 process Effects 0.000 abstract description 4
- 238000013135 deep learning Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 241000282326 Felis catus Species 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 235000000332 black box Nutrition 0.000 description 3
- 230000007123 defense Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 210000003194 forelimb Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000003094 perturbing effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a directional attack countermeasure patch generation method and a device, wherein the method adopts a plurality of continuous white box models with different structures to iteratively update countermeasure patches so that the obtained target general countermeasure patch can have a better attack effect on a black box model with an unknown structure. By introducing triple loss, the success rate of outputting the target class can be improved in the process of directional attack. By introducing attention transfer loss, the migration effect of the target universal countermeasure patch on the model attention area can be improved, so that the directional attack effect of the target universal countermeasure patch is greatly improved. By introducing the smoothing loss, the difference between the pixel points of the target universal countermeasure patch can be reduced, and the attention of human eyes is not easily caused. Furthermore, by adding a mode of resisting patches, the directional attack can be simultaneously carried out at the physical level and the digital level, and the implementation is more convenient.
Description
Technical Field
The invention relates to the technical field of artificial intelligence security, in particular to a directional attack countermeasure patch generation method and device.
Background
Deep Neural Networks (DNNs) have achieved tremendous success in the fields of image classification, object detection, text classification, speech recognition, etc., and have been widely used in production and life. However, research in recent years has shown that deep learning networks are fragile and susceptible to challenge samples. The countermeasure sample is modified and disturbed by the clean sample, so that the trained neural network generates misclassification or misidentification, and the target task cannot be completed.
The existence of the countermeasure sample is twofold, on one hand, the countermeasure sample attacks or misleads the application generated based on the deep learning, such as the automobile driving and the face recognition system, thereby causing potential security threat, possibly causing economic loss or casualties. On the other hand, the training of the countermeasure sample on the deep neural network is valuable and beneficial, and the defense capability and robustness of the deep neural network can be effectively enhanced by using the countermeasure sample for performing the countermeasure training. Therefore, the research of the confrontation sample has an important promotion effect on the development of the artificial intelligence safety field. However, a method for generating a countermeasure patch for a black box model with an unknown structure is lacked in the prior art, and the application requirements for attack countermeasures and defense promotion of the black box model are difficult to meet.
Disclosure of Invention
The embodiment of the invention provides a directional attack anti-patch generation method and device, which are used for solving the problems that the generated anti-patch ignores the characteristics of mutual attention among models, the migration capability of the model attention area is weak, and the success rate is low when the directional attack is carried out on a black box model with an uncertain structure in the prior art.
The technical scheme of the invention is as follows:
in one aspect, the present invention provides a method for generating a directional attack countermeasure patch, including:
acquiring a plurality of white box models with the same task as a black box model to be attacked, wherein model structures and parameters of the white box models are different;
acquiring a random initialization counterattack patch, determining the target category of the directional attack, and updating and iterating the initialization counterattack patch by adopting each white box model in a plurality of continuous iteration cycles to obtain a target universal counterattack patch; wherein the output of a preceding iteration loop is taken as the input of a following iteration loop, each iteration loop comprising:
obtaining a plurality of undisturbed clean pictures, inputting each clean picture into a first white box model corresponding to a current iteration cycle, and outputting a first prediction contribution weight matrix and a first attention key area corresponding to each clean picture according to attention characteristics of the first white box model;
replacing and connecting random positions in each clean picture by adopting a first countermeasure patch input by the current iteration cycle to obtain a countermeasure sample corresponding to each clean picture;
adding the target category into labels of each countermeasure sample, inputting the labels into the first white box model, and calculating joint loss by adopting a preset loss function, wherein the preset loss function at least comprises countermeasure loss, attention transfer loss, triple loss and smooth loss, and the attention transfer loss is calculated according to a first prediction contribution weight matrix corresponding to each clean picture, a first attention key area and a random position adopted when the first countermeasure patch is connected;
and performing back propagation to update the countermeasure patch according to the combined loss value by a gradient descent method, repeating iteration, inputting a countermeasure sample corresponding to each iteration into the black box model to obtain a first confidence coefficient of an output target class, stopping iteration and outputting the current first countermeasure patch when the first confidence coefficient is greater than a preset confidence coefficient or the iteration frequency reaches a preset value.
In some embodiments, the predetermined loss function is a joint loss of opponent loss, attention transfer loss, triple loss, and smoothing loss, calculated as follows:
wherein,is the preset loss function;outputting the countermeasure loss for the target class label associated with the probability;for the attention transfer loss associated with the first white-box model region of interest migration,a weight coefficient for the attention transfer loss;in order for the loss of the smoothness to be said,a weight coefficient that is the smoothing loss;for the loss of the triplet in question,the weight coefficients lost for the triples.
wherein,in order to combat the sample,probability of a target class output by a softmax layer after inputting a challenge sample into the first white-box model.
wherein,the first prediction contribution weight matrix is used for representing the contribution degree of each area of the confrontation sample to model prediction;is the first attention key region; mask is a binary mask marking the location of the first countermeasure patch,the area value of the first anti-patch is 1, and the rest are 0;
wherein,,a feature map representing the output of the last convolutional layer of the first white-box modelTo (1) aA channel pairThe degree of sensitivity of the class object,representing an output probability of a tth class target of the first white-box model;outputting a characteristic diagram of the last layer of convolution layer of the white box model;is a normalization constant;andrespectively representing the row sequence and the line sequence corresponding to the pixels in the image;
in some embodiments, the triplet penalty is calculated as:
wherein,,for the first pair of anti-samples,is a one-hot vector of the target class,is a one-hot vector of the true class,a logits value for a target class label derived for the first antagonizing sample input to the first white-box model,is a threshold value.
In some embodiments, the smoothing loss is calculated as:
In some embodiments, the initialization countermeasure patch is generated in a set size and shape.
In some embodiments, the initialization countermeasure patch is gaussian distributed-compliant noise.
In another aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method are implemented.
In another aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the steps of the above-mentioned method.
The invention has the beneficial effects that:
in the directional attack countermeasure patch generation method and device, the method adopts a plurality of continuous white box models with different structures to update the countermeasure patch in an iterative manner, so that the obtained target universal countermeasure patch can have a better attack effect on a black box model with an unknown structure. By introducing triple loss, the success rate of outputting the target class can be improved in the process of directional attack. By introducing attention transfer loss, the migration effect of the target universal countermeasure patch on the model attention area can be improved, so that the directional attack effect of the target universal countermeasure patch is greatly improved. By introducing the smoothing loss, the difference between the pixel points of the target universal countermeasure patch can be reduced, and the attention of human eyes is not easily caused.
Furthermore, by adding a mode of resisting patches, the directional attack can be simultaneously carried out at the physical level and the digital level, and the implementation is more convenient.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
fig. 1 is a schematic flow chart of a method for generating a directional attack countermeasure patch according to an embodiment of the present invention;
FIG. 2 is a comparison of Vgg16, Resnet50 and inclusion V3 models to the same region of interest;
fig. 3 is a logic diagram of a method for generating a directional attack countermeasure patch according to an embodiment of the present invention;
FIG. 4 is a logic diagram of a single iteration loop in the directional attack countermeasure patch generation method of FIG. 3;
fig. 5 is a logic diagram of a method for generating a directional attack countermeasure patch according to another embodiment of the present invention;
fig. 6 is a logic diagram of a single iteration loop in the directional attack countermeasure patch generation method described in fig. 5.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.
In the era of Deep Learning (DL) algorithm-driven data computation, it is important to ensure the safety and robustness of the algorithm. The increase of computer processing capability at present enables Deep Learning (DL) to be widely applied to processing various Machine Learning (ML) tasks, such as image classification, natural language processing, game theory and the like, but the potential safety hazard of deep learning is also exposed. It has been found that by adding specific noise or disturbances to benign samples, the neural network model can be easily tricked into making false judgments, and the added noise or disturbances are difficult to perceive. This counter-attack poses a significant risk to the generation of neural network models deployed in life. The counterattack can be classified into white-box attack and black-box attack according to the degree of recognition of the model structure and parameters by an attacker. In a white-box attack, an attacker has complete knowledge about his white-box model, including the model architecture and parameters; in the black box attack, an attacker does not know the structure and parameters of the black box model.
For research and defense against attacks, firstly, a countermeasure sample capable of causing model error prediction needs to be generated, and the countermeasure sample can be generated by perturbing a clean picture (original benign sample). For the white-box model in the white-box attack, since the structure and parameters of the white-box model are known, the countermeasure sample can be made by various methods. For the black box model in the black box attack, the returned results of query access are mostly relied on to generate the countermeasure sample. Based on the difference of attack effect, the method can be further divided into directional attack and non-directional attack. The directional attack means that the attack resisting method can specify the type after the attack aiming at the input sample, and the difficulty of specifying the specific type after the attack is higher. The non-directional attack means that the specific category is not concerned as long as the predicted result of the countercheck sample generated by the attack method is not the correct category.
In the prior art, the mode of generating the countermeasure sample for the white box model is mature, and a higher countermeasure effect can be obtained by generating the countermeasure patch and randomly replacing the countermeasure patch in a clean picture. When the method is applied to the black box model, the countermeasure effect is poor due to the fact that the countermeasure patch cannot be generated efficiently. Furthermore, under the condition of directional attack, because the structure and parameters of the black box model are unknown, the difficulty of completing the directional attack is higher, and the prior art cannot generate corresponding countersamples.
The invention provides a directional attack countermeasure patch generation method, which is used for generating a universal patch capable of being applied to various pictures to carry out directional attack on a specific black box model, and as shown in figures 1, 3 and 5, the method comprises the following steps of S101-S102:
it should be noted that, the steps S101 to S102 and S1021 to S1024 described in this embodiment are not limited to the order of the steps, and it should be understood that, under certain conditions, some steps may be parallel or the order may be changed.
Step S101: and acquiring a plurality of white box models with the same task as the black box model to be attacked, wherein the model structures and parameters of the white box models are different.
Step S102: and obtaining a random initialization counterpatch, determining the target category of the directional attack, and updating and iterating the initialization counterpatch by adopting each white box model in a plurality of continuous iteration cycles to obtain a target universal counterpatch.
The output of the previous iteration loop is used as the input of the next iteration loop, as shown in fig. 4 and 6, each iteration loop comprises steps S1021 to S1024:
step S1021: the method comprises the steps of obtaining a plurality of undisturbed clean pictures, inputting each clean picture into a first white box model corresponding to a current iteration cycle, and outputting a first prediction contribution weight matrix and a first attention key area corresponding to each clean picture according to attention characteristics of the first white box model.
Step S1022: and replacing and connecting the random positions in each clean picture by adopting a first countermeasure patch input by the current iteration cycle to obtain a countermeasure sample corresponding to each clean picture.
Step S1023: and adding the target category into labels of each pair of resisting samples, inputting the labels into a first white box model, and calculating joint loss by adopting a preset loss function, wherein the preset loss function at least comprises resisting loss, attention transfer loss, triple loss and smooth loss, and the attention transfer loss is calculated according to a first prediction contribution weight matrix corresponding to each clean picture, a first attention key area and a random position adopted when connecting a first resisting patch.
Step S1024: and performing back propagation to update the countermeasure patch according to the combined loss value by a gradient descent method, repeating iteration, inputting the countermeasure sample corresponding to each iteration into the black box model to obtain a first confidence coefficient of an output target class, and stopping iteration and outputting the current first countermeasure patch when the first confidence coefficient is greater than a preset confidence coefficient or the iteration times reach a preset value.
In step S101, in order to obtain an efficient countermeasure patch for the black box model, the embodiment updates the iterative countermeasure patch with a plurality of white box models with different structures and parameters to obtain higher robustness. Specifically, the tasks executed by the white box models are the same as those of the black box model to be attacked, and different neural network structures can be adopted among the white box models, or models with different parameters generated by training based on the same neural network structure can be adopted. The architecture, parameter values, training methods of the respective white-box models should be known.
In step S102, an initialization patch, which may be generated in a set size and shape, is randomly generated. Further, the size of the clean picture can be scaled according to a set ratio, and the size of the patch is initialized. The shape of the initialization patch can be set according to the needs of the actual application scenario, such as a circle, an ellipse, a square, a rectangle, or other shapes. Since the present embodiment is used for the directional attack, it is further necessary to determine the target category of the directional attack, that is, the result that is desired to be finally output after the black box model is processed. In step S102, the initialization patch is continuously updated by using a plurality of white-box models with known structures and parameters based on a gradient descent method, so that the continuously updated countermeasure patch can adapt to white-box models with different structures and different parameters, that is, can better adapt to a black-box model. Each white-box model is subjected to an iterative loop, so that the input counterpatch adapts to the white-box model corresponding to the current loop.
Specifically, in an iterative loop, the countermeasure patch is propagated backward by a gradient descent method under the condition that the structure and parameters of a single white-box model are known. In step S1021, the clean image may be obtained from an existing database, or may be acquired according to actual needs. Specifically, in order to introduce the attention feature of the model to the input image and serve as a basis for subsequently adjusting the countercheck patch, the obtained clean picture is input into the first white-box model corresponding to the current iteration loop in the embodiment, and since the structure and parameters of the first white-box model are known, the first prediction contribution weight matrix and the first attention key region can be calculated. Specifically, after the clean image is input into the first white-box model, a feature map output by the last convolution layer of the first white-box model is obtained. ThenClass object the first of the feature mapThe sensitivity of each channel isFurther willThe weighted linear combination of the feature maps of the last layer is sent to an activation function as a weight to obtain a first prediction contribution weight matrix。
wherein,a feature map representing the output of the last convolutional layer of the first white-box modelTo (1) aA channel pairThe degree of sensitivity of the class object,represents the first white-box modelOutput probability of class object;outputting a characteristic diagram of the last layer of convolution layer of the white box model;is a normalization constant;andrespectively representing the column sequence number and the line sequence number corresponding to the pixel in the image.
wherein, the threshold value can be adjusted according to the actual application requirement.
In step S1022, each clean picture is processed by using the first countermeasure patch input into the current iteration loop, so as to obtain a countermeasure sample. The first anti-patch is added in the clean picture in a random replacement connection mode, and the connection position can adopt a two-dimensional maskThe mark is marked on the surface of the substrate,the area value of the countermeasure patch is 1, and the rest is 0. Thus, the challenge sample may be expressed as:
In a specific implementation process, the first anti-aliasing patch can be randomly translated, scaled and rotated and then connected with the clean picture.
In step S1023, based on the first white-box model with known structure and parameters, the first countermeasure patch is subjected to update iteration so as to satisfy the condition that the directional attack is completed on the first white-box model. Specifically, in order to enable the first countermeasure patch to effectively migrate the region of interest of the first white-box model to the position where the patch is located, the embodiment introduces attention-shifting loss in the loss function; in order to distinguish the target class obtained by the generated confrontation sample from the original class, the embodiment introduces the triple loss, so that the features of the same label are as close as possible in spatial position, and the features of different labels are as far as possible in spatial position, and meanwhile, in order to prevent the features of the sample from being aggregated into a very small space, the requirement for the positive examples of the same class is metAnd a negative example, the distance of the negative example is at least a threshold value farther than that of the positive example. Further, in order to improve the naturalness of the countermeasure patch, it is in conformity with the vision of human eyes, and a smoothing loss is also introduced. In addition to this, there is a penalty on the success rate of the attack, i.e. the probability of the target class tag output. The synthetic confrontation loss, the attention transfer loss, the triple loss and the smoothing loss constitute a preset loss function.
Specifically, in some embodiments, the predetermined loss function is a combined loss of the opponent loss, the attention-transfer loss, the triple loss, and the smoothing loss, and is calculated as follows:
wherein,is a preset loss function;outputting a probability-related confrontation loss for the target class label;for the attention transfer loss associated with the first white-box model region of interest migration,a weight coefficient that is a loss of attention transfer;in order to smooth out the losses,a weight coefficient that is a smoothing loss;for the loss of a triplet, the loss of the triplet,the weight coefficients lost for the triples.
wherein,in order to combat the sample,probability of target class output by softmax layer after inputting the first white-box model for the challenge sample.
wherein,a first prediction contribution weight matrix used for representing the contribution degree of each area of the confrontation sample to the model prediction;is a first attention critical region; mask is a binary mask that marks where the first countermeasure patch is located,the area value of the first countermeasure patch is 1, and the rest are 0.
Specifically, different neural network models first extract different features before making a correct decision, and then assign appropriate weights to the features, i.e., assign appropriate attention to the extracted features. Although different model network architectures differ, the features of interest to the model tend to be the same. As shown in fig. 2, when Vgg16, respet 50 and inclusion V3 identify images of cats, there are significant differences in the areas of interest of the three models (the highlighted areas indicated by the arrows are the areas of interest), Vgg16 focuses only on the face of the cat, respet 50 focuses on the face and neck of the cat, and inclusion V3 combines features of the face, neck and part of the forelimb of the cat, but overall all models tend to focus on features related to the face of the cat. In view of such characteristics, the present embodiment may cause the first countermeasure patch to suppress the attention feature of the first white-box model and transfer the attention feature of the first white-box model from the target region to the non-target region in the update iteration by introducing attention transfer loss, which may cause misclassification of the model since the first white-box model no longer focuses on objects within the key region.
In some embodiments, the triplet penalty is calculated as:
wherein,for the first pair of anti-samples,is a one-hot vector of the target class,is a one-hot vector of the true class,the logits values for the target class labels obtained for the first antagonizing sample input to the first white-box model,is a threshold value.
In a directional attack, the loss function is always only related to the object class. However, the generated challenge samples may be too close to the original classes, and thus the challenge samples are still classified as original classes by the target model. Thus, the present embodiment introduces triple lossesThe goal of (1) is to make features of the same label as close in spatial location as possible while features of different labels are as far apart in spatial location as possible, and to not have features of the sample converge into a very small space requires that for positive and negative examples of the same class, the negative example is at least a threshold away from the positive example。
In some embodiments, the smoothing loss is calculated as:
In order to further improve the naturalness of the countermeasure patch to conform to the vision of human eyes, the smoothness of the countermeasure patch can be improved by reducing the square of the difference between adjacent pixels. Smoothing is useful to improve the robustness of the antagonistic case in the physical environment. This embodiment therefore introduces a smoothing penalty in the physical environment.
In step S1024, the first countermeasure patch is updated by using a gradient descent method according to the joint loss, the countermeasure sample obtained in each iteration is input into the black box model to be attacked, and a first confidence coefficient about the target class is output, where the first confidence coefficient may be used as a parameter for determining whether to end the iteration of updating the first countermeasure patch. And only when the first confidence coefficient meets the requirement, the first anti-patch obtained through updating iteration is considered to be capable of generating the expected attack effect on the black box model to be attacked. Or stopping updating when the set iteration number reaches a set value.
In another aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method are implemented.
In another aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the steps of the above-mentioned method.
The invention is illustrated below with reference to specific examples:
example 1
The embodiment provides a directional attack anti-patch generation method, which is used for performing directional attack on a black box model executing a specific task, and as shown in fig. 3 and 4, the method specifically includes the following steps:
1. and acquiring a plurality of white box models with the same task as the black box model to be attacked, wherein the model structures and parameters of the white box models are different.
2. And obtaining a random initialization counterpatch, determining the target category of the directional attack, and updating and iterating the initialization counterpatch by adopting each white box model in a plurality of continuous iteration cycles to obtain a target universal counterpatch. Through a plurality of white box models with known structures and parameters, the countercheck patches are continuously updated in an iterative manner, so that the final countercheck patches can have universality for all the white box models, and directional attack on the black box model to be attacked can be realized under the task.
Specifically, when a white-box model is used to iteratively update a countermeasure patch, the method includes:
2.1, obtaining a plurality of undisturbed clean pictures, inputting each clean picture into a first white box model corresponding to the current iteration cycle, and outputting a first prediction contribution weight matrix and a first attention key area corresponding to each clean picture according to the attention feature of the first white box model. The feature of interest of the first white-box model for each clean picture is obtained here in order to calculate the migration effect of the first anti-patch on the feature of interest of the model as a parameter in step 2.3.
And 2.2, replacing and connecting the random positions in each clean picture by adopting a first countermeasure patch input by the current iteration cycle to obtain a countermeasure sample corresponding to each clean picture.
And 2.3, adding the target category into labels of each countermeasure sample, inputting the labels into a first white box model, and calculating joint loss by adopting a preset loss function, wherein the preset loss function at least comprises countermeasure loss, attention transfer loss, triple loss and smooth loss, and the attention transfer loss is calculated according to a first prediction contribution weight matrix corresponding to each clean picture, a first attention key area and a random position adopted when connecting a first countermeasure patch.
And 2.4, performing back propagation to update the countermeasure patch according to the joint loss value by a gradient descent method, repeating iteration, stopping iteration when the iteration times reach a preset value, and outputting the current first countermeasure patch.
Finally, the countercheck patch output by the previous iteration loop is used as the input of the next iteration loop, and through continuous updating iteration, the countercheck patch with universality is finally obtained and can be used for directional attack on the black box model to be attacked.
Example 2
On the basis of embodiment 1, as shown in fig. 5 and fig. 6, in each iteration loop, the countermeasure sample of each update iteration is input to the black box model to be attacked to output a first confidence degree about the target class, the first confidence degree is used as a condition for stopping the update, when the first confidence degree reaches a preset confidence degree, the update in the current iteration loop is stopped, and the current first countermeasure patch is output.
The method has the advantages that most of the existing attack methods generate pixel-level perturbation and are superposed on the original image, so that the method is difficult to realize in the physical world, and the countermeasure patch generated by the method can be printed out and has certain practical significance when being applied to the physical world. The existing method ignores the common concerned characteristics among the models, does not utilize the common concerned characteristics among the models, and has the same attack effect on a white box but has poor attack effect on a black box. The attention transfer loss adopted in the invention inhibits the characteristics concerned by different models, and transfers the characteristics concerned by the models from the key area to the area where the counterpatch is located, thus, the models are misclassified because the models do not concern objects in the key area. Therefore, the effect of the attack on the black box is good. The method and the device introduce triple losses in face recognition aiming at the requirement of the directional attack, and can improve the attack success rate of the directional attack. From the visual angle, if the difference between the pixel values of the adjacent pixel points is too large, the pixel values of the adjacent pixel points are unnatural, and the attention of human eyes is easy to draw. Therefore, smooth loss is provided, and the phenomenon that the pixel value difference of adjacent pixel points is too large is avoided. In an attack mode, the method is connected with a plurality of white box models in series to continuously fit the gradient of the black box, finally generates a universal countermeasure patch, and improves the success rate of resisting the attack of the sample on the black box.
In summary, in the directional attack countermeasure patch generation method and apparatus, the method employs a plurality of continuous white-box models with different structures to iteratively update the countermeasure patch, so that the obtained target universal countermeasure patch can have a better attack effect on the black-box model with an unknown structure. By introducing triple loss, the success rate of outputting the target class can be improved in the process of directional attack. By introducing attention transfer loss, the migration effect of the target universal countermeasure patch on the model attention area can be improved, so that the directional attack effect of the target universal countermeasure patch is greatly improved. By introducing the smoothing loss, the difference between the pixel points of the target universal countermeasure patch can be reduced, and the attention of human eyes is not easily caused.
Furthermore, by adding a mode of resisting patches, the directional attack can be simultaneously carried out at the physical level and the digital level, and the implementation is more convenient.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A directional attack countermeasure patch generation method is characterized by comprising the following steps:
acquiring a plurality of white box models with the same task as a black box model to be attacked, wherein model structures and parameters of the white box models are different;
acquiring a random initialization counterattack patch, determining the target category of the directional attack, and updating and iterating the initialization counterattack patch by adopting each white box model in a plurality of continuous iteration cycles to obtain a target universal counterattack patch; wherein the output of a preceding iteration loop is taken as the input of a following iteration loop, each iteration loop comprising:
obtaining a plurality of undisturbed clean pictures, inputting each clean picture into a first white box model corresponding to a current iteration cycle, and outputting a first prediction contribution weight matrix and a first attention key area corresponding to each clean picture according to attention characteristics of the first white box model;
replacing and connecting random positions in each clean picture by adopting a first countermeasure patch input by the current iteration cycle to obtain a countermeasure sample corresponding to each clean picture;
adding the target category into labels of each countermeasure sample, inputting the labels into the first white box model, and calculating joint loss by adopting a preset loss function, wherein the preset loss function at least comprises countermeasure loss, attention transfer loss, triple loss and smooth loss, and the attention transfer loss is calculated according to a first prediction contribution weight matrix corresponding to each clean picture, a first attention key area and a random position adopted when the first countermeasure patch is connected;
and performing back propagation to update the countermeasure patch according to the combined loss value by a gradient descent method, repeating iteration, inputting a countermeasure sample corresponding to each iteration into the black box model to obtain a first confidence coefficient of an output target class, stopping iteration and outputting the current first countermeasure patch when the first confidence coefficient is greater than a preset confidence coefficient or the iteration frequency reaches a preset value.
2. The method of generating a directional-attack-countermeasure patch according to claim 1, wherein the predetermined loss function is a joint loss of a countermeasure loss, an attention-transfer loss, a triple loss, and a smoothing loss, and is calculated as follows:
wherein,is the preset loss function;outputting the countermeasure loss for the target class label associated with the probability;for the attention transfer loss associated with the first white-box model region of interest migration,a weight coefficient for the attention transfer loss;in order for the loss of the smoothness to be said,a weight coefficient that is the smoothing loss;for the loss of the triplet in question,the weight coefficients lost for the triples.
3. A directional-attack-countermeasure patch generation method as claimed in claim 2, wherein the countermeasure loss isThe calculation formula of (A) is as follows:
4. A directed attack countermeasure patch generation method as claimed in claim 3, wherein the attention diversion lossThe calculation formula of (A) is as follows:
wherein, a weight matrix is contributed to the first prediction and is used for representing the contribution degree of each area of the confrontation sample to the model prediction;is the first attention key region; mask is a binary mask marking the location of the first countermeasure patch,the area value of the first anti-patch is 1, and the rest are 0;
wherein,,a feature map representing the output of the last convolutional layer of the first white-box modelTo (1) aA channel pairThe degree of sensitivity of the class object,representing an output probability of a tth class target of the first white-box model;outputting a characteristic diagram of the last layer of convolution layer of the white box model;is a normalization constant;andrespectively representing the row sequence and the line sequence corresponding to the pixels in the image;
5. the method of generating a directional-attack-countermeasure patch according to claim 4, wherein the triple penalty is calculated as:
7. A directional attack countermeasure patch generation method according to claim 1, wherein the initialization countermeasure patch is generated in a set size and shape.
8. A directional attack countermeasure patch generation method according to claim 1, wherein the initialization countermeasure patch is gaussian distributed-compliant noise.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110646139.3A CN113255816B (en) | 2021-06-10 | 2021-06-10 | Directional attack countermeasure patch generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110646139.3A CN113255816B (en) | 2021-06-10 | 2021-06-10 | Directional attack countermeasure patch generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113255816A true CN113255816A (en) | 2021-08-13 |
CN113255816B CN113255816B (en) | 2021-10-01 |
Family
ID=77187320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110646139.3A Active CN113255816B (en) | 2021-06-10 | 2021-06-10 | Directional attack countermeasure patch generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113255816B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113689338A (en) * | 2021-09-08 | 2021-11-23 | 北京邮电大学 | Method for generating scaling robustness countermeasure patch |
CN113792806A (en) * | 2021-09-17 | 2021-12-14 | 中南大学 | Anti-patch generation method |
CN114742170A (en) * | 2022-04-22 | 2022-07-12 | 马上消费金融股份有限公司 | Countermeasure sample generation method, model training method, image recognition method and device |
CN115544499A (en) * | 2022-11-30 | 2022-12-30 | 武汉大学 | Migratable black box anti-attack sample generation method and system and electronic equipment |
CN117253094A (en) * | 2023-10-30 | 2023-12-19 | 上海计算机软件技术开发中心 | Method, system and electronic equipment for generating contrast sample by image classification system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956185A (en) * | 2019-11-21 | 2020-04-03 | 大连理工大学人工智能大连研究院 | Method for detecting image salient object |
CN111898645A (en) * | 2020-07-03 | 2020-11-06 | 贵州大学 | Movable sample attack resisting method based on attention mechanism |
CN112085069A (en) * | 2020-08-18 | 2020-12-15 | 中国人民解放军战略支援部队信息工程大学 | Multi-target countermeasure patch generation method and device based on integrated attention mechanism |
US20210012188A1 (en) * | 2019-07-09 | 2021-01-14 | Baidu Usa Llc | Systems and methods for defense against adversarial attacks using feature scattering-based adversarial training |
-
2021
- 2021-06-10 CN CN202110646139.3A patent/CN113255816B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210012188A1 (en) * | 2019-07-09 | 2021-01-14 | Baidu Usa Llc | Systems and methods for defense against adversarial attacks using feature scattering-based adversarial training |
CN110956185A (en) * | 2019-11-21 | 2020-04-03 | 大连理工大学人工智能大连研究院 | Method for detecting image salient object |
CN111898645A (en) * | 2020-07-03 | 2020-11-06 | 贵州大学 | Movable sample attack resisting method based on attention mechanism |
CN112085069A (en) * | 2020-08-18 | 2020-12-15 | 中国人民解放军战略支援部队信息工程大学 | Multi-target countermeasure patch generation method and device based on integrated attention mechanism |
Non-Patent Citations (3)
Title |
---|
CHANGCHUN ZHANG等: "Transferable attention networks for adversarial domain", 《INFORMATION SCIENCE》 * |
MAOSEN LI等: "Towards Transferable Targeted Attack", 《 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
陈珂: "基于启发式搜索的神经网络对抗样本生成方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113689338A (en) * | 2021-09-08 | 2021-11-23 | 北京邮电大学 | Method for generating scaling robustness countermeasure patch |
CN113689338B (en) * | 2021-09-08 | 2024-03-22 | 北京邮电大学 | Method for generating scaling robustness countermeasure patch |
CN113792806A (en) * | 2021-09-17 | 2021-12-14 | 中南大学 | Anti-patch generation method |
CN114742170A (en) * | 2022-04-22 | 2022-07-12 | 马上消费金融股份有限公司 | Countermeasure sample generation method, model training method, image recognition method and device |
CN114742170B (en) * | 2022-04-22 | 2023-07-25 | 马上消费金融股份有限公司 | Countermeasure sample generation method, model training method, image recognition method and device |
CN115544499A (en) * | 2022-11-30 | 2022-12-30 | 武汉大学 | Migratable black box anti-attack sample generation method and system and electronic equipment |
CN117253094A (en) * | 2023-10-30 | 2023-12-19 | 上海计算机软件技术开发中心 | Method, system and electronic equipment for generating contrast sample by image classification system |
CN117253094B (en) * | 2023-10-30 | 2024-05-14 | 上海计算机软件技术开发中心 | Method, system and electronic equipment for generating contrast sample by image classification system |
Also Published As
Publication number | Publication date |
---|---|
CN113255816B (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113255816B (en) | Directional attack countermeasure patch generation method and device | |
CN109948658B (en) | Feature diagram attention mechanism-oriented anti-attack defense method and application | |
Huang et al. | Adversarial attacks on neural network policies | |
Carlini et al. | Towards evaluating the robustness of neural networks | |
Wang et al. | Fca: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack | |
CN111475797B (en) | Method, device and equipment for generating countermeasure image and readable storage medium | |
CN111753881B (en) | Concept sensitivity-based quantitative recognition defending method against attacks | |
CN111737691B (en) | Method and device for generating confrontation sample | |
CN112215251A (en) | System and method for defending against attacks using feature dispersion based countermeasure training | |
CN111340214A (en) | Method and device for training anti-attack model | |
CN112396129A (en) | Countermeasure sample detection method and general countermeasure attack defense system | |
CN110941794A (en) | Anti-attack defense method based on universal inverse disturbance defense matrix | |
CN111027628B (en) | Model determination method and system | |
CN111754519B (en) | Class activation mapping-based countermeasure method | |
Gragnaniello et al. | Perceptual quality-preserving black-box attack against deep learning image classifiers | |
CN113066002A (en) | Generation method of countermeasure sample, training method of neural network, training device of neural network and equipment | |
CN111178504B (en) | Information processing method and system of robust compression model based on deep neural network | |
CN113935396A (en) | Manifold theory-based method and related device for resisting sample attack | |
CN113435264A (en) | Face recognition attack resisting method and device based on black box substitution model searching | |
Khan et al. | A hybrid defense method against adversarial attacks on traffic sign classifiers in autonomous vehicles | |
CN115481716A (en) | Physical world counter attack method based on deep network foreground activation feature transfer | |
Guesmi et al. | Advart: Adversarial art for camouflaged object detection attacks | |
Zhao et al. | Resilience of pruned neural network against poisoning attack | |
CN113240080A (en) | Prior class enhancement based confrontation training method | |
CN117011508A (en) | Countermeasure training method based on visual transformation and feature robustness |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |