CN114693973A - Black box confrontation sample generation method based on Transformer model - Google Patents

Black box confrontation sample generation method based on Transformer model Download PDF

Info

Publication number
CN114693973A
CN114693973A CN202210332993.7A CN202210332993A CN114693973A CN 114693973 A CN114693973 A CN 114693973A CN 202210332993 A CN202210332993 A CN 202210332993A CN 114693973 A CN114693973 A CN 114693973A
Authority
CN
China
Prior art keywords
coding block
model
original image
self
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210332993.7A
Other languages
Chinese (zh)
Inventor
刘琚
韩艳阳
刘晓玺
顾凌晨
江潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202210332993.7A priority Critical patent/CN114693973A/en
Publication of CN114693973A publication Critical patent/CN114693973A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention provides a black box confrontation sample generation method based on a Transformer model, and belongs to the technical field of artificial intelligence. The method mainly considers the influence of different coding blocks on the resistance sample performance, and classifies the coding block performance by adopting the coding block weight fraction. Different strategies are used for different coding blocks to generate the countermeasure samples, the influence of image and model information on disturbance is balanced, the updating direction of the disturbance is stabilized, and the attack success rate and the migration capacity of the countermeasure samples are improved. And finally, designing self-adaptive weight to adjust the disturbance size of the key pixel point, so that the attack capability of the anti-disturbance is improved under the condition that human eyes cannot perceive the disturbance. The invention obviously improves the transfer capability of the confrontation sample, can effectively evaluate and improve the safety of the artificial intelligence technology, and fully proves the effectiveness of the method by testing on the image classification task.

Description

Black box confrontation sample generation method based on Transformer model
Technical Field
The invention relates to a black box confrontation sample generation method based on a Transformer model, and belongs to the technical field of artificial intelligence.
Background
With the rapid development of artificial intelligence technology, neural network models play an important role in various social fields, especially in the computer vision field, such as face recognition, automatic driving, social security and the like. However, studies have shown that neural network models are very susceptible to challenge samples with high mobility: on the local model, by adding noise to the original image, an image visually similar to the original image is generated, resulting in an output error of other neural network models. As more and more neural network models are applied in real scenes on a large scale, the safety of the neural network models raises concerns. Meanwhile, the unexplainable property of the neural network model enables the model performance to depend on the trained data set, and the improvement and the wide application of the model performance are limited. Therefore, how to establish a credible artificial intelligence technology and evaluate and improve the safety of the neural network model are problems to be solved urgently at present.
In order to improve the safety of the neural network model in the application field and reduce the potential threat of an attacker in a real scene, on one hand, the safety performance of the model can be evaluated by testing the neural network model by using a countermeasure sample; on the other hand, by adding countermeasure samples in the original training set for training, the stability and safety of the model can be improved. Therefore, the confrontation sample becomes an important method for evaluating and improving the performance of the model, and is a hot research problem in the field of artificial intelligence. At present, various types of neural network model architectures, such as convolutional neural networks, generative confrontation networks, ViTs (vision transforms), etc., have appeared, wherein the ViTs can better integrate global information due to having a wider receptive field, and especially when a large-scale data set training model is adopted, the model architecture has very excellent performance in many computer vision fields, including target detection, image classification, etc. The existing anti-attack technology is mostly provided for a convolutional neural network, excessively depends on convolutional neural network model information, is difficult to generate an anti-sample capable of attacking multiple different types of models simultaneously, has the defects of low attack success rate, weak migration capability and the like on a ViTs model, and cannot be used for evaluating and improving the safety of the ViTs. Therefore, it is important to study the method of generation of confrontational samples based on ViTs.
Disclosure of Invention
In order to solve the problems of low attack success rate and weak migration capability on a ViTs model in the prior art, the invention provides a method for generating a black box countermeasure sample based on a Transformer model, which has strong attack performance and reduces the dependence degree on model information, thereby improving the migration capability of the countermeasure sample. Under the black box scene, various convolutional neural networks and ViTs models can be attacked at the same time. In the training stage of the model, the confrontation sample generated by the method is fused with the original training set, so that the comprehensive performance of the model can be effectively improved. Meanwhile, the invention provides a more effective evaluation method for the safety of the neural network model in the application scene.
The technical scheme adopted by the invention is as follows:
a black box countermeasure sample generation method based on a Transformer model is characterized in that aiming at different coding block attributes in a Transformer, the method adopts two strategies to jointly generate a countermeasure sample, aiming at a robust coding block, a cross entropy loss strategy is used, aiming at a non-robust coding block, a self attention characteristic loss strategy is used, the method can avoid redundancy of model information, and migration capacity of the countermeasure sample is improved, and the method is characterized by comprising the following steps:
step 1: acquiring an original image and a corresponding label to form an original data set, and initializing a countermeasure disturbance with the same size as the original image;
step 2: generating a plurality of noise images with the same size as the original image to form a negative sample data set;
and step 3: dividing a coding block in a visual Transformer, namely an ViT model into a robust coding block and a non-robust coding block according to the original image and the negative sample data set; the ViT model is a pre-training model and comprises a plurality of coding blocks, and each coding block can extract classification information for calculating a target loss value;
and 4, step 4: linearly superposing the counterdisturbance and the original image to input the counterdisturbance and the original image into the ViT model, iteratively updating the counterdisturbance until an iteration stopping condition is met, outputting a final counterdisturbance, and linearly superposing the final counterdisturbance and the original image to obtain a corresponding countersample;
particularly, forming a negative sample data set in step 2 specifically includes:
determining the number of images contained in a negative sample data set corresponding to the original image as M;
obtaining the mth image x' in the negative sample data set corresponding to the original image according to the following formula:
Figure BDA0003575729290000021
wherein the content of the first and second substances,
Figure BDA0003575729290000022
the m noise is the m noise; p represents the probability that a certain pixel point in the noise image is 0, and 1-p represents the probability that a certain pixel point in the noise image is 1; x is an original image; an indicator represents a vector dot product.
In particular, dividing the coding block in the ViT model into a robust coding block and a non-robust coding block in step 3 specifically includes:
determining target loss function values of the model relative to the original image and self-attention feature maps of the coding blocks relative to the original image according to the original image and the ViT model, wherein the target loss function values and the self-attention feature maps are respectively called as a first loss value and a first self-attention feature map;
and obtaining a first coding block weight score in each coding block according to the first loss value and the first self-attention feature map.
According to each image in the negative sample data set and the ViT model, determining a target loss function of the model relative to the negative sample and a self-attention feature map of each coding block relative to the negative sample, which are respectively called as a second loss value and a second self-attention feature map;
determining a second coding block weight fraction in each coding block according to the second loss value and the second self-attention feature map;
obtaining an average second coding block weight fraction in each coding block according to the second coding block weight fraction of each image in the negative sample data set;
and determining the anti-interference capability of each coding block according to the weight fraction of the first coding block and the weight fraction of the average second coding block, dividing the first K coding blocks with strong anti-interference capability into robust coding blocks, and dividing the rest coding blocks into non-robust coding blocks.
In particular, a first encoded block weight score for each encoded block is obtained according to the following equation:
Figure BDA0003575729290000023
where B represents the number of coding blocks in the ViT model; i represents the number of the coding block; j (x, y) represents the loss function of the ViT model; SAMi(x) Representing the self-attention feature graph extracted by the ith coding block;
Figure BDA0003575729290000024
representing the gradient over x.
In particular, an average second encoding block weight score for each encoding block is obtained according to the following equation:
Figure BDA0003575729290000025
where M is the number of images contained in the negative sample data set.
In particular, the robust coding block and the non-robust coding block described in step 3 specifically include:
and obtaining the anti-interference capability of each coding block according to the following formula:
Figure BDA0003575729290000026
wherein, | | represents taking an absolute value; w is ai(x) A first encoding block weight score representing the ith encoding block,
Figure BDA0003575729290000027
representing the average second encoding block weight fraction of the ith encoding block.
Obtaining the positions of the code blocks with strong interference resistance to weak interference resistance according to the following formula:
S=[s1,s2,…,sB]
the coding blocks corresponding to the first K positions are robust coding blocks, and the rest are non-robust coding blocks.
In particular, the iterative updating described in step 4 opposes the disturbance, and specifically includes:
linearly superposing the confrontation disturbance and the original image to obtain a temporary confrontation sample, and inputting the temporary confrontation sample into the ViT model;
and calculating a loss function of the confrontation sample and the original image based on the self-attention feature map according to the non-robust coding block and the self-attention feature map extracted by the non-robust coding block.
And calculating the cross entropy loss function of the confrontation sample and the original image according to the robust coding block.
A pixel-based loss function of the challenge sample and the original image is calculated.
The challenge samples are updated according to the total loss function.
The total loss function is as follows:
Figure BDA0003575729290000031
wherein the content of the first and second substances,
Figure BDA0003575729290000038
(x, y) represents the skCross entropy loss function of each coding block; x is the number of*For the confrontation sample, y is a real label corresponding to the original image; II |)2Represents a euclidean distance;
Figure BDA0003575729290000032
denotes the s thkExtracting a self-attention feature map from each coding block; λ and μ are constants that balance the loss function.
In particular, the confrontation sample is updated in step 4 according to the following formula:
Figure BDA0003575729290000033
wherein the content of the first and second substances,
Figure BDA0003575729290000034
representing the confrontation sample obtained from the t iteration; alpha is the step length of single update against disturbance; sign () is a sign function for determining the direction of a disturbance;
Figure BDA0003575729290000035
a gradient representing a loss;
Figure BDA0003575729290000036
represents an adaptive weight; RELU () is a linear rectification function.
According to the technical scheme, the method for generating the black box countermeasure sample based on the Transformer model can effectively attack various unknown neural network models. The attributes of different coding blocks in ViT are fully considered, the coding blocks are divided into two types of robust coding blocks and non-robust coding blocks according to the weight scores of the coding blocks, two attack strategies are established to jointly generate a countermeasure sample, and the influence of the image and local model information on the countermeasure sample is balanced to suppress unnecessary model information and redundancy of image self-attention characteristics highly related to the model, so that the transfer capability of the countermeasure sample is remarkably improved. The update direction of the countering disturbance is stabilized by constraining the pixel-level and self-attention feature-level differences of the countering samples and the original image. Furthermore, the self-adaptive weight is designed to adjust the disturbance size of different pixel points, and the key pixel point information is highlighted, so that the countercheck sample has stronger attack capability without being influenced by the visual effect.
In conclusion, the method for generating the confrontation sample provided by the invention effectively solves the problems of low attack success rate and weak migration capability on the ViTs model, has strong attack performance on the convolutional neural network model, is more suitable for the safety detection of the neural network model in the real scene, and can be used for training the more stable and safe neural network model.
Drawings
FIG. 1 is a flow chart of a challenge sample generation method of an embodiment of the present invention;
fig. 2 is a flow chart of coding block selection provided by an embodiment of the present invention;
fig. 3 is a block diagram of coding block selection provided by an embodiment of the present invention;
FIG. 4 is a block diagram of countermeasure sample generation provided by an embodiment of the invention;
Detailed Description
The invention provides a black box confrontation sample generation method based on a Transformer model. In order to solve the problems of low attack success rate and weak migration capability on a ViTs model in the prior art, the attributes of different coding blocks in the ViTs are fully considered, and two strategies are respectively adopted for different coding blocks to jointly generate an antagonistic sample so as to reduce the degree of dependence on model information and improve the migration capability of the antagonistic sample. In order to further improve the attack capability, the size of disturbance is adjusted by adopting self-adaptive weight aiming at key pixel points of the image, the key pixel points are damaged, and the attack performance of resisting the sample is obviously improved. Fig. 1 shows a flow chart of the method of the present invention, and the specific implementation steps are as follows:
(1) the method comprises the steps of obtaining an original image and a corresponding label, forming an original data set, and initializing the anti-disturbance with the same size as the original image. In this example, 1 image was randomly picked from each class of the ILSVRC 2012 validation dataset, for a total of 1000 images as the original image.
(2) Generating a plurality of noise images with the same size as the original images to form a negative sample data set; in this example, there are M images in the negative sample data set, specifically, the M images in the negative sample data set are obtained according to the following formula:
Figure BDA0003575729290000037
wherein the content of the first and second substances,
Figure BDA0003575729290000041
the m noise is the m noise; p represents the probability that a certain pixel point in the noise image is 0, and 1-p represents the probability that a certain pixel point in the noise image is 1; x is an original image; an indicator represents a vector dot product.
(3) As shown in fig. 2, the encoded blocks are divided by an encoded block weight fraction. According to the original image and the ViT model, wherein the ViT model is a pre-training model, the original image can be correctly predicted, the model comprises a plurality of coding blocks, and each coding block can extract classification information for calculating a target loss value. In order to obtain the first coding block weight score in each coding block, the coding block selection block diagram is explicitly shown in fig. 3, an original image is firstly transformed into an image block conforming to the ViT model input, then the image block is input into the ViT model to obtain the first loss value of the model, and a first self-attention feature map is obtained from each coding block according to the following formula:
Figure BDA0003575729290000042
wherein softmax () is a neural network activation function; qi、KiAnd ViViIs the self-attention weight matrix of the ViT model; dqIs the dimension of the learnable matrix in the ViT model; ()TIndicating transposition.
(4) And obtaining a first coding block weight score in the ith coding block according to the first loss value and the first self-attention feature map in each coding block, wherein a specific calculation formula is as follows:
Figure BDA0003575729290000043
wherein B represents the number of coding blocks in the ViT model; i represents the number of the coding block; j (x, y) represents the loss function of the ViT model; SAMi(x) Representing the self-attention feature graph extracted by the ith coding block;
Figure BDA0003575729290000044
representing the gradient of x; w (x) ═ w1(x),w2(x),…,wB(x)]Is the first coding block weight score of the model.
(5) According to the negative sample data set and an ViT model, determining a second loss value of the target loss function and a second self-attention feature map in each coding block, then calculating a second coding block weight score in each coding block, and finally obtaining an average second coding block weight score of each coding block according to the second coding block weight score of each image in the negative sample data set, wherein a specific formula is as follows:
Figure BDA0003575729290000045
wherein M is the number of images contained in the negative sample data set;
Figure BDA0003575729290000046
is the average second coding block weight fraction of the model.
(6) Determining the anti-interference capability of each coding block according to the weight fraction of the first coding block and the weight fraction of the average second coding block, wherein the specific calculation formula is as follows:
Figure BDA0003575729290000047
wherein, | | represents taking an absolute value; w is ai(x) A first encoding block weight score representing the ith encoding block,
Figure BDA0003575729290000048
representing the average second encoding block weight fraction of the ith encoding block. D (x) ═ d1,d2,…,dB]Representing the total interference rejection capability of the model. In one aspect, diThe larger the value is, the larger the difference between the weight fraction of the first coding block extracted by the ith coding block and the weight fraction of the average second coding block is, namely, the coding block is easily interfered by noise, has weaker anti-interference capability and poorer robustness. On the other hand, the weaker the anti-interference capability of the coding block is, the more sensitive the coding block is to noise, and the difference between two images can be sensed conveniently.
(7) The method comprises the steps of sequencing the anti-interference capability of each coding block from strong to weak, recording the position of the corresponding coding block, dividing the first K coding blocks with strong anti-interference capability into robust coding blocks, and dividing the rest coding blocks into non-robust coding blocks, wherein the specific formula is as follows:
S=[s1,s2,…,sB]
the coding blocks corresponding to the first K positions are robust coding blocks, and the rest are non-robust coding blocks.
(8) As shown in fig. 4, the countermeasure disturbance and the original image are linearly superimposed and input into the ViT model, and information is purposefully extracted from the coding blocks by dividing the coding blocks, so that redundancy of unnecessary model information and image self-attention features highly related to the model is effectively reduced, and mobility of countermeasure samples is improved. In contrast, the present invention combines two strategies to generate a countermeasure sample, and balances the image itself and the local model information, and the specific method is as follows:
A. using cross entropy loss strategy for robust coding blocks
Since the robust coding block is less susceptible to noise interference, it is more prone to extract image information that is critical to the model prediction result. According to the image information extracted from the robust coding block, the influence of the countervailing sample on the model prediction result can be accurately obtained, and the change direction of the countervailing disturbance can be determined. In the ViT model, the image information extracted by each coding block can output the final prediction result through a multi-head linear classifier, so that the image information can be extracted from the robust coding block to obtain the cross entropy loss function of the multi-coding block.
B. Using self-attention-based feature loss strategy for non-robust coding blocks
The non-robust coding block has the advantage of being sensitive to noise, and can effectively measure the difference between the countersample and the original image, thereby stabilizing the updating direction of disturbance, promoting more disturbance generation and improving the attack performance of the countersample. And determining the loss of the self-attention feature map between the confrontation sample and the original image according to the self-attention feature map extracted by the non-robust encoder.
The final loss function integrates a cross entropy loss strategy and a self-attention feature map based loss strategy, further, the pixel-level-based loss is used for improving the attack capacity against the sample, and the specific expression of the total loss function is as follows:
Figure BDA0003575729290000051
wherein the content of the first and second substances,
Figure BDA0003575729290000052
represents the s thkCross entropy loss function of each coding block; x is a countermeasure sample, and y is a real label corresponding to the original image; II |)2Represents the Euclidean distance;
Figure BDA0003575729290000053
denotes the s thkExtracting a self-attention feature map from each coding block; λ and μ are constants that balance the loss function.
(9) And determining the size of the temporary anti-disturbance by self-adaptive weight adjustment according to the average second coding block weight fraction obtained by the optimal robust coding block, and improving the attack capability of the anti-sample by enlarging the disturbance length of the key pixel point. And finally, linearly superposing the original image to obtain a confrontation sample for a new iteration, wherein the specific formula is as follows:
Figure BDA0003575729290000054
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003575729290000055
representing the confrontation sample obtained from the t iteration; sign () is a sign function, takes a value of +1 or-1, and is used for determining the updating direction of disturbance; alpha is a disturbance single update step length;
Figure BDA0003575729290000056
represents an adaptive weight matrix, where relu (x) ═ max (0, x) is a linear rectification function, and thus, the adaptive weights 1 ≦ aw (x) ≦ 2.
The method includes the steps that experiments are conducted in an image classification task, 1 image in each class of an ILSVRC 2012 verification data set is randomly selected as an original image, 1000 original images are generated in total, countermeasure samples are generated on a Deit-T model, testing is conducted on 5 ViTs models and 5 convolutional neural network models, and finally comparison is conducted with 5 countermeasure sample generation methods. The experiment measures the performance of the method through the attack success rate, and the larger the attack success rate value is, the better the attack performance and the mobility of the resisting sample are. The attack performance of the resisting sample on 5 ViTs models is shown in the table 1, and the attack performance of the resisting sample on 5 convolutional neural networks is shown in the table 2. The method has better attack performance and migration performance on a convolutional neural network or a ViTs model. Therefore, the method can better evaluate and improve the safety of the neural network model in the application scene.
TABLE 1
Figure BDA0003575729290000057
Figure BDA0003575729290000061
TABLE 2
Figure BDA0003575729290000062

Claims (8)

1. A black box countermeasure sample generation method based on a Transformer model is characterized in that aiming at different coding block attributes in a Transformer, the method adopts two strategies to jointly generate a countermeasure sample, aiming at a robust coding block, a cross entropy loss strategy is used, aiming at a non-robust coding block, a self attention characteristic loss strategy is used, the method can avoid redundancy of model information, and migration capacity of the countermeasure sample is improved, and the method is characterized by comprising the following steps:
step 1: acquiring an original image and a corresponding label to form an original data set, and initializing a countermeasure disturbance with the same size as the original image;
step 2: generating a plurality of noise images with the same size as the original image to form a negative sample data set;
and step 3: dividing a coding block in a visual Transformer, namely an ViT model into a robust coding block and a non-robust coding block according to the original image and the negative sample data set; the ViT model is a pre-training model and comprises a plurality of coding blocks, and each coding block can extract classification information for calculating a target loss value;
and 4, step 4: and linearly superposing the counterdisturbance and the original image to input the same into the ViT model, iteratively updating the counterdisturbance until an iteration stopping condition is met, outputting a final counterdisturbance, and linearly superposing the final counterdisturbance and the original image to obtain a corresponding countersample.
2. The transform model-based black-box confrontation sample generation method of claim 1, wherein: forming a negative sample data set in the step 2, which specifically comprises:
determining the number of images contained in a negative sample data set corresponding to the original image as M;
obtaining the mth image in the negative sample data set corresponding to the original image according to the following formula:
Figure FDA0003575729280000011
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003575729280000012
the mth noise image; p represents the probability that a certain pixel point in the noise image is 0, and 1-p represents the probability that a certain pixel point in the noise image is 1; x is an original image; an indicator represents a vector dot product.
3. The transform model-based black-box confrontation sample generation method of claim 1, wherein: in step 3, dividing the coding blocks in the ViT model into robust coding blocks and non-robust coding blocks, specifically including:
determining target loss function values of the model relative to the original image and self-attention feature maps of the coding blocks relative to the original image according to the original image and the ViT model, wherein the target loss function values and the self-attention feature maps are respectively called as a first loss value and a first self-attention feature map; obtaining a first coding block weight score in each coding block according to the first loss value and the first self-attention feature map;
according to each image in the negative sample data set and the ViT model, determining a target loss function of the model relative to the negative sample and a self-attention feature map of each coding block relative to the negative sample, which are respectively called as a second loss value and a second self-attention feature map;
determining a second coding block weight fraction in each coding block according to the second loss value and a second self-attention feature map;
obtaining an average second coding block weight score in each coding block according to the second coding block weight score of each image in the negative sample data set;
and determining the anti-interference capability of each coding block according to the weight fraction of the first coding block and the weight fraction of the average second coding block, dividing the first K coding blocks with strong anti-interference capability into robust coding blocks, and dividing the rest coding blocks into non-robust coding blocks.
4. The method of claim 3, wherein the transform model-based black-box confrontation sample generation method is characterized in that the first coding block weight score of each coding block is obtained according to the following formula:
Figure FDA0003575729280000013
wherein B represents the number of coding blocks in the ViT model; i represents the number of the coding block; j (x, y) represents the loss function of the ViT model; SAMi(x) Representing the self-attention feature graph extracted by the ith coding block;
Figure FDA0003575729280000014
representing the gradient over x.
5. The transform model-based black-box confrontation sample generation method of claim 3, wherein: obtaining an average second coding block weight score of each coding block according to the following formula:
Figure FDA0003575729280000015
where M is the number of images contained in the negative sample data set.
6. The method for generating a black-box confrontation sample based on a Transformer model according to claim 3, wherein: the robust coding block and the non-robust coding block specifically include:
and obtaining the anti-interference capability of each coding block according to the following formula:
Figure FDA0003575729280000021
wherein, | | represents taking an absolute value; w is ai(x) A first encoding block weight score representing the ith encoding block,
Figure FDA0003575729280000022
representing an average second encoding block weight fraction of an ith encoding block;
obtaining the positions of the code blocks with strong interference resistance to weak interference resistance according to the following formula:
S=[s1,s2,…,sB]
the coding blocks corresponding to the first K positions are robust coding blocks, and the rest are non-robust coding blocks.
7. The Transformer model-based black-box confrontation sample generation method of claim 1, wherein: the iteratively updating the countermeasure disturbance in step 4 specifically includes:
linearly superposing the confrontation disturbance and the original image to obtain a temporary confrontation sample, and inputting the temporary confrontation sample into the ViT model;
calculating a loss function of the confrontation sample and the original image based on the self-attention feature map according to the non-robust coding block and the self-attention feature map extracted by the non-robust coding block;
calculating a cross entropy loss function of a confrontation sample and an original image according to the robust coding block;
calculating a pixel-based loss function for the challenge sample and the original image;
updating the confrontation sample according to the total loss function;
the total loss function is as follows:
Figure FDA0003575729280000023
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003575729280000024
represents the s thkCross entropy loss function of each coding block; x is the number of*For the confrontation sample, y is a real label corresponding to the original image; II |)2Represents the Euclidean distance;
Figure FDA0003575729280000025
denotes the th skExtracting a self-attention feature map from each coding block; λ and μ are constants that balance the loss function.
8. The method for generating a black-box confrontation sample based on a Transformer model according to claim 7, wherein: updating the confrontation sample according to the following formula:
Figure FDA0003575729280000026
wherein the content of the first and second substances,
Figure FDA0003575729280000027
representing the confrontation sample obtained from the t iteration; alpha is the step length of single update against disturbance; sign () is a sign function for determining the direction of a disturbance;
Figure FDA0003575729280000028
representing the adaptive weights, RELU () is a linear rectification function.
CN202210332993.7A 2022-03-31 2022-03-31 Black box confrontation sample generation method based on Transformer model Pending CN114693973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210332993.7A CN114693973A (en) 2022-03-31 2022-03-31 Black box confrontation sample generation method based on Transformer model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210332993.7A CN114693973A (en) 2022-03-31 2022-03-31 Black box confrontation sample generation method based on Transformer model

Publications (1)

Publication Number Publication Date
CN114693973A true CN114693973A (en) 2022-07-01

Family

ID=82140259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210332993.7A Pending CN114693973A (en) 2022-03-31 2022-03-31 Black box confrontation sample generation method based on Transformer model

Country Status (1)

Country Link
CN (1) CN114693973A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943641A (en) * 2022-07-26 2022-08-26 北京航空航天大学 Method and device for generating anti-texture image based on model sharing structure

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943641A (en) * 2022-07-26 2022-08-26 北京航空航天大学 Method and device for generating anti-texture image based on model sharing structure
CN114943641B (en) * 2022-07-26 2022-10-28 北京航空航天大学 Method and device for generating confrontation texture image based on model sharing structure

Similar Documents

Publication Publication Date Title
CN109948658B (en) Feature diagram attention mechanism-oriented anti-attack defense method and application
CN108491837B (en) Anti-attack method for improving license plate attack robustness
CN111368886B (en) Sample screening-based label-free vehicle picture classification method
CN113554089B (en) Image classification countermeasure sample defense method and system and data processing terminal
CN111401407B (en) Countermeasure sample defense method based on feature remapping and application
CN113674140B (en) Physical countermeasure sample generation method and system
CN110941794A (en) Anti-attack defense method based on universal inverse disturbance defense matrix
CN113283599B (en) Attack resistance defense method based on neuron activation rate
CN111754519B (en) Class activation mapping-based countermeasure method
CN112396129A (en) Countermeasure sample detection method and general countermeasure attack defense system
CN113254927B (en) Model processing method and device based on network defense and storage medium
CN112036381B (en) Visual tracking method, video monitoring method and terminal equipment
CN113435264A (en) Face recognition attack resisting method and device based on black box substitution model searching
Wu et al. Defense against adversarial attacks in traffic sign images identification based on 5G
CN114693973A (en) Black box confrontation sample generation method based on Transformer model
Ahmad et al. A novel image tamper detection approach by blending forensic tools and optimized CNN: Sealion customized firefly algorithm
CN113034332B (en) Invisible watermark image and back door attack model construction and classification method and system
CN113936140A (en) Evaluation method of sample attack resisting model based on incremental learning
CN117152486A (en) Image countermeasure sample detection method based on interpretability
CN111950635A (en) Robust feature learning method based on hierarchical feature alignment
CN116232699A (en) Training method of fine-grained network intrusion detection model and network intrusion detection method
CN114863132A (en) Method, system, equipment and storage medium for modeling and capturing image spatial domain information
CN114638356A (en) Static weight guided deep neural network back door detection method and system
CN113487506A (en) Countermeasure sample defense method, device and system based on attention denoising
CN113344814A (en) High-resolution countermeasure sample synthesis method based on generation mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination