CN117079053A - Artificial intelligent image recognition attack resistance method and system based on gradient average - Google Patents

Artificial intelligent image recognition attack resistance method and system based on gradient average Download PDF

Info

Publication number
CN117079053A
CN117079053A CN202311115161.0A CN202311115161A CN117079053A CN 117079053 A CN117079053 A CN 117079053A CN 202311115161 A CN202311115161 A CN 202311115161A CN 117079053 A CN117079053 A CN 117079053A
Authority
CN
China
Prior art keywords
sample
image
attack
gradient
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311115161.0A
Other languages
Chinese (zh)
Inventor
张恒巍
杨博
尹衡
李晨蔚
耿致远
王晋东
蔡国明
徐开勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202311115161.0A priority Critical patent/CN117079053A/en
Publication of CN117079053A publication Critical patent/CN117079053A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image recognition, in particular to an artificial intelligent image recognition attack resistance method and system based on gradient average, which are characterized in that an original input sample is subjected to multiple image transformations in iterative optimization, and a plurality of image samples after the image transformations are obtained; obtaining model output of the transformed image samples by using the pre-trained model, obtaining cross entropy of the model output and the sample input real labels in the original sample set based on the target loss function, solving gradient of the transformed image samples by using cross entropy loss, calculating average value of the gradient of the plurality of transformed image samples, solving the challenge disturbance by using the gradient average value, and adding the challenge disturbance to the original input samples in the next round of iterative optimization so as to generate a challenge sample for performing challenge attack on the model by iterative optimization. The invention reduces the influence of excessive randomness by gradient averaging, and generates a countermeasure sample with more mobility.

Description

Artificial intelligent image recognition attack resistance method and system based on gradient average
Technical Field
The invention relates to the technical field of image recognition, in particular to an artificial intelligent image recognition attack resistance method and system based on gradient average.
Background
Deep neural networks (Deep Neural Networks, DNNs) have been widely used in the real world, such as autopilot, object detection, face verification, etc. However, deep neural networks are vulnerable to challenge samples, which are made by adding human imperceptible perturbations in benign examples. The countering sample can mislead the model to generate false prediction, and the countering sample forms a great threat to the safety of the deep neural network and the application thereof. Significant attention has been given to the generation of challenge samples, also known as challenge attacks, because it can enhance understanding of the mechanism by which deep neural network models work, evaluating and improving the robustness of different models. In general, the challenge samples typically exhibit migratability between deep learning models, which means that the challenge samples generated on the proxy model may also mislead the victim model. By taking advantage of the migratory nature of the challenge sample, an attacker can attack real-world artificial intelligence applications without knowing any information of the target model, making a migration-based attack more practical in many real-world scenarios. Therefore, migration-based attacks have become a very important type of black box attack.
However, it is worth noting that the mobility of the existing methods is greatly affected as the challenge defenses evolve. The difference in the attack ability of the challenge sample under the white box and black box settings is similar to the difference in the performance of the neural network on the training set and the test set, i.e., the migratory properties of the challenge sample are naturally similar to the generalization properties of the neural network. Therefore, there are existing schemes that avoid undesirable extrema by designing advanced optimization algorithms or enhance the challenge sample generation process using model enhancement. Meanwhile, a data enhancement strategy is adopted, and before the original sample is input into the model, random transformation is carried out on the original sample, so that the model realizes more modes, and the mobility of the countermeasure sample is improved. As a typical method for improving the mobility of the challenge sample by adopting a data enhancement strategy, the probability p is introduced to control the balance between the original image and the transformed image while the input sample is subjected to random transformation by the diversified input method, but the randomness caused by the probability p can negatively influence the mobility of the challenge sample.
Disclosure of Invention
Therefore, the invention provides an artificial intelligent image recognition challenge attack method and system based on gradient average, which solve the problem that the randomness caused by the probability p affects the mobility of a challenge sample in the prior art, and generate the challenge sample with higher mobility by reducing the influence of the randomness.
According to the design scheme provided by the invention, in one aspect, an artificial intelligent image recognition attack resistance method based on gradient average is provided, which comprises the following steps:
acquiring a pre-trained image recognition deep neural network model and an original sample input in model pre-training, and setting an iteration optimization termination condition and an iteration optimization target loss function in an attack;
in each iterative optimization, performing multiple image transformations on an original input sample and acquiring multiple image transformed image samples; obtaining model output of a transformed image sample by using a pre-trained model, obtaining cross entropy of the model output and a sample input real label in an original sample set based on a target loss function, solving gradient of the transformed image sample by using cross entropy loss, calculating average value of a plurality of transformed image sample gradients, solving anti-disturbance by using the gradient average value, and adding the anti-disturbance to the original input sample in next round of iterative optimization;
and obtaining a gradient average value under the last iteration round based on the iteration termination condition, and generating a countermeasure sample in the countermeasure attack by using the gradient average value so as to perform the countermeasure attack test on the pre-trained image recognition depth neural network model by using the countermeasure sample.
As the artificial intelligent image recognition attack resistance method based on gradient average, the invention further sets the iteration optimization termination condition as the maximum iteration optimization round or the iteration optimization time threshold.
As the artificial intelligent image recognition anti-attack method based on gradient average, the invention further sets a process of iteratively optimizing a target loss function in the anti-attack, comprising the following steps:
firstly, constructing a model cross entropy loss function based on an original input sample, a corresponding real label and a pre-trained image recognition depth neural network model parameter;
then, converting the antagonistic sample generation into a conditional constraint optimization problem of a model cross entropy loss function, wherein the conditional constraint of the model cross entropy loss function comprises: a challenge sample that is visually indistinguishable from the original input sample is generated by maximizing a model cross entropy loss function, and the challenge sample is utilized to misguide the pre-trained image recognition deep neural network model to output error categories that are inconsistent with the corresponding real labels.
As the artificial intelligent image recognition attack resistance method based on gradient average, the iterative optimization target loss function is expressed as follows:
wherein x and y are respectively an original input sample and a corresponding real label, theta is a model parameter of a pre-trained image recognition depth neural network model, J (theta, x, y) is a model cross entropy loss function, and x is a model cross entropy loss function adv To combat the samples, m represents the number of samples of the randomly transformed image during each iteration, T i (x adv P) is the ratio of p to x at the ith sample in the iterative process with a given probability adv A random transformation function that performs a random transformation.
As the artificial intelligent image recognition attack resistance method based on gradient average, the method further comprises the following steps of:
setting image transformation times in the iterative optimization process according to the sampling number of the random transformation image, and acquiring gradient average values of a plurality of transformed images sampled under the same transformation condition according to the image transformation times so as to solve the disturbance resistance by utilizing the gradient average values.
As the artificial intelligent image recognition attack resistance method based on gradient average, the iterative optimization target loss function is expressed as follows:
wherein x and y are respectively original sample input and corresponding real labels, theta is a model parameter of a pre-trained image recognition depth neural network model, J (theta, x, y) is a model cross entropy loss function, and x is a model cross entropy loss function adv To combat the samples, m represents the number of samples of the randomly transformed image during each iteration, T i (x advi ) For randomly transforming amplitude parameter sigma in the ith sampling of iterative process i For x adv A random transformation function that performs a random transformation.
As the artificial intelligent image recognition attack resistance method based on gradient average, the method further comprises the following steps of:
setting the image transformation times in the iterative optimization process according to the sampling number of the random transformation image, uniformly setting the transformation amplitude of the image transformation for a plurality of times, and obtaining the gradient average value of the images after the transformation, which are uniformly set with the image transformation amplitude from small to large, according to the image transformation times and the transformation amplitude so as to solve the disturbance resistance by utilizing the gradient average value.
Further, the invention also provides an artificial intelligence image recognition attack resistance system based on gradient average, comprising: a data acquisition module, an iterative optimization module and an countermeasure generation module, wherein,
the data acquisition module is used for acquiring a pre-trained image recognition deep neural network model and an original sample input in model pre-training, and setting iteration optimization termination conditions and an iteration optimization target loss function in an attack;
the iteration optimization module is used for obtaining anti-disturbance by utilizing an iteration optimization algorithm, and in each iteration optimization, the original input sample is subjected to multiple image transformations in each iteration optimization, and image samples after multiple image transformations are obtained; obtaining model output of a transformed image sample by using a pre-trained model, obtaining cross entropy of the model output and a sample input real label in an original sample set based on a target loss function, solving gradient of the transformed image sample by using cross entropy loss, calculating average value of a plurality of transformed image sample gradients, solving anti-disturbance by using the gradient average value, and adding the anti-disturbance to the original input sample in next round of iterative optimization;
and the antagonism generation module is used for acquiring a gradient average value under the last iteration round based on the iteration termination condition, and generating a antagonism sample in the antagonism attack by using the gradient average value so as to perform the antagonism attack test on the pre-trained image recognition depth neural network model by using the antagonism sample.
The invention has the beneficial effects that:
according to the invention, from the statistical perspective, the expectation of the gradient is approximately obtained by averaging the gradients of the multiple transformed images sampled under the same condition or by uniformly setting the image transformation amplitude from small to large, and the gradient obtained for multiple times is averaged in iterative optimization, so that the influence of excessive randomness is reduced, an countermeasure sample with mobility is generated, and the robustness of the model is further improved. Further, experimental data show that the scheme can remarkably enhance diversified input migration attacks in various random transformation forms (translation, cutting, rotation and the like), is superior to a base line in a quite large amplitude, can be seamlessly combined with other attack methods, and can generate an countermeasure sample with more migration compared with the most advanced migration attack method at present, and the effectiveness and the superiority of the scheme are verified.
Description of the drawings:
FIG. 1 is a schematic illustration of an artificial intelligence image recognition challenge framework based on gradient averaging in an embodiment;
FIG. 2 is a graph showing the attack success rate of seven models generated on the Inc-v3 model using DIM-GAA and CIM-GAA for a different number m of sampled images in an embodiment;
FIG. 3 is a graph showing the attack success rate of seven models generated on the Inc-v3 model using DIM-GAA and CIM-GAA with different random transition probabilities p against sample attacks in the example;
fig. 4 is a schematic diagram of random transform output in an embodiment.
The specific embodiment is as follows:
the present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.
Let x and y be the clean image and the corresponding real label, respectively, θ being the parameter of the model. J (θ, x, y) is the loss function of the neural network, typically the cross entropy loss function. The goal of challenge is to generate a challenge sample x that is visually indistinguishable from x by maximizing J (θ, x, y) adv To mislead the model to give the error classification. Infinite norms can be used to limit the disturbance countermeasure, i.e. ||x adv -x|| And epsilon is less than or equal to epsilon. Thus, the challenge sample generation may translate into the following conditional constraint optimization problem:
the optimization problem corresponding to the direct solving formula 1 is complex, the required calculation cost is high, therefore, the existing training process of the neural network can be used for reference, FGSM is provided, and the generation process of the countermeasure sample is simplified.
Fast Gradient Sign Method (FGSM) FGSM is one of the most basic methods of antagonistic sample generation,gradient of loss function with respect to inputLooking for a challenge sample in the rising direction and infinitely norm limiting the challenge disturbance:
momentum Iterative Fast Gradient Sign Method (MI-FGSM) MI-FGSM introduces momentum into the antagonistic sample generation process for the first time, so as to stabilize the gradient update direction, improve the convergence process, and greatly improve the attack success rate. The update process can be summarized as:
mu is the attenuation factor of the momentum term, g t Is a gradient weighted accumulation of the previous t-round iterations.
Diverse Input Method (DIM) the DIM randomly transforms the original input at each iteration with a given probability, the random transformation including random resizing and random padding to mitigate the overfitting phenomenon in the challenge sample generation process. The method can be naturally combined with other migration attack methods to improve the migration resistance. The stochastic transformation equation is as follows:
the diversified input method improves the mobility of the countermeasure sample from the angle of data enhancement, balances the white box and black box attack success rate of the countermeasure sample by introducing probability (randomness), better considers aggressiveness (aiming at a white box model) and mobility (aiming at a black box model), and realizes better compromise effect. However, due to the introduction of the probability, the gradient obtained in each iteration process has larger fluctuation, and particularly when the probability is p=0.5, the variance corresponding to the gradient is the largest, the randomness is the largest, and the mobility of the countermeasure sample is further improved. Although the introduction of randomness can reduce the likelihood of falling into local optimum during the challenge sample generation to some extent, excessive randomness can have negative effects. For example, as shown in table 1, it is found in experiments that although the attack success rate corresponding to the challenge sample generated by the DIM method is greatly improved compared with MI-FGSM, the attack success rate has larger fluctuation, especially the black box attack success rate, and the maximum fluctuation can even reach 3.4%, which highlights the negative effect of the excessive randomness.
Table 1 attack success rate (%) on seven models under a single model setup. Challenge samples were generated on Inc-v 3. * Representing a white box attack. The maximum and minimum values are respectively highlighted by underlining.
Based on the consideration of how to alleviate the influence of excessive randomness and further improve the mobility of the challenge sample, the embodiment of the invention provides an artificial intelligent image recognition challenge attack method based on gradient averaging, which comprises the following steps:
s101, acquiring a pre-trained image recognition deep neural network model and an original sample input in model pre-training, and setting iteration optimization termination conditions and an iteration optimization target loss function in an attack;
s102, in each iteration optimization, carrying out multiple image transformations on an original input sample and obtaining image samples after multiple image transformations; obtaining model output of a transformed image sample by using a pre-trained model, obtaining cross entropy of the model output and a sample input real label in an original sample set based on a target loss function, solving gradient of the transformed image sample by using cross entropy loss, calculating average value of a plurality of transformed image sample gradients, solving anti-disturbance by using the gradient average value, and adding the anti-disturbance to the original input sample in next round of iterative optimization;
and S103, acquiring a gradient average value under the last iteration round based on the iteration termination condition, and generating a countermeasure sample in the countermeasure attack by using the gradient average value so as to perform the countermeasure attack simulation test on the pre-trained image recognition depth neural network model by using the countermeasure sample.
The settable iteration optimization termination condition is a maximum iteration optimization round or an iteration optimization time threshold.
Specifically, the process of iteratively optimizing the objective loss function in setting up the challenge attack may include:
firstly, constructing a model cross entropy loss function based on original sample input, corresponding real labels and pre-trained image recognition depth neural network model parameters;
then, converting the antagonistic sample generation into a conditional constraint optimization problem of a model cross entropy loss function, wherein the conditional constraint of the model cross entropy loss function comprises: a challenge sample that is visually indistinguishable from the original sample input is generated by maximizing a model cross entropy loss function, and the challenge sample is utilized to misguide the pre-trained image recognition deep neural network model to output error categories that are inconsistent with the corresponding real labels.
For the consideration of how to mitigate the influence of the randomness, and further to improve the mobility of the challenge sample, in this embodiment, the following strategy may be adopted: (1) The probability P is not removed, and then the gradients of the plurality of transformed images sampled under the same condition are averaged, so that the expected value of the gradients is approximately obtained; or, (2) the probability P is removed, and then the gradients of a plurality of transformed images with the image transformation amplitude set uniformly from small to large are averaged, so that the expected value of the gradients is approximately obtained. The influence of excessive randomness is relieved by the two different averaging modes, and the mobility of the challenge sample is enhanced. In order to solve the problem that the randomness is too great in the process of generating the challenge sample by the diversification method, namely, the gradient average attack method combined with the scheme of the embodiment of the scheme can be respectively named as GAA (1) and GAA (2).
The specific idea of GAA (1) is that, when an image is transformed each time, probability p is not removed, parameters of random transformation are not changed each time, only the number of random transformation is increased each time, and then gradients of the images after multiple transformation are averaged, namely gradients of the images after multiple transformation sampled under the same condition are averaged. Taking DIM as an example, the change of the optimization formula corresponding to DIM-GGA (1) compared to DIM is as follows:
m represents the number of samples of the randomly transformed image during each iteration. As can be readily seen from equation (6), DIM-GGA (1) and DIM can be mutually converted by setting m.
The concrete idea of GAA (2) is that, when each time image transformation is performed on input, probability P is removed, the number of times of each transformation is increased, meanwhile, in order to consider the amplitudes of various transformations, the transformation amplitudes of multiple image transformations are uniformly set, then the gradients of the multiple transformed images are averaged, that is, the gradients of multiple transformed images with the image transformation amplitudes uniformly set from small to large are averaged, and compared with DIM, the optimization formula is as follows:
note that T in equation 7 is each time i (x advi ) The corresponding transformations are unequal, sigma i Is the parameter corresponding to each transformation amplitude.
In the embodiment, as shown in fig. 1, the framework of the gradient average attack is given an original image, and the original image is subjected to image transformation for a plurality of times. Then, the gradients of the multiple transformed images are calculated, and the gradients are averaged to reduce the influence of randomness, thereby obtaining a countermeasure sample with higher mobility.
It is noted that the scheme is general. As can be seen from the formulas (6) and (7), the scheme adopts a gradient average mode to solve the problem of overlarge randomness of diversified inputs. For the diversified input method, the transformation used is random resizing and random padding. Then, the transformation in the diversified input method is replaced by random rotation, random translation, random clipping and the like, and the gradient average attack method in the scheme is still applicable.
Considering the influence of larger randomness of the diversified input method, the embodiment of the scheme adopts a gradient averaging mode to average the gradient obtained for multiple times to approximate the expectation of obtaining the gradient, and reduces the influence of overlarge randomness so as to realize better mobility. In the course of the DIM challenge, the original gradient value is replaced by the gradient average value of a plurality of transformed images sampled under the same condition, thereby forming DIM-GAA (1), the pseudo code of which is shown as algorithm 1. Similarly, the original gradient value is replaced by the gradient average value of a plurality of transformed images with uniformly set image transformation amplitudes from small to large, thereby forming DIM-GAA (2), the pseudo code of which is summarized in algorithm 2. Notably, when each input transformation in the DIM is replaced with random clipping, random rotation, random translation, etc., the corresponding CIM-GAA (1)/(2), RIM-GAA (1)/(2), TIM-GAA (1)/(2), etc. are formed. In addition, the scheme also has good expansibility, and can be well combined with other migration attack methods, so that the migration of the countermeasure sample is further improved.
Further, based on the above method, the embodiment of the present invention further provides an artificial intelligence image recognition attack countermeasure system based on gradient averaging, including: a data acquisition module, an iterative optimization module and an countermeasure generation module, wherein,
the data acquisition module is used for acquiring a pre-trained image recognition deep neural network model and an original sample input in model pre-training, and setting iteration optimization termination conditions and an iteration optimization target loss function in an attack;
the iterative optimization module is used for obtaining anti-disturbance by using an iterative optimization algorithm, and in each iterative optimization, carrying out multiple image transformations on an original input sample and obtaining image samples after multiple image transformations; obtaining model output of a transformed image sample by using a pre-trained model, obtaining cross entropy of the model output and a sample input real label in an original sample set based on a target loss function, solving gradient of the transformed image sample by using cross entropy loss, calculating average value of a plurality of transformed image sample gradients, solving anti-disturbance by using the gradient average value, and adding the anti-disturbance to the original input sample in next round of iterative optimization;
and the antagonism generation module is used for acquiring a gradient average value under the last iteration round based on the iteration termination condition, and generating a antagonism sample in the antagonism attack by using the gradient average value so as to perform the antagonism attack test on the pre-trained image recognition depth neural network model by using the antagonism sample.
To verify the validity of this protocol, the following is further explained in connection with experimental data:
experiment setting:
data set: 1000 images belonging to 1000 categories were randomly selected from the ImageNet validation set, all of which could be correctly classified by the network under test. All images were adjusted beforehand to 299×299×3.
And (3) model: in the experiment, four normally trained networks were considered, namely, acceptance-v 3 (Inc-v 3), acceptance-v 4 (Inc-v 4), acceptance-Resnet-v 2 (Inc Res-v 2) and Resnet-v2-101 (Res-101), and three counter-trained networks with strong defenses, namely, ens3-adv-Inception-v3(Inc-v3 ens3 ),ens4-adv-Inception-v3(Inc-v3 ens4 ) And ens-adv-admission-ResNet-v 2 (IncRes-v 2) ens ). In addition, the scheme is verified on several typical defense models. Including advanced token guided denoising (high-level representation guided denoiser, HGD), random resizing and padding (random resizing and padding, R&P), NIPS-r3, feature distillation (feature distillation, FD), image compression model cleaning (purifying perturbations via image compression model, comdefnd), and random smoothing (randomized smoothing, RS).
Baseline method: starting from DIM, it was analyzed and modified, taking DIM as baseline. In addition, in the diversified input process, the original random transformation is replaced by other random transformations, such as random rotation, clipping, translation, and the like, and the formed method RIM, CIM, TIM is also used as a base line. In order to better verify the effectiveness and superiority of the scheme, the most advanced migration attack method S is also adopted 2 I-SI-TI-DIM served as our baseline.
Super parameter setting: for the super-parameter setting, the maximum perturbation is ε=16, the number of iterations T=10, and the step α=1.6. For MI-FGSM the attenuation coefficient defaults to μ=1.0 for DIM the transition probability p is set to 0.5. For the present scheme, the number of averages m=5. For all GAA (1) methods, the random transform function T (x adv The method comprises the steps of carrying out a first treatment on the surface of the The transition probability p in p) is set to 0.5. For the random transform operation T (-), it represents different operations in different ways. In RIM, CIM, TIM, the image is rotated, cut out, and translated, respectively. The specific transformation operation is set as follows:
for transformation operation T (-) in DIM and DIM-GAA (1), it first scales the original input image x randomly to rnd×rnd×3, where rnd ε [299,330 ], then fills the image randomly to 330×330×3. For T (x; σ) in DIM-GAA (2), σ is denoted as the value of rnd, and the five transformations are 299, 307, 314, 322, 329, respectively. For transformation operation T (-) in CIM and CIM-GAA (1), it first randomly clips the original input image x to rnd×rnd×3, where rnd ε [279,299 ], then randomly fills the image to 299×299×3. For T (x; σ) in CIM-GAA (2), σ is denoted as the value of rnd, and the values in the five transformations are 279, 284, 289, 294, 298, respectively. For transformation operations T (-) in RIM and RIM-GAA (1), the input image x is randomly rotated by θ degrees around the center of the image, θ ε [ -10,10]. For T (x; sigma) in RIM-GAA (2), sigma is expressed as a value of θ, and uniform random values between-10 and 10 are respectively taken in five transformations. For the transformation operation T (-) in TIM and TIM-GAA (1), the input image x is randomly translated by rnd pixel values along four dimensions, up, down, left, right, rnd ε [0,10]. For T (x; σ) in TIM-GAA (2), σ is denoted as the value of rnd, and the values in the five transformations are 0,2,5,7, 10, respectively.
In order to facilitate visual understanding of several kinds of random transformations, as shown in fig. 4, the images from the first line to the last line are the original input and various kinds of random transformed images, respectively, and the random transformed partial images are displayed. Specifically, the input image is subjected to a plurality of random transformations, and the resulting random transformation output is shown in fig. 4. From the figure, the random transformation does not change the semantic information of the image.
1. Verification experiment for simulating attack test in single network
Challenge samples were generated on four normally trained networks using the baseline method and the protocol, respectively, and tested on all 7 networks (four normally trained networks, three challenge trained networks). The results are shown in table 2, where attack success rate refers to the model classification error rate with challenge samples as input.
From the results, the scheme not only keeps high attack success rate on the white box model, but also obviously improves the attack success rate on the black box model. For example, the success rate of attack on Inc-v4 by DIM, CIM, RIM, TIM was 71.4%,67.4%,70.1% and 70.4% when challenge samples were generated on Inc-v3, respectively. In contrast, the DIM-GAA (2), CIM-GAA (2), RIM-GAA (2) and TIM-GAA (2) methods respectively realize attack success rates of 85.6%,82.4%,84.5% and 83.9%, which are respectively 14.2%,15.0%,14.4% and 13.5% higher than the corresponding baseline attacks. In addition, when aiming at an advanced resistance training model, the GAA attack method in the scheme is always better than the baseline attack, so that the attack success rate is greatly improved. These outstanding results strongly demonstrate that our method can effectively improve the mobility of the challenge sample.
Table 2 attack success (%) against attacks on seven models under a single model setup. The challenge samples were made on Inc-v3, inc-v4, inRes-v2 and Res-101, respectively. * Representing a white box attack.
2. Verification experiment for integrated network simulation attack test
Consider evaluating the performance of the solution by attacking multiple models simultaneously to further verify its effectiveness. And by adopting an integrated attack method, logic output of different models is fused. Specifically, challenge samples were generated by integrating four normally trained models, including Inc-v3, inc-v4, inc Res-v2 and Res101, all of which were given the same weight. The challenge samples were tested for mobility on both the normal training model and the challenge training model.
As shown in table 3, the scheme always achieves the highest attack success rate regardless of the white box or black box setting. For example, the average challenge success rates of DIM-GAA (1) and DIM-GAA (2) in the present scenario were 79.8% and 84.4%, respectively, 6.0% and 10.6% higher than DIM, respectively, compared to baseline challenge DIM. Notably, the protocol showed a greater improvement in the challenge training model, averaging 6.6% -21.4% higher than baseline challenge. These results convincingly demonstrate that gradient averaging can effectively enhance the migratory nature of resistant attacks.
Table 3 attack success (%) against attacks on seven models under the integrated model setup. The challenge samples were generated on an integrated model, i.e., inc-v3, inc-v4, inRes-v2, and Res-101.* Representing the white box model.
3. Defending model simulation attack experiment
In addition to normal and challenge training models, the effectiveness of the quantitative scheme in other advanced defenses is considered, including advanced representation guided denoising (HGD), random resizing and padding (R & P), NIPS-R3, feature Distillation (FD), purifying of disturbances (comdefnd) by image compression models, and Random Smoothing (RS). All challenge samples were generated on an integrated model of Inc-v3, inc-v4, inc Res-v2 and Res101 with the same integration weights. Advanced defense methods were tested with these challenge samples.
The experimental results are shown in table 4. On all advanced defense methods, it is obvious that the gradient average attack in the scheme can greatly enhance the baseline attack method. For example, the average success rates of DIM and CIM for six defense modes are 44.8% and 43.9%, respectively. In contrast, the average success rates of DIM-GAA (2) and CIM-GAA (2) in the present protocol were 65.6% and 59.1%, respectively, 20.8% and 15.2% higher than the corresponding baseline challenge. This significant improvement demonstrates the significant effectiveness of the present approach to other defense models. It also shows that the present defense mechanisms are still vulnerable to well designed resistant examples and far from reaching the need for true security.
TABLE 4 challenge success (%) for advanced defense methods. The challenge samples were generated on an integrated model, i.e., inc-v3, inc-v4, inRes-v2, and Res-101.
/>
4. Super parameter experiment
The effect of different parameters, including the number of image transformations m and the probability of transformation p, was studied using a series of ablation experiments. To simplify the analysis, only two methods, DIM-GAA (1) and CIM-GAA (1), were selected for the analysis.
Number of image transformations (average number of samples) m: firstly, the influence of the number of copies of the transformed image, namely the average sample number m in each iteration process on the attack success rate on seven models is studied. m increases from 1 to 10 in steps of 1, when m=1, DIM-GAA (1) and CIM-GAA (1) are degenerated to DIM and CIM, respectively. Fig. 2 shows attack success rates of DIM-GAA (1) and CIM-GAA (1) methods on various networks, respectively, and the dotted line and the solid line represent success rates of white-box and black-box attacks, respectively. From the results, as m increases, the success rate of white-box attacks is maintained at about 100%, while the success rate of black-box attacks is gradually increased. Furthermore, for all attacks, if m is small, i.e. gradient averaging using only a small number of transformed inputs, the black box success rate can be significantly increased. This phenomenon indicates the importance of gradient averaging during the attack. Intuitively, a greater number of transformed image copies suggests better countermeasure migration. However, a larger value of m requires a higher calculation cost, and thus, the calculation cost and the attack success rate can be balanced by setting different values of m. Considering both factors together, m can then be given a value of 5.
Transition probability p: the effect of the transition probability p on attack success rate at white and black box settings was studied. The transition probability p is transformed between 0 and 1 in steps of 0.1. When p=0, both DIM-GAA (1) and CIM-GAA (1) degenerate into MI-FGSM. Fig. 3 shows the success rates of DIM-GAA (1) and CIM-GAA (1) attacks on various networks, respectively, with the dashed and solid lines representing the success rates of white-box and black-box attacks, respectively. From the results, the trend of changes in DIM-GAA (1) and CIM-GAA (1) as p increases was generally similar. With the increase of p, the black box attack success rate of the DIM-GAA (1) and the CIM-GAA (1) is gradually increased, and the white box attack success rate is only slightly reduced but is also kept above 98.6%. When p=1, the white-box attack success rate of DIM-GAA (1) is at least 98.6%, while the white-box attack success rate of DIM (the result is shown in table 1) is 97.1%, and the result shows that the introduction of gradient average can better balance white-box and black-box attacks, so that the white-box and black-box success rate of the challenge sample is improved at the same time.
5. Expansion experiment
A large number of expansion experiments are utilized to comprehensively analyze and evaluate the scheme. Firstly, comparing the scheme with the most advanced migration attack method in attack success rate; secondly, comparing and analyzing the image quality of the countermeasure sample generated by different attack methods; and finally, comparing the time consumption of the baseline attack method and the corresponding gradient average attack method.
Compared with the success rate of the most advanced migration attack method: in order to better evaluate the attack performance of the scheme, gradient average attack and the most advanced migration attack method S are adopted 2 I-SI-TI-DIM was compared. Experimental results are shown in table 5, from which it can be seen that by combining with the strong attack method, the scheme achieves a higher attack success rate. For example, S 2 The attack success rate of I-SI-TI-DI-RIM-GAA (1) on Res-101 was 92.4%, while S 2 The I-SI-TI-DIM is 89.2%, and the outstanding results highlight the strong attack performance of the scheme.
Table 5 attack success (%) against attacks on seven models under a single model setup. Challenge samples were generated on Inc-v 3. * Representing a white box attack.
Contrast of the quality of the challenge sample images generated by different attack methods: table 6 reports the image quality of the generated challenge samples following different methods of typical three perception metrics. The three indexes are respectively: peak Signal-to-Noise Ratio (PSNR), structural similarity index measurement (Structural Similarity Index Measure, SSIM), and learning perceived image patch similarity (Learned Perceptual Image Patch Similarity, LPIPS). From the results, the scheme has better image quality compared with the countermeasure sample generated by the baseline attack method. Further illustrating the superiority of the scheme. For example, taking the SSIM index as an example, the score of CIM-GAA (2) in the present scenario is 0.678 at the highest, while the score of baseline attack CIM is 0.673.
Table 6. Image quality of the generated challenge samples was evaluated based on four criteria. Challenge samples were generated on Inc-v 3.
Attack time comparison of different attack methods: multiple transformations of the image are required in the experiment, and then the gradients of the multiple transformed images are averaged, which adds to some extent to the time cost of counteracting sample generation. Thus, baseline attacks and time consumption of the present protocol can be analyzed. Since the number of image transformations is the same in GAA (1) and GAA (2), the time consumption of both is almost the same, so that one is selected for presentation. The average time for the baseline challenge and corresponding GAA (2) method to generate one challenge sample is given in table 7. For example, with Inc-v3 as the surrogate model, DIM and DIM-GAA (2) take on average 0.089 seconds and 0.339 seconds, respectively, to generate a challenge sample, both of which are less time consuming. From the results, the gradient average attack method does increase the time consumption, but is generally acceptable, and in particular, the gradient average attack method greatly improves the attack success rate against the sample. Experiments were performed on a block of RTX 2080Ti GPU.
TABLE 7 average time(s) for generating a challenge sample on Inc-v3, inc-v4, inRes-v2 and Res-101, respectively. The left side of the diagonal line is the time required for the baseline method and the right side is the time required for the proposed method.
Compared with the existing diversified input migration attack method, the gradient average attack provided by the scheme has higher attack success rate on the normal training model and the countermeasure training model. In addition, the scheme is verified on diversified input migration attacks of other random transformations (translation, clipping, rotation and the like), the effectiveness and superiority of the scheme are further emphasized, the potential of gradient averaging in improving the anti-migration performance is revealed, and valuable insight is provided for further exploration of the anti-attack field.
The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The elements and method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or a combination thereof, and the elements and steps of the examples have been generally described in terms of functionality in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different methods for each particular application, but such implementation is not considered to be beyond the scope of the present invention.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the above methods may be performed by a program that instructs associated hardware, and that the program may be stored on a computer readable storage medium, such as: read-only memory, magnetic or optical disk, etc. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits, and accordingly, each module/unit in the above embodiments may be implemented in hardware or may be implemented in a software functional module. The present invention is not limited to any specific form of combination of hardware and software.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An artificial intelligence image recognition attack resistance method based on gradient average, which is characterized by comprising the following steps:
acquiring a pre-trained image recognition deep neural network model and an original input sample in model pre-training, and setting iteration optimization termination conditions and an iteration optimization target loss function in an attack;
in each iterative optimization, carrying out multiple image transformations on an original input sample and obtaining image samples after multiple image transformations; obtaining model output of a transformed image sample by using a pre-trained model, obtaining cross entropy of the model output and a sample input real label in an original sample set based on a target loss function, solving gradient of the transformed image sample by using cross entropy loss, calculating average value of a plurality of transformed image sample gradients, solving anti-disturbance by using the gradient average value, and adding the anti-disturbance to the original input sample in next round of iterative optimization;
and obtaining a gradient average value under the last iteration round based on the iteration termination condition, and generating a countermeasure sample in the countermeasure attack by using the gradient average value so as to perform the countermeasure attack test on the pre-trained image recognition depth neural network model by using the countermeasure sample.
2. The method for recognizing and countering attacks based on artificial intelligence image based on gradient averaging according to claim 1, wherein the set iterative optimization termination condition is a maximum iterative optimization round or an iterative optimization time threshold.
3. The method of gradient-averaging-based artificial intelligence image recognition challenge attack of claim 1, wherein setting a process for iteratively optimizing a target loss function in a challenge attack comprises:
firstly, constructing a model cross entropy loss function based on an original input sample, a corresponding real label and a pre-trained image recognition depth neural network model parameter;
then, converting the antagonistic sample generation into a conditional constraint optimization problem of a model cross entropy loss function, wherein the conditional constraint of the model cross entropy loss function comprises: a challenge sample that is visually indistinguishable from the original sample input is generated by maximizing a model cross entropy loss function, and the challenge sample is utilized to misguide the pre-trained image recognition deep neural network model to output error categories that are inconsistent with the corresponding real labels.
4. A gradient-averaging-based artificial intelligence image recognition challenge method according to claim 1 or 3, characterized in that the iterative optimization objective loss function is expressed as:
wherein x and y are respectively original sample input and corresponding real labels, theta is a model parameter of a pre-trained image recognition depth neural network model, J (theta, x, y) is a model cross entropy loss function, and x is a model cross entropy loss function adv To combat the samples, m represents the number of samples of the randomly transformed image during each iteration, T (x adv P) is given by a given probability p over x adv A random transformation function that performs a random transformation.
5. The method for recognizing attack resistant based on artificial intelligence image based on gradient averaging according to claim 4, further comprising, in each iterative optimization:
setting image transformation times in the iterative optimization process according to the sampling number of the random transformation image, and acquiring gradient average values of a plurality of transformed images sampled under the same transformation condition according to the image transformation times so as to solve the disturbance resistance by utilizing the gradient average values.
6. A gradient-averaging-based artificial intelligence image recognition challenge method according to claim 1 or 3, characterized in that the iterative optimization objective loss function is expressed as:
wherein x and y are respectively original sample input and corresponding real labels, theta is a model parameter of a pre-trained image recognition depth neural network model, J (theta, x, y) is a model cross entropy loss function, and x is a model cross entropy loss function adv To combat the samples, m represents the number of samples of the randomly transformed image during each iteration, T (x adv P) is given by a given probability p over x adv Random transformation function, T, for performing random transformation i (x advi ) For randomly transforming amplitude parameter sigma in the ith sampling of iterative process i For x adv A random transformation function that performs a random transformation.
7. The method for recognizing attack resistant based on artificial intelligence image based on gradient averaging according to claim 6, further comprising, in each iterative optimization:
setting the image transformation times in the iterative optimization process according to the sampling number of the random transformation image, uniformly setting the transformation amplitude of the image transformation for a plurality of times, and obtaining the gradient average value of the images after the transformation, which are uniformly set with the image transformation amplitude from small to large, according to the image transformation times and the transformation amplitude so as to solve the disturbance resistance by utilizing the gradient average value.
8. An artificial intelligence image recognition attack-combating system based on gradient averaging, comprising: a data acquisition module, an iterative optimization module and an countermeasure generation module, wherein,
the data acquisition module is used for acquiring a pre-trained image recognition deep neural network model and an original input sample in model pre-training, and setting iteration optimization termination conditions and an iteration optimization target loss function in an attack;
the iterative optimization module is used for obtaining anti-disturbance by using an iterative optimization algorithm, and in each iterative optimization, carrying out multiple image transformations on an original input sample and obtaining image samples after multiple image transformations; obtaining model output of a transformed image sample by using a pre-trained model, obtaining cross entropy of the model output and a sample input real label in an original sample set based on a target loss function, solving gradient of the transformed image sample by using cross entropy loss, calculating average value of a plurality of transformed image sample gradients, solving anti-disturbance by using the gradient average value, and adding the anti-disturbance to the original input sample in next round of iterative optimization;
and the antagonism generation module is used for acquiring a gradient average value under the last iteration round based on the iteration termination condition, and generating a antagonism sample in the antagonism attack by using the gradient average value so as to perform the antagonism attack test on the pre-trained image recognition depth neural network model by using the antagonism sample.
9. An electronic device, comprising: at least one processor, and a memory coupled to the at least one processor;
wherein the memory stores a computer program executable by the at least one processor to implement the artificial intelligence image recognition challenge method of any of claims 1 to 7.
10. A computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when the computer program is executed, the artificial intelligence image recognition attack countermeasure method according to any one of claims 1 to 7 can be implemented.
CN202311115161.0A 2023-08-31 2023-08-31 Artificial intelligent image recognition attack resistance method and system based on gradient average Pending CN117079053A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311115161.0A CN117079053A (en) 2023-08-31 2023-08-31 Artificial intelligent image recognition attack resistance method and system based on gradient average

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311115161.0A CN117079053A (en) 2023-08-31 2023-08-31 Artificial intelligent image recognition attack resistance method and system based on gradient average

Publications (1)

Publication Number Publication Date
CN117079053A true CN117079053A (en) 2023-11-17

Family

ID=88702207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311115161.0A Pending CN117079053A (en) 2023-08-31 2023-08-31 Artificial intelligent image recognition attack resistance method and system based on gradient average

Country Status (1)

Country Link
CN (1) CN117079053A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407690A (en) * 2023-12-14 2024-01-16 之江实验室 Task execution method, device and equipment based on model migration evaluation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407690A (en) * 2023-12-14 2024-01-16 之江实验室 Task execution method, device and equipment based on model migration evaluation
CN117407690B (en) * 2023-12-14 2024-03-22 之江实验室 Task execution method, device and equipment based on model migration evaluation

Similar Documents

Publication Publication Date Title
CN111460426B (en) Deep learning resistant text verification code generation system and method based on antagonism evolution framework
CN113449783B (en) Countermeasure sample generation method, system, computer device and storage medium
CN117079053A (en) Artificial intelligent image recognition attack resistance method and system based on gradient average
KR20210081769A (en) Attack-less Adversarial Training for a Robust Adversarial Defense
CN111488904A (en) Image classification method and system based on confrontation distribution training
CN113033822A (en) Antagonistic attack and defense method and system based on prediction correction and random step length optimization
CN110855716B (en) Self-adaptive security threat analysis method and system for counterfeit domain names
CN115830369A (en) Countermeasure sample generation method and system based on deep neural network
Qiu et al. Fencebox: A platform for defeating adversarial examples with data augmentation techniques
Chen et al. Patch selection denoiser: An effective approach defending against one-pixel attacks
CN115147682A (en) Method and device for generating concealed white box confrontation sample with mobility
CN115619616A (en) Method, device, equipment and medium for generating confrontation sample based on watermark disturbance
Macas et al. Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems
CN113034332A (en) Invisible watermark image and backdoor attack model construction and classification method and system
Short et al. Defending Against Adversarial Examples.
CN113487506B (en) Attention denoising-based countermeasure sample defense method, device and system
CN115620100A (en) Active learning-based neural network black box attack method
CN114861796A (en) Confrontation sample mixed defense method aiming at large-size image classification
Chakraborty et al. Dynamarks: Defending against deep learning model extraction using dynamic watermarking
Peterson et al. The importance of generalizability for anomaly detection
Wu et al. Detecting Adversarial Examples Using Rich Residual Models to Improve Data Security in CNN Models
CN117689954A (en) Artificial intelligent image recognition attack resistance method and system based on image average
CN113569897B (en) Anti-sample defense method for obtaining low-frequency information based on fixed pixel points
CN115527084A (en) Intelligent system confrontation sample generation method and system based on diversified input strategy
Xu On the Neural Representation for Adversarial Attack and Defense

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination