CN110969242A

CN110969242A - Defense method for generating general inverse disturbance based on generative confrontation

Info

Publication number: CN110969242A
Application number: CN201911183790.0A
Authority: CN
Inventors: 陈晋音; 朱伟鹏; 吴长安
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2020-04-07

Abstract

The invention discloses a defense method for generating general inverse disturbance based on generative countermeasures, which comprises the following steps: (1) building a generative confrontation network, wherein the generative confrontation network comprises a generative model and a discrimination model; (2) various attack methods are utilized to obtain a relatively comprehensive confrontation sample; (3) pre-training a generation model and a discrimination model in the generative countermeasure network by adopting a normal sample; (4) establishing a loss function of the generated model, and training the generated model; (5) establishing a loss function of the discrimination model, and training the discrimination model; (6) repeating the steps (4) and (5) until the iteration number reaches a preset upper limit value or the loss functions of the two models reach a preset threshold value; (7) and detecting and applying the performance index of the trained generated model. The invention completes the feature extraction of the general disturbance distribution by building a proper generative countermeasure network, and generates proper general inverse disturbance, thereby improving the robustness of the model.

Description

Defense method for generating general inverse disturbance based on generative confrontation

Technical Field

The invention belongs to the technical field of deep learning safety, and particularly relates to a defense method for generating general inverse disturbance based on generative confrontation.

Background

Deep learning can obtain more accurate classification results than general algorithms by learning and calculating potential relations of a large amount of data, and has strong feature learning capability and feature expression capability. Accordingly, the deep learning technique is widely applied to the field of artificial intelligence, including an automatic driving technique, an augmented reality technique, a computer vision, a biomedical diagnosis, a natural language processing technique, and the like. Deep learning utilizes a neural network with huge parameters to extract features, and fitting of mass data distribution is completed, so that good image processing capability is presented.

At present, deep learning technology is increasingly widely applied in the field of image recognition, including the use of convolutional neural networks to complete the detection and recognition of target objects, the use of mtcnn (Multi-task shielded connected networks) and faceNet to complete the recognition and detection of human faces, and the use of deep learning models to realize the classification of a large number of classes of objects. However, as suggested by szegdy et al, the depth model is vulnerable to subtle perturbations. Experiments show that even a deep learning model which is well trained is interfered by slight disturbance, so that a result with high confidence is generated on an error class mark. The existence of confrontation samples threatens the safety of the deep learning model, and people pay more and more attention to the deep learning and the application capability of the neural network in real life.

In the past, the defensive measures added to the model to resist the countersample are generally: and (3) resisting training, or adding scaling and displacement in front of the model to reduce the influence of attack disturbance. However, these defense methods weaken the harmfulness of a certain part against disturbance to a certain extent through the feature extraction capability of the deep learning model. Omid Poursaned et al, in the "general adaptive characteristics", propose that there may be a general disturbance to the deep learning model, forcing the model to generate an offset in recognition and classification, and making an erroneous classification result. In the face of general disturbances, the above-mentioned defense methods are difficult to effectively resist.

Disclosure of Invention

In order to improve the defense capability of various models in various known or unknown attack methods, the invention provides a defense method for generating general inverse disturbance based on generative confrontation.

The technical scheme of the invention is as follows:

a defense method for generating general inverse disturbance based on generative confrontation is characterized by comprising the following steps:

(1) building a generative confrontation network GAN, wherein the generative confrontation network comprises a generative model G and a discrimination model D based on a convolutional neural network; the generation model G learns the characteristic distribution of the universal inverse disturbance through parameters of a neural network, and adds the universal inverse disturbance into the sample to generate a benign sample; the discrimination model D is used for judging the confidence of a benign sample generated by the generation model;

(2) attacking the normal sample by using a plurality of attack methods to generate an antagonistic sample containing a plurality of types of disturbances;

(3) pre-training a generation model G and a discrimination model D in a generative confrontation network by adopting normal samples;

(4) establishing a loss function of the generating model G, and training the generating model G;

(5) establishing a loss function of the discrimination model D, and training the discrimination model D;

(6) repeating the steps (4) and (5) until the iteration number reaches a preset upper limit value or the loss functions of the two models reach a preset threshold value;

(7) and detecting the performance index of the trained generated model G, applying the model G after detection, inputting the sample to be classified into the deep learning model for classification after the trained generated model G is processed, so that the confrontation sample in the sample to be classified can be correctly identified, and the defense on the confrontation sample is completed.

In the invention, an unsupervised learning mode is adopted for training, and the thoughts of zero-sum games in the game theory are matched, the games are continuously played by generating a model G (Generator) and a discrimination model D (discriminator), so that G learns the distribution of data, and if image generation is used, G can generate a vivid confrontation image from a section of random number after training is finished. The recognition capability of the discrimination model D is improved through continuous training, so that the capability of generating the model G is improved in the game. Meanwhile, the network complexity of the generated model and the discriminant model is ensured to be similar as much as possible, and the effect of maximum game training can be further ensured to be achieved as much as possible in the process of double-model mutual training.

In the step (1), the generative confrontation network adopts the thought of zero-sum game in the game theory, so that the generative model tends to be perfect in the continuous game process of the sum discriminant model. Therefore, under the limitation of the structural complexity of the deep learning network, the structural complexity of the generated model and the structure complexity of the discriminant model are similar to each other to achieve a possibly good training effect. This not only ensures the dynamic balance of the two during training, but also enables the overall structure to move faster towards the final nash equilibrium point.

In step (2), the multiple attack methods include depfool attack (minimum canonical countermeasure disturbance is generated by iterative computation), Jacobian-based salenecy Map attack (countermeasure attack is generated by limiting l0 norm of disturbance), BOUNDARY attack (a decision-based attack that starts with a large countermeasure disturbance and then tries to reduce disturbance while keeping countermeasures), general disturbance attack (disturbance attack that pushes out images from classification boundaries by using countermeasure disturbance and generates disturbance that implements attack on any image), FGSM attack (disturbance attack that optimizes disturbance by using high dimensional linearity of deep neural network design, and confidence coefficient calculation), and ZOO attack (an attack method that optimizes disturbance by using estimation gradient).

The method adopts various kinds of counterattack of black boxes and white boxes to generate countersamples, which are recorded as perturb and used for training the GAN model, so as to make up for possible defects of general disturbance and ensure the maximum generalization of the counterattack disturbance as much as possible, thereby ensuring that the finally generated general inverse disturbance is strong enough.

In the step (3), the pre-training is to ensure the balance with the game of the generated model as far as possible, and the training discrimination model has certain discrimination capability on true and false samples, so that the phenomenon of model collapse in alternate training is avoided. Because the training of the generated model only uses the feedback of the discriminant model as a standard, namely the generated confrontation sample is not good and good, only the evaluation of the confrontation sample by the discriminant model is considered. If the discriminant model enters the confidence range of the countermeasure sample for some unknown reasons at the beginning, the two network structures will continuously cheat each other in the training process, so that the finally generated sample lacks some information, has incomplete features and cannot be used as a proper countermeasure sample.

The specific process of the step (4) is as follows:

(4-1) taking the adversarial sample perturb as the input of the generation model G to obtain a generated benign sample which is marked as perturb';

(4-2) taking the generated benign sample perturb 'as the input of a discrimination model D, and obtaining confidence feedback, and marking as conv _ per';

(4-3) calculating a loss function G _ loss of the generative model according to the following formula using the obtained confidence feedback,

G_loss＝||conv_per'-G_best||₂

g _ best represents confidence feedback which is required to be obtained when all generated samples generated by the generation model are judged to be benign samples in the optimal state;

(4-4) repeating the step (4-1) to the step (4-3) according to the iteration number G _ iter of the generated model training;

and (4-5) comparing the variation of the loss function G _ loss in the training process of the next generated model with the variation of the loss function D _ loss in the training process of the previous discriminant model, and adjusting the iteration number G _ iter of the next generated model. The specific way of adjusting the iteration number G _ iter of the next training generation model is as follows:

if the G _ loss variation is obviously smaller than the D _ loss variation, amplifying the G _ iter by using an offset coefficient rho without considering the deviation of the model strength; otherwise, the G _ iter is scaled down by the offset coefficient ρ. The criterion for specifically determining whether the variation of G _ loss is significantly smaller than the variation of D _ loss is different according to different models, for example, the difference between the two variations is greater than 1/10, 1/5, or 1/3.

The specific process of the step (5) is as follows:

(5-1) taking the normal sample as the input of the discriminant model, and obtaining the confidence feedback of the discriminant model, and marking as conv _ good;

(5-2) taking the countermeasure sample perturb as the input of the discriminant model, and obtaining the confidence feedback of the discriminant model, and marking as conv _ per;

(5-3) taking benign sample perturb 'generated after the antagonistic sample perturb is input into the generation model as the input of the discrimination model, and obtaining confidence feedback, and marking as conv _ per';

(5-4) using the confidence feedback of the obtained normal sample, benign sample and confrontation sample, calculating the loss function of the discriminant model according to the following formula, which is marked as D _ loss,

Dreal_loss＝||conv_good-D_best||₂+||conv_per'-D_best||₂

Dfake_loss＝||conv_per-G_worst||₂

D_loss＝Dreal_loss+Dfake_loss

wherein, D _ best represents confidence feedback which should be obtained when all real samples are judged to be benign in an optimal state; g _ best represents that all the countermeasure signals are judged as false signals under the optimal condition

(5-5) repeating the steps (5-1) to (5-4) according to the iteration number D _ iter trained by the discriminant model;

and (5-6) comparing the variation of the D _ loss in the discriminant model training process with the variation of the G _ loss in the previously generated model training process to adjust the iteration number D _ iter of the next discriminant model. The specific way of adjusting the iteration number D _ iter of the next discriminant model is as follows:

if the D _ loss variation is obviously smaller than the G _ loss variation, amplifying the D _ iter by using an offset coefficient rho without considering the deviation of the model strength; otherwise, the G _ iter is scaled down by the offset coefficient ρ. The criterion for specifically determining whether the D _ loss variation is significantly smaller than the G _ loss variation is different according to different models, for example, the difference between the two variations is greater than 1/10, 1/5, or 1/3.

In the step (7), the content of the performance index detection comprises the defense characteristics of the random sample and the defense success rate of the sample set; the defense characteristics of the random sample refer to that an antagonistic sample is randomly selected, the change conditions of class marks and confidence degrees of the antagonistic sample after the general inverse disturbance is added are observed, if the class marks are changed into the class marks of a benign sample, the defense effect of the general inverse disturbance reaches the standard, and meanwhile, the higher the confidence degree is, the stronger the defense capability of the general inverse disturbance is; the defense success rate of the sample set is to add general inverse disturbance to all the confrontation samples and check the proportion of the samples with successful defense.

Compared with the prior art, the invention has the following beneficial effects:

the invention designs the general inverse disturbance by means of the characteristic of high degree of freedom of a generating type confrontation network and skillfully utilizing the characteristic that the general disturbance can generate the confrontation attack to most images, and the defense method can also complete the defense to most of the confrontation attack. The generation of the general inverse disturbance is performed by learning the general generation of the general inverse disturbance by using a high-degree-of-freedom generative countermeasure network. Therefore, the universal inverse disturbance generated by the invention does not need to borrow the feedback information of the model, and can defend various defense methods indiscriminately, so that the defense for the known attack and part of unknown attack is realized on the basis of not changing the internal structure of the model.

Drawings

FIG. 1 is a schematic flow chart of a defense method for generating a universal inverse disturbance based on generative confrontation according to the present invention;

FIG. 2 is a schematic diagram of a defense process against a resistant attack using the universal inverse perturbation of the present invention;

FIG. 3 is a general inverse perturbation generated for each challenge sample in an embodiment of the present invention.

Detailed Description

The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.

As shown in fig. 1, a defense method for generating a universal inverse perturbation based on generative confrontation includes the following steps:

step 1, building a proper generation type confrontation network structure GAN.

Two models in GAN: the method comprises the steps of generating a model G and a discrimination model D, wherein the generating model G learns the characteristic distribution of general inverse disturbance through a large number of parameters of a neural network and captures the data distribution of anti-interference samples, the discrimination model D is a recognition detection model and is used for estimating whether a sample belongs to a pure sample, and the recognition capability of the discrimination model D is improved through continuous training, so that the capability of generating the model G is improved in the game. Meanwhile, the network complexity of the generated model and the discriminant model is ensured to be similar as much as possible, and the effect of maximum game training can be further ensured to be achieved as much as possible in the process of double-model mutual training.

Because the generative confrontation network adopts the thought of the zero-sum game in the game theory, the generative model tends to be perfect in the continuous game process of the sum discriminant model. Therefore, under the limitation of the structural complexity of the deep learning network, the structural complexity of the generated model and the structure complexity of the discriminant model are similar to each other to achieve a possibly good training effect. This not only ensures the dynamic balance of the two during training, but also enables the overall structure to move faster towards the final nash equilibrium point.

And 2, obtaining a relatively comprehensive confrontation sample by utilizing various attack methods.

And various attack methods are utilized to obtain a relatively comprehensive confrontation sample. The countermeasure samples include those obtained under various kinds of counterattack such as deep pool, Jacobian-based salanec Map attach (JSMA) in addition to those generated by the general inverse perturbation, which is to ensure the maximum generalization of the counterattack and make the finally obtained general inverse perturbation as strong as possible.

And step 3, completing the pre-training of the discrimination model and the generation model in the GAN.

And (4) completing the pre-training work of the discrimination model and the generation model in the GAN by utilizing the real data set of the sample and the generated various samples. The pre-training is to ensure the balance of the game with the generated model as far as possible, and the training discrimination model has certain distinguishing capability for true and false samples, so that the phenomenon of model collapse in alternate training is avoided. Because the training of the generated model only uses the feedback of the discriminant model as a standard, namely the generated confrontation sample is not good and good, only the evaluation of the confrontation sample by the discriminant model is considered. If the discriminant model enters the confidence range of the countermeasure sample for some unknown reasons at the beginning, the two network structures will continuously cheat each other in the training process, so that the finally generated sample lacks some information, has incomplete features and cannot be used as a proper countermeasure sample.

And 4, training G by using the loss function of the convergence G.

(4-1) the antagonistic sample perturb is used as an input of the generation model, and the generated benign sample is obtained and is marked as perturb'.

(4-2) taking the generated benign sample as an input of the discriminant model and obtaining confidence feedback, which is denoted as conv _ per'.

And (4-3) calculating a loss function, G _ loss, of the generative model according to the formula (1) by using the obtained confidence feedback.

G_loss＝||conv_per'-G_best||₂

In the optimal case, G _ best means that the capability of generating the model is strong enough, and the generated benign samples should obtain confidence feedback, that is, all the generated samples are judged to be benign samples. This facilitates the optimization of the generated benign samples towards absolute benign samples, and also ensures that the confidence of the generated benign samples continues to be optimized. For this reason, it is a matter of course that when the discriminant model has good discriminant capability, the general inverse perturbation with poor effect is labeled with the class of the false signal. The invention takes the function of reducing the loss of the generated model as an index to train the generated model, so that the generated benign sample approaches to a real benign sample.

And (4-4) repeating the steps (4-1) to (4-3) according to the iteration number G _ iter of the generated model training.

And (4-5) comparing the variable quantity of G _ loss in the training process of the generated model with the variable quantity of D _ loss in the training process of the previously distinguished model to adjust the iteration number G _ iter of the generated model in the next training.

If the amount of change in G _ loss is significantly less than the amount of change in D _ loss, G _ iter should be scaled up by the offset factor ρ, regardless of the deviation in model strength. Otherwise, G _ iter should be scaled down by the offset factor ρ. This is to increase the balance of the two network structures in game, and avoid the model from collapsing.

And 5, training D by using the loss function of the convergence D.

(5-1) taking the normal sample as the input of the discriminant model, and obtaining the confidence feedback of the discriminant model, and marking as conv _ good.

(5-2) the confrontation sample perturb is taken as the input of the discriminant model, and the confidence feedback of the discriminant model is obtained and is denoted as conv _ per.

(5-3) generating a benign sample as an input of the discriminant model, and obtaining confidence feedback, which is denoted as conv _ per'.

And (5-4) calculating a loss function of the discriminant model according to the following formula by using the obtained confidence feedback for generating the benign signal, the benign sample and the confrontation sample, and recording the loss function as D _ loss.

Dreal_loss＝||conv_good-D_best||₂+||conv_per'-D_best||₂

Dfake_loss＝||conv_per-G_worst||₂

D_loss＝Dreal_loss+Dfake_loss

In the optimal case, D _ best represents that the capability of the discriminant model is strong enough, and the confidence feedback that the real samples should obtain, that is, all the real samples are discriminated as benign. And G _ best represents the condition that the capacity of the discrimination model is strong enough under the condition that the confidence feedback which should be obtained by the countermeasure sample is considered to be optimal, and all countermeasure signals are discriminated as false signals. This facilitates a more efficient discriminant ability of the discriminant model, which is added to game play, generally to make the generated signal with good antagonism more realistic. Dreal _ loss represents the loss of the discriminator due to the deviation of the confidence from the ideal value when a benign sample is input as the discriminator, and Dfake _ loss represents the loss of the discriminator due to the deviation of the confidence from the ideal value when a countermeasure sample is input as the discriminator.

And (5-5) repeating the steps (5-1) to (5-4) according to the iteration number D _ iter trained by the discriminant model.

And (5-6) adjusting the iteration number D _ iter of the next discriminant model according to the comparison between the variation of D _ loss in the discriminant model training process and the variation of G _ loss in the previous discriminant model training process.

If the amount of change in D _ loss is significantly less than the amount of change in G _ loss, D _ iter should be scaled up by the offset factor ρ, regardless of the deviation in model strength. Otherwise, G _ iter should be scaled down by the offset factor ρ. This is to increase the balance of the two network structures in game, and avoid the model from collapsing.

And 6, repeating the steps (4) to (5) until an iteration upper limit is reached or a better network structure is obtained.

And continuously iterating and searching for an optimal universal inverse disturbance generator according to the iteration times or the sample effect as requirements, and continuously improving the defense capability generated by the universal inverse disturbance.

Step 7, observing the performance index of the generated general inverse disturbance

The general inverse disturbance performance indexes mainly include the defense characteristics of random samples and the defense success rate of a sample set. The defense characteristic of the random sample is that an antagonistic sample is randomly selected, the change conditions of class marks and confidence degrees of the antagonistic sample after the general inverse disturbance is added are observed, if the class marks are changed into the class marks of the benign sample, the defense effect of the general inverse disturbance reaches the standard, and meanwhile, the higher the confidence degree is, the stronger the defense capability of the general inverse disturbance is. The defense success rate of the sample set is to add general inverse disturbance to all the confrontation samples and check the proportion of the samples which are successfully defended. The good general inverse disturbance can complete defense on a large number of confrontation samples, and the effect of defending partial unknown attacks can be realized on the basis of defending the existing attacks.

As shown in fig. 2, after the general inverse disturbance of the present invention is obtained, before any sample is input into the classification model, the general inverse disturbance is added to the sample to achieve the effect of filtering disturbance or preventing disturbance, thereby improving the robustness of the model.

As shown in fig. 3, the general inverse perturbation obtained by the method of the present invention using the mini data set is shown.

The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A defense method for generating general inverse disturbance based on generative confrontation is characterized by comprising the following steps:

(1) building a generative confrontation network, wherein the generative confrontation network comprises a generative model G and a discrimination model D based on a convolutional neural network; the generation model G learns the characteristic distribution of the universal inverse disturbance through parameters of a neural network, and adds the universal inverse disturbance into the sample to generate a benign sample; the discrimination model D is used for judging the confidence of a benign sample generated by the generation model;

2. The method according to claim 1, wherein in step (2), the attack methods include DeepFool attack, Jacobian-based Saliency Map attack, BOUNDARY attack, general perturbation attack, FGSM attack, and ZOO attack.

3. The defense method for generating general inverse perturbation based on generative confrontation according to claim 1, wherein the specific process of step (4) is as follows:

G_loss＝||conv_per'-G_best||₂

and (4-5) comparing the variation of the loss function G _ loss in the training process of the next generated model with the variation of the loss function D _ loss in the training process of the previous discriminant model, and adjusting the iteration number G _ iter of the next generated model.

4. The defense method for resisting and generating general inverse disturbance based on generative mode as claimed in claim 1, wherein in step (4-5), the iterative number G _ iter of next training generative model is adjusted by:

if the G _ loss variation is obviously smaller than the D _ loss variation, amplifying the G _ iter by using an offset coefficient rho without considering the deviation of the model strength; otherwise, the G _ iter is scaled down by the offset coefficient ρ.

5. The defense method for generating general inverse perturbation based on generative confrontation according to claim 1, wherein the specific process of step (5) is as follows:

Dreal_loss＝||conv_good-D_best||₂+||conv_per'-D_best||₂

Dfake_loss＝||conv_per-G_worst||₂

D_loss＝Dreal_loss+Dfake_loss

wherein, D _ best represents confidence feedback which should be obtained when all real samples are judged to be benign in an optimal state; g _ worst represents that all the confrontation signals are judged as false signals under the optimal condition;

and (5-6) comparing the variation of the D _ loss in the discriminant model training process with the variation of the G _ loss in the previously generated model training process to adjust the iteration number D _ iter of the next discriminant model.

6. The method for defending against the generation of universal inverse perturbation based on generative countermeasures according to claim 5, wherein in the step (5-6), the iterative number D _ iter of the next discriminant model is adjusted by:

if the D _ loss variation is obviously smaller than the G _ loss variation, amplifying the D _ iter by using an offset coefficient rho without considering the deviation of the model strength; otherwise, the G _ iter is scaled down by the offset coefficient ρ.

7. The defense method for generating universal inverse perturbation based on generative confrontation as claimed in claim 1, wherein in step (7), the content of performance index detection comprises the defense characteristics of random samples and the defense success rate of sample set; the defense characteristics of the random sample refer to that an antagonistic sample is randomly selected, the change conditions of class marks and confidence degrees of the antagonistic sample after the general inverse disturbance is added are observed, if the class marks are changed into the class marks of a benign sample, the defense effect of the general inverse disturbance reaches the standard, and meanwhile, the higher the confidence degree is, the stronger the defense capability of the general inverse disturbance is; the defense success rate of the sample set is to add general inverse disturbance to all the confrontation samples and check the proportion of the samples with successful defense.