CN116883736A

CN116883736A - Challenge defense method based on difficulty guiding variable attack strategy

Info

Publication number: CN116883736A
Application number: CN202310831043.3A
Authority: CN
Inventors: 何仕远; 位纪伟; 杨阳
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-10-13

Abstract

The invention discloses a challenge defense method based on a difficulty guiding variable attack strategy, which comprises the following steps of firstly, according to an image x _i Class loss function of (2)Determining a difficulty threshold ρ for an image _i Then according to the difficulty threshold value rho of the image _i Dynamically adjusting attack strategiesNumber of attack steps I _i And maximum disturbance strength e _i Attack strategyThe generation of the resistance samples is improved without depending on fixed parameters, each sample has a consistent contribution to the robustness of the target network, i.e. the target network, from a spatial distribution point of view, and attack information can be better learned to enhance the robustness of the target network. At the same time, the difficulty threshold ρ of the present invention _i The difficulty of the challenge sample for the challenge training increases according to the increase of the training times t, so that the robustness of the target network converges and approaches the robust boundary as the training proceeds. In addition, the invention eliminates images which are misclassified as outliers, thus reducing the negative effect of misclassification on the overall improvement of the robustness of the target network, and maintaining the original data structure as much as possible, thereby reducing the attenuation of the classification accuracy of the target network.

Description

Challenge defense method based on difficulty guiding variable attack strategy

Technical Field

The invention belongs to the technical field of countermeasure defense, and particularly relates to a challenge defense method based on a difficulty guiding variable attack strategy.

Background

Deep neural networks exhibit excellent performance in academic and industrial fields, however, they are susceptible to misleading of antagonistic samples, examples of which are created by introducing almost imperceptible perturbations to benign images. In recent years, many studies have focused on the generation of resistant samples, and several practical applications of deep networks have proven to be vulnerable to resistant samples, such as image classification, object detection, and neural machine translation. The sensitivity of deep networks to challenge samples raises concerns about artificial intelligence security and presents new challenges for implementation of deep learning.

The goal of the counterdefense approach is to improve the vulnerability of existing deep learning target networks when they are attacked. The existing methods for solving the challenge sample can be widely classified into three types, a preprocessing method, an improved neural network structure and target network enhancement using external information. The preprocessing method aims at improving the robustness of the target network by applying data enhancement or filtering techniques in the training process; improving the neural network structure involves modifying the architecture or training method to increase robustness; target network enhancement using external information involves using external target networks or knowledge to enhance the robustness of the target network.

At present, countermeasure training is considered to improve the robustness of the deep learning target networkThe most effective method belongs to a method for enhancing a target network by using external information.Countermeasure trainingThe target network enhancement is performed by incorporating the challenge samples (Adversarial Examples, AEs) generated by the attack methods, such as the fast gradient notation method (Fast Gradient Sign Method, FGSM) and the projection gradient descent (Projected Gradient Descent, PGD) into the training data.

One common phenomenon faced by challenge training, while capable of enhancing the robustness of challenge samples, typically results in a reduction in standard accuracy when subjected to natural non-challenge data testing. This tradeoff between robustness and accuracy is an important concern in the area of challenge learning. It not only affects the utility of existing methods, but also highlights competing relationships in the countermeasure training due to inconsistent challenge and natural data distribution, which can have a significant impact on the training process and present considerable difficulties. A number of existing studies have attempted to explain the potential causes of this phenomenon from the training phase perspective, including sharp losses, gradient shading, and so forth. From the perspective of challenge sample generation, two major factors contribute to this problem. First, individual variability of clean samples results in different challenge samples generated using the same attack strategy, which produces different contributions to the robustness of the target network during the challenge training process (aggressive data is closer to the decision boundary, protected data is farther from the decision boundary). Secondly, the introduction of the disturbance resistance destroys the basic structure of the original data, thereby affecting the accuracy of the target network. Therefore, how to solve these two factors is critical to improving robustness and accuracy.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides an anti-defense method based on a difficulty guiding variable attack strategy, so that the robustness of each clean sample to the deep neural network is uniformly contributed, attack information is better learned to enhance the robustness of the deep neural network, meanwhile, the damage to the original data structure of the clean sample is relieved, and the accuracy of the deep neural network is improved.

In order to achieve the above object, the challenge defense method based on the difficulty guiding variable attack strategy of the present invention is characterized by comprising the steps of:

(1) Determining the maximum training times T, wherein the initial training times t=1

(2) Difficulty guided challenge sample generation

2.1 Generating a sample-dependent attack strategy)

In the training data set, a batch of n images is taken, and for the ith image x _i Image x under a given standard challenge _i Initial attack strategy of (a)Generating initial challenge samples by standard challenge attacksWherein g (·) is the standard challenge sample generator, < >>For disturbance, θ is a trainable parameter of a target network, which is a deep neural network designed for image classification, ++>Representing the initial number of attack steps->Representing an initial maximum disturbance intensity;

then, the and image x is generated _i Related attack strategiesWhen initially countering sample x _i ⁱⁿ Classification result of->Equal to clean sample, image x _i Classification result of->Attack strategy->The method comprises the following steps:

when initially countering sample x _i ⁱⁿ Classification result of (2)Not equal to clean sample, image x _i Classification result of->Attack strategy->The method comprises the following steps:

wherein I is _i For attack step number, E _i For maximum disturbance intensity, f _θ () Predictive probability value, K, representing a trainable parameter θ for an input sample for a target network _I Represents the selectable maximum attack step number, K in attack strategy _∈ The upper limit value of the selectable maximum disturbance intensity in the attack strategy is represented by a clip (·, min, max) which represents the value of the limiting variable · is [ min, max ]]Within the range ρ _i As the difficulty threshold, the larger the difficulty threshold is, the larger the image difficulty is, according to the image x _i Class loss function of (2)And (3) determining:

where β, γ is the scaling weight used to ensure the difficulty threshold ρ _i The method meets the following conditions:

2.2 Generating an challenge sample)

Sample correlation, image x _i Is greater than the challenge sample x' _i The generation process is as follows:

(3) Training of target network

Image x _i Inputting into a target network for classification to obtain an image x _i Classification result of (2)If the classification result->Not equal to image x _i True class value y of (2) _i Then the sample x 'will be opposed' _i Replaced by image x _i About to fight sample x' _i Discard image x _i As a challenge sample x' _i Then n challenge samples x 'are used' _i Updating a trainable parameter theta of the target network;

then, another batch of n images is taken from the training data set, and n challenge samples x 'are generated according to the methods of the steps 2.1) and 2.2)' _i And updating the trainable parameter theta of the target network, so that the target network is updated continuously until all images in the training data set are taken out, and the sequential training is completed;

(4) And judging whether the training times T is equal to the maximum training times T, if so, finishing the training of the target network, otherwise, t=t+1, returning to the step (2), taking out the images in the training data set in batches again to generate countermeasure samples and training the target network.

The purpose of the invention is realized in the following way:

the invention relates to a challenge defense method based on a difficulty guiding variable attack strategy, which comprises the following steps of firstly, according to an image x _i Class loss function of (2)Determining a difficulty threshold ρ for an image _i Then according to the difficulty threshold value rho of the image _i Dynamically adjusting attack strategy->Number of attack steps I _i And maximum disturbance strength e _i Attack strategy->The generation of the resistance samples is improved without depending on fixed parameters, each sample has a consistent contribution to the robustness of the target network, i.e. the target network, from a spatial distribution point of view, and attack information can be better learned to enhance the robustness of the target network. At the same time, the difficulty threshold ρ of the present invention _i Will increase according to the increase of the training times t, so that the difficulty of the countermeasure sample for countermeasure training needs to be increased along with the progress of trainingThe lines are increasing to converge and approach the robustness of the target network to the robust boundary. In addition, the invention eliminates images which are misclassified as outliers, thus reducing the negative effect of misclassification on the overall improvement of the robustness of the target network, and maintaining the original data structure as much as possible, thereby reducing the attenuation of the classification accuracy of the target network.

Drawings

FIG. 1 is a schematic diagram of the process of countermeasure training;

FIG. 2 is a flow chart of one embodiment of the challenge defense method of the present invention based on a difficulty guided variable attack strategy;

FIG. 3 is a schematic diagram of one embodiment of the challenge defense method based on the difficulty guided variable attack strategy of the present invention;

FIG. 4 is a schematic diagram of difficulty threshold versus distance.

Detailed Description

The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

The standard challenge training (AT) method is defined as a very small maximum optimization problem, where the objective function includes minimizing the objective network loss function for all examples and maximizing the loss of worst case challenge samples in the disturbance set. The effectiveness of challenge training methods generally depends on the strength of the attack of the challenge sample generated. Although current defense approaches have made significant progress in improving robustness, most approaches still use fixed parameters to generate a challenge sample, e.g., PGD (Projected Gradient Descent) or its variants, and it is difficult to control the strength of the attack generated against the sample. Some studies explore different attack strategies for improving robustness in different training phases, such as course versus resistance training (CAT), dynamic resistance training (DART), and friendly resistance training (FAT). These approaches aim to enhance the robustness of the target network against resistance attacks. They use manually designed metrics to assess the difficulty of challenge samples, but do not take into account the potential advantages that more customized and adaptive challenge sample generation methods may bring. Most methods of controlling challenge strength against a sample by manually developing an attack strategy therefore not only require expert knowledge, but also have limited improvement in robustness. Meanwhile, most methods in the prior art uniformly treat generated challenge samples without distinguishing individual characteristics thereof when updating target network parameters in the challenge training process. Furthermore, from a time point of view, the existing methods do not take into account the evolution characteristics of the challenge samples of the different training phases, these limitations seriously affecting the effectiveness of the training process.

The learnable strategy automatically generates a sample-based attack strategy against the learning method to obtain information of a specific sample and to overcome the influence of statistical differences between samples. However, it does not consider the underlying problem of challenge training, and therefore does not provide a guided attack strategy: 1) Clean sample variability resulting in different challenge examples generated using the same attack strategy; 2) Damage to the underlying data structure caused by the resistive disturbance.

In order to solve the problems, the invention innovatively provides a novel countermeasure training framework, and integrates the concept of 'difficulty guided attack strategy'. The present invention aims to improve the generation of resistant samples by dynamically adjusting the attack parameters according to the difficulty or degree of difficulty of each sample, instead of relying on fixed parameters. Not all challenge samples are of equal importance in challenge training. Some data points may be geometrically distant from class boundaries, making them easy to classify. Conversely, other data points may approach class boundaries, making it difficult to classify. This patent defines the difficulty of a sample based on the likelihood that the sample is misclassified. Samples that are more prone to misclassification are considered more difficult and they tend to be close to class boundaries.

As shown in fig. 1, the target network is assumed to have an accurate robust boundary theta ^* The course of the countermeasure training can be regarded as a continuous approximation of the boundary by the target network f. Therefore, the difficulty of the sample for countermeasure training isThe training is increased to converge the robustness of the target network and approach the robust boundary. From a spatial distribution perspective, the present invention expects a consistent contribution of each sample to the robustness of the target network in order to examine the direct impact of the attack methodology on the target network and to better learn the attack information to enhance the robustness of the target network. The method provided by the invention uses the predicted value of the target network to indirectly represent the strength of the sample, and obtains the attack strategy based on the sample according to the constraint condition. In addition, in order to alleviate the damage to the original data structure, the invention filters the samples according to the classification result of the clean samples, and by utilizing two types of constraints constructed from the angles of time and space distribution, the invention directly learns the influence of the attack method on the target network instead of the indirect influence related to the sample distribution, and the invention improves the generation of the countermeasure text to enhance the robustness and accuracy of DNN.

1. Re-thinking of the resistance training process

First, we re-examine the relationship between the resistance training and the standard training. Let θ sumRespectively a trainable parameter and a loss function of the DNN network f. Given input sample +.>And its associated label y, a class C dataset can be constructedWherein->In the context of natural training, many machine learning tasks can be formulated as optimization problems:

standard trainingThe main objective of (a) is to obtain a neural network with minimal risk of misclassification experience in the face of inputs from natural data distribution. However, standard trained target networks exhibit weak robustness, making them vulnerable to attack by the antagonistic example, which is documented in the relevant literature. Fight attacks adds a disturbance delta to the input that is imperceptible to humans _i To deceive DNN, generate an antagonistic example x _i +δ _i The objectives are generally as follows:

wherein f _θ (x _i +δ _i ) Representing the output of the network and,is a loss function. E is +.>Limitation of norms. { delta _i ∈Δ：||δ _i || _p And E is less than or equal to, wherein p can be 1,2 and infinity.

Challenge training is an effective method to increase robustness by training the target network on a challenge example. The main learning goal of standard challenge training (AT) is to train the neural network to minimize the risk of misclassification of the input samples AT a predefined disturbance strength. Opponents introduce an opponent disturbance to each sample during the opponent training process, and data sets are obtainedTransition to-> To mitigate vulnerability of machine learning target networks to challenge attacks, conventional challenge trainingThe method is generally aimed at optimizing the following objective functions:

as described above, in the countermeasure training process proposed by the present invention, two types of constraints are involved: 1) From a time perspective, the difficulty of the sample for the challenge training needs to increase gradually as the training proceeds. 2) From the perspective of spatial distribution, the robustness contribution of each sample to the target network should be consistent, and the difficulty of different challenge samples in the same training phase should be consistent. Let h (x' _i )∈[0，1]ρ and T e {0, 1..the, T } represent challenge samples x ', respectively' _i ＝x _i +δ _i Is a threshold of difficulty in countering samples and training times, wherein 1/ρ is E [0,1]Our new expression for an AT can be defined as:

2. challenge training based on difficulty guidance

The flow chart and schematic diagram of the method proposed by the invention are shown in fig. 2 and 3. The invention comprises two key components: challenge sample generation based on difficulty guidance and challenge training based on robust convergence rules. These two parts cooperate with each other to improve the robustness of the target network against the resistance example. The first process generates a challenge example for each sample specific feature using a difficulty-based attack strategy, while the challenge training process iteratively updates the target network parameters and controls the overall difficulty of the sample, ensuring that the target network converges towards a robust boundary. In general, these components constitute a new framework that enhances the robustness of the target network against resistance attacks.

Specifically, in this embodiment, as shown in fig. 2 and 3, the challenge defense method based on the difficulty guiding variable attack strategy of the present invention includes the following steps:

step S1: initialization of

The maximum training number T is determined, the initial training number t=1.

Step S2: difficulty guided challenge sample generation

And generating an antagonism sample based on the difficulty guidance. The proposed sample-dependent attack strategy generator based on difficulty guidance generates different strategies for different samples according to the classification effect and robustness of the target network in different training phases. Is provided withRepresents x _i Where M represents the number of parameters of the attack strategy, depending on the attack method used. In HGSD-AT, the difficulty of a sample is defined as the distance of the sample from the classification hyperplane. The process of adding the countermeasures to the disturbance aims at moving the original sample towards the classification boundary and beyond. Thus, we can observe that as the difficulty of the sample increases, so does the distance against the disturbance. Thus, we choose the parameters that have the strongest correlation with the distance traveled to construct the policy set.

Our method adjusts two key parameters (number of attack steps)And maximum disturbance strength e _i ∈∈＝{0，1，...，K _∈ -1 }) to guide the generation of the resistance samples and to increase the robustness of the target network. The two parameters are chosen to be closely related to the difficulty of the sample and are included in most attack methods, wherein the two parameters are chosen to be K respectively _I And K _∈ 。

To achieve the goal of equation (4), the relationship between the sample difficulty h (·) and the attack strategy S needs to be obtained by initially opposing the difficulty difference between the sample and the original sample. First, the greater the parameter value of the attack strategy, the further the distance between the challenge sample and the original sample. Second, the generation strategy for all these initial challenge samples is the same, as shown in FIG. 4. Thus, as the difficulty change is smaller, a longer distance is required to move to the target difficulty and the corresponding policy parameters are larger. For countermeasure samples far from the decision boundary, the difficulty of the samples needs to be increased; conversely, for samples close to the decision boundary, the difficulty needs to be reduced.

The direct use of the distance between the sample and the classification boundary to measure the difficulty is overly complex, since we do not need specific distance values. Instead, we can use the predicted probability value as a more practical measure. Next, we will explain in detail the process of acquiring a new challenge sample based on the predicted probability value. :

step S2.1: generating sample-dependent attack strategies

step S2.2: generating challenge samples

step S3: target network training

Image x _i Inputting into a target network for classification to obtain an image x _i Classification result of (2)If the classification result->Not equal to image x _i True class value y of (2) _i Then the sample x 'will be opposed' _i Replaced by image x _i About to fight sample x' _i Discard image x _i As a challenge sample x' _i Then n challenge samples x 'are used' _i The trainable parameter θ of the target network is updated.

Then, another batch of n images is taken out from the training data set, and n challenge samples x 'are generated according to the method of the steps S2.1 and S2.2' _i And updating the trainable parameter theta of the target network, so that the trainable parameter theta is continuously updated until all images in the training data set are taken out, and the sequential training is completed.

Step S3: judging whether the training times T is equal to the maximum training times T, if so, completing the training of the target network, otherwise, t=t+1, returning to the step S2, taking out the images in the training data set in batches again to generate the countermeasure sample and training the target network.

Experiment verification

To evaluate the effectiveness of the present invention, experiments were performed on three different databases in this experimental verification: CIFAR10, CIFAR100 and Tiny ImageNet. In the experimental section, PGD-AT stopped early was used as a basic model to verify the challenge defense approach of the present invention based on a difficulty guided variable attack strategy. The same training settings are used for the present invention and the base model, including data partitioning, target network training loss, and training parameter settings. The present invention (HGSD-AT) was compared to several baseline methods, such as PGDAT, TRADES, SAT, MART, CAT, DART, FAT, GAIRAT, AWP, LBGAT and LASAT. Furthermore, the present invention (HGSD-AT) was also compared to two comprehensive defense approaches combining LASAT with two existing representative approaches. This allows an assessment of the effectiveness of the challenge defense method of the present invention based on the difficulty guided variable attack strategy against a combination of various prior art techniques. In addition, in the experimental verification, methods of CAT, DART, FAT, LASAT and the like are selected, and different attack strategies are used in different training phases for more targeted comparison and analysis. Comparing the sample correlation method of the invention with the methods unrelated to the samples further proves that the invention utilizes the sample correlation concept to eliminate the influence of data distribution on the countermeasure training, so that the target network can learn the direct influence of the attack method on the target network.

In order to reflect the overall improvement of the test items, in the experimental verification, the classification result is evaluated by adopting the test accuracy. It represents the proportion of the target network that is correctly predicted in all predictions. In this experimental verification, a variety of challenge-resistant techniques, such as FGSM, PGD, C & W, were selected to test the trained target network. The maximum disturbance intensity is set to 8, the attack step length is set to 2 for all attack methods, the attack step number is set to 20 under the L-infinity norm, standard setting of countermeasure training is followed, and the cleaning accuracy and the robust accuracy are used as evaluation indexes. To evaluate the robustness against the target network, a normalized score calculated using a white-box attack set is used, called the average robustness score (Average Robusmess Score, ARS). ARS measures the success rate of defense against a series of white-box attacks, the higher the value the better. The set of attack methods for ARS includes FGSM, PGD and C & W. In this experimental verification, the degree of classification accuracy degradation (D-degree) relative to the original non-robust target network was also calculated.

Experimental results

TABLE 1

TABLE 2

The experimental results of this experiment on CIFAR-10 and CIFAR-100 are shown in tables 1 and 2. In addition, since the comparison methods are all based on PGD-AT improvements, in the present experimental verification, the results of AT were regarded as a benchmark, and differences in test accuracy (diff.) were reported. The invention (HGSD-AT) shows excellent performance in most attack scenarios, and solves the trade-off between accuracy and robustness. The invention (HGSD-AT) obtains the best robust performance under all attack scenes. For example, when the WRN34-10 is taken as a target network on the CIFAR-10, the robustness of the basic method PGD-AT is improved by about 20.36% and 20.11% by corresponding to the PGD and C & W attacks respectively. Meanwhile, compared with the original data, the accuracy is reduced by 3.65%, so that the network becomes the most accurate target network. Compared with the most advanced method AWP, the invention (HGSD-AT) also achieves excellent performance in terms of Average Robustness Score (ARS) which reaches 18.09%. We attribute these improvements to the use of attack strategies generated based on hardness guidance, rather than unguided attack strategies.

Furthermore, the robustness improvement in CIFAR-100 of the present invention (HGSD-AT) is more significant from the point of view of effectiveness improvement than in CIFAR-10, because CIFAR-100 has a higher category number and more complex attack scenarios. Specifically, the accuracy of the invention (HGSD-AT) in the face of PGD attack is improved by 9.64% and the accuracy in the face of C & W attack is improved by 12.08% on CIFAR-100 compared with the optimal method LBGAT in the comparison method. This shows that the present invention can successfully generate challenge samples that better reflect the attack strategy information.

LASAT is currently the most effective method compared to other existing defense methods that generate challenge samples based on non-fixed policies. However, the performance of the present invention (HGSD-AT) on both CIFAR-10 and CIFAR-100 datasets exceeds LASAT. Specifically, the robustness of HGSD-AT under PGD and C & W attacks was improved by 18.95% and 18.29% on CIFAR-10, respectively, and 11.62% and 11.61% on CIFAR-100, respectively, as compared to LASAT. We further compared the combination of the present invention (HGSD-AT) with LASAT with other methods (TRADES and AWP) using a fixed challenge sample generation strategy. The invention (HGSD-AT) is superior to TRADES+LASAT and AWP+LASAT, and the robustness score (ARS) on CIFAR-10 dataset is improved by 18.00% and 15.59%, respectively, and the robustness score on CIFAR-100 dataset is improved by 13.49% and 8.95%. The experimental result of the experimental verification clearly shows that the invention (HGSD-AT) is very effective in improving the robustness of the deep learning model, namely the target network, which is an important step for solving the attack vulnerability faced by the target network. The present invention (HGSD-AT) has the potential to improve the safety and reliability of deep learning methods by having superior performance compared to other existing defense methods.

TABLE 3 Table 3

This experiment verifies that the present invention (HGSD-AT) was also evaluated AT Tiny ImageNet, using PreactRENet 18 as the target network. Since Tiny ImageNet has more categories than CIFAR-10 and CIFAR-100, defenses against resistant samples become more challenging. To evaluate the effectiveness of the present invention (HGSD-AT), the present experiment verifies that the test was performed on four reference models and the results were compared to the previous, most advanced methods. The results obtained are shown in Table 3. Notably, the present invention (HGSD-AT) demonstrates improvements to clean data and robustness against challenge on all four reference models, indicating its effectiveness in enhancing the robustness of the model against challenge samples on challenging data sets.

The invention (HGSD-AT) achieves a significant improvement in robustness, 7.47% and 8.93% increase compared to PGT-AT and TRADES in the face of C & W attack. Furthermore, it increases by more than a fifth over the other two approaches in the face of C & W attacks. The present invention (HGSD-AT) still exhibits superior performance under other challenge-resistance approaches compared to existing approaches. For example, when FGSM and PGD attacks are used, its performance exceeds 12.08% and 10.33% for TRADES, respectively. Results on Tiny ImageNet verify that HGSD-AT also has achieved promising results on high quality image datasets and multiple categories.

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. The challenge defense method based on the difficulty guiding variable attack strategy is characterized by comprising the following steps:

(2) Difficulty guided challenge sample generation

2.1 Generating a sample-dependent attack strategy)

then, the and image x is generated _i Related toAttack strategyWhen initially countering sample x _i ⁱⁿ Classification result of->Equal to clean sample, image x _i Classification result of->Attack strategy->The method comprises the following steps:

2.2 Generating an challenge sample)

(3) Training of target network