CN116883736A - Challenge defense method based on difficulty guiding variable attack strategy - Google Patents
Challenge defense method based on difficulty guiding variable attack strategy Download PDFInfo
- Publication number
- CN116883736A CN116883736A CN202310831043.3A CN202310831043A CN116883736A CN 116883736 A CN116883736 A CN 116883736A CN 202310831043 A CN202310831043 A CN 202310831043A CN 116883736 A CN116883736 A CN 116883736A
- Authority
- CN
- China
- Prior art keywords
- challenge
- sample
- target network
- image
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 230000007123 defense Effects 0.000 title claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 90
- 230000008569 process Effects 0.000 claims description 16
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000001419 dependent effect Effects 0.000 claims description 4
- 238000013459 approach Methods 0.000 abstract description 14
- 230000006872 improvement Effects 0.000 abstract description 10
- 230000000694 effects Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 9
- 238000012795 verification Methods 0.000 description 8
- 238000012360 testing method Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000001965 increasing effect Effects 0.000 description 4
- 230000003042 antagnostic effect Effects 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008485 antagonism Effects 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Computer Hardware Design (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a challenge defense method based on a difficulty guiding variable attack strategy, which comprises the following steps of firstly, according to an image x i Class loss function of (2)Determining a difficulty threshold ρ for an image i Then according to the difficulty threshold value rho of the image i Dynamically adjusting attack strategiesNumber of attack steps I i And maximum disturbance strength e i Attack strategyThe generation of the resistance samples is improved without depending on fixed parameters, each sample has a consistent contribution to the robustness of the target network, i.e. the target network, from a spatial distribution point of view, and attack information can be better learned to enhance the robustness of the target network. At the same time, the difficulty threshold ρ of the present invention i The difficulty of the challenge sample for the challenge training increases according to the increase of the training times t, so that the robustness of the target network converges and approaches the robust boundary as the training proceeds. In addition, the invention eliminates images which are misclassified as outliers, thus reducing the negative effect of misclassification on the overall improvement of the robustness of the target network, and maintaining the original data structure as much as possible, thereby reducing the attenuation of the classification accuracy of the target network.
Description
Technical Field
The invention belongs to the technical field of countermeasure defense, and particularly relates to a challenge defense method based on a difficulty guiding variable attack strategy.
Background
Deep neural networks exhibit excellent performance in academic and industrial fields, however, they are susceptible to misleading of antagonistic samples, examples of which are created by introducing almost imperceptible perturbations to benign images. In recent years, many studies have focused on the generation of resistant samples, and several practical applications of deep networks have proven to be vulnerable to resistant samples, such as image classification, object detection, and neural machine translation. The sensitivity of deep networks to challenge samples raises concerns about artificial intelligence security and presents new challenges for implementation of deep learning.
The goal of the counterdefense approach is to improve the vulnerability of existing deep learning target networks when they are attacked. The existing methods for solving the challenge sample can be widely classified into three types, a preprocessing method, an improved neural network structure and target network enhancement using external information. The preprocessing method aims at improving the robustness of the target network by applying data enhancement or filtering techniques in the training process; improving the neural network structure involves modifying the architecture or training method to increase robustness; target network enhancement using external information involves using external target networks or knowledge to enhance the robustness of the target network.
At present, countermeasure training is considered to improve the robustness of the deep learning target networkThe most effective method belongs to a method for enhancing a target network by using external information.Countermeasure trainingThe target network enhancement is performed by incorporating the challenge samples (Adversarial Examples, AEs) generated by the attack methods, such as the fast gradient notation method (Fast Gradient Sign Method, FGSM) and the projection gradient descent (Projected Gradient Descent, PGD) into the training data.
One common phenomenon faced by challenge training, while capable of enhancing the robustness of challenge samples, typically results in a reduction in standard accuracy when subjected to natural non-challenge data testing. This tradeoff between robustness and accuracy is an important concern in the area of challenge learning. It not only affects the utility of existing methods, but also highlights competing relationships in the countermeasure training due to inconsistent challenge and natural data distribution, which can have a significant impact on the training process and present considerable difficulties. A number of existing studies have attempted to explain the potential causes of this phenomenon from the training phase perspective, including sharp losses, gradient shading, and so forth. From the perspective of challenge sample generation, two major factors contribute to this problem. First, individual variability of clean samples results in different challenge samples generated using the same attack strategy, which produces different contributions to the robustness of the target network during the challenge training process (aggressive data is closer to the decision boundary, protected data is farther from the decision boundary). Secondly, the introduction of the disturbance resistance destroys the basic structure of the original data, thereby affecting the accuracy of the target network. Therefore, how to solve these two factors is critical to improving robustness and accuracy.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides an anti-defense method based on a difficulty guiding variable attack strategy, so that the robustness of each clean sample to the deep neural network is uniformly contributed, attack information is better learned to enhance the robustness of the deep neural network, meanwhile, the damage to the original data structure of the clean sample is relieved, and the accuracy of the deep neural network is improved.
In order to achieve the above object, the challenge defense method based on the difficulty guiding variable attack strategy of the present invention is characterized by comprising the steps of:
(1) Determining the maximum training times T, wherein the initial training times t=1
(2) Difficulty guided challenge sample generation
2.1 Generating a sample-dependent attack strategy)
In the training data set, a batch of n images is taken, and for the ith image x i Image x under a given standard challenge i Initial attack strategy of (a)Generating initial challenge samples by standard challenge attacksWherein g (·) is the standard challenge sample generator, < >>For disturbance, θ is a trainable parameter of a target network, which is a deep neural network designed for image classification, ++>Representing the initial number of attack steps->Representing an initial maximum disturbance intensity;
then, the and image x is generated i Related attack strategiesWhen initially countering sample x i in Classification result of->Equal to clean sample, image x i Classification result of->Attack strategy->The method comprises the following steps:
when initially countering sample x i in Classification result of (2)Not equal to clean sample, image x i Classification result of->Attack strategy->The method comprises the following steps:
wherein I is i For attack step number, E i For maximum disturbance intensity, f θ () Predictive probability value, K, representing a trainable parameter θ for an input sample for a target network I Represents the selectable maximum attack step number, K in attack strategy ∈ The upper limit value of the selectable maximum disturbance intensity in the attack strategy is represented by a clip (·, min, max) which represents the value of the limiting variable · is [ min, max ]]Within the range ρ i As the difficulty threshold, the larger the difficulty threshold is, the larger the image difficulty is, according to the image x i Class loss function of (2)And (3) determining:
where β, γ is the scaling weight used to ensure the difficulty threshold ρ i The method meets the following conditions:
2.2 Generating an challenge sample)
Sample correlation, image x i Is greater than the challenge sample x' i The generation process is as follows:
(3) Training of target network
Image x i Inputting into a target network for classification to obtain an image x i Classification result of (2)If the classification result->Not equal to image x i True class value y of (2) i Then the sample x 'will be opposed' i Replaced by image x i About to fight sample x' i Discard image x i As a challenge sample x' i Then n challenge samples x 'are used' i Updating a trainable parameter theta of the target network;
then, another batch of n images is taken from the training data set, and n challenge samples x 'are generated according to the methods of the steps 2.1) and 2.2)' i And updating the trainable parameter theta of the target network, so that the target network is updated continuously until all images in the training data set are taken out, and the sequential training is completed;
(4) And judging whether the training times T is equal to the maximum training times T, if so, finishing the training of the target network, otherwise, t=t+1, returning to the step (2), taking out the images in the training data set in batches again to generate countermeasure samples and training the target network.
The purpose of the invention is realized in the following way:
the invention relates to a challenge defense method based on a difficulty guiding variable attack strategy, which comprises the following steps of firstly, according to an image x i Class loss function of (2)Determining a difficulty threshold ρ for an image i Then according to the difficulty threshold value rho of the image i Dynamically adjusting attack strategy->Number of attack steps I i And maximum disturbance strength e i Attack strategy->The generation of the resistance samples is improved without depending on fixed parameters, each sample has a consistent contribution to the robustness of the target network, i.e. the target network, from a spatial distribution point of view, and attack information can be better learned to enhance the robustness of the target network. At the same time, the difficulty threshold ρ of the present invention i Will increase according to the increase of the training times t, so that the difficulty of the countermeasure sample for countermeasure training needs to be increased along with the progress of trainingThe lines are increasing to converge and approach the robustness of the target network to the robust boundary. In addition, the invention eliminates images which are misclassified as outliers, thus reducing the negative effect of misclassification on the overall improvement of the robustness of the target network, and maintaining the original data structure as much as possible, thereby reducing the attenuation of the classification accuracy of the target network.
Drawings
FIG. 1 is a schematic diagram of the process of countermeasure training;
FIG. 2 is a flow chart of one embodiment of the challenge defense method of the present invention based on a difficulty guided variable attack strategy;
FIG. 3 is a schematic diagram of one embodiment of the challenge defense method based on the difficulty guided variable attack strategy of the present invention;
FIG. 4 is a schematic diagram of difficulty threshold versus distance.
Detailed Description
The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.
The standard challenge training (AT) method is defined as a very small maximum optimization problem, where the objective function includes minimizing the objective network loss function for all examples and maximizing the loss of worst case challenge samples in the disturbance set. The effectiveness of challenge training methods generally depends on the strength of the attack of the challenge sample generated. Although current defense approaches have made significant progress in improving robustness, most approaches still use fixed parameters to generate a challenge sample, e.g., PGD (Projected Gradient Descent) or its variants, and it is difficult to control the strength of the attack generated against the sample. Some studies explore different attack strategies for improving robustness in different training phases, such as course versus resistance training (CAT), dynamic resistance training (DART), and friendly resistance training (FAT). These approaches aim to enhance the robustness of the target network against resistance attacks. They use manually designed metrics to assess the difficulty of challenge samples, but do not take into account the potential advantages that more customized and adaptive challenge sample generation methods may bring. Most methods of controlling challenge strength against a sample by manually developing an attack strategy therefore not only require expert knowledge, but also have limited improvement in robustness. Meanwhile, most methods in the prior art uniformly treat generated challenge samples without distinguishing individual characteristics thereof when updating target network parameters in the challenge training process. Furthermore, from a time point of view, the existing methods do not take into account the evolution characteristics of the challenge samples of the different training phases, these limitations seriously affecting the effectiveness of the training process.
The learnable strategy automatically generates a sample-based attack strategy against the learning method to obtain information of a specific sample and to overcome the influence of statistical differences between samples. However, it does not consider the underlying problem of challenge training, and therefore does not provide a guided attack strategy: 1) Clean sample variability resulting in different challenge examples generated using the same attack strategy; 2) Damage to the underlying data structure caused by the resistive disturbance.
In order to solve the problems, the invention innovatively provides a novel countermeasure training framework, and integrates the concept of 'difficulty guided attack strategy'. The present invention aims to improve the generation of resistant samples by dynamically adjusting the attack parameters according to the difficulty or degree of difficulty of each sample, instead of relying on fixed parameters. Not all challenge samples are of equal importance in challenge training. Some data points may be geometrically distant from class boundaries, making them easy to classify. Conversely, other data points may approach class boundaries, making it difficult to classify. This patent defines the difficulty of a sample based on the likelihood that the sample is misclassified. Samples that are more prone to misclassification are considered more difficult and they tend to be close to class boundaries.
As shown in fig. 1, the target network is assumed to have an accurate robust boundary theta * The course of the countermeasure training can be regarded as a continuous approximation of the boundary by the target network f. Therefore, the difficulty of the sample for countermeasure training isThe training is increased to converge the robustness of the target network and approach the robust boundary. From a spatial distribution perspective, the present invention expects a consistent contribution of each sample to the robustness of the target network in order to examine the direct impact of the attack methodology on the target network and to better learn the attack information to enhance the robustness of the target network. The method provided by the invention uses the predicted value of the target network to indirectly represent the strength of the sample, and obtains the attack strategy based on the sample according to the constraint condition. In addition, in order to alleviate the damage to the original data structure, the invention filters the samples according to the classification result of the clean samples, and by utilizing two types of constraints constructed from the angles of time and space distribution, the invention directly learns the influence of the attack method on the target network instead of the indirect influence related to the sample distribution, and the invention improves the generation of the countermeasure text to enhance the robustness and accuracy of DNN.
1. Re-thinking of the resistance training process
First, we re-examine the relationship between the resistance training and the standard training. Let θ sumRespectively a trainable parameter and a loss function of the DNN network f. Given input sample +.>And its associated label y, a class C dataset can be constructedWherein->In the context of natural training, many machine learning tasks can be formulated as optimization problems:
standard trainingThe main objective of (a) is to obtain a neural network with minimal risk of misclassification experience in the face of inputs from natural data distribution. However, standard trained target networks exhibit weak robustness, making them vulnerable to attack by the antagonistic example, which is documented in the relevant literature. Fight attacks adds a disturbance delta to the input that is imperceptible to humans i To deceive DNN, generate an antagonistic example x i +δ i The objectives are generally as follows:
wherein f θ (x i +δ i ) Representing the output of the network and,is a loss function. E is +.>Limitation of norms. { delta i ∈Δ:||δ i || p And E is less than or equal to, wherein p can be 1,2 and infinity.
Challenge training is an effective method to increase robustness by training the target network on a challenge example. The main learning goal of standard challenge training (AT) is to train the neural network to minimize the risk of misclassification of the input samples AT a predefined disturbance strength. Opponents introduce an opponent disturbance to each sample during the opponent training process, and data sets are obtainedTransition to-> To mitigate vulnerability of machine learning target networks to challenge attacks, conventional challenge trainingThe method is generally aimed at optimizing the following objective functions:
as described above, in the countermeasure training process proposed by the present invention, two types of constraints are involved: 1) From a time perspective, the difficulty of the sample for the challenge training needs to increase gradually as the training proceeds. 2) From the perspective of spatial distribution, the robustness contribution of each sample to the target network should be consistent, and the difficulty of different challenge samples in the same training phase should be consistent. Let h (x' i )∈[0,1]ρ and T e {0, 1..the, T } represent challenge samples x ', respectively' i =x i +δ i Is a threshold of difficulty in countering samples and training times, wherein 1/ρ is E [0,1]Our new expression for an AT can be defined as:
2. challenge training based on difficulty guidance
The flow chart and schematic diagram of the method proposed by the invention are shown in fig. 2 and 3. The invention comprises two key components: challenge sample generation based on difficulty guidance and challenge training based on robust convergence rules. These two parts cooperate with each other to improve the robustness of the target network against the resistance example. The first process generates a challenge example for each sample specific feature using a difficulty-based attack strategy, while the challenge training process iteratively updates the target network parameters and controls the overall difficulty of the sample, ensuring that the target network converges towards a robust boundary. In general, these components constitute a new framework that enhances the robustness of the target network against resistance attacks.
Specifically, in this embodiment, as shown in fig. 2 and 3, the challenge defense method based on the difficulty guiding variable attack strategy of the present invention includes the following steps:
step S1: initialization of
The maximum training number T is determined, the initial training number t=1.
Step S2: difficulty guided challenge sample generation
And generating an antagonism sample based on the difficulty guidance. The proposed sample-dependent attack strategy generator based on difficulty guidance generates different strategies for different samples according to the classification effect and robustness of the target network in different training phases. Is provided withRepresents x i Where M represents the number of parameters of the attack strategy, depending on the attack method used. In HGSD-AT, the difficulty of a sample is defined as the distance of the sample from the classification hyperplane. The process of adding the countermeasures to the disturbance aims at moving the original sample towards the classification boundary and beyond. Thus, we can observe that as the difficulty of the sample increases, so does the distance against the disturbance. Thus, we choose the parameters that have the strongest correlation with the distance traveled to construct the policy set.
Our method adjusts two key parameters (number of attack steps)And maximum disturbance strength e i ∈∈={0,1,...,K ∈ -1 }) to guide the generation of the resistance samples and to increase the robustness of the target network. The two parameters are chosen to be closely related to the difficulty of the sample and are included in most attack methods, wherein the two parameters are chosen to be K respectively I And K ∈ 。
To achieve the goal of equation (4), the relationship between the sample difficulty h (·) and the attack strategy S needs to be obtained by initially opposing the difficulty difference between the sample and the original sample. First, the greater the parameter value of the attack strategy, the further the distance between the challenge sample and the original sample. Second, the generation strategy for all these initial challenge samples is the same, as shown in FIG. 4. Thus, as the difficulty change is smaller, a longer distance is required to move to the target difficulty and the corresponding policy parameters are larger. For countermeasure samples far from the decision boundary, the difficulty of the samples needs to be increased; conversely, for samples close to the decision boundary, the difficulty needs to be reduced.
The direct use of the distance between the sample and the classification boundary to measure the difficulty is overly complex, since we do not need specific distance values. Instead, we can use the predicted probability value as a more practical measure. Next, we will explain in detail the process of acquiring a new challenge sample based on the predicted probability value. :
step S2.1: generating sample-dependent attack strategies
In the training data set, a batch of n images is taken, and for the ith image x i Image x under a given standard challenge i Initial attack strategy of (a)Generating initial challenge samples by standard challenge attacksWherein g (·) is the standard challenge sample generator, < >>For disturbance, θ is a trainable parameter of a target network, which is a deep neural network designed for image classification, ++>Representing the initial number of attack steps->Representing an initial maximum disturbance intensity;
then, the and image x is generated i Related attack strategiesWhen initially countering sample x i in Classification result of->Equal to clean sample, image x i Classification result of->Attack strategy->The method comprises the following steps:
when initially countering sample x i in Classification result of (2)Not equal to clean sample, image x i Classification result of->Attack strategy->The method comprises the following steps:
wherein I is i For attack step number, E i For maximum disturbance intensity, f θ () Predictive probability value, K, representing a trainable parameter θ for an input sample for a target network I Represents the selectable maximum attack step number, K in attack strategy ∈ The upper limit value of the selectable maximum disturbance intensity in the attack strategy is represented by a clip (·, min, max) which represents the value of the limiting variable · is [ min, max ]]Within the range ρ i As the difficulty threshold, the larger the difficulty threshold is, the larger the image difficulty is, according to the image x i Class loss function of (2)And (3) determining:
where β, γ is the scaling weight used to ensure the difficulty threshold ρ i The method meets the following conditions:
step S2.2: generating challenge samples
Sample correlation, image x i Is greater than the challenge sample x' i The generation process is as follows:
step S3: target network training
Image x i Inputting into a target network for classification to obtain an image x i Classification result of (2)If the classification result->Not equal to image x i True class value y of (2) i Then the sample x 'will be opposed' i Replaced by image x i About to fight sample x' i Discard image x i As a challenge sample x' i Then n challenge samples x 'are used' i The trainable parameter θ of the target network is updated.
Then, another batch of n images is taken out from the training data set, and n challenge samples x 'are generated according to the method of the steps S2.1 and S2.2' i And updating the trainable parameter theta of the target network, so that the trainable parameter theta is continuously updated until all images in the training data set are taken out, and the sequential training is completed.
Step S3: judging whether the training times T is equal to the maximum training times T, if so, completing the training of the target network, otherwise, t=t+1, returning to the step S2, taking out the images in the training data set in batches again to generate the countermeasure sample and training the target network.
Experiment verification
To evaluate the effectiveness of the present invention, experiments were performed on three different databases in this experimental verification: CIFAR10, CIFAR100 and Tiny ImageNet. In the experimental section, PGD-AT stopped early was used as a basic model to verify the challenge defense approach of the present invention based on a difficulty guided variable attack strategy. The same training settings are used for the present invention and the base model, including data partitioning, target network training loss, and training parameter settings. The present invention (HGSD-AT) was compared to several baseline methods, such as PGDAT, TRADES, SAT, MART, CAT, DART, FAT, GAIRAT, AWP, LBGAT and LASAT. Furthermore, the present invention (HGSD-AT) was also compared to two comprehensive defense approaches combining LASAT with two existing representative approaches. This allows an assessment of the effectiveness of the challenge defense method of the present invention based on the difficulty guided variable attack strategy against a combination of various prior art techniques. In addition, in the experimental verification, methods of CAT, DART, FAT, LASAT and the like are selected, and different attack strategies are used in different training phases for more targeted comparison and analysis. Comparing the sample correlation method of the invention with the methods unrelated to the samples further proves that the invention utilizes the sample correlation concept to eliminate the influence of data distribution on the countermeasure training, so that the target network can learn the direct influence of the attack method on the target network.
In order to reflect the overall improvement of the test items, in the experimental verification, the classification result is evaluated by adopting the test accuracy. It represents the proportion of the target network that is correctly predicted in all predictions. In this experimental verification, a variety of challenge-resistant techniques, such as FGSM, PGD, C & W, were selected to test the trained target network. The maximum disturbance intensity is set to 8, the attack step length is set to 2 for all attack methods, the attack step number is set to 20 under the L-infinity norm, standard setting of countermeasure training is followed, and the cleaning accuracy and the robust accuracy are used as evaluation indexes. To evaluate the robustness against the target network, a normalized score calculated using a white-box attack set is used, called the average robustness score (Average Robusmess Score, ARS). ARS measures the success rate of defense against a series of white-box attacks, the higher the value the better. The set of attack methods for ARS includes FGSM, PGD and C & W. In this experimental verification, the degree of classification accuracy degradation (D-degree) relative to the original non-robust target network was also calculated.
Experimental results
TABLE 1
TABLE 2
The experimental results of this experiment on CIFAR-10 and CIFAR-100 are shown in tables 1 and 2. In addition, since the comparison methods are all based on PGD-AT improvements, in the present experimental verification, the results of AT were regarded as a benchmark, and differences in test accuracy (diff.) were reported. The invention (HGSD-AT) shows excellent performance in most attack scenarios, and solves the trade-off between accuracy and robustness. The invention (HGSD-AT) obtains the best robust performance under all attack scenes. For example, when the WRN34-10 is taken as a target network on the CIFAR-10, the robustness of the basic method PGD-AT is improved by about 20.36% and 20.11% by corresponding to the PGD and C & W attacks respectively. Meanwhile, compared with the original data, the accuracy is reduced by 3.65%, so that the network becomes the most accurate target network. Compared with the most advanced method AWP, the invention (HGSD-AT) also achieves excellent performance in terms of Average Robustness Score (ARS) which reaches 18.09%. We attribute these improvements to the use of attack strategies generated based on hardness guidance, rather than unguided attack strategies.
Furthermore, the robustness improvement in CIFAR-100 of the present invention (HGSD-AT) is more significant from the point of view of effectiveness improvement than in CIFAR-10, because CIFAR-100 has a higher category number and more complex attack scenarios. Specifically, the accuracy of the invention (HGSD-AT) in the face of PGD attack is improved by 9.64% and the accuracy in the face of C & W attack is improved by 12.08% on CIFAR-100 compared with the optimal method LBGAT in the comparison method. This shows that the present invention can successfully generate challenge samples that better reflect the attack strategy information.
LASAT is currently the most effective method compared to other existing defense methods that generate challenge samples based on non-fixed policies. However, the performance of the present invention (HGSD-AT) on both CIFAR-10 and CIFAR-100 datasets exceeds LASAT. Specifically, the robustness of HGSD-AT under PGD and C & W attacks was improved by 18.95% and 18.29% on CIFAR-10, respectively, and 11.62% and 11.61% on CIFAR-100, respectively, as compared to LASAT. We further compared the combination of the present invention (HGSD-AT) with LASAT with other methods (TRADES and AWP) using a fixed challenge sample generation strategy. The invention (HGSD-AT) is superior to TRADES+LASAT and AWP+LASAT, and the robustness score (ARS) on CIFAR-10 dataset is improved by 18.00% and 15.59%, respectively, and the robustness score on CIFAR-100 dataset is improved by 13.49% and 8.95%. The experimental result of the experimental verification clearly shows that the invention (HGSD-AT) is very effective in improving the robustness of the deep learning model, namely the target network, which is an important step for solving the attack vulnerability faced by the target network. The present invention (HGSD-AT) has the potential to improve the safety and reliability of deep learning methods by having superior performance compared to other existing defense methods.
TABLE 3 Table 3
This experiment verifies that the present invention (HGSD-AT) was also evaluated AT Tiny ImageNet, using PreactRENet 18 as the target network. Since Tiny ImageNet has more categories than CIFAR-10 and CIFAR-100, defenses against resistant samples become more challenging. To evaluate the effectiveness of the present invention (HGSD-AT), the present experiment verifies that the test was performed on four reference models and the results were compared to the previous, most advanced methods. The results obtained are shown in Table 3. Notably, the present invention (HGSD-AT) demonstrates improvements to clean data and robustness against challenge on all four reference models, indicating its effectiveness in enhancing the robustness of the model against challenge samples on challenging data sets.
The invention (HGSD-AT) achieves a significant improvement in robustness, 7.47% and 8.93% increase compared to PGT-AT and TRADES in the face of C & W attack. Furthermore, it increases by more than a fifth over the other two approaches in the face of C & W attacks. The present invention (HGSD-AT) still exhibits superior performance under other challenge-resistance approaches compared to existing approaches. For example, when FGSM and PGD attacks are used, its performance exceeds 12.08% and 10.33% for TRADES, respectively. Results on Tiny ImageNet verify that HGSD-AT also has achieved promising results on high quality image datasets and multiple categories.
While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.
Claims (1)
1. The challenge defense method based on the difficulty guiding variable attack strategy is characterized by comprising the following steps:
(1) Determining the maximum training times T, wherein the initial training times t=1
(2) Difficulty guided challenge sample generation
2.1 Generating a sample-dependent attack strategy)
In the training data set, a batch of n images is taken, and for the ith image x i Image x under a given standard challenge i Initial attack strategy of (a)Generating initial challenge samples by standard challenge attacksWherein g (·) is the standard challenge sample generator, < >>For disturbance, θ is a trainable parameter of a target network, which is a deep neural network designed for image classification, ++>Representing the initial number of attack steps->Representing an initial maximum disturbance intensity;
then, the and image x is generated i Related toAttack strategyWhen initially countering sample x i in Classification result of->Equal to clean sample, image x i Classification result of->Attack strategy->The method comprises the following steps:
when initially countering sample x i in Classification result of (2)Not equal to clean sample, image x i Classification result of->Attack strategy->The method comprises the following steps:
wherein I is i For attack step number, E i For maximum disturbance intensity, f θ () Predictive probability value, K, representing a trainable parameter θ for an input sample for a target network I Represents the selectable maximum attack step number, K in attack strategy ∈ The upper limit value of the selectable maximum disturbance intensity in the attack strategy is represented by a clip (·, min, max) which represents the value of the limiting variable · is [ min, max ]]Within the range ρ i As the difficulty threshold, the larger the difficulty threshold is, the larger the image difficulty is, according to the image x i Class loss function of (2)And (3) determining:
where β, γ is the scaling weight used to ensure the difficulty threshold ρ i The method meets the following conditions:
2.2 Generating an challenge sample)
Sample correlation, image x i Is greater than the challenge sample x' i The generation process is as follows:
(3) Training of target network
Image x i Inputting into a target network for classification to obtain an image x i Classification result of (2)If the classification result->Not equal to image x i True class value y of (2) i Then the sample x 'will be opposed' i Replaced by image x i About to fight sample x' i Discard image x i As a challenge sample x' i Then n challenge samples x 'are used' i Updating a trainable parameter theta of the target network;
then, another batch of n images is taken from the training data set, and n challenge samples x 'are generated according to the methods of the steps 2.1) and 2.2)' i And updating the trainable parameter theta of the target network, so that the target network is updated continuously until all images in the training data set are taken out, and the sequential training is completed;
(4) And judging whether the training times T is equal to the maximum training times T, if so, finishing the training of the target network, otherwise, t=t+1, returning to the step (2), taking out the images in the training data set in batches again to generate countermeasure samples and training the target network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310831043.3A CN116883736A (en) | 2023-07-07 | 2023-07-07 | Challenge defense method based on difficulty guiding variable attack strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310831043.3A CN116883736A (en) | 2023-07-07 | 2023-07-07 | Challenge defense method based on difficulty guiding variable attack strategy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116883736A true CN116883736A (en) | 2023-10-13 |
Family
ID=88265602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310831043.3A Pending CN116883736A (en) | 2023-07-07 | 2023-07-07 | Challenge defense method based on difficulty guiding variable attack strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116883736A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117197589A (en) * | 2023-11-03 | 2023-12-08 | 武汉大学 | Target classification model countermeasure training method and system |
-
2023
- 2023-07-07 CN CN202310831043.3A patent/CN116883736A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117197589A (en) * | 2023-11-03 | 2023-12-08 | 武汉大学 | Target classification model countermeasure training method and system |
CN117197589B (en) * | 2023-11-03 | 2024-01-30 | 武汉大学 | Target classification model countermeasure training method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bhagoji et al. | Analyzing federated learning through an adversarial lens | |
CN112364915B (en) | Imperceptible countermeasure patch generation method and application | |
Tong et al. | Adversarial regression with multiple learners | |
CN116883736A (en) | Challenge defense method based on difficulty guiding variable attack strategy | |
Qinqing et al. | Image enhancement technique based on improved PSO algorithm | |
Huai et al. | Deep Metric Learning: The Generalization Analysis and an Adaptive Algorithm. | |
Zeng et al. | Are adversarial examples created equal? a learnable weighted minimax risk for robustness under non-uniform attacks | |
CN113704758A (en) | Black box attack counterattack sample generation method and system | |
CN113988293A (en) | Method for generating network by antagonism of different hierarchy function combination | |
CN117940936A (en) | Method and apparatus for evaluating robustness against | |
Dong et al. | Towards intrinsic adversarial robustness through probabilistic training | |
Li et al. | Efficient computation of discounted asymmetric information zero-sum stochastic games | |
CN117313792A (en) | Flexible job shop scheduling strategy training method based on deep reinforcement learning | |
Pan et al. | Gradmdm: Adversarial attack on dynamic networks | |
JP2021022316A (en) | Learning device, learning method, and learning program | |
CN115510986A (en) | Countermeasure sample generation method based on AdvGAN | |
CN112115969B (en) | Method and device for optimizing FKNN model parameters based on variant sea squirt swarm algorithm | |
Sun et al. | Generative adversarial networks unlearning | |
Ye et al. | Reinforcement unlearning | |
Kushida et al. | Generation of adversarial examples using adaptive differential evolution | |
Zhang et al. | Oblivion: Poisoning Federated Learning by Inducing Catastrophic Forgetting | |
Mukeri et al. | Towards Query Efficient and Derivative Free Black Box Adversarial Machine Learning Attack | |
Kuurila-Zhang et al. | Adaptive adversarial norm space for efficient adversarial training | |
Huster et al. | Towards the development of robust deep neural networks in adversarial settings | |
Ingle et al. | Enhancing Adversarial Defense in Neural Networks by Combining Feature Masking and Gradient Manipulation on the MNIST Dataset. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |