CN113111349A - Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning - Google Patents
Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning Download PDFInfo
- Publication number
- CN113111349A CN113111349A CN202110449810.5A CN202110449810A CN113111349A CN 113111349 A CN113111349 A CN 113111349A CN 202110449810 A CN202110449810 A CN 202110449810A CN 113111349 A CN113111349 A CN 113111349A
- Authority
- CN
- China
- Prior art keywords
- target
- neural network
- pruning
- model
- backdoor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013138 pruning Methods 0.000 title claims abstract description 68
- 238000010586 diagram Methods 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000007123 defense Effects 0.000 title claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 claims abstract description 57
- 210000002569 neuron Anatomy 0.000 claims abstract description 51
- 230000004913 activation Effects 0.000 claims abstract description 28
- 230000001419 dependent effect Effects 0.000 claims abstract description 8
- 238000013450 outlier detection Methods 0.000 claims abstract description 7
- 238000012216 screening Methods 0.000 claims abstract description 6
- 238000001514 detection method Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 abstract description 5
- 239000011159 matrix material Substances 0.000 description 7
- 238000012549 training Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013079 data visualisation Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005059 dormancy Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012946 outsourcing Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000009966 trimming Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Virology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning, and relates to the field of neural network model pruning. The method comprises the steps of determining a back door trigger of each category in a target neural network, comparing an L1 paradigm of the back door triggers through an outlier detection point algorithm, and determining a target label as target label data; performing model inversion on the category of the target neural network to calculate a corresponding data set, drawing a thermodynamic diagram according to the data set, and determining the optimal position of the rear door trigger according to the thermodynamic diagram; sequentially inputting target label data to the target neural network, and screening target neurons in the target neural network according to the weight and the activation value of the input target label data; and performing model pruning on the target neural network according to the target neuron. The invention can effectively defend the backdoor attack based on the random trigger and the model-dependent trigger.
Description
Technical Field
The invention relates to the field of neural network model pruning, in particular to a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning.
Background
Recently, a new attack mode, Backdoor Attacks (Backdoor Attacks), has been attracting attention. After an attacker trains the model with malicious data with a Backdoor Trigger (Backdoor Trigger), the model will be injected into the Backdoor. The back door model can correctly classify all benign data, but misclassification will occur when data with back door triggers is input. The backdoor attack has extremely strong concealment, which brings great challenge to attack detection and brings little risk to resource-limited users who need outsourcing training process.
Thermodynamic diagrams (also known as correlation graphs) are the main methods for data visualization. The user can judge the correlation between the variables according to the correlation coefficient corresponding to different square colors in the graph. The larger the correlation coefficient, the higher the degree of linear correlation between the variables; the smaller the correlation coefficient, the lower the degree of linear correlation between the variables. An attacker can generate a thermodynamic diagram according to the target class data and then determine the optimal position of the back door trigger, so that a better attack effect is achieved.
Reverse engineering (also known as reverse engineering) can reverse the input from known facts and conclusions through reverse analysis techniques. For an attacker, reverse engineering can help the attacker to analyze relevant information of a target model or a training data set, so that the purpose of stealing the model or data is achieved; for defenders, reverse engineering can help the defenders to predict the behavior of attackers and carry out corresponding defense design. In the deep neural network backdoor attack, a defender can utilize the thinking of reverse engineering to reversely deduce a possible backdoor trigger by observing an output value and an activation value, thereby assisting the formulation of a defense scheme.
Model pruning is a method for model compression, which means to prune relatively unimportant neurons in a neural network to obtain a lighter-weight network on the premise of ensuring the complete function. Currently there are two main pruning methods. The first is to use the weight of the neuron to judge the importance of the neuron, and the neuron with smaller weight usually has smaller contribution to the neural network, so that the neuron with smaller weight at each layer can be cut. The second is to use the magnitude of the neuron activation value to judge the importance of the neuron, and the neuron with smaller activation value tends to contribute less to the neural network, so that the neuron with smaller activation value at each layer can be pruned to achieve the pruning purpose.
Studies have shown that normal data normally puts the metaportal implant-contaminated neurons in a state of dormancy, and only data with metaportal triggers can strongly activate them. Therefore, the defense of the backdoor attack can be realized by pruning some dormant neurons through model pruning operation, namely inputting normal data, pruning from low activation value to high activation value, namely implanting polluted neurons in the backdoor, and stopping until the accuracy of test data is lower than a threshold value. However, such an activation-value-based pruning strategy is limited to backdoor attacks against random triggers, and does not work with model-dependent triggers (selecting a neuron generation trigger with a large weight in a model, selecting a high-activation-value neuron generation trigger under normal data input, and the like).
Disclosure of Invention
Based on the defects of the prior art, the invention provides the backdoor attack method for defending the deep neural network based on the thermodynamic diagram, the reverse engineering and the model pruning technology, solves the problem of model-dependent backdoor attack which is difficult to defend in the traditional scheme, and fills the blank in relevant aspects.
In order to achieve the purpose, the invention adopts the following technical scheme:
a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning comprises the following steps:
determining a back door trigger of each category in a target neural network, comparing an L1 paradigm of the back door triggers through an outlier detection point algorithm, and determining a target label as target label data;
performing model inversion on the category of the target neural network to calculate a corresponding data set, wherein the data set is a training set of the target neural network, drawing a thermodynamic diagram according to the data set, and determining the optimal position of the back door trigger according to the thermodynamic diagram;
sequentially inputting target label data to the target neural network, and screening target neurons in the target neural network according to the weight and the activation value of the target label data;
and performing model pruning on the target neural network according to the target neuron.
Preferably, the target tag is determined according to reverse engineering.
Preferably, the method further comprises the steps of carrying out attack success rate detection and model accuracy rate detection on the target neural network, and completing pruning on the target neural network through multiple times of debugging.
Preferably, the rear door trigger should satisfy the following formula:
refers to the minimum perturbation, | T, required to classify any benign data with the back-door trigger into a target classt| is the size of the back door trigger;refers to the perturbation required to classify an uninfected tag into the correct class.
Preferably, the outlier detection point algorithm satisfies the following three formulas:
bi=|ai-a0.5|;
M=b0.5;
in the above formula ai(0. ltoreq. i. ltoreq.n) denotes a certain flip-flop ΔiBenign data, a, input while reversing0.5Refers to the median of the benign data; calculate each aiCorresponding data point and median a0.5Absolute deviation b ofiAnd the median of these absolute deviations is denoted as M; δ represents the anomaly index of the data point.
Preferably, the back door trigger includes a random type and a model-dependent type.
Preferably, a verification data set D without maliciousness is input into the target neural networkvalidAnd recording the calculation result; after each round of pruning is finished, the D is put againvalidInputting the pruned deep neural network and obtaining a result Di (i is 1,2,3 …), and recording Di and DvalidSimilarity of (c); the higher the similarity, the better the prediction accuracy of the pruned target neural network is proven.
Preferably, for the attack success rate of the backdoor attack after pruning, after a round of pruning is performed, the success rate of misclassification of the model after the data with the backdoor trigger is pruned is tested, namely the attack success rate.
According to the technical scheme, compared with the prior art, the invention discloses a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning, and the problem that a traditional method based on activation value pruning cannot defend backdoor attacks based on model-dependent triggers is well solved. And in addition, because a potential backdoor trigger is generated reversely before model pruning, the search range of malicious neurons is reduced, and the pruning efficiency is effectively improved. The accuracy rate of the model and the attack success rate are synchronously detected in the pruning process, and proper local fine adjustment is carried out, so that the accuracy rate of the model is ensured while the attack success rate is reduced.
Through detection, the method can effectively defend backdoor attacks based on random triggers and model-dependent triggers, and does not affect the usability of the models.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a flow diagram illustrating recovery of a back door trigger and a target tag;
FIG. 3 is a flow chart illustrating the determination of the optimal position of the back door trigger;
FIG. 4 is a flow chart illustrating the screening of a backdoor model for possible presence of malicious neurons;
FIG. 5 is a flow chart of back door model pruning.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning, which comprises the following steps as shown in figures 1 and 2:
determining a back door trigger of each category in the target neural network, comparing an L1 paradigm of the back door triggers through an outlier detection point algorithm, and determining a target label as target label data;
performing model inversion on the category of the target neural network to calculate a corresponding data set, wherein the data set is a training set of the target neural network, drawing a thermodynamic diagram according to the data set, and determining the optimal position of a back door trigger according to the thermodynamic diagram;
sequentially inputting target label data to a target neural network, and screening target neurons in the target neural network according to the weight and the activation value of the input target label data;
and performing model pruning on the target neural network according to the target neuron.
In this embodiment, step 1: and (4) reversely deducing potential backdoor triggers aiming at each category of the target neural network through a reverse engineering technology, and finding a target label of backdoor attack.
The basic idea of a back-door attack is to let the target neural network classify non-target class data with back-door triggers into target classes by constructing back-door triggers. The back door trigger should satisfy two conditions: firstly, benign data with a backdoor trigger is classified into a target class, which is a basic appeal of an attacker; another is that the back door trigger should be as small as possible in order to avoid detection by a defender.
To achieve these two conditions, the back door trigger TtThe following two equations should be satisfied:
in the formula (1)Refers to the minimum perturbation, | T, required to classify any benign data with back-door triggers into target classestAnd | is the size of the back door trigger. The whole formula means that the size of the backgate trigger for classifying any input into a target class is not less than the size of the minimum disturbance, so that the success of misclassification attack is achieved.
In the formula (2)Refers to the perturbation required to classify an uninfected tag into the correct class. The back door trigger should be much smaller in size than it to circumvent the detection of an defender.
The specific steps of step 1 are as follows:
the general form of the trigger injection is first defined:
A(x,m,Δ)=x′ (3);
x′i,j,c=(1-mi,j)·xi,j,c+mi,j·Δi,j (4);
a (-) in equation (3) represents the function that applies the trigger to the original image x. Δ is the trigger pattern, representing a three-dimensional matrix of pixel color intensities having the same size as the input image. m is a two-dimensional matrix that determines the number of original image pixels that the trigger can cover. Equation (4) explains m further: when m of a specific pixel (i, j)i,jWhen 1, the trigger will completely cover the original color; when m isi,jWhen 0, the original color is not modified. Δ and m together determine the shape of the trigger. Potential target tag y for a given backdoor attacktThe trigger has to classify any input into the target label ytYet small enough to avoid detection. In addition, we use the L1 norm of m to determine the size of the back-gate flip-flop.
The objective function for generating the back door trigger follows:
f (-) in equation (5) is the prediction function of the target neural network. l is a loss function that measures the classification error, i.e., the cross entropy in our experiment. X is a benign dataset that is accessible. λ is a weight parameter that controls the size of the back door trigger. A smaller lambda will increase the size of the back-gate flip-flop and also increase the success rate of misclassification. In experiments, we dynamically adjusted λ during the optimization process using an Adam optimizer to ensure that > 99% of benign images can be successfully misclassified.
After the above optimization process we find that any other input is misclassified to the target label ytThe required back door trigger and its L1 paradigm. By doing this for all other tags in the target neural network, we have N potential back-gate triggers Δ for a neural network with N tagsi(0. ltoreq. i.ltoreq.n) and their L1 paradigm.
Then, the back-gate trigger Δ with the smallest L1 pattern among these back-gate triggers is found by an outlier detection algorithm0This is the back gate trigger used by the attacker, and the trigger Δ0The corresponding tag is the target tag of the attacker. The algorithm satisfies the following three formulas:
bi=|ai-a0.5|
M=b0.5
in the above formula ai(0. ltoreq. i. ltoreq.n) denotes a certain flip-flop ΔiBenign data, a, input while reversing0.5Refers to the median of these data. First, calculate each aiCorresponding data point and median a0.5Absolute deviation b ofiAnd the median of these absolute deviations is denoted as M. Delta denotes the abnormality index of the data point, which can be represented by biDivided by M. When the assumed potential distribution is normal, the abnormal index is normalized by using a constant value, the data points with the abnormal index larger than a certain threshold value are likely to be discrete values, and the label corresponding to the abnormal index is likely to be polluted. The smaller the value of M, the more likely the corresponding tag is to be contaminated.
And performing the calculation on each label to obtain the M value of each label, wherein the label with the lower M value is the suspected label. The suspect tags are further examined and the value of M is calculated again, wherein the value of M is significantly lower than that of other tags, namely the target tags.
Step 2: for neural network object class ytPerforming model inversion to obtain data set, and drawing thermodynamic diagram to determine ytIs the back door trigger delta0The optimum position of (a).
First, for the target label y of the deep neural network modeltAnd carrying out model inversion to obtain a related data set. The rendering of the thermodynamic diagram can then be done with some tools, such as the heatmap function in the seaborn module of python, to achieve data visualization. With reference to fig. 3, the details of the implementation are as follows:
and calculating a correlation relation matrix of the data set obtained by inversion by using corercoef of numpy, and transmitting the correlation relation matrix into a heatmap function as a data parameter. And setting matrix block color parameters of the heatmap function, matrix block annotation parameters, matrix block interval and interval line parameters and other parameters, and completing the drawing of the thermodynamic diagram. The position with the highest color is the position with the object class ytIs also the backdoor trigger delta0The optimum position of (a). When delta0When the position is located, the attack success rate is the largest, the difference from benign data is the largest, and the step 3 is more favorably carried out.
And step 3: and (3) screening out the malicious neurons which are possibly polluted in the target network by inputting benign data and the malicious data designed in the step (2) into the neural network.
To inject backgates, attackers typically use malicious data with backgate triggers (including random and model-dependent) to train deep neural networks, resulting in some neurons being contaminated by backgate triggers. These malicious neurons will be strongly activated in the presence of a back-gate trigger, appearing as a higher activation value or a higher weight, and a lower weight and activation value when inputting benign data.
Based on the above analysis, when we input a large number of samples with back-gate triggers and a large number of samples of benign data into the deep neural network, the weight difference or activation difference between the contaminated neurons in the back-gate model when normal samples are input and the neurons when normal samples are input will be significantly larger than those when normal samples are input. Therefore, the weight difference or the activation value difference of the neurons under the two input conditions can be detected, so that a series of neuron sets possibly polluted by attackers can be found.
As shown in fig. 4, the specific implementation details of step 3 are as follows:
first, data with a back-gate trigger is input into the deep neural network, and the activation value and weight value of each neuron in the first full link layer (the neuron of this layer is usually activated by the back-gate trigger) are calculated. For the activation value of neuron, assuming that the first layer of the neural network is composed of K neurons, the activation value of the nth neuron of the first layerCan be obtained from the following equation:
wherein philIs the activation function of the l-th layer,is the degree of activation of the neurons of the previous layer,is the weight of the neuron in the previous layer,to correct for the deviation.
Then, benign data without backdoor triggers is input into the deep neural network, and the activation value of each neuron in the first full link layer is calculated by the formulaAnd weight value
And finally, subtracting the activation value and the weight value obtained under the condition of inputting normal data from the activation value and the weight value obtained under the condition of inputting data with a back door trigger to obtain an activation value difference value delta a and a weight value difference value delta w. The two data sets are formed by sorting the activation value difference value delta a and the weight value difference value delta w from high to low, and the neurons which are ranked at the top in the two data sets are more likely to be malicious neurons.
And 4, step 4: and performing model pruning on the deep neural network with the backdoor.
In the last step, a target neuron set ordered in the order from high to low of the activation value difference Δ a and the weight value difference Δ w is obtained, and then final pruning is performed. Model pruning may cause loss of neural network precision, so detection is performed after each pruning operation, and the detection standards are two, namely prediction accuracy of the pruned model and attack success rate of backdoor attacks after pruning.
For the prediction accuracy of the model after pruning, a control group is set firstly, namely a verification data set D without malice is input into an original deep neural networkvalidAnd recording the calculation result. After each round of pruning is finished, the D is put againvalidInputting the pruned deep neural network and obtaining a result Di (i is 1,2,3 …), and recording Di and DvalidThe similarity of (c). The higher the similarity is, the better the prediction accuracy of the pruned deep neural network is proved to be. During pruning, this value should be ensured at a higher level. If the pruning rate is lower than a preset threshold (determined by a user according to the situation, generally not lower than 97%), a rollback operation is performed, and the pruning of the corresponding neuron is cancelled.
For the attack success rate of the backdoor attack after pruning, after one round of pruning is carried out, the success rate of the misclassification of the model after the data with the backdoor trigger is pruned is tested, namely the attack success rate. The larger the attack success rate is reduced, the more accurate pruning is.
With reference to fig. 5, the step 4 is implemented as follows:
a series of neurons sorted according to the weight value difference obtained in the step 3And a series of neurons ordered by activation value differencePruning operation is carried out in turn. And when one neuron is pruned, the prediction accuracy and attack success rate of the model after pruning are retested.
Let QiRepresenting the prediction accuracy of the deep neural network model obtained after the ith round of operation, SiShowing the success rate of the attack after the ith round of operation. If the success rate of the attack SiWithout significant reduction, i.e. Si-1Is approximately equal to SiThe pruning in the round is indicated as invalid pruning or the model accuracy QiIs significantly reduced, i.e. QiIf the number of the trimmed neurons is less than the defense requirement threshold (here, 97%), which indicates that trimming the neurons is not beneficial to defending against the attack of malicious data, a rollback operation is performed to recover the neurons, the trimmed neurons are recovered into a target neural network, and the (i + 1) th round of pruning is performed. If the ith round of pruning operation is finished and the backspacing condition is not met, the attack success rate S is obtainediReduced and model prediction accuracy QiEssentially unchanged, the pruning of the neuron may defend against backdoor attacks. And after each round of pruning is finished, if the back-off is not needed, carrying out local fine adjustment on the cut neural network by using a least square method, and after partial model precision is recovered, carrying out the (i + 1) th round of pruning. After n rounds of pruning, if the success rate of attack is SnModel accuracy Q on the validation data set below a predetermined threshold (here 10%)nAbove the defensive requirement (the threshold is determined by the user according to the defensive requirement), the preset defensive effect is considered to be achieved, and the pruning operation is finished.
After pruning is finished, the neural network after training and pruning is initialized by using the pre-trained neural network weight, and fine tuning is performed again. The classification accuracy degradation of benign input caused by pruning can be recovered (at least partially) by fine-tuning. After fine tuning, the final benign neural network model is obtained.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (8)
1. A backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning is characterized by comprising the following steps:
determining a back door trigger of each category in a target neural network, comparing an L1 paradigm of the back door triggers through an outlier detection point algorithm, and determining a target label as target label data;
performing model inversion on the category of the target neural network to calculate a corresponding data set, drawing a thermodynamic diagram according to the data set, and determining the optimal position of the rear door trigger according to the thermodynamic diagram;
sequentially inputting target label data to the target neural network, and screening target neurons in the target neural network according to the weight and the activation value of the target label data;
and performing model pruning on the target neural network according to the target neuron.
2. The method of claim 1, wherein the target tag is determined according to reverse engineering.
3. The backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning according to claim 1, further comprising performing attack success rate detection and model accuracy rate detection on the target neural network, and completing pruning on the target neural network through multiple debugging.
4. The method for defending against backdoor attacks based on thermodynamic diagrams, reverse engineering and model pruning according to claim 1, wherein the backdoor trigger should satisfy the following formula:
5. The method for defending against backdoor attacks based on thermodynamic diagrams, reverse engineering and model pruning according to claim 1, wherein the outlier detection point algorithm satisfies the following three formulas:
bi=|ai-a0.5|;
M=b0.5;
in the above formula ai(0. ltoreq. i. ltoreq.n) representsA certain flip-flop deltaiBenign data, a, input while reversing0.5Refers to the median of the benign data; calculate each aiCorresponding data point and median a0.5Absolute deviation b ofiAnd the median of these absolute deviations is denoted as M; δ represents the anomaly index of the data point.
6. A backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning according to claim 1, wherein the backdoor trigger comprises a random type and a model-dependent type.
7. The method of claim 1, wherein a verification data set D without malicious intent is inputted into the target neural networkvalidAnd recording the calculation result; after each round of pruning is finished, the D is put againvalidInputting the pruned deep neural network and obtaining a result Di (i is 1,2,3 …), and recording Di and DvalidThe similarity of (c).
8. The backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning according to claim 1, characterized in that, for the attack success rate of the backdoor attack after pruning, after a round of pruning is performed, the success rate of misclassification of the data with backdoor triggers by the model after pruning is tested.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110449810.5A CN113111349B (en) | 2021-04-25 | 2021-04-25 | Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110449810.5A CN113111349B (en) | 2021-04-25 | 2021-04-25 | Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113111349A true CN113111349A (en) | 2021-07-13 |
CN113111349B CN113111349B (en) | 2022-04-29 |
Family
ID=76720004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110449810.5A Active CN113111349B (en) | 2021-04-25 | 2021-04-25 | Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113111349B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113609482A (en) * | 2021-07-14 | 2021-11-05 | 中国科学院信息工程研究所 | Back door detection and restoration method and system for image classification model |
CN114048466A (en) * | 2021-10-28 | 2022-02-15 | 西北大学 | Neural network backdoor attack defense method based on YOLO-V3 algorithm |
CN114610885A (en) * | 2022-03-09 | 2022-06-10 | 江南大学 | Text classification backdoor attack method, system and equipment |
CN114074390B (en) * | 2021-11-23 | 2024-04-26 | 苏州博宇科技有限公司 | Machining system and method for automation of plastic mold electrode |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108596336A (en) * | 2018-04-24 | 2018-09-28 | 清华大学 | For the software and hardware combined attack method and device of neural network |
CN111242291A (en) * | 2020-04-24 | 2020-06-05 | 支付宝(杭州)信息技术有限公司 | Neural network backdoor attack detection method and device and electronic equipment |
CN111260059A (en) * | 2020-01-23 | 2020-06-09 | 复旦大学 | Backdoor attack method of video analysis neural network model |
US20200387608A1 (en) * | 2019-05-29 | 2020-12-10 | Anomalee Inc. | Post-Training Detection and Identification of Human-Imperceptible Backdoor-Poisoning Attacks |
US20200410098A1 (en) * | 2019-06-26 | 2020-12-31 | Hrl Laboratories, Llc | System and method for detecting backdoor attacks in convolutional neural networks |
CN112182576A (en) * | 2020-10-14 | 2021-01-05 | 桂林电子科技大学 | Virus-putting attack method based on feature collision in deep learning |
-
2021
- 2021-04-25 CN CN202110449810.5A patent/CN113111349B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108596336A (en) * | 2018-04-24 | 2018-09-28 | 清华大学 | For the software and hardware combined attack method and device of neural network |
US20200387608A1 (en) * | 2019-05-29 | 2020-12-10 | Anomalee Inc. | Post-Training Detection and Identification of Human-Imperceptible Backdoor-Poisoning Attacks |
US20200410098A1 (en) * | 2019-06-26 | 2020-12-31 | Hrl Laboratories, Llc | System and method for detecting backdoor attacks in convolutional neural networks |
CN111260059A (en) * | 2020-01-23 | 2020-06-09 | 复旦大学 | Backdoor attack method of video analysis neural network model |
CN111242291A (en) * | 2020-04-24 | 2020-06-05 | 支付宝(杭州)信息技术有限公司 | Neural network backdoor attack detection method and device and electronic equipment |
CN112182576A (en) * | 2020-10-14 | 2021-01-05 | 桂林电子科技大学 | Virus-putting attack method based on feature collision in deep learning |
Non-Patent Citations (1)
Title |
---|
YANJIAO CHEN ET AL.: "Backdoor Attacks and Defenses for Deep Neural Networks in Outsourced Cloud Environments", 《 IEEE NETWORK》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113609482A (en) * | 2021-07-14 | 2021-11-05 | 中国科学院信息工程研究所 | Back door detection and restoration method and system for image classification model |
CN113609482B (en) * | 2021-07-14 | 2023-10-17 | 中国科学院信息工程研究所 | Back door detection and restoration method and system for image classification model |
CN114048466A (en) * | 2021-10-28 | 2022-02-15 | 西北大学 | Neural network backdoor attack defense method based on YOLO-V3 algorithm |
CN114048466B (en) * | 2021-10-28 | 2024-03-26 | 西北大学 | Neural network back door attack defense method based on YOLO-V3 algorithm |
CN114074390B (en) * | 2021-11-23 | 2024-04-26 | 苏州博宇科技有限公司 | Machining system and method for automation of plastic mold electrode |
CN114610885A (en) * | 2022-03-09 | 2022-06-10 | 江南大学 | Text classification backdoor attack method, system and equipment |
CN114610885B (en) * | 2022-03-09 | 2022-11-08 | 江南大学 | Text classification backdoor attack method, system and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113111349B (en) | 2022-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113111349A (en) | Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning | |
CN111753881B (en) | Concept sensitivity-based quantitative recognition defending method against attacks | |
Singla et al. | Second-order provable defenses against adversarial attacks | |
US20150134578A1 (en) | Discriminator, discrimination program, and discrimination method | |
CN112365005B (en) | Federal learning poisoning detection method based on neuron distribution characteristics | |
CN115186816B (en) | Back door detection method based on decision shortcut search | |
CN109242223A (en) | The quantum support vector machines of city Public Buildings Fire Risk is assessed and prediction technique | |
CN110874471B (en) | Privacy and safety protection neural network model training method and device | |
Wang et al. | ADDITION: Detecting adversarial examples with image-dependent noise reduction | |
CN116757273A (en) | An improved method to combat perturbation backdoor attacks | |
Vaddadi et al. | An efficient convolutional neural network for adversarial training against adversarial attack | |
CN109272036B (en) | Random fern target tracking method based on depth residual error network | |
Ma et al. | Releasing malevolence from benevolence: The menace of benign data on machine unlearning | |
Zhang et al. | Defending against backdoor attack on deep neural networks based on multi-scale inactivation | |
CN113010888A (en) | Neural network backdoor attack defense method based on key neurons | |
US20230259658A1 (en) | Device and method for determining adversarial patches for a machine learning system | |
Li et al. | Backdoor mitigation by correcting the distribution of neural activations | |
CN110796237B (en) | Method and device for detecting attack resistance of deep neural network | |
Gala et al. | Evaluating the effectiveness of attacks and defenses on machine learning through adversarial samples | |
Abdukhamidov et al. | Singleadv: single-class target-specific attack against interpretable deep learning systems | |
Mukeri et al. | Towards query efficient and derivative free black box adversarial machine learning attack | |
Zhao et al. | Uma: Facilitating backdoor scanning via unlearning-based model ablation | |
Dunnett et al. | Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies | |
Halim et al. | Automated Adversarial-Attack Removal with SafetyNet Using ADGIT | |
Shen et al. | Reimagining linear probing: Kolmogorov-arnold networks in transfer learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |