CN113111349A - Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning - Google Patents

Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning Download PDF

Info

Publication number
CN113111349A
CN113111349A CN202110449810.5A CN202110449810A CN113111349A CN 113111349 A CN113111349 A CN 113111349A CN 202110449810 A CN202110449810 A CN 202110449810A CN 113111349 A CN113111349 A CN 113111349A
Authority
CN
China
Prior art keywords
target
neural network
pruning
model
backdoor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110449810.5A
Other languages
Chinese (zh)
Other versions
CN113111349B (en
Inventor
陈艳姣
龚雪鸾
徐文渊
李晓媛
彭艺欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110449810.5A priority Critical patent/CN113111349B/en
Publication of CN113111349A publication Critical patent/CN113111349A/en
Application granted granted Critical
Publication of CN113111349B publication Critical patent/CN113111349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Virology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning, and relates to the field of neural network model pruning. The method comprises the steps of determining a back door trigger of each category in a target neural network, comparing an L1 paradigm of the back door triggers through an outlier detection point algorithm, and determining a target label as target label data; performing model inversion on the category of the target neural network to calculate a corresponding data set, drawing a thermodynamic diagram according to the data set, and determining the optimal position of the rear door trigger according to the thermodynamic diagram; sequentially inputting target label data to the target neural network, and screening target neurons in the target neural network according to the weight and the activation value of the input target label data; and performing model pruning on the target neural network according to the target neuron. The invention can effectively defend the backdoor attack based on the random trigger and the model-dependent trigger.

Description

Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning
Technical Field
The invention relates to the field of neural network model pruning, in particular to a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning.
Background
Recently, a new attack mode, Backdoor Attacks (Backdoor Attacks), has been attracting attention. After an attacker trains the model with malicious data with a Backdoor Trigger (Backdoor Trigger), the model will be injected into the Backdoor. The back door model can correctly classify all benign data, but misclassification will occur when data with back door triggers is input. The backdoor attack has extremely strong concealment, which brings great challenge to attack detection and brings little risk to resource-limited users who need outsourcing training process.
Thermodynamic diagrams (also known as correlation graphs) are the main methods for data visualization. The user can judge the correlation between the variables according to the correlation coefficient corresponding to different square colors in the graph. The larger the correlation coefficient, the higher the degree of linear correlation between the variables; the smaller the correlation coefficient, the lower the degree of linear correlation between the variables. An attacker can generate a thermodynamic diagram according to the target class data and then determine the optimal position of the back door trigger, so that a better attack effect is achieved.
Reverse engineering (also known as reverse engineering) can reverse the input from known facts and conclusions through reverse analysis techniques. For an attacker, reverse engineering can help the attacker to analyze relevant information of a target model or a training data set, so that the purpose of stealing the model or data is achieved; for defenders, reverse engineering can help the defenders to predict the behavior of attackers and carry out corresponding defense design. In the deep neural network backdoor attack, a defender can utilize the thinking of reverse engineering to reversely deduce a possible backdoor trigger by observing an output value and an activation value, thereby assisting the formulation of a defense scheme.
Model pruning is a method for model compression, which means to prune relatively unimportant neurons in a neural network to obtain a lighter-weight network on the premise of ensuring the complete function. Currently there are two main pruning methods. The first is to use the weight of the neuron to judge the importance of the neuron, and the neuron with smaller weight usually has smaller contribution to the neural network, so that the neuron with smaller weight at each layer can be cut. The second is to use the magnitude of the neuron activation value to judge the importance of the neuron, and the neuron with smaller activation value tends to contribute less to the neural network, so that the neuron with smaller activation value at each layer can be pruned to achieve the pruning purpose.
Studies have shown that normal data normally puts the metaportal implant-contaminated neurons in a state of dormancy, and only data with metaportal triggers can strongly activate them. Therefore, the defense of the backdoor attack can be realized by pruning some dormant neurons through model pruning operation, namely inputting normal data, pruning from low activation value to high activation value, namely implanting polluted neurons in the backdoor, and stopping until the accuracy of test data is lower than a threshold value. However, such an activation-value-based pruning strategy is limited to backdoor attacks against random triggers, and does not work with model-dependent triggers (selecting a neuron generation trigger with a large weight in a model, selecting a high-activation-value neuron generation trigger under normal data input, and the like).
Disclosure of Invention
Based on the defects of the prior art, the invention provides the backdoor attack method for defending the deep neural network based on the thermodynamic diagram, the reverse engineering and the model pruning technology, solves the problem of model-dependent backdoor attack which is difficult to defend in the traditional scheme, and fills the blank in relevant aspects.
In order to achieve the purpose, the invention adopts the following technical scheme:
a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning comprises the following steps:
determining a back door trigger of each category in a target neural network, comparing an L1 paradigm of the back door triggers through an outlier detection point algorithm, and determining a target label as target label data;
performing model inversion on the category of the target neural network to calculate a corresponding data set, wherein the data set is a training set of the target neural network, drawing a thermodynamic diagram according to the data set, and determining the optimal position of the back door trigger according to the thermodynamic diagram;
sequentially inputting target label data to the target neural network, and screening target neurons in the target neural network according to the weight and the activation value of the target label data;
and performing model pruning on the target neural network according to the target neuron.
Preferably, the target tag is determined according to reverse engineering.
Preferably, the method further comprises the steps of carrying out attack success rate detection and model accuracy rate detection on the target neural network, and completing pruning on the target neural network through multiple times of debugging.
Preferably, the rear door trigger should satisfy the following formula:
Figure BDA0003038229560000031
Figure BDA0003038229560000032
Figure BDA0003038229560000033
refers to the minimum perturbation, | T, required to classify any benign data with the back-door trigger into a target classt| is the size of the back door trigger;
Figure BDA0003038229560000034
refers to the perturbation required to classify an uninfected tag into the correct class.
Preferably, the outlier detection point algorithm satisfies the following three formulas:
bi=|ai-a0.5|;
M=b0.5
Figure BDA0003038229560000035
in the above formula ai(0. ltoreq. i. ltoreq.n) denotes a certain flip-flop ΔiBenign data, a, input while reversing0.5Refers to the median of the benign data; calculate each aiCorresponding data point and median a0.5Absolute deviation b ofiAnd the median of these absolute deviations is denoted as M; δ represents the anomaly index of the data point.
Preferably, the back door trigger includes a random type and a model-dependent type.
Preferably, a verification data set D without maliciousness is input into the target neural networkvalidAnd recording the calculation result; after each round of pruning is finished, the D is put againvalidInputting the pruned deep neural network and obtaining a result Di (i is 1,2,3 …), and recording Di and DvalidSimilarity of (c); the higher the similarity, the better the prediction accuracy of the pruned target neural network is proven.
Preferably, for the attack success rate of the backdoor attack after pruning, after a round of pruning is performed, the success rate of misclassification of the model after the data with the backdoor trigger is pruned is tested, namely the attack success rate.
According to the technical scheme, compared with the prior art, the invention discloses a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning, and the problem that a traditional method based on activation value pruning cannot defend backdoor attacks based on model-dependent triggers is well solved. And in addition, because a potential backdoor trigger is generated reversely before model pruning, the search range of malicious neurons is reduced, and the pruning efficiency is effectively improved. The accuracy rate of the model and the attack success rate are synchronously detected in the pruning process, and proper local fine adjustment is carried out, so that the accuracy rate of the model is ensured while the attack success rate is reduced.
Through detection, the method can effectively defend backdoor attacks based on random triggers and model-dependent triggers, and does not affect the usability of the models.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a flow diagram illustrating recovery of a back door trigger and a target tag;
FIG. 3 is a flow chart illustrating the determination of the optimal position of the back door trigger;
FIG. 4 is a flow chart illustrating the screening of a backdoor model for possible presence of malicious neurons;
FIG. 5 is a flow chart of back door model pruning.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning, which comprises the following steps as shown in figures 1 and 2:
determining a back door trigger of each category in the target neural network, comparing an L1 paradigm of the back door triggers through an outlier detection point algorithm, and determining a target label as target label data;
performing model inversion on the category of the target neural network to calculate a corresponding data set, wherein the data set is a training set of the target neural network, drawing a thermodynamic diagram according to the data set, and determining the optimal position of a back door trigger according to the thermodynamic diagram;
sequentially inputting target label data to a target neural network, and screening target neurons in the target neural network according to the weight and the activation value of the input target label data;
and performing model pruning on the target neural network according to the target neuron.
In this embodiment, step 1: and (4) reversely deducing potential backdoor triggers aiming at each category of the target neural network through a reverse engineering technology, and finding a target label of backdoor attack.
The basic idea of a back-door attack is to let the target neural network classify non-target class data with back-door triggers into target classes by constructing back-door triggers. The back door trigger should satisfy two conditions: firstly, benign data with a backdoor trigger is classified into a target class, which is a basic appeal of an attacker; another is that the back door trigger should be as small as possible in order to avoid detection by a defender.
To achieve these two conditions, the back door trigger TtThe following two equations should be satisfied:
Figure BDA0003038229560000051
Figure BDA0003038229560000061
in the formula (1)
Figure BDA0003038229560000062
Refers to the minimum perturbation, | T, required to classify any benign data with back-door triggers into target classestAnd | is the size of the back door trigger. The whole formula means that the size of the backgate trigger for classifying any input into a target class is not less than the size of the minimum disturbance, so that the success of misclassification attack is achieved.
In the formula (2)
Figure BDA0003038229560000063
Refers to the perturbation required to classify an uninfected tag into the correct class. The back door trigger should be much smaller in size than it to circumvent the detection of an defender.
The specific steps of step 1 are as follows:
the general form of the trigger injection is first defined:
A(x,m,Δ)=x′ (3);
x′i,j,c=(1-mi,j)·xi,j,c+mi,j·Δi,j (4);
a (-) in equation (3) represents the function that applies the trigger to the original image x. Δ is the trigger pattern, representing a three-dimensional matrix of pixel color intensities having the same size as the input image. m is a two-dimensional matrix that determines the number of original image pixels that the trigger can cover. Equation (4) explains m further: when m of a specific pixel (i, j)i,jWhen 1, the trigger will completely cover the original color; when m isi,jWhen 0, the original color is not modified. Δ and m together determine the shape of the trigger. Potential target tag y for a given backdoor attacktThe trigger has to classify any input into the target label ytYet small enough to avoid detection. In addition, we use the L1 norm of m to determine the size of the back-gate flip-flop.
The objective function for generating the back door trigger follows:
Figure BDA0003038229560000064
f (-) in equation (5) is the prediction function of the target neural network. l is a loss function that measures the classification error, i.e., the cross entropy in our experiment. X is a benign dataset that is accessible. λ is a weight parameter that controls the size of the back door trigger. A smaller lambda will increase the size of the back-gate flip-flop and also increase the success rate of misclassification. In experiments, we dynamically adjusted λ during the optimization process using an Adam optimizer to ensure that > 99% of benign images can be successfully misclassified.
After the above optimization process we find that any other input is misclassified to the target label ytThe required back door trigger and its L1 paradigm. By doing this for all other tags in the target neural network, we have N potential back-gate triggers Δ for a neural network with N tagsi(0. ltoreq. i.ltoreq.n) and their L1 paradigm.
Then, the back-gate trigger Δ with the smallest L1 pattern among these back-gate triggers is found by an outlier detection algorithm0This is the back gate trigger used by the attacker, and the trigger Δ0The corresponding tag is the target tag of the attacker. The algorithm satisfies the following three formulas:
bi=|ai-a0.5|
M=b0.5
Figure BDA0003038229560000071
in the above formula ai(0. ltoreq. i. ltoreq.n) denotes a certain flip-flop ΔiBenign data, a, input while reversing0.5Refers to the median of these data. First, calculate each aiCorresponding data point and median a0.5Absolute deviation b ofiAnd the median of these absolute deviations is denoted as M. Delta denotes the abnormality index of the data point, which can be represented by biDivided by M. When the assumed potential distribution is normal, the abnormal index is normalized by using a constant value, the data points with the abnormal index larger than a certain threshold value are likely to be discrete values, and the label corresponding to the abnormal index is likely to be polluted. The smaller the value of M, the more likely the corresponding tag is to be contaminated.
And performing the calculation on each label to obtain the M value of each label, wherein the label with the lower M value is the suspected label. The suspect tags are further examined and the value of M is calculated again, wherein the value of M is significantly lower than that of other tags, namely the target tags.
Step 2: for neural network object class ytPerforming model inversion to obtain data set, and drawing thermodynamic diagram to determine ytIs the back door trigger delta0The optimum position of (a).
First, for the target label y of the deep neural network modeltAnd carrying out model inversion to obtain a related data set. The rendering of the thermodynamic diagram can then be done with some tools, such as the heatmap function in the seaborn module of python, to achieve data visualization. With reference to fig. 3, the details of the implementation are as follows:
and calculating a correlation relation matrix of the data set obtained by inversion by using corercoef of numpy, and transmitting the correlation relation matrix into a heatmap function as a data parameter. And setting matrix block color parameters of the heatmap function, matrix block annotation parameters, matrix block interval and interval line parameters and other parameters, and completing the drawing of the thermodynamic diagram. The position with the highest color is the position with the object class ytIs also the backdoor trigger delta0The optimum position of (a). When delta0When the position is located, the attack success rate is the largest, the difference from benign data is the largest, and the step 3 is more favorably carried out.
And step 3: and (3) screening out the malicious neurons which are possibly polluted in the target network by inputting benign data and the malicious data designed in the step (2) into the neural network.
To inject backgates, attackers typically use malicious data with backgate triggers (including random and model-dependent) to train deep neural networks, resulting in some neurons being contaminated by backgate triggers. These malicious neurons will be strongly activated in the presence of a back-gate trigger, appearing as a higher activation value or a higher weight, and a lower weight and activation value when inputting benign data.
Based on the above analysis, when we input a large number of samples with back-gate triggers and a large number of samples of benign data into the deep neural network, the weight difference or activation difference between the contaminated neurons in the back-gate model when normal samples are input and the neurons when normal samples are input will be significantly larger than those when normal samples are input. Therefore, the weight difference or the activation value difference of the neurons under the two input conditions can be detected, so that a series of neuron sets possibly polluted by attackers can be found.
As shown in fig. 4, the specific implementation details of step 3 are as follows:
first, data with a back-gate trigger is input into the deep neural network, and the activation value and weight value of each neuron in the first full link layer (the neuron of this layer is usually activated by the back-gate trigger) are calculated. For the activation value of neuron, assuming that the first layer of the neural network is composed of K neurons, the activation value of the nth neuron of the first layer
Figure BDA0003038229560000091
Can be obtained from the following equation:
Figure BDA0003038229560000092
wherein philIs the activation function of the l-th layer,
Figure BDA0003038229560000093
is the degree of activation of the neurons of the previous layer,
Figure BDA0003038229560000094
is the weight of the neuron in the previous layer,
Figure BDA0003038229560000095
to correct for the deviation.
Then, benign data without backdoor triggers is input into the deep neural network, and the activation value of each neuron in the first full link layer is calculated by the formula
Figure BDA0003038229560000096
And weight value
Figure BDA0003038229560000097
And finally, subtracting the activation value and the weight value obtained under the condition of inputting normal data from the activation value and the weight value obtained under the condition of inputting data with a back door trigger to obtain an activation value difference value delta a and a weight value difference value delta w. The two data sets are formed by sorting the activation value difference value delta a and the weight value difference value delta w from high to low, and the neurons which are ranked at the top in the two data sets are more likely to be malicious neurons.
And 4, step 4: and performing model pruning on the deep neural network with the backdoor.
In the last step, a target neuron set ordered in the order from high to low of the activation value difference Δ a and the weight value difference Δ w is obtained, and then final pruning is performed. Model pruning may cause loss of neural network precision, so detection is performed after each pruning operation, and the detection standards are two, namely prediction accuracy of the pruned model and attack success rate of backdoor attacks after pruning.
For the prediction accuracy of the model after pruning, a control group is set firstly, namely a verification data set D without malice is input into an original deep neural networkvalidAnd recording the calculation result. After each round of pruning is finished, the D is put againvalidInputting the pruned deep neural network and obtaining a result Di (i is 1,2,3 …), and recording Di and DvalidThe similarity of (c). The higher the similarity is, the better the prediction accuracy of the pruned deep neural network is proved to be. During pruning, this value should be ensured at a higher level. If the pruning rate is lower than a preset threshold (determined by a user according to the situation, generally not lower than 97%), a rollback operation is performed, and the pruning of the corresponding neuron is cancelled.
For the attack success rate of the backdoor attack after pruning, after one round of pruning is carried out, the success rate of the misclassification of the model after the data with the backdoor trigger is pruned is tested, namely the attack success rate. The larger the attack success rate is reduced, the more accurate pruning is.
With reference to fig. 5, the step 4 is implemented as follows:
a series of neurons sorted according to the weight value difference obtained in the step 3
Figure BDA0003038229560000101
And a series of neurons ordered by activation value difference
Figure BDA0003038229560000102
Pruning operation is carried out in turn. And when one neuron is pruned, the prediction accuracy and attack success rate of the model after pruning are retested.
Let QiRepresenting the prediction accuracy of the deep neural network model obtained after the ith round of operation, SiShowing the success rate of the attack after the ith round of operation. If the success rate of the attack SiWithout significant reduction, i.e. Si-1Is approximately equal to SiThe pruning in the round is indicated as invalid pruning or the model accuracy QiIs significantly reduced, i.e. QiIf the number of the trimmed neurons is less than the defense requirement threshold (here, 97%), which indicates that trimming the neurons is not beneficial to defending against the attack of malicious data, a rollback operation is performed to recover the neurons, the trimmed neurons are recovered into a target neural network, and the (i + 1) th round of pruning is performed. If the ith round of pruning operation is finished and the backspacing condition is not met, the attack success rate S is obtainediReduced and model prediction accuracy QiEssentially unchanged, the pruning of the neuron may defend against backdoor attacks. And after each round of pruning is finished, if the back-off is not needed, carrying out local fine adjustment on the cut neural network by using a least square method, and after partial model precision is recovered, carrying out the (i + 1) th round of pruning. After n rounds of pruning, if the success rate of attack is SnModel accuracy Q on the validation data set below a predetermined threshold (here 10%)nAbove the defensive requirement (the threshold is determined by the user according to the defensive requirement), the preset defensive effect is considered to be achieved, and the pruning operation is finished.
After pruning is finished, the neural network after training and pruning is initialized by using the pre-trained neural network weight, and fine tuning is performed again. The classification accuracy degradation of benign input caused by pruning can be recovered (at least partially) by fine-tuning. After fine tuning, the final benign neural network model is obtained.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A backdoor attack defense method based on thermodynamic diagrams, reverse engineering and model pruning is characterized by comprising the following steps:
determining a back door trigger of each category in a target neural network, comparing an L1 paradigm of the back door triggers through an outlier detection point algorithm, and determining a target label as target label data;
performing model inversion on the category of the target neural network to calculate a corresponding data set, drawing a thermodynamic diagram according to the data set, and determining the optimal position of the rear door trigger according to the thermodynamic diagram;
sequentially inputting target label data to the target neural network, and screening target neurons in the target neural network according to the weight and the activation value of the target label data;
and performing model pruning on the target neural network according to the target neuron.
2. The method of claim 1, wherein the target tag is determined according to reverse engineering.
3. The backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning according to claim 1, further comprising performing attack success rate detection and model accuracy rate detection on the target neural network, and completing pruning on the target neural network through multiple debugging.
4. The method for defending against backdoor attacks based on thermodynamic diagrams, reverse engineering and model pruning according to claim 1, wherein the backdoor trigger should satisfy the following formula:
Figure FDA0003038229550000011
Figure FDA0003038229550000012
Figure FDA0003038229550000013
refers to the minimum perturbation, | T, required to classify any benign data with the back-door trigger into a target classt| is the size of the back door trigger;
Figure FDA0003038229550000014
refers to the perturbation required to classify an uninfected tag into the correct class.
5. The method for defending against backdoor attacks based on thermodynamic diagrams, reverse engineering and model pruning according to claim 1, wherein the outlier detection point algorithm satisfies the following three formulas:
bi=|ai-a0.5|;
M=b0.5
Figure FDA0003038229550000021
in the above formula ai(0. ltoreq. i. ltoreq.n) representsA certain flip-flop deltaiBenign data, a, input while reversing0.5Refers to the median of the benign data; calculate each aiCorresponding data point and median a0.5Absolute deviation b ofiAnd the median of these absolute deviations is denoted as M; δ represents the anomaly index of the data point.
6. A backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning according to claim 1, wherein the backdoor trigger comprises a random type and a model-dependent type.
7. The method of claim 1, wherein a verification data set D without malicious intent is inputted into the target neural networkvalidAnd recording the calculation result; after each round of pruning is finished, the D is put againvalidInputting the pruned deep neural network and obtaining a result Di (i is 1,2,3 …), and recording Di and DvalidThe similarity of (c).
8. The backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning according to claim 1, characterized in that, for the attack success rate of the backdoor attack after pruning, after a round of pruning is performed, the success rate of misclassification of the data with backdoor triggers by the model after pruning is tested.
CN202110449810.5A 2021-04-25 2021-04-25 Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning Active CN113111349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110449810.5A CN113111349B (en) 2021-04-25 2021-04-25 Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110449810.5A CN113111349B (en) 2021-04-25 2021-04-25 Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning

Publications (2)

Publication Number Publication Date
CN113111349A true CN113111349A (en) 2021-07-13
CN113111349B CN113111349B (en) 2022-04-29

Family

ID=76720004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110449810.5A Active CN113111349B (en) 2021-04-25 2021-04-25 Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning

Country Status (1)

Country Link
CN (1) CN113111349B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609482A (en) * 2021-07-14 2021-11-05 中国科学院信息工程研究所 Back door detection and restoration method and system for image classification model
CN114048466A (en) * 2021-10-28 2022-02-15 西北大学 Neural network backdoor attack defense method based on YOLO-V3 algorithm
CN114610885A (en) * 2022-03-09 2022-06-10 江南大学 Text classification backdoor attack method, system and equipment
CN114074390B (en) * 2021-11-23 2024-04-26 苏州博宇科技有限公司 Machining system and method for automation of plastic mold electrode

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596336A (en) * 2018-04-24 2018-09-28 清华大学 For the software and hardware combined attack method and device of neural network
CN111242291A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Neural network backdoor attack detection method and device and electronic equipment
CN111260059A (en) * 2020-01-23 2020-06-09 复旦大学 Backdoor attack method of video analysis neural network model
US20200387608A1 (en) * 2019-05-29 2020-12-10 Anomalee Inc. Post-Training Detection and Identification of Human-Imperceptible Backdoor-Poisoning Attacks
US20200410098A1 (en) * 2019-06-26 2020-12-31 Hrl Laboratories, Llc System and method for detecting backdoor attacks in convolutional neural networks
CN112182576A (en) * 2020-10-14 2021-01-05 桂林电子科技大学 Virus-putting attack method based on feature collision in deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596336A (en) * 2018-04-24 2018-09-28 清华大学 For the software and hardware combined attack method and device of neural network
US20200387608A1 (en) * 2019-05-29 2020-12-10 Anomalee Inc. Post-Training Detection and Identification of Human-Imperceptible Backdoor-Poisoning Attacks
US20200410098A1 (en) * 2019-06-26 2020-12-31 Hrl Laboratories, Llc System and method for detecting backdoor attacks in convolutional neural networks
CN111260059A (en) * 2020-01-23 2020-06-09 复旦大学 Backdoor attack method of video analysis neural network model
CN111242291A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Neural network backdoor attack detection method and device and electronic equipment
CN112182576A (en) * 2020-10-14 2021-01-05 桂林电子科技大学 Virus-putting attack method based on feature collision in deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANJIAO CHEN ET AL.: "Backdoor Attacks and Defenses for Deep Neural Networks in Outsourced Cloud Environments", 《 IEEE NETWORK》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609482A (en) * 2021-07-14 2021-11-05 中国科学院信息工程研究所 Back door detection and restoration method and system for image classification model
CN113609482B (en) * 2021-07-14 2023-10-17 中国科学院信息工程研究所 Back door detection and restoration method and system for image classification model
CN114048466A (en) * 2021-10-28 2022-02-15 西北大学 Neural network backdoor attack defense method based on YOLO-V3 algorithm
CN114048466B (en) * 2021-10-28 2024-03-26 西北大学 Neural network back door attack defense method based on YOLO-V3 algorithm
CN114074390B (en) * 2021-11-23 2024-04-26 苏州博宇科技有限公司 Machining system and method for automation of plastic mold electrode
CN114610885A (en) * 2022-03-09 2022-06-10 江南大学 Text classification backdoor attack method, system and equipment
CN114610885B (en) * 2022-03-09 2022-11-08 江南大学 Text classification backdoor attack method, system and equipment

Also Published As

Publication number Publication date
CN113111349B (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN113111349A (en) Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning
CN111753881B (en) Concept sensitivity-based quantitative recognition defending method against attacks
Singla et al. Second-order provable defenses against adversarial attacks
US20150134578A1 (en) Discriminator, discrimination program, and discrimination method
CN112365005B (en) Federal learning poisoning detection method based on neuron distribution characteristics
CN115186816B (en) Back door detection method based on decision shortcut search
CN109242223A (en) The quantum support vector machines of city Public Buildings Fire Risk is assessed and prediction technique
CN110874471B (en) Privacy and safety protection neural network model training method and device
Wang et al. ADDITION: Detecting adversarial examples with image-dependent noise reduction
CN116757273A (en) An improved method to combat perturbation backdoor attacks
Vaddadi et al. An efficient convolutional neural network for adversarial training against adversarial attack
CN109272036B (en) Random fern target tracking method based on depth residual error network
Ma et al. Releasing malevolence from benevolence: The menace of benign data on machine unlearning
Zhang et al. Defending against backdoor attack on deep neural networks based on multi-scale inactivation
CN113010888A (en) Neural network backdoor attack defense method based on key neurons
US20230259658A1 (en) Device and method for determining adversarial patches for a machine learning system
Li et al. Backdoor mitigation by correcting the distribution of neural activations
CN110796237B (en) Method and device for detecting attack resistance of deep neural network
Gala et al. Evaluating the effectiveness of attacks and defenses on machine learning through adversarial samples
Abdukhamidov et al. Singleadv: single-class target-specific attack against interpretable deep learning systems
Mukeri et al. Towards query efficient and derivative free black box adversarial machine learning attack
Zhao et al. Uma: Facilitating backdoor scanning via unlearning-based model ablation
Dunnett et al. Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies
Halim et al. Automated Adversarial-Attack Removal with SafetyNet Using ADGIT
Shen et al. Reimagining linear probing: Kolmogorov-arnold networks in transfer learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant