CN113010888B - Neural network backdoor attack defense method based on key neurons - Google Patents

Neural network backdoor attack defense method based on key neurons Download PDF

Info

Publication number
CN113010888B
CN113010888B CN202110228938.9A CN202110228938A CN113010888B CN 113010888 B CN113010888 B CN 113010888B CN 202110228938 A CN202110228938 A CN 202110228938A CN 113010888 B CN113010888 B CN 113010888B
Authority
CN
China
Prior art keywords
neurons
neural network
control gate
key
neuron
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110228938.9A
Other languages
Chinese (zh)
Other versions
CN113010888A (en
Inventor
詹瑾瑜
江维
温翔宇
周星志
孙若旭
宋子微
廖炘可
范翥峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110228938.9A priority Critical patent/CN113010888B/en
Publication of CN113010888A publication Critical patent/CN113010888A/en
Application granted granted Critical
Publication of CN113010888B publication Critical patent/CN113010888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a neural network backdoor attack defense method based on key neurons, which is applied to the field of network security and comprises the steps of firstly collecting input samples and corresponding operation results during operation aiming at a deployed neural network model; then after the control gate is deployed in each neuron in the neural network model, training the control gate according to the collected input samples and corresponding operation results to obtain each picture and the optimal control gate corresponding to each class of pictures; and then determining key neurons according to the optimal control gate, performing statistical analysis on the neurons with lower activation frequency in the key neurons, judging whether the neural network model is abnormal, if so, using a fine-grained cutting strategy to cut the screened abnormal neurons, completing back gate defense, and otherwise, outputting the original neural network model.

Description

Neural network backdoor attack defense method based on key neurons
Technical Field
The invention belongs to the field of network security, and particularly relates to an application program attack defense technology.
Background
Backdoor attacks are a serious threat to Artificial Intelligence (AI) applications based on Neural Networks (NN). The principle of the backdoor attack is that an attacker trains an attacked model by using a polluted data set and releases the attacked model to a public community. The user uses the attacked model unknowingly, and trigger pictures which are carefully designed by an attacker are mixed in the input of the runtime model, so that the classification precision of the model is greatly reduced, and even the model is unavailable. The goal of the backdoor attack is to embed an attacker-designed backdoor in the neural network model so that the attacker can attack the user's AI system at any time based on the backdoor.
In the detection aspect of backdoor attacks, b.chen et al propose a backdoor attack detection method based on analyzing data set distribution and activating clustering; wang et al think that the attacked class is unstable, and some minor disturbances can cause classification failure, so a method based on anomaly detection is proposed to detect the backdoor attack; liu et al also proposed a backdoor attack detection method based on detecting the distribution of the prediction results, i.e., normal model classification results are uniformly distributed with respect to the data set, while the presence of a backdoor model highlights the classification aspect of one class over the others.
In terms of defense against backdoor attacks, the currently popular backdoor elimination method includes: and (3) using a clean data set to finely adjust the model and retraining after cutting. Methods in which a model is fine-tuned using a clean data set are proposed by y.ji and b.chen et al, who believe that fine-tuning a damaged neural network model using a clean data set can eliminate backdoors that may exist in the model. While the method of using the clipping method to eliminate the hidden backdoor is proposed by k.liu and b.wang et al in the papers, they believe that the neural network backdoor will be hidden in some neurons in the neural network, and it may not be possible to eliminate the backdoor well by directly using the data set to retrain, and it may be a better way to destroy the association of these neurons, so they clip some of the neurons in the model using the clipping strategy, and retrain the clipped model using the clean data set to keep normal classification of the clean data.
The system considers that under a common backdoor attack scene, namely an attacker trains and issues an attacked model, then a user is possibly attacked in the using process, a defender provides a feasible detection method to detect whether the model is attacked or not, and then a feasible backdoor elimination strategy is provided on the basis of detection to eliminate the existing backdoor attack. Different from the existing detection method, the system provides a backdoor attack detection method based on key neurons and a backdoor attack elimination method based on fine-grained clipping based on the interpretability of a neural network model and the characteristic triggering of the model. The system analyzes the activation frequency of key neurons of the generated model to be detected, finds the difference between the mathematical characteristics of the attacked model and the normal model, thereby completing the detection of the model to be detected, and completes the elimination of a backdoor by using a fine-grained cutting strategy based on the positioning of abnormal neurons, so as to generate a safe repaired neural network model.
Some neurons in the neural network may not only support the inference operation of the neural network, but may also reflect certain features on the input picture. Neurons that are closely associated with the input picture may be considered key neurons. And combining the key neurons corresponding to the pictures belonging to the same class, namely considering the key neurons belonging to the class. Similarly, all the classes of key neurons are combined to obtain the key neurons of the whole model.
On the other hand, a control gate is one structure in a neural network. The control gate needs to be deployed to each neuron of each layer in the neural network, and then multiplied with the output of the neural network as a parameter as the final output of the neuron of the neural network. The magnitude of the value may indicate the sensitivity and contribution of the corresponding neuron to the current class, with the control gate as a parameter. For example, if the value of the corresponding gate of a certain neuron is 3.2, it is considered that the contribution of the neuron to the current classification should be increased by 3.2 times compared with the output obtained by normal training. Conversely, a control gate with a value below 1 corresponds to a neuron that indicates that the contribution to the final classification should be reduced.
Disclosure of Invention
In order to solve the technical problems, the invention provides a neural network backdoor attack defense method based on key neurons aiming at the backdoor attack problem existing in a computer vision application program based on a neural network.
The technical scheme adopted by the invention is as follows: a neural network backdoor attack defense method based on key neurons comprises the following steps:
and S1, collecting input samples and corresponding operation results when the program is operated aiming at the deployed computer vision application program based on the neural network model. The program mainly completes the classification task of the input samples, namely for an input picture, the class to which the input picture belongs is output through the inference of a neural network model.
S2, after the control gate is deployed in each neuron in the neural network model, training the control gate according to the input samples collected in the step S1 and the corresponding operation results to obtain the optimal control gate corresponding to each type of input samples;
s3, determining a plurality of key neurons according to the optimal control gate;
s4, statistically analyzing neurons with low activation frequency in the plurality of key neurons, judging whether the neural network model is abnormal, if so, executing the step S5, otherwise, outputting the original neural network model;
and S5, cutting the screened abnormal neurons by using a fine-grained cutting strategy to complete the back door defense.
In step S2, each type of input sample specifically includes: and classifying the input samples according to the operation result obtained by training, wherein the scoring class result is N.
The optimal control gate corresponding to each type of input sample in step S2 is specifically: inputting input samples belonging to the same class into a neural network for training, performing gradient descent updating on a control gate by adopting a gradient descent algorithm for each input sample in the class in combination with the input sample and an optimal output operation result, obtaining the updated control gate after iteration times are reached, namely the optimal control gate corresponding to each input sample, and combining all the optimal control gates corresponding to the input samples to obtain the optimal control gate of the input sample.
Step S3 specifically includes: setting a first threshold and a second threshold, wherein the first threshold is used for screening key neurons corresponding to the condition that a single input sample is classified into one class, the second threshold is used for screening key neurons corresponding to the condition that the class at least comprises 2 input samples, and the specific screening process comprises the following steps:
for the condition that a single input sample is classified, judging the control gate value exceeding a first threshold value in the control gate values corresponding to all the neurons of the input sample so that the corresponding neuron is the key neuron of the input sample, and then setting the control gate value of the key neuron to be 1;
for the case of a class including at least 2 input samples, calculating the activation frequency of each neuron corresponding to all the input samples in the class, and if the activation frequency exceeds a second threshold, considering that the neuron belongs to a key neuron of the class.
Step S4 specifically includes: taking the key neurons corresponding to each type of input sample in the step S3 as a group to obtain N groups of key neurons, and respectively calculating the correlation coefficient and the abnormality index of each group of neurons; if the abnormality index is greater than the third threshold, it indicates that the neural network model may be attacked, and the step S5 is continuously executed; otherwise, the neural network model is safe and is output.
Step S5 specifically includes: calculating the neuron activation frequency of all layers of the neural network model, and taking the neurons with overhigh activation frequency as suspected abnormal neurons; calculating the corresponding dispersion of each layer of the neural network model based on the suspected abnormal neurons, and positioning the abnormal neurons with abnormal activation frequency according to the dispersion; and (5) cutting the positioned abnormal neurons.
The dispersion β is calculated as:
Figure BDA0002958096280000031
wherein, card (Ψ)hg) Number of neurons having a high activation frequency, N, representing the l-th layer of the neural network modellRepresenting the total number of layer i neurons of the neural network model.
The invention has the beneficial effects that: the method of the invention expresses the internal information of the model in the form of key neurons, and improves the interpretable characteristic of the backdoor attack detection method; most of the traditional methods need to carry out backdoor attack detection in an off-line state, and the methods basically need to design or obtain a larger data set to serve as the basis for developing the detection method; compared with the traditional method, the method can finish the back door attack detection of the model to be detected only by collecting some input samples in operation and developing a detection method on the samples; the invention also implements the cutting strategy in the possible abnormal neurons by positioning the abnormal neurons, thereby achieving the purpose that only a small number of neurons of the cutting part are needed to complete the elimination of the backdoor without retraining; the complexity of defense measures in the process of back door defense is reduced, a specific clean data set is not needed, and the cost and the expense in the process are reduced.
Drawings
FIG. 1 is a flow chart of a method for collecting runtime input samples and corresponding output results in accordance with the present invention.
FIG. 2 is a flow chart of the control gate deployment and iterative optimization of the present invention.
FIG. 3 is a flow chart of key neuron generation of the present invention.
FIG. 4 is a flowchart of the anomaly index-based backdoor attack detection method according to the present invention.
FIG. 5 is a flow chart of a backdoor attack defense method based on fine-grained clipping according to the present invention.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
The method mainly comprises the steps of input data collection in operation, control gate deployment and training, key neuron generation, model detection based on abnormal indexes and neural network backgate elimination based on fine-grained cutting. The following description will be given taking the detection of a backdoor attack by an image processing program based on a neural network as an example:
1. the run-time input data collection mainly aims at the deployed neural network model to collect the run-time input samples and the corresponding run results. It is expected that all pictures input into the neural network model are legal and can correspond to a legal output, so that the pictures need to be cut by a preprocessing module when the models are input. And then classifying and sorting the data in operation by using an input sample collection method, wherein the image is stored locally by directly using a storage mode and using an opencv image storage method. Then, after the pictures are input into the model, the corresponding operation results need to be collected continuously. Here the run results are saved using the json format and correspond to previously collected pictures. Finally, it is necessary to ensure that the collected data is sufficient (modification is to ensure that the collected data reaches a given number threshold n, and 30-40 input samples can ensure the feasibility of the algorithm after testing). As shown in fig. 1, the method comprises the following steps:
step A1: the input samples input to the model are preprocessed to conform to the input criteria of the model.
Step A2: a counter is initialized that ensures that the sample data collected is sufficient.
Step A3: the input data is placed into a buffer and input into a neural network model and inferred. After the operation result of the model is obtained, the input picture and the operation result are combined to be used as a data set.
Step A4: and storing a group of input pictures and operation results, wherein the pictures are directly stored by using opencv, and the operation results and the names of the pictures are stored in a json format, so that the pictures and the results are ensured to be in one-to-one correspondence.
Step A5: and adding one to the counter value, judging whether the requirement of the data volume is met, and otherwise, continuing to collect the data.
2. Deployment and training of control gates: a control gate is deployed after each neuron in the neural network. After deployment of the control gates, the control gates may serve as criteria for whether the neuron plays a critical role for the final classification. In practice, the control gate may simply be placed as a parameter after the neuron and multiplied by the neuron's activation output as the final neuron's output result. If the value of the control gate corresponding to a certain neuron is greater than 1, the neuron is considered to be more important for the classification output of the neural network model, and if the value of the control gate corresponding to the neuron is less than 1, the neuron is considered to have smaller influence on the final output. In practice the control gate is a non-negative number and is directly set to 0 for control gates smaller than 0. In the training stage of the control gate, the aim is to obtain the optimal control gate corresponding to each picture and each class of pictures. Specifically, a control gate corresponding to each input sample is obtained first, which requires a gradient down update of the control gate using a gradient down algorithm in combination with the input picture and the final output result. After 100 iterations, the value of the control gate after the optimization is updated is obtained. Then, classifying the input samples, and classifying all pictures according to the operation result based on the operation result of each sample after model inference. For the control gate of each class, only the input samples belonging to the class are required to be subjected to control gate optimization, the obtained control gates are combined, and the set is the control gate belonging to the class; as shown in fig. 2, the method comprises the following steps:
step B1: and acquiring an original neural network model.
Step B2: control gates deployed into the neural network are initialized.
Step B3: for each picture, inferences are made through the original model and the model with the deployed control gates.
Step B4: and collecting the calculation results of the two models, solving the cross entropy of the calculation results, updating the deployed control gate by using a gradient descent method, and completing the training of the control gate of a single picture after 100 iterations.
Step B5: the control gate belonging to a single picture is saved.
Step B6: and judging whether all the pictures finish the training of the control gate or not so as to ensure that all the pictures are trained to perform the training of the control gate, and finally obtaining the optimal control gate corresponding to a single picture or a class of pictures.
3. The generation of key neurons, as their name implies, i.e. there are some neurons that are critical to this class. That is, these neurons are very important for the classification result of this class. From another perspective, if the gates of the pictures belonging to the same class are generated, the gate values corresponding to some neurons are always larger. That is, these neurons are critical to these pictures. These neurons can be said to be key neurons in these pictures. And (4) gathering key neurons corresponding to the pictures, and continuously observing the influence of the control gates of the neurons on the classification result of the class. If there are always some neurons whose control gates are of relatively high value for most pictures, these are considered as key neurons belonging to this class; as shown in fig. 3, the method comprises the following steps:
step C1: two different thresholds are set for screening key neurons belonging to a single picture and a single class, respectively. Wherein the first threshold value gamma1The method is used for screening the key neurons of a single picture, the threshold value is derived from the subsequent calculation of a control gate, and the neurons which are important for the inference result of a single input sample can be screened out based on the trained values of the control gate. Second threshold value gamma2For obtaining the key neurons belonging to the class, the threshold value is to ensure that the screened key neurons of a certain class can still maintain the normal classification function of the class.
Step C2: and traversing all the control gates of the pictures belonging to the same class.
Step C3: for all neurons of a single picture, if their value exceeds the threshold γ1That is, the neurons corresponding to the gates exceeding the threshold are considered as the key neurons belonging to the picture, and the gates of the corresponding neurons are set to 1.
Step C4: and finishing the generation of key neurons of all pictures belonging to the same class.
Step C5: calculating the activation frequency of each neuron if the corresponding activation frequency exceeds a threshold value gamma2I.e. this neuron is considered to be a key neuron belonging to this class.
Step C6: and judging whether the generation of the key neurons of all classes and all pictures in the classes is finished, and collecting all the key neurons if the traversal is finished.
4. The detection of the backdoor attack based on the abnormal index comprises two indexes: and the correlation coefficient index is used for carrying out statistical analysis on neurons with low activation frequency in the key neurons. Because there are some original pictures belonging to different classes corresponding to the feature class, after a trigger is added, the original pictures are classified into the target class, and the correlation coefficient is to analyze the influence of the features of the original pictures belonging to the different classes on the activation frequency of the key neurons of the target class. And (4) completing the detection of the neural network model by combining the correlation coefficient indexes of different layers and different classes. Here this task is done using the anomaly index. The abnormal index represents the abnormal degree of a neural network model, and for a neural network model, if the neural network model is not attacked by a backdoor, the abnormal index value is normal (within a normal range), and if the neural network model is attacked, the abnormal index value is in a dangerous state, namely, the abnormal index exceeds a threshold value of the normal range; as shown in fig. 4, the method comprises the following steps:
step D1: for N classes in a certain classification model (i.e. a specific neural network model, such as the VGG16 model exemplified below), key neurons belonging to all N classes, i.e. N groups of key neurons, may be obtained based on step C. After obtaining the neurons, performing mathematical analysis on the key neurons of the groups, and calculating to obtain N groups of two mathematical indexes, namely correlation coefficients and abnormal indexes.
Step D2: traversing all classes and model layers, calculating a covariance matrix of key neuron activation frequencies for each class corresponding to multiple input samples at each layer, for example, in the VGG16 model, all 16 layers need to be calculated. The covariance matrix represents the difference between the data sets and is expressed in the form of a matrix.
Figure BDA0002958096280000071
Wherein C representsTo calculate the covariance value, cov (A)i,Aj) Means calculating the covariance between the two sets of data, AiIndicating the ith group of data and K indicating a total of K groups of data.
Step D3: the variance of the key neuron activation frequency of a single sample at each layer and the final correlation coefficient are calculated. The correlation coefficient index is used for carrying out statistical analysis on neurons with lower activation frequency in key neurons.
Figure BDA0002958096280000072
Wherein, Ci,jIs to indicate the ith row and jth column elements,
Figure BDA0002958096280000073
is represented by AkStandard deviation of this set of data.
Step D4: and taking out the correlation coefficients of all the N classes as an object to be analyzed.
Step D5: calculating the mean of the correlation coefficients
Figure BDA0002958096280000074
Based on the mean, the variance for the correlation coefficient in each layer for the combination of all N classes is calculated for the entire model.
Step D6: and calculating an abnormality index. The abnormality index needs to be calculated based on the correlation coefficient. Calculating the abnormality index requires combining all layers and all classes of the model, and the abnormality index of the model to be detected can be calculated by integrating the parameters.
Figure BDA0002958096280000075
Wherein alpha isijRepresenting the correlation coefficient value of the ith layer corresponding to the jth class;
step D7: if the anomaly index AI of the model to be determined is greater than a threshold value TAIThe model is considered to be attacked, otherwise the model is considered to be a security model. Wherein T isAIRepresenting an anomaly index threshold calculated from a number of security models, above which the model is likely to be attacked, and below which the model is secure. For example, T can be measured during the course of the experiments of the present inventionAISet to 0.06. The abnormal index of the safe and clean neural network model is lower than 0.06, and the abnormal index (0.15) of the model with the backdoor is far beyond the threshold value, so that whether the backdoor exists in the neural network model can be judged based on the abnormal index.
Defending against backdoor attacks based on fine-grained clipping: fine-grained clipping first requires determining the clipped object, and here the concept of abnormal neurons is first defined, which requires locating the key neurons with abnormally high activation frequency, which are possible abnormal neurons. After screening out neurons, it is not certain that these neurons are either abnormal or have a high activation frequency due to a particular cause in a normal situation. Therefore, the dispersion index is used for determining whether the screening is reasonable, and the dispersion index is used for carrying out statistical analysis on the neurons with high activation frequency in the key neurons. These higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture. Since all pictures are classified as a target class if triggers are included therein, corresponding to different pictures, there are some key neurons corresponding to the target class that correspond to the trigger pattern. After the abnormal neurons are determined, determining layers needing to be cut and a cutting proportion by using a fine-grained cutting strategy, cutting the screened abnormal neurons, and finishing a back door defense step; as shown in fig. 5, the method comprises the following steps:
step E1: and D, taking the result in the step D as a pilot condition, carrying out the step if the result in the step D is that the model is attacked, and directly outputting the original neural network model if the model is not attacked.
Step E2: abnormal neurons are located. Calculating the activation frequency of neurons of all layers, and screening the neurons with high activation frequency as possible abnormal neurons:
Figure BDA0002958096280000081
where ψ denotes the activation frequency of a single neuron,τh denotes a threshold value of activation frequency for screening abnormal neurons, which always have this larger activation frequency value, ΨcrRepresenting the set of activation frequencies corresponding to all key neurons.
Step E3: and calculating the dispersion. Since abnormally high activation frequencies of some neurons may also occur in normal neural networks. For the attacked model, due to the existence of the trigger pattern in the training process, the characteristics of the trigger are extracted from each layer of the neural network, so that a high degree of association exists between some neurons and the trigger in each layer. Based on the observation and analysis, a dispersion index is provided, and the statistical analysis is carried out on the neurons with higher activation frequency in the key neurons. These higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture. First, count of the number of neurons CARD (Ψ) having a higher activation frequency in a certain layer is countedhg) Then counting the total number N of the layer of neuronsl. The dispersion is calculated based on two values:
Figure BDA0002958096280000082
step E4: the dispersion index is used as an auxiliary index, and the neuron generating high activation is judged to be an abnormal neuron caused by the existence of the trigger, so that the abnormal neuron with abnormal activation frequency is ensured to be really positioned: the number of abnormal neurons due to the presence of triggers is much greater than the number of high activation frequency neurons that normally occur. In this case, the dispersion is expressed in that the dispersion index of the model with the back door is much higher than that of the normal model. And traversing all layers which can implement the cutting by using a concept similar to grid search, collecting cutting proportion intervals, and determining the optimal cutting rate and the layers which implement the cutting.
Step E5: and (3) implementing a cutting strategy, namely assigning a value of 0 to the weight corresponding to the abnormal neuron by using a weight assignment mode, blocking the propagation of the abnormal neuron and finishing the defense of backdoor attack.
The method of the invention has the following characteristics:
1. the method performs mathematical analysis on key neurons that control the generation of gates. And the model is deeply embedded into the model, the internal information of the model is expressed in the form of key neurons, and the interpretability characteristic of the backdoor attack detection method is improved. The interpretability can explain and explain the reliability of the method more reasonably, and the detection method can be expanded more based on the interpretability of the model.
2. Run-time input samples are used as test data. Since most of the conventional methods need to perform backdoor attack detection in an offline state, the writing methods basically need to design or obtain a relatively large data set as a basis for development of the detection method. Compared with the methods, the detection of the backdoor attack of the model to be detected can be completed only by collecting some input samples in operation and developing the detection method for the samples.
3. And (3) positioning abnormal neurons by using a fine-grained clipping strategy, and implementing the clipping strategy in the possible abnormal neurons, so that the elimination of the backdoor can be completed by only clipping a small number of neurons, and retraining is not needed. The characteristic can reduce the reduction of the complexity of defense measures in the process of back door defense, and a specific clean data set is not needed, so that the cost and the expense in the process are reduced.
In view of the three characteristics, the method is suitable for defending the backdoor attack of the neural network model during the operation of the deployment phase.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (3)

1. A neural network backdoor attack defense method based on key neurons is characterized by comprising the following steps:
s1, collecting input samples and corresponding operation results when the computer vision application program is operated aiming at the computer vision application program with the deployed neural network model;
the program is used for completing the classification task of the input sample;
s2, after the control gate is deployed in each neuron in the neural network model, training the control gate according to the input samples collected in the step S1 and the corresponding operation results to obtain the optimal control gate corresponding to each type of input samples; the optimal control gate corresponding to each type of input sample in step S2 is specifically: inputting input samples belonging to the same class into a neural network for training, performing gradient descent updating on a control gate by adopting a gradient descent algorithm for each input sample in the class in combination with the input sample and an optimal output operation result, obtaining an updated control gate which is an optimal control gate corresponding to each input sample after iteration times are reached, and combining all optimal control gates corresponding to the input samples of the class to obtain the optimal control gate of the input samples of the class;
s3, determining a plurality of key neurons according to the optimal control gate; step S3 specifically includes: setting a first threshold and a second threshold, wherein the first threshold is used for screening key neurons corresponding to the condition that a single input sample is classified into one class, the second threshold is used for screening key neurons corresponding to the condition that the class at least comprises 2 input samples, and the specific screening process comprises the following steps:
for the condition that a single input sample is classified, judging the control gate value exceeding a first threshold value in the control gate values corresponding to all the neurons of the input sample so that the corresponding neuron is the key neuron of the input sample, and then setting the control gate value of the key neuron to be 1;
for the case of a class at least comprising 2 input samples, calculating the activation frequency of each neuron corresponding to all the input samples in the class, and if the activation frequency exceeds a second threshold value, determining that the neuron belongs to a key neuron of the class;
s4, statistically analyzing neurons with low activation frequency in the plurality of key neurons, judging whether the neural network model is abnormal, if so, executing the step S5, otherwise, outputting the original neural network model; neurons with a lower activation frequency, i.e. an activation frequency less than the threshold τhThe neuron of (a);
s5, cutting the screened abnormal neurons by using a fine-grained cutting strategy to complete the back door defense; step S5 specifically includes: calculating the neuron activation frequency of all layers of the neural network model, and taking the neurons with overhigh activation frequency as suspected abnormal neurons; calculating the corresponding dispersion of each layer of the neural network model based on the suspected abnormal neurons, and positioning the abnormal neurons with abnormal activation frequency according to the dispersion; cutting the positioned abnormal neurons;
the dispersion β is calculated as:
Figure FDA0003550753780000011
wherein, card (Ψ)hg) Number of neurons having a high activation frequency, N, representing the l-th layer of the neural network modellRepresenting the total number of neurons in layer I of the neural network model, ΨhgNeurons with high activation frequency are represented:
Ψhg={ψ|ψ≥τh,ψ∈Ψcr}
where ψ denotes the activation frequency of an individual neuron, τhA threshold, Ψ, representing the frequency of activation used to screen for aberrant neuronscrRepresenting the set of activation frequencies corresponding to all key neurons.
2. The method for defending against neural network backdoor attacks based on key neurons according to claim 1, wherein each type of input sample in step S2 is specifically: and classifying the input samples according to the operation result obtained by training, wherein the scoring class result is N.
3. The neural network back door attack defense method based on key neurons according to claim 2, wherein the step S4 specifically comprises: taking the key neurons corresponding to each type of input sample in the step S3 as a group to obtain N groups of key neurons, and respectively calculating the correlation coefficient and the abnormality index of each group of neurons; if the abnormality index is greater than the third threshold, it indicates that the neural network model may be attacked, and the step S5 is continuously executed; otherwise, the neural network model is safe and is output.
CN202110228938.9A 2021-03-02 2021-03-02 Neural network backdoor attack defense method based on key neurons Active CN113010888B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110228938.9A CN113010888B (en) 2021-03-02 2021-03-02 Neural network backdoor attack defense method based on key neurons

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110228938.9A CN113010888B (en) 2021-03-02 2021-03-02 Neural network backdoor attack defense method based on key neurons

Publications (2)

Publication Number Publication Date
CN113010888A CN113010888A (en) 2021-06-22
CN113010888B true CN113010888B (en) 2022-04-19

Family

ID=76402163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110228938.9A Active CN113010888B (en) 2021-03-02 2021-03-02 Neural network backdoor attack defense method based on key neurons

Country Status (1)

Country Link
CN (1) CN113010888B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113918717B (en) * 2021-10-18 2023-07-04 中国人民解放军国防科技大学 Text backdoor defense method for cleaning data

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102647421B (en) * 2012-04-09 2016-06-29 北京百度网讯科技有限公司 The web back door detection method of Behavior-based control feature and device
CN106790186B (en) * 2016-12-30 2020-04-24 中国人民解放军信息工程大学 Multi-step attack detection method based on multi-source abnormal event correlation analysis
RU2739865C2 (en) * 2018-12-28 2020-12-29 Акционерное общество "Лаборатория Касперского" System and method of detecting a malicious file
US11601468B2 (en) * 2019-06-25 2023-03-07 International Business Machines Corporation Detection of an adversarial backdoor attack on a trained model at inference time
CN110910328B (en) * 2019-11-26 2023-01-24 电子科技大学 Defense method based on antagonism sample classification grade
CN111045330B (en) * 2019-12-23 2020-12-29 南方电网科学研究院有限责任公司 Attack identification method based on Elman neural network and grid-connected interface device
CN111260059B (en) * 2020-01-23 2023-06-02 复旦大学 Back door attack method of video analysis neural network model
CN112183717A (en) * 2020-08-28 2021-01-05 北京航空航天大学 Neural network training method and device based on critical path
CN112132262B (en) * 2020-09-08 2022-05-20 西安交通大学 Recurrent neural network backdoor attack detection method based on interpretable model
CN112365005B (en) * 2020-12-11 2024-03-19 浙江工业大学 Federal learning poisoning detection method based on neuron distribution characteristics

Also Published As

Publication number Publication date
CN113010888A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN111914256B (en) Defense method for machine learning training data under toxic attack
CN112765607B (en) Neural network model backdoor attack detection method
Obeidat et al. Intensive pre-processing of kdd cup 99 for network intrusion classification using machine learning techniques
CN111783442A (en) Intrusion detection method, device, server and storage medium
JP4484643B2 (en) Time series data abnormality determination program and time series data abnormality determination method
CN111835707B (en) Malicious program identification method based on improved support vector machine
CN110874471B (en) Privacy and safety protection neural network model training method and device
CN113111349B (en) Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning
CN113204745B (en) Deep learning back door defense method based on model pruning and reverse engineering
CN101364263A (en) Method and system for detecting skin texture to image
CN113361397B (en) Face mask wearing condition detection method based on deep learning
Jiang et al. Color backdoor: A robust poisoning attack in color space
CN102045357A (en) Affine cluster analysis-based intrusion detection method
CN113660196A (en) Network traffic intrusion detection method and device based on deep learning
CN113010888B (en) Neural network backdoor attack defense method based on key neurons
CN111352926A (en) Data processing method, device, equipment and readable storage medium
CN115174170B (en) VPN encryption flow identification method based on ensemble learning
Sheikholeslami et al. Efficient randomized defense against adversarial attacks in deep convolutional neural networks
KR102405799B1 (en) Method and system for providing continuous adaptive learning over time for real time attack detection in cyberspace
Al-Nafjan et al. Intrusion detection using PCA based modular neural network
Tariq et al. Towards an awareness of time series anomaly detection models' adversarial vulnerability
CN116739073B (en) Online back door sample detection method and system based on evolution deviation
Iliashov Synthesis of algorithms for recognition of vulnerabilities in web resources using signatures of fuzzy linguistic features
CN117614742B (en) Malicious traffic detection method with enhanced honey point perception
Tettey et al. Conflict modelling and knowledge extraction using computational intelligence methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant