CN113010888B

CN113010888B - Neural network backdoor attack defense method based on key neurons

Info

Publication number: CN113010888B
Application number: CN202110228938.9A
Authority: CN
Inventors: 詹瑾瑜; 江维; 温翔宇; 周星志; 孙若旭; 宋子微; 廖炘可; 范翥峰
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-03-02
Filing date: 2021-03-02
Publication date: 2022-04-19
Anticipated expiration: 2041-03-02
Also published as: CN113010888A

Abstract

The invention discloses a neural network backdoor attack defense method based on key neurons, which is applied to the field of network security and comprises the steps of firstly collecting input samples and corresponding operation results during operation aiming at a deployed neural network model; then after the control gate is deployed in each neuron in the neural network model, training the control gate according to the collected input samples and corresponding operation results to obtain each picture and the optimal control gate corresponding to each class of pictures; and then determining key neurons according to the optimal control gate, performing statistical analysis on the neurons with lower activation frequency in the key neurons, judging whether the neural network model is abnormal, if so, using a fine-grained cutting strategy to cut the screened abnormal neurons, completing back gate defense, and otherwise, outputting the original neural network model.

Description

Neural network backdoor attack defense method based on key neurons

Technical Field

The invention belongs to the field of network security, and particularly relates to an application program attack defense technology.

Background

Backdoor attacks are a serious threat to Artificial Intelligence (AI) applications based on Neural Networks (NN). The principle of the backdoor attack is that an attacker trains an attacked model by using a polluted data set and releases the attacked model to a public community. The user uses the attacked model unknowingly, and trigger pictures which are carefully designed by an attacker are mixed in the input of the runtime model, so that the classification precision of the model is greatly reduced, and even the model is unavailable. The goal of the backdoor attack is to embed an attacker-designed backdoor in the neural network model so that the attacker can attack the user's AI system at any time based on the backdoor.

In the detection aspect of backdoor attacks, b.chen et al propose a backdoor attack detection method based on analyzing data set distribution and activating clustering; wang et al think that the attacked class is unstable, and some minor disturbances can cause classification failure, so a method based on anomaly detection is proposed to detect the backdoor attack; liu et al also proposed a backdoor attack detection method based on detecting the distribution of the prediction results, i.e., normal model classification results are uniformly distributed with respect to the data set, while the presence of a backdoor model highlights the classification aspect of one class over the others.

In terms of defense against backdoor attacks, the currently popular backdoor elimination method includes: and (3) using a clean data set to finely adjust the model and retraining after cutting. Methods in which a model is fine-tuned using a clean data set are proposed by y.ji and b.chen et al, who believe that fine-tuning a damaged neural network model using a clean data set can eliminate backdoors that may exist in the model. While the method of using the clipping method to eliminate the hidden backdoor is proposed by k.liu and b.wang et al in the papers, they believe that the neural network backdoor will be hidden in some neurons in the neural network, and it may not be possible to eliminate the backdoor well by directly using the data set to retrain, and it may be a better way to destroy the association of these neurons, so they clip some of the neurons in the model using the clipping strategy, and retrain the clipped model using the clean data set to keep normal classification of the clean data.

The system considers that under a common backdoor attack scene, namely an attacker trains and issues an attacked model, then a user is possibly attacked in the using process, a defender provides a feasible detection method to detect whether the model is attacked or not, and then a feasible backdoor elimination strategy is provided on the basis of detection to eliminate the existing backdoor attack. Different from the existing detection method, the system provides a backdoor attack detection method based on key neurons and a backdoor attack elimination method based on fine-grained clipping based on the interpretability of a neural network model and the characteristic triggering of the model. The system analyzes the activation frequency of key neurons of the generated model to be detected, finds the difference between the mathematical characteristics of the attacked model and the normal model, thereby completing the detection of the model to be detected, and completes the elimination of a backdoor by using a fine-grained cutting strategy based on the positioning of abnormal neurons, so as to generate a safe repaired neural network model.

Some neurons in the neural network may not only support the inference operation of the neural network, but may also reflect certain features on the input picture. Neurons that are closely associated with the input picture may be considered key neurons. And combining the key neurons corresponding to the pictures belonging to the same class, namely considering the key neurons belonging to the class. Similarly, all the classes of key neurons are combined to obtain the key neurons of the whole model.

On the other hand, a control gate is one structure in a neural network. The control gate needs to be deployed to each neuron of each layer in the neural network, and then multiplied with the output of the neural network as a parameter as the final output of the neuron of the neural network. The magnitude of the value may indicate the sensitivity and contribution of the corresponding neuron to the current class, with the control gate as a parameter. For example, if the value of the corresponding gate of a certain neuron is 3.2, it is considered that the contribution of the neuron to the current classification should be increased by 3.2 times compared with the output obtained by normal training. Conversely, a control gate with a value below 1 corresponds to a neuron that indicates that the contribution to the final classification should be reduced.

Disclosure of Invention

In order to solve the technical problems, the invention provides a neural network backdoor attack defense method based on key neurons aiming at the backdoor attack problem existing in a computer vision application program based on a neural network.

The technical scheme adopted by the invention is as follows: a neural network backdoor attack defense method based on key neurons comprises the following steps:

and S1, collecting input samples and corresponding operation results when the program is operated aiming at the deployed computer vision application program based on the neural network model. The program mainly completes the classification task of the input samples, namely for an input picture, the class to which the input picture belongs is output through the inference of a neural network model.

S2, after the control gate is deployed in each neuron in the neural network model, training the control gate according to the input samples collected in the step S1 and the corresponding operation results to obtain the optimal control gate corresponding to each type of input samples;

s3, determining a plurality of key neurons according to the optimal control gate;

s4, statistically analyzing neurons with low activation frequency in the plurality of key neurons, judging whether the neural network model is abnormal, if so, executing the step S5, otherwise, outputting the original neural network model;

and S5, cutting the screened abnormal neurons by using a fine-grained cutting strategy to complete the back door defense.

In step S2, each type of input sample specifically includes: and classifying the input samples according to the operation result obtained by training, wherein the scoring class result is N.

The optimal control gate corresponding to each type of input sample in step S2 is specifically: inputting input samples belonging to the same class into a neural network for training, performing gradient descent updating on a control gate by adopting a gradient descent algorithm for each input sample in the class in combination with the input sample and an optimal output operation result, obtaining the updated control gate after iteration times are reached, namely the optimal control gate corresponding to each input sample, and combining all the optimal control gates corresponding to the input samples to obtain the optimal control gate of the input sample.

Step S3 specifically includes: setting a first threshold and a second threshold, wherein the first threshold is used for screening key neurons corresponding to the condition that a single input sample is classified into one class, the second threshold is used for screening key neurons corresponding to the condition that the class at least comprises 2 input samples, and the specific screening process comprises the following steps:

for the condition that a single input sample is classified, judging the control gate value exceeding a first threshold value in the control gate values corresponding to all the neurons of the input sample so that the corresponding neuron is the key neuron of the input sample, and then setting the control gate value of the key neuron to be 1;

for the case of a class including at least 2 input samples, calculating the activation frequency of each neuron corresponding to all the input samples in the class, and if the activation frequency exceeds a second threshold, considering that the neuron belongs to a key neuron of the class.

Step S4 specifically includes: taking the key neurons corresponding to each type of input sample in the step S3 as a group to obtain N groups of key neurons, and respectively calculating the correlation coefficient and the abnormality index of each group of neurons; if the abnormality index is greater than the third threshold, it indicates that the neural network model may be attacked, and the step S5 is continuously executed; otherwise, the neural network model is safe and is output.

Step S5 specifically includes: calculating the neuron activation frequency of all layers of the neural network model, and taking the neurons with overhigh activation frequency as suspected abnormal neurons; calculating the corresponding dispersion of each layer of the neural network model based on the suspected abnormal neurons, and positioning the abnormal neurons with abnormal activation frequency according to the dispersion; and (5) cutting the positioned abnormal neurons.

The dispersion β is calculated as:

wherein, card (Ψ)_hg) Number of neurons having a high activation frequency, N, representing the l-th layer of the neural network model_lRepresenting the total number of layer i neurons of the neural network model.

The invention has the beneficial effects that: the method of the invention expresses the internal information of the model in the form of key neurons, and improves the interpretable characteristic of the backdoor attack detection method; most of the traditional methods need to carry out backdoor attack detection in an off-line state, and the methods basically need to design or obtain a larger data set to serve as the basis for developing the detection method; compared with the traditional method, the method can finish the back door attack detection of the model to be detected only by collecting some input samples in operation and developing a detection method on the samples; the invention also implements the cutting strategy in the possible abnormal neurons by positioning the abnormal neurons, thereby achieving the purpose that only a small number of neurons of the cutting part are needed to complete the elimination of the backdoor without retraining; the complexity of defense measures in the process of back door defense is reduced, a specific clean data set is not needed, and the cost and the expense in the process are reduced.

Drawings

FIG. 1 is a flow chart of a method for collecting runtime input samples and corresponding output results in accordance with the present invention.

FIG. 2 is a flow chart of the control gate deployment and iterative optimization of the present invention.

FIG. 3 is a flow chart of key neuron generation of the present invention.

FIG. 4 is a flowchart of the anomaly index-based backdoor attack detection method according to the present invention.

FIG. 5 is a flow chart of a backdoor attack defense method based on fine-grained clipping according to the present invention.

Detailed Description

In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.

The method mainly comprises the steps of input data collection in operation, control gate deployment and training, key neuron generation, model detection based on abnormal indexes and neural network backgate elimination based on fine-grained cutting. The following description will be given taking the detection of a backdoor attack by an image processing program based on a neural network as an example:

1. the run-time input data collection mainly aims at the deployed neural network model to collect the run-time input samples and the corresponding run results. It is expected that all pictures input into the neural network model are legal and can correspond to a legal output, so that the pictures need to be cut by a preprocessing module when the models are input. And then classifying and sorting the data in operation by using an input sample collection method, wherein the image is stored locally by directly using a storage mode and using an opencv image storage method. Then, after the pictures are input into the model, the corresponding operation results need to be collected continuously. Here the run results are saved using the json format and correspond to previously collected pictures. Finally, it is necessary to ensure that the collected data is sufficient (modification is to ensure that the collected data reaches a given number threshold n, and 30-40 input samples can ensure the feasibility of the algorithm after testing). As shown in fig. 1, the method comprises the following steps:

step A1: the input samples input to the model are preprocessed to conform to the input criteria of the model.

Step A2: a counter is initialized that ensures that the sample data collected is sufficient.

Step A3: the input data is placed into a buffer and input into a neural network model and inferred. After the operation result of the model is obtained, the input picture and the operation result are combined to be used as a data set.

Step A4: and storing a group of input pictures and operation results, wherein the pictures are directly stored by using opencv, and the operation results and the names of the pictures are stored in a json format, so that the pictures and the results are ensured to be in one-to-one correspondence.

Step A5: and adding one to the counter value, judging whether the requirement of the data volume is met, and otherwise, continuing to collect the data.

2. Deployment and training of control gates: a control gate is deployed after each neuron in the neural network. After deployment of the control gates, the control gates may serve as criteria for whether the neuron plays a critical role for the final classification. In practice, the control gate may simply be placed as a parameter after the neuron and multiplied by the neuron's activation output as the final neuron's output result. If the value of the control gate corresponding to a certain neuron is greater than 1, the neuron is considered to be more important for the classification output of the neural network model, and if the value of the control gate corresponding to the neuron is less than 1, the neuron is considered to have smaller influence on the final output. In practice the control gate is a non-negative number and is directly set to 0 for control gates smaller than 0. In the training stage of the control gate, the aim is to obtain the optimal control gate corresponding to each picture and each class of pictures. Specifically, a control gate corresponding to each input sample is obtained first, which requires a gradient down update of the control gate using a gradient down algorithm in combination with the input picture and the final output result. After 100 iterations, the value of the control gate after the optimization is updated is obtained. Then, classifying the input samples, and classifying all pictures according to the operation result based on the operation result of each sample after model inference. For the control gate of each class, only the input samples belonging to the class are required to be subjected to control gate optimization, the obtained control gates are combined, and the set is the control gate belonging to the class; as shown in fig. 2, the method comprises the following steps:

step B1: and acquiring an original neural network model.

Step B2: control gates deployed into the neural network are initialized.

Step B3: for each picture, inferences are made through the original model and the model with the deployed control gates.

Step B4: and collecting the calculation results of the two models, solving the cross entropy of the calculation results, updating the deployed control gate by using a gradient descent method, and completing the training of the control gate of a single picture after 100 iterations.

Step B5: the control gate belonging to a single picture is saved.

Step B6: and judging whether all the pictures finish the training of the control gate or not so as to ensure that all the pictures are trained to perform the training of the control gate, and finally obtaining the optimal control gate corresponding to a single picture or a class of pictures.

3. The generation of key neurons, as their name implies, i.e. there are some neurons that are critical to this class. That is, these neurons are very important for the classification result of this class. From another perspective, if the gates of the pictures belonging to the same class are generated, the gate values corresponding to some neurons are always larger. That is, these neurons are critical to these pictures. These neurons can be said to be key neurons in these pictures. And (4) gathering key neurons corresponding to the pictures, and continuously observing the influence of the control gates of the neurons on the classification result of the class. If there are always some neurons whose control gates are of relatively high value for most pictures, these are considered as key neurons belonging to this class; as shown in fig. 3, the method comprises the following steps:

step C1: two different thresholds are set for screening key neurons belonging to a single picture and a single class, respectively. Wherein the first threshold value gamma₁The method is used for screening the key neurons of a single picture, the threshold value is derived from the subsequent calculation of a control gate, and the neurons which are important for the inference result of a single input sample can be screened out based on the trained values of the control gate. Second threshold value gamma₂For obtaining the key neurons belonging to the class, the threshold value is to ensure that the screened key neurons of a certain class can still maintain the normal classification function of the class.

Step C2: and traversing all the control gates of the pictures belonging to the same class.

Step C3: for all neurons of a single picture, if their value exceeds the threshold γ₁That is, the neurons corresponding to the gates exceeding the threshold are considered as the key neurons belonging to the picture, and the gates of the corresponding neurons are set to 1.

Step C4: and finishing the generation of key neurons of all pictures belonging to the same class.

Step C5: calculating the activation frequency of each neuron if the corresponding activation frequency exceeds a threshold value gamma₂I.e. this neuron is considered to be a key neuron belonging to this class.

Step C6: and judging whether the generation of the key neurons of all classes and all pictures in the classes is finished, and collecting all the key neurons if the traversal is finished.

4. The detection of the backdoor attack based on the abnormal index comprises two indexes: and the correlation coefficient index is used for carrying out statistical analysis on neurons with low activation frequency in the key neurons. Because there are some original pictures belonging to different classes corresponding to the feature class, after a trigger is added, the original pictures are classified into the target class, and the correlation coefficient is to analyze the influence of the features of the original pictures belonging to the different classes on the activation frequency of the key neurons of the target class. And (4) completing the detection of the neural network model by combining the correlation coefficient indexes of different layers and different classes. Here this task is done using the anomaly index. The abnormal index represents the abnormal degree of a neural network model, and for a neural network model, if the neural network model is not attacked by a backdoor, the abnormal index value is normal (within a normal range), and if the neural network model is attacked, the abnormal index value is in a dangerous state, namely, the abnormal index exceeds a threshold value of the normal range; as shown in fig. 4, the method comprises the following steps:

step D1: for N classes in a certain classification model (i.e. a specific neural network model, such as the VGG16 model exemplified below), key neurons belonging to all N classes, i.e. N groups of key neurons, may be obtained based on step C. After obtaining the neurons, performing mathematical analysis on the key neurons of the groups, and calculating to obtain N groups of two mathematical indexes, namely correlation coefficients and abnormal indexes.

Step D2: traversing all classes and model layers, calculating a covariance matrix of key neuron activation frequencies for each class corresponding to multiple input samples at each layer, for example, in the VGG16 model, all 16 layers need to be calculated. The covariance matrix represents the difference between the data sets and is expressed in the form of a matrix.

Wherein C representsTo calculate the covariance value, cov (A)_i，A_j) Means calculating the covariance between the two sets of data, A_iIndicating the ith group of data and K indicating a total of K groups of data.

Step D3: the variance of the key neuron activation frequency of a single sample at each layer and the final correlation coefficient are calculated. The correlation coefficient index is used for carrying out statistical analysis on neurons with lower activation frequency in key neurons.

Wherein, C_i，jIs to indicate the ith row and jth column elements,

is represented by A_kStandard deviation of this set of data.

Step D4: and taking out the correlation coefficients of all the N classes as an object to be analyzed.

Step D5: calculating the mean of the correlation coefficients

Based on the mean, the variance for the correlation coefficient in each layer for the combination of all N classes is calculated for the entire model.

Step D6: and calculating an abnormality index. The abnormality index needs to be calculated based on the correlation coefficient. Calculating the abnormality index requires combining all layers and all classes of the model, and the abnormality index of the model to be detected can be calculated by integrating the parameters.

Wherein alpha is_ijRepresenting the correlation coefficient value of the ith layer corresponding to the jth class;

step D7: if the anomaly index AI of the model to be determined is greater than a threshold value T_AIThe model is considered to be attacked, otherwise the model is considered to be a security model. Wherein T is_AIRepresenting an anomaly index threshold calculated from a number of security models, above which the model is likely to be attacked, and below which the model is secure. For example, T can be measured during the course of the experiments of the present invention_AISet to 0.06. The abnormal index of the safe and clean neural network model is lower than 0.06, and the abnormal index (0.15) of the model with the backdoor is far beyond the threshold value, so that whether the backdoor exists in the neural network model can be judged based on the abnormal index.

Defending against backdoor attacks based on fine-grained clipping: fine-grained clipping first requires determining the clipped object, and here the concept of abnormal neurons is first defined, which requires locating the key neurons with abnormally high activation frequency, which are possible abnormal neurons. After screening out neurons, it is not certain that these neurons are either abnormal or have a high activation frequency due to a particular cause in a normal situation. Therefore, the dispersion index is used for determining whether the screening is reasonable, and the dispersion index is used for carrying out statistical analysis on the neurons with high activation frequency in the key neurons. These higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture. Since all pictures are classified as a target class if triggers are included therein, corresponding to different pictures, there are some key neurons corresponding to the target class that correspond to the trigger pattern. After the abnormal neurons are determined, determining layers needing to be cut and a cutting proportion by using a fine-grained cutting strategy, cutting the screened abnormal neurons, and finishing a back door defense step; as shown in fig. 5, the method comprises the following steps:

step E1: and D, taking the result in the step D as a pilot condition, carrying out the step if the result in the step D is that the model is attacked, and directly outputting the original neural network model if the model is not attacked.

Step E2: abnormal neurons are located. Calculating the activation frequency of neurons of all layers, and screening the neurons with high activation frequency as possible abnormal neurons:

where ψ denotes the activation frequency of a single neuron,_τh denotes a threshold value of activation frequency for screening abnormal neurons, which always have this larger activation frequency value, Ψ_crRepresenting the set of activation frequencies corresponding to all key neurons.

Step E3: and calculating the dispersion. Since abnormally high activation frequencies of some neurons may also occur in normal neural networks. For the attacked model, due to the existence of the trigger pattern in the training process, the characteristics of the trigger are extracted from each layer of the neural network, so that a high degree of association exists between some neurons and the trigger in each layer. Based on the observation and analysis, a dispersion index is provided, and the statistical analysis is carried out on the neurons with higher activation frequency in the key neurons. These higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture. First, count of the number of neurons CARD (Ψ) having a higher activation frequency in a certain layer is counted_hg) Then counting the total number N of the layer of neurons_l. The dispersion is calculated based on two values:

step E4: the dispersion index is used as an auxiliary index, and the neuron generating high activation is judged to be an abnormal neuron caused by the existence of the trigger, so that the abnormal neuron with abnormal activation frequency is ensured to be really positioned: the number of abnormal neurons due to the presence of triggers is much greater than the number of high activation frequency neurons that normally occur. In this case, the dispersion is expressed in that the dispersion index of the model with the back door is much higher than that of the normal model. And traversing all layers which can implement the cutting by using a concept similar to grid search, collecting cutting proportion intervals, and determining the optimal cutting rate and the layers which implement the cutting.

Step E5: and (3) implementing a cutting strategy, namely assigning a value of 0 to the weight corresponding to the abnormal neuron by using a weight assignment mode, blocking the propagation of the abnormal neuron and finishing the defense of backdoor attack.

The method of the invention has the following characteristics:

1. the method performs mathematical analysis on key neurons that control the generation of gates. And the model is deeply embedded into the model, the internal information of the model is expressed in the form of key neurons, and the interpretability characteristic of the backdoor attack detection method is improved. The interpretability can explain and explain the reliability of the method more reasonably, and the detection method can be expanded more based on the interpretability of the model.

2. Run-time input samples are used as test data. Since most of the conventional methods need to perform backdoor attack detection in an offline state, the writing methods basically need to design or obtain a relatively large data set as a basis for development of the detection method. Compared with the methods, the detection of the backdoor attack of the model to be detected can be completed only by collecting some input samples in operation and developing the detection method for the samples.

3. And (3) positioning abnormal neurons by using a fine-grained clipping strategy, and implementing the clipping strategy in the possible abnormal neurons, so that the elimination of the backdoor can be completed by only clipping a small number of neurons, and retraining is not needed. The characteristic can reduce the reduction of the complexity of defense measures in the process of back door defense, and a specific clean data set is not needed, so that the cost and the expense in the process are reduced.

In view of the three characteristics, the method is suitable for defending the backdoor attack of the neural network model during the operation of the deployment phase.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A neural network backdoor attack defense method based on key neurons is characterized by comprising the following steps:

s1, collecting input samples and corresponding operation results when the computer vision application program is operated aiming at the computer vision application program with the deployed neural network model;

the program is used for completing the classification task of the input sample;

s2, after the control gate is deployed in each neuron in the neural network model, training the control gate according to the input samples collected in the step S1 and the corresponding operation results to obtain the optimal control gate corresponding to each type of input samples; the optimal control gate corresponding to each type of input sample in step S2 is specifically: inputting input samples belonging to the same class into a neural network for training, performing gradient descent updating on a control gate by adopting a gradient descent algorithm for each input sample in the class in combination with the input sample and an optimal output operation result, obtaining an updated control gate which is an optimal control gate corresponding to each input sample after iteration times are reached, and combining all optimal control gates corresponding to the input samples of the class to obtain the optimal control gate of the input samples of the class;

s3, determining a plurality of key neurons according to the optimal control gate; step S3 specifically includes: setting a first threshold and a second threshold, wherein the first threshold is used for screening key neurons corresponding to the condition that a single input sample is classified into one class, the second threshold is used for screening key neurons corresponding to the condition that the class at least comprises 2 input samples, and the specific screening process comprises the following steps:

for the case of a class at least comprising 2 input samples, calculating the activation frequency of each neuron corresponding to all the input samples in the class, and if the activation frequency exceeds a second threshold value, determining that the neuron belongs to a key neuron of the class;

s4, statistically analyzing neurons with low activation frequency in the plurality of key neurons, judging whether the neural network model is abnormal, if so, executing the step S5, otherwise, outputting the original neural network model; neurons with a lower activation frequency, i.e. an activation frequency less than the threshold τ_hThe neuron of (a);

s5, cutting the screened abnormal neurons by using a fine-grained cutting strategy to complete the back door defense; step S5 specifically includes: calculating the neuron activation frequency of all layers of the neural network model, and taking the neurons with overhigh activation frequency as suspected abnormal neurons; calculating the corresponding dispersion of each layer of the neural network model based on the suspected abnormal neurons, and positioning the abnormal neurons with abnormal activation frequency according to the dispersion; cutting the positioned abnormal neurons;

the dispersion β is calculated as:

wherein, card (Ψ)_hg) Number of neurons having a high activation frequency, N, representing the l-th layer of the neural network model_lRepresenting the total number of neurons in layer I of the neural network model, Ψ_hgNeurons with high activation frequency are represented:

Ψ_hg＝{ψ|ψ≥τ_h，ψ∈Ψ_cr}

where ψ denotes the activation frequency of an individual neuron, τ_hA threshold, Ψ, representing the frequency of activation used to screen for aberrant neurons_crRepresenting the set of activation frequencies corresponding to all key neurons.

2. The method for defending against neural network backdoor attacks based on key neurons according to claim 1, wherein each type of input sample in step S2 is specifically: and classifying the input samples according to the operation result obtained by training, wherein the scoring class result is N.

3. The neural network back door attack defense method based on key neurons according to claim 2, wherein the step S4 specifically comprises: taking the key neurons corresponding to each type of input sample in the step S3 as a group to obtain N groups of key neurons, and respectively calculating the correlation coefficient and the abnormality index of each group of neurons; if the abnormality index is greater than the third threshold, it indicates that the neural network model may be attacked, and the step S5 is continuously executed; otherwise, the neural network model is safe and is output.