CN112765607B - Neural network model backdoor attack detection method - Google Patents
Neural network model backdoor attack detection method Download PDFInfo
- Publication number
- CN112765607B CN112765607B CN202110068380.2A CN202110068380A CN112765607B CN 112765607 B CN112765607 B CN 112765607B CN 202110068380 A CN202110068380 A CN 202110068380A CN 112765607 B CN112765607 B CN 112765607B
- Authority
- CN
- China
- Prior art keywords
- neural network
- neurons
- network model
- key
- pictures
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Virology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a neural network model backdoor attack detection method, which comprises the following steps: s1, collecting input data during the operation of the neural network; s2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class; s3, generating key neurons; s4, calculating indexes based on the numerical characteristics of the critical path; and S5, calculating the abnormal index based on the index, and judging whether the neural network model is attacked by the backdoor. The method performs mathematical analysis on the generated key path of the control gate, expresses the internal information of the model in the form of the key path, and improves the reliability of the back door attack detection method; the back door attack detection of the model to be detected can be completed by using the input sample in the operation as detection data, and the method is very suitable for the back door attack detection of the neural network model in the operation of the deployment stage.
Description
Technical Field
In particular to a neural network model backdoor attack detection method which is mainly applied to a security-critical intelligent system runtime backdoor attack detection scene.
Background
Backdoor attacks are a serious threat to Artificial Intelligence (AI) applications based on Neural Networks (NN). The principle of the backdoor attack is that an attacker trains an attacked model by using a polluted data set and releases the attacked model to a public community. The user uses the attacked model unknowingly, and trigger pictures which are carefully designed by an attacker are mixed in the input of the runtime model, so that the classification precision of the model is greatly reduced, and even the model is unavailable. The goal of the backdoor attack is to embed an attacker-designed backdoor in the neural network model so that the attacker can attack the user's AI system at any time based on the backdoor.
In the aspect of detection of backdoor attacks, b.chen et al propose a backdoor attack detection method based on analysis of data set distribution and activation clustering; wang et al think that the attacked class is unstable, and some minor disturbances can cause classification failure, so a method based on anomaly detection is proposed to detect the backdoor attack; liu et al also proposed a backdoor attack detection method based on detecting the distribution of the prediction results, i.e., normal model classification results are uniformly distributed with respect to the data set, while the presence of a backdoor model highlights the classification aspect of one class over the others.
The system considers that under a common backdoor attack scene, namely an attacker trains and issues an attacked model, then a user is possibly attacked in the using process, and a defender provides a feasible detection method to detect whether the model is attacked or not. Different from the existing detection method, the system is triggered by the characteristics of the model based on the interpretability of the neural network model, and provides a backdoor attack detection method based on the neuron critical path. The system analyzes the generated key path of the model to be detected, finds the difference between the key path of the attacked model and the normal model, and accordingly completes the detection of the model to be determined.
Neural network critical path generation techniques are used to analyze the routing paths of critical neurons in a neural network model. Some neurons in the neural network may not only support the inference operation of the neural network, but may also reflect certain features on the input picture. Neurons that are closely associated with the input picture may be considered key neurons. The key neuron routing paths synthesized by key neuron groups of different layers are called key paths belonging to the class, and the combination of the key paths of different classes is called key paths of the whole model.
On the other hand, a control gate is one structure in a neural network. The control gate needs to be deployed to each neuron in each layer of the neural network and then multiplied as a parameter with the output of the neural network as the final output of the neural network neurons, as shown in fig. 1. The magnitude of the value may indicate the sensitivity and contribution of the corresponding neuron to the current class, with the control gate as a parameter. For example, if the value of the corresponding gate of a certain neuron is 3.2, it is considered that the contribution of the neuron to the current classification should be increased by 3.2 times compared with the output obtained by normal training. Conversely, a control gate with a value below 1 corresponds to a neuron that indicates that the contribution to the final classification should be reduced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a neural network model backdoor attack detection method which uses an input sample during operation as detection data, performs mathematical analysis on a generated key path of a control door, expresses the internal information of a model in the form of the key path and improves the reliability of the backdoor attack detection method.
The purpose of the invention is realized by the following technical scheme: a neural network model backdoor attack detection method comprises the following steps:
s1, collecting input data during the operation of the neural network: aiming at a deployed neural network model, collecting an input sample and a corresponding operation result when the neural network model operates;
s2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class;
s3, generating key neurons;
s4, calculating indexes based on the numerical characteristics of the critical path;
and S5, calculating the abnormal index based on the index, and judging whether the neural network model is attacked by the backdoor.
Further, the specific implementation method of step S1 is as follows:
s11, preprocessing the picture input into the neural network model to make the picture accord with the input standard of the neural network model;
s12, initializing the counter value to be 0;
s13, placing the input picture in a buffer area, and inputting the picture into a neural network model for inference; after the operation result of the neural network model is obtained, taking the input picture and the operation result as a data set; after the operation of the neural network model, the pictures with the same classification result are of the same class, and the pictures of the same class are collected to form a set;
s14, storing the data group, directly storing the pictures by using opencv, storing the operation results and the names of the pictures by using a json format, and ensuring that the pictures correspond to the results one to one;
and S15, adding one to the counter value, judging whether the counter meets the set data volume requirement, if so, ending the collection, otherwise, returning to the step S13.
Further, the specific implementation method of step S2 is as follows:
s21, acquiring an original neural network model;
s22, initializing a control gate deployed in the neural network, wherein the control gate is deployed behind each neuron in the neural network;
s23, inputting the pictures into the original neural network model and the neural network model with the control gate for inference respectively;
s24, collecting the operation results of the two models, calculating the cross entropy of the two operation results, then updating the deployed control gate by using a gradient descent method, and completing the training of the control gate of a single picture after 100 iterations;
s25, saving the control gate belonging to a single picture;
s26, executing the operations of steps S24 and S25 for each picture, and ensuring that all pictures are trained to control the door.
Further, the specific implementation method of step S3 is as follows:
s31, setting two different thresholds which are respectively used for screening key neurons belonging to a single picture and a single class;
s32, traversing control gates of all pictures belonging to the same class, and completing generation of key neurons of all pictures belonging to the same class; the specific generation mode is as follows: for the control gates corresponding to all the neurons of a single picture, if the values of the control gates exceed a set threshold, considering the neurons corresponding to the control gates exceeding the threshold as the key neurons belonging to the picture, and setting the control gates of the corresponding neurons as 1; if the number of the neurons does not exceed the threshold value, the neurons corresponding to the control gate are not the key neurons belonging to the picture, and the value of the control gate is set to be 0;
s33, calculating the activation frequency of each neuron, and if the activation frequency exceeds a set threshold value, determining that the neuron is a key neuron belonging to the class; neurons that do not exceed a set threshold are considered non-critical neurons and are not important for this class;
s34, executing the operations of the steps S32 and S33 on all the types of pictures to obtain all the key neurons.
Further, the specific implementation method of step S4 is as follows:
s41, after calculating to obtain a class of key neurons, connecting all the key neurons belonging to the class to obtain a key path belonging to the class; splicing the key paths of all classes to obtain a key path belonging to the model;
s42, calculating a covariance matrix of each layer of the key path of the plurality of input pictures corresponding to each class;
for the neural network model participating in the operation, L layers are shared, one layer is selected in sequence in one operation to calculate a covariance matrix of a critical path in a certain layer, the covariance matrix represents the difference among a plurality of groups of data and is represented by a matrix form:
representing the computation of covariance between two sets of critical path data at layer i,data representing a critical path corresponding to the pth group of pictures,representing critical path data corresponding to the q-th group of pictures, K representing the total number of picture data;
p=1,2,...,K,q=1,2,...,K,p≠q,1≤l≤L;
s43, calculating the variance of the key path corresponding to all input pictures at the l layer and the final correlation coefficient alpha at the l layerl;
The correlation coefficient index is used for carrying out statistical analysis on neurons with the activation frequency lower than a set threshold tau in key neurons in a key path:
ci,jis a matrix ClThe ith row and the jth column of the element,representing critical path data corresponding to the l-th layerThe standard deviation of (a); alpha is alphalCalculating a correlation coefficient of a key path corresponding to K pictures on the ith layer;
s44, calculating the dispersion; the dispersion index is used for carrying out statistical analysis on neurons with the activation frequency higher than 80% in the key neurons in the key path; these higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture; first, the number of neurons CARD (Ψ) having a high activation frequency is countedhg) Then counting the total number N of the layer of neuronslThen, the dispersion is calculated based on two values:
and S45, repeating the operations of the steps S42-S44, and finishing the calculation of the correlation coefficients and the dispersion of all the classes.
Further, the specific implementation method of step S5 is as follows:
s51, taking the dispersion and correlation coefficient of all classes as undetermined data;
s52, traversing all classes and model layers, and calculating the mean value of the correlation coefficientAnd mean of dispersionThe two mean values are used as a basis for subsequently calculating the variance of the correlation coefficient and the dispersion;
s53, calculating an abnormality index; the abnormality index needs to be calculated based on two indexes of the correlation coefficient and the dispersion, the abnormality index needs to be calculated by combining all layers and all classes of the model, the parameters are integrated, and the abnormality index of the model to be detected is obtained by calculation by combining the correlation coefficient and the dispersion:
l represents the L th layer of the current neural network model, N represents the N th class of the current neural network model, N represents the number of output classes of the neural network model, and L represents the total number of layers of the neural network model;
and S53, if the abnormality index AI of the model to be measured is larger than the threshold value, the model is considered to be attacked, otherwise, the model is considered to be a safety model.
The invention has the beneficial effects that:
1. the method carries out mathematical analysis on the generated key path of the control gate, extends into the model, expresses the internal information of the model in the form of the key path, improves the effectiveness of the backdoor attack detection method by utilizing a neural network interpretability method, can explain and explain the reliability of the method more reasonably, and can expand the detection method more based on the interpretability of the model.
2. The present invention uses run-time input samples as detection data. Since most of the conventional methods need to perform backdoor attack detection in an offline state, the writing methods basically need to design or obtain a relatively large data set as a basis for development of the detection method. Compared with the methods, the method can finish the back door attack detection of the model to be detected only by collecting some input samples in operation and developing a detection method for the samples, and is very suitable for the back door attack detection of the neural network model in operation in a deployment stage.
Drawings
FIG. 1 is a schematic diagram of a neural network architecture with control gates deployed;
FIG. 2 is a flow chart of a neural network model backdoor attack detection method of the present invention;
FIG. 3 is a flow chart of collecting input data during operation of a neural network;
FIG. 4 is a flow chart of control gate optimization training;
FIG. 5 is a flow chart of key neuron generation;
FIG. 6 is a flow chart of calculating an indicator based on a numerical feature of a critical path;
FIG. 7 is a flow chart of anomaly index calculation based on indicators.
Detailed Description
Neural networks are widely used in safety-critical fields such as face recognition, autopilot, etc. At present, an attack method aiming at a neural network model is quite common, wherein backdoor attacks are an important part of the attack method. After the attacked neural network is applied to specific applications such as face recognition and automatic driving, dangerous consequences such as information leakage, traffic accident and the like caused by the fact that vehicles cannot recognize traffic signs can be caused. Therefore, the design and application of a defense method against neural network backdoor attacks are very important and urgent. The invention is based on interpretability, combines with specific tasks such as face recognition and guideboard recognition, and completes the detection of backdoor attack through collected running data such as guideboards acquired by faces or vehicles. The technical scheme of the invention is further explained by combining the attached drawings.
As shown in fig. 2, the method for detecting a back door attack of a neural network model of the present invention includes the following steps:
s1, collecting input data during the operation of the neural network: aiming at a deployed neural network model, collecting an input sample and a corresponding operation result when the neural network model operates;
for the input sample collection module, the input samples and the corresponding operation results at the operation time are mainly collected aiming at the deployed neural network model. It is expected that all pictures input into the neural network model are legal and can correspond to a legal output, so that the pictures need to be cut by a preprocessing module when the models are input. And then classifying and sorting the data in operation by using an input sample collection method, wherein the image is stored locally by directly using a storage mode and using an opencv image storage method. Then, after the pictures are input into the model, the corresponding operation results need to be collected continuously. Here the run results are saved using the json format and correspond to previously collected pictures. Finally, it is necessary to ensure that the collected data is adequate. As shown in fig. 3, the specific implementation method is as follows:
s11, preprocessing the picture input into the neural network model to make the picture accord with the input standard of the neural network model;
s12, initializing the counter value to be 0;
s13, placing the input picture in a buffer area, and inputting the picture into a neural network model for inference; after the operation result of the neural network model is obtained, taking the input picture and the operation result as a data set; after the operation of the neural network model, the pictures with the same classification result are of the same class, and the pictures of the same class are collected to form a set;
s14, storing the data group, directly storing the pictures by using opencv, storing the operation results and the names of the pictures by using a json format, and ensuring that the pictures correspond to the results one to one;
and S15, adding one to the counter value, judging whether the counter meets the set data volume requirement, if so, ending the collection, otherwise, returning to the step S13.
S2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class;
after deployment of the control gates, the control gates may serve as criteria for whether the neuron plays a critical role for the final classification. In practice, the control gate may simply be placed as a parameter into the neuron and multiplied by the neuron's activation output as the final neuron's output result. If the value of the control gate corresponding to a certain neuron is greater than 1, the neuron is considered to be more important for the classification output of the neural network model, and if the value of the control gate corresponding to the neuron is less than 1, the neuron is considered to have smaller influence on the final output. In practice the control gate is a non-negative number and is directly set to 0 for control gates smaller than 0. In the training phase of the control gate, the aim is to obtain the optimal control gate corresponding to each picture and each class. Specifically, a control gate corresponding to each input sample is obtained first, which requires a gradient down update of the control gate using a gradient down algorithm in combination with the input picture and the final output result. After 100 iterations, the value of the control gate after the optimization is updated is obtained. For the control gates of each class, only the input samples belonging to the class need to be combined after all the control gates are optimized, and the set is the control gates belonging to the class. As shown in fig. 4, the specific implementation method is as follows:
s21, acquiring an original neural network model;
s22, initializing a control gate deployed in the neural network, wherein the control gate is deployed behind each neuron in the neural network;
s23, inputting the pictures into the original neural network model and the neural network model with the control gate for inference respectively;
s24, collecting the operation results of the two models, calculating the cross entropy of the two operation results, then updating the deployed control gate by using a gradient descent method, and completing the training of the control gate of a single picture after 100 iterations;
s25, saving the control gate belonging to a single picture;
s26, executing the operations of steps S24 and S25 for each picture to ensure that all pictures are trained to control the door.
S3, generating key neurons;
key neurons, as their name implies, are the existence of neurons that are critical to this class. That is, these neurons are very important for the classification result of this class. From another perspective, if the gates of the pictures belonging to the same class are generated, the gate values corresponding to some neurons are always larger. That is, these neurons are critical to these pictures. These neurons can be said to be key neurons in these pictures. And (4) gathering key neurons corresponding to the pictures, and continuously observing the influence of the control gates of the neurons on the classification result of the class. If there are always some neurons whose control gates are of relatively high value for most pictures, these are considered as key neurons belonging to this class. As shown in fig. 5, the specific implementation method is as follows:
s31, setting two different thresholds which are respectively used for screening key neurons belonging to a single picture and a single class;
s32, traversing control gates of all pictures belonging to the same class, and completing generation of key neurons of all pictures belonging to the same class; the specific generation mode is as follows: for the control gates corresponding to all the neurons of a single picture, if the values of the control gates exceed a set threshold, considering the neurons corresponding to the control gates exceeding the threshold as the key neurons belonging to the picture, and setting the control gates of the corresponding neurons as 1; if the threshold value is not exceeded, the neuron corresponding to the control gate is not considered to be a key neuron belonging to the picture, and the value of the control gate is set to be 0;
s33, calculating the activation frequency of each neuron, and if the activation frequency exceeds a set threshold value, determining that the neuron is a key neuron belonging to the class; neurons that do not exceed a set threshold are considered non-critical neurons and are not important for this class;
s34, executing the operations of the steps S32 and S33 on all the types of pictures to obtain all the key neurons.
As shown in fig. 6, the specific implementation method is as follows:
s41, after calculating to obtain a class of key neurons, connecting all the key neurons belonging to the class to obtain a key path belonging to the class; splicing all the key paths to obtain the key path belonging to the model;
s42, calculating a covariance matrix of each layer of the key path of the plurality of input pictures corresponding to each class;
for the neural network model participating in the operation, L layers are shared, one layer is selected in sequence in one operation to calculate a covariance matrix of a key path in a certain layer, the covariance matrix represents the difference among a plurality of groups of data and is represented by a matrix form:
representing the computation of covariance between two sets of critical path data at layer i,data representing a critical path corresponding to the pth group of pictures,representing critical path data corresponding to the q-th group of pictures, K representing the total number of picture data; p is equal to 1, 2, 1, K, q is equal to 1, 2, 1, K, p is equal to q, and L is equal to or less than 1 and equal to or less than L;
s43, calculating the variance of the key path corresponding to all input pictures at the l layer and the final correlation coefficient alpha at the l layerl;
The correlation coefficient index is used for carrying out statistical analysis on neurons with the activation frequency lower than a set threshold tau in key neurons in a key path:
ci,jis a matrix ClThe ith row and the jth column of the element,representing critical path data corresponding to the l-th layerStandard deviation of (d); alpha is alphalCalculating a correlation coefficient of a key path corresponding to K pictures on the ith layer;
s44, calculating the dispersion; the dispersion index is used for carrying out statistical analysis on neurons with the activation frequency higher than 80% in the key neurons in the key path; these higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture; first, the number of neurons CARD (Ψ) having a high activation frequency is countedhg) Then counting the total number N of the layer of neuronslThen, the dispersion is calculated based on two values:
and S45, repeating the operations of the steps S42-S44, and finishing the calculation of the correlation coefficients and the dispersion of all the classes.
As shown in fig. 7, the method specifically includes the following sub-steps:
s51, taking the dispersion and the correlation coefficient of all the classes as undetermined data;
s52, traversing all classes and model layers, and calculating the mean value of the correlation coefficientAnd mean of dispersionThe two mean values are used as a basis for subsequently calculating the variance of the correlation coefficient and the dispersion;
s53, calculating an abnormality index; the abnormality index indicates the degree of abnormality of a neural network model, and for a neural network model, if it is not attacked by the backdoor, its abnormality index value is normal (within a normal range), and if it is attacked, its abnormality index is in a dangerous state, that is, exceeds a threshold value of the normal range. The abnormality index needs to be calculated based on two indexes of the correlation coefficient and the dispersion, the abnormality index needs to be calculated by combining all layers and all classes of the model, the parameters are integrated, and the abnormality index of the model to be detected is obtained by calculation by combining the correlation coefficient and the dispersion:
l represents the L th layer of the current neural network model, N represents the N th class of the current neural network model, N represents the number of output classes of the neural network model, and L represents the total number of layers of the neural network model;
and S53, if the abnormality index AI of the model to be measured is larger than the threshold value, the model is considered to be attacked, otherwise, the model is considered to be a safety model.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.
Claims (4)
1. A neural network model backdoor attack detection method is characterized by comprising the following steps:
s1, collecting input data during the operation of the neural network: aiming at a deployed neural network model, collecting an input sample and a corresponding operation result when the neural network model operates;
s2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class;
s3, generating key neurons;
s4, calculating indexes based on the numerical characteristics of the critical path; the specific implementation method comprises the following steps:
s41, after calculating to obtain a class of key neurons, connecting all the key neurons belonging to the class to obtain a key path belonging to the class; splicing the key paths of all classes to obtain a key path belonging to the model;
s42, calculating a covariance matrix of each layer of the key path of the plurality of input pictures corresponding to each class;
for the neural network model participating in the operation, L layers are shared, one layer is selected in sequence in one operation to calculate a covariance matrix of a key path in a certain layer, the covariance matrix represents the difference among a plurality of groups of data and is represented by a matrix form:
representing the computation of covariance between two sets of critical path data at layer i,data representing a critical path corresponding to the pth group of pictures,representing critical path data corresponding to the q-th group of pictures, K representing the total number of picture data; p is equal to 1, 2, 1, K, q is equal to 1, 2, 1, K, p is equal to q, and L is equal to or less than 1 and equal to or less than L;
s43, calculating the variance of the key path corresponding to all input pictures at the l layer and the final correlation coefficient alpha at the l layerl;
The correlation coefficient index is used for carrying out statistical analysis on neurons with the activation frequency lower than a set threshold tau in key neurons in a key path:
ci,jis a matrix ClThe ith row and the jth column of the element,representing critical path data corresponding to the l-th layerStandard deviation of (d); alpha is alphalCalculating a correlation coefficient of a key path corresponding to K pictures on the ith layer;
s44, calculating the dispersion; the dispersion index is higher than 80% of activation frequency in key neurons in the key pathCarrying out statistical analysis on the neurons; these higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture; first, the number of neurons CARD (Ψ) having a high activation frequency is countedhg) Then counting the total number N of the layer of neuronslThen, the dispersion is calculated based on two values:
s45, repeating the operations of the steps S42-S44, and completing the calculation of the correlation coefficients and the dispersion of all the classes;
s5, calculating an abnormal index based on the index, and judging whether the neural network model is attacked by a backdoor; the specific implementation method comprises the following steps:
s51, taking the dispersion and correlation coefficient of all classes as undetermined data;
s52, traversing all classes and model layers, and calculating the mean value of the correlation coefficientAnd mean of dispersionThe two mean values are used as a basis for subsequently calculating the variance of the correlation coefficient and the dispersion;
s53, calculating an abnormality index; the abnormality index needs to be calculated based on two indexes of the correlation coefficient and the dispersion, the abnormality index needs to be calculated by combining all layers and all classes of the model, the parameters are integrated, and the abnormality index of the model to be detected is obtained by calculation by combining the correlation coefficient and the dispersion:
l represents the L th layer of the current neural network model, N represents the N th class of the current neural network model, N represents the number of output classes of the neural network model, and L represents the total number of layers of the neural network model;
and S53, if the abnormality index AI of the model to be measured is larger than the threshold value, the model is considered to be attacked, otherwise, the model is considered to be a safety model.
2. The method for detecting the back door attack of the neural network model according to claim 1, wherein the step S1 is specifically implemented by:
s11, preprocessing the picture input into the neural network model to make the picture accord with the input standard of the neural network model;
s12, initializing the counter value to be 0;
s13, placing the input picture in a buffer area, and inputting the picture into a neural network model for inference; after the operation result of the neural network model is obtained, taking the input picture and the operation result as a data set; after the operation of the neural network model, the pictures with the same classification result are of the same class, and the pictures of the same class are collected to form a set;
s14, storing the data group, directly storing the pictures by using opencv, storing the operation results and the names of the pictures by using a json format, and ensuring that the pictures correspond to the results one to one;
and S15, adding one to the counter value, judging whether the counter meets the set data volume requirement, if so, ending the collection, otherwise, returning to the step S13.
3. The method for detecting the back door attack of the neural network model according to claim 1, wherein the step S2 is specifically implemented by:
s21, acquiring an original neural network model;
s22, initializing a control gate deployed in the neural network, wherein the control gate is deployed behind each neuron in the neural network;
s23, inputting the pictures into the original neural network model and the neural network model with the control gate for inference respectively;
s24, collecting the operation results of the two models, calculating the cross entropy of the two operation results, then updating the deployed control gate by using a gradient descent method, and completing the training of the control gate of a single picture after 100 iterations;
s25, saving the control gate belonging to a single picture;
s26, executing the operations of steps S24 and S25 for each picture, and ensuring that all pictures are trained to control the door.
4. The method for detecting the back door attack of the neural network model according to claim 1, wherein the step S3 is specifically implemented by:
s31, setting two different thresholds which are respectively used for screening key neurons belonging to a single picture and a single class;
s32, traversing control gates of all pictures belonging to the same class, and completing generation of key neurons of all pictures belonging to the same class; the specific generation mode is as follows: for the control gates corresponding to all the neurons of a single picture, if the values of the control gates exceed a set threshold, considering the neurons corresponding to the control gates exceeding the threshold as the key neurons belonging to the picture, and setting the control gates of the corresponding neurons as 1; if the threshold value is not exceeded, the neuron corresponding to the control gate is not considered to be a key neuron belonging to the picture, and the value of the control gate is set to be 0;
s33, calculating the activation frequency of each neuron, and if the activation frequency exceeds a set threshold value, determining that the neuron is a key neuron belonging to the class; neurons that do not exceed a set threshold are considered non-critical neurons and are not important for this class;
s34, executing the operations of the steps S32 and S33 on all the types of pictures to obtain all the key neurons.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110068380.2A CN112765607B (en) | 2021-01-19 | 2021-01-19 | Neural network model backdoor attack detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110068380.2A CN112765607B (en) | 2021-01-19 | 2021-01-19 | Neural network model backdoor attack detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112765607A CN112765607A (en) | 2021-05-07 |
CN112765607B true CN112765607B (en) | 2022-05-17 |
Family
ID=75703117
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110068380.2A Active CN112765607B (en) | 2021-01-19 | 2021-01-19 | Neural network model backdoor attack detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112765607B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113222120B (en) * | 2021-05-31 | 2022-09-16 | 北京理工大学 | Neural network back door injection method based on discrete Fourier transform |
CN113283590B (en) * | 2021-06-11 | 2024-03-19 | 浙江工业大学 | Defending method for back door attack |
CN114897161B (en) * | 2022-05-17 | 2023-02-07 | 中国信息通信研究院 | Mask-based graph classification backdoor attack defense method and system, electronic equipment and storage medium |
CN115659171B (en) * | 2022-09-26 | 2023-06-06 | 中国工程物理研究院计算机应用研究所 | Model back door detection method and device based on multi-element feature interaction and storage medium |
CN116383814B (en) * | 2023-06-02 | 2023-09-15 | 浙江大学 | Neural network model back door detection method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110941855A (en) * | 2019-11-26 | 2020-03-31 | 电子科技大学 | Stealing and defending method for neural network model under AIoT scene |
CN111242291A (en) * | 2020-04-24 | 2020-06-05 | 支付宝(杭州)信息技术有限公司 | Neural network backdoor attack detection method and device and electronic equipment |
CN111260059A (en) * | 2020-01-23 | 2020-06-09 | 复旦大学 | Back door attack method of video analysis neural network model |
CN112132262A (en) * | 2020-09-08 | 2020-12-25 | 西安交通大学 | Recurrent neural network backdoor attack detection method based on interpretable model |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111049786A (en) * | 2018-10-12 | 2020-04-21 | 北京奇虎科技有限公司 | Network attack detection method, device, equipment and storage medium |
US11514297B2 (en) * | 2019-05-29 | 2022-11-29 | Anomalee Inc. | Post-training detection and identification of human-imperceptible backdoor-poisoning attacks |
US11609990B2 (en) * | 2019-05-29 | 2023-03-21 | Anomalee Inc. | Post-training detection and identification of human-imperceptible backdoor-poisoning attacks |
-
2021
- 2021-01-19 CN CN202110068380.2A patent/CN112765607B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110941855A (en) * | 2019-11-26 | 2020-03-31 | 电子科技大学 | Stealing and defending method for neural network model under AIoT scene |
CN111260059A (en) * | 2020-01-23 | 2020-06-09 | 复旦大学 | Back door attack method of video analysis neural network model |
CN111242291A (en) * | 2020-04-24 | 2020-06-05 | 支付宝(杭州)信息技术有限公司 | Neural network backdoor attack detection method and device and electronic equipment |
CN112132262A (en) * | 2020-09-08 | 2020-12-25 | 西安交通大学 | Recurrent neural network backdoor attack detection method based on interpretable model |
Non-Patent Citations (5)
Title |
---|
《Heatmap-Aware Low-Cost Design to Resist Adversarial Attacks: Work-in-Progress》;Zhiyuan He等;《2020 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)》;20201109;第32-33页 * |
《Interpretability Derived Backdoor Attacks Detection in Deep Neural Networks: Work-in-Progress》;Xiangyu Wen等;《2020 International Conference on Embedded Software (EMSOFT)》;20201109;正文第Ⅱ-Ⅴ节 * |
《分布式系统中抵御错误注入攻击的优化设计》;文亮等;《计算机应用》;20160229;第36卷(第2期);第495-498页 * |
《基于有限状态机的指针解引用静态检测方法》;詹瑾瑜等;《四川大学学报(工程科学版)》;20110430;第43卷(第4期);第135-142页 * |
《深度学习模型的中毒攻击与防御综述》;陈晋音等;《信息安全学报》;20200831;第5卷(第4期);第14-29页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112765607A (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112765607B (en) | Neural network model backdoor attack detection method | |
US11514297B2 (en) | Post-training detection and identification of human-imperceptible backdoor-poisoning attacks | |
US11565721B2 (en) | Testing a neural network | |
Bejani et al. | Convolutional neural network with adaptive regularization to classify driving styles on smartphones | |
CN111310814A (en) | Method and device for training business prediction model by utilizing unbalanced positive and negative samples | |
CN111783442A (en) | Intrusion detection method, device, server and storage medium | |
CN110874471B (en) | Privacy and safety protection neural network model training method and device | |
CN112115761B (en) | Countermeasure sample generation method for detecting vulnerability of visual perception system of automatic driving automobile | |
CN113283599B (en) | Attack resistance defense method based on neuron activation rate | |
CN113537284B (en) | Deep learning implementation method and system based on mimicry mechanism | |
CN113811894B (en) | Monitoring of a KI module for driving functions of a vehicle | |
CN116192500A (en) | Malicious flow detection device and method for resisting tag noise | |
CN114220097A (en) | Anti-attack-based image semantic information sensitive pixel domain screening method and application method and system | |
Kirichek et al. | System for detecting network anomalies using a hybrid of an uncontrolled and controlled neural network | |
CN117454187B (en) | Integrated model training method based on frequency domain limiting target attack | |
CN112084936B (en) | Face image preprocessing method, device, equipment and storage medium | |
CN116383814B (en) | Neural network model back door detection method and system | |
CN113010888B (en) | Neural network backdoor attack defense method based on key neurons | |
CN116305103A (en) | Neural network model backdoor detection method based on confidence coefficient difference | |
CN111666985B (en) | Deep learning confrontation sample image classification defense method based on dropout | |
CN114021136A (en) | Back door attack defense system for artificial intelligence model | |
CN114119382A (en) | Image raindrop removing method based on attention generation countermeasure network | |
Dhonthi et al. | Backdoor mitigation in deep neural networks via strategic retraining | |
CN116739073B (en) | Online back door sample detection method and system based on evolution deviation | |
CN118101326B (en) | Lightweight Internet of vehicles intrusion detection method based on improved MobileNetV model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |