CN112765607B - Neural network model backdoor attack detection method - Google Patents

Neural network model backdoor attack detection method Download PDF

Info

Publication number
CN112765607B
CN112765607B CN202110068380.2A CN202110068380A CN112765607B CN 112765607 B CN112765607 B CN 112765607B CN 202110068380 A CN202110068380 A CN 202110068380A CN 112765607 B CN112765607 B CN 112765607B
Authority
CN
China
Prior art keywords
neural network
neurons
network model
key
pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110068380.2A
Other languages
Chinese (zh)
Other versions
CN112765607A (en
Inventor
江维
詹瑾瑜
温翔宇
周星志
宋子微
孙若旭
廖炘可
范翥峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110068380.2A priority Critical patent/CN112765607B/en
Publication of CN112765607A publication Critical patent/CN112765607A/en
Application granted granted Critical
Publication of CN112765607B publication Critical patent/CN112765607B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Virology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a neural network model backdoor attack detection method, which comprises the following steps: s1, collecting input data during the operation of the neural network; s2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class; s3, generating key neurons; s4, calculating indexes based on the numerical characteristics of the critical path; and S5, calculating the abnormal index based on the index, and judging whether the neural network model is attacked by the backdoor. The method performs mathematical analysis on the generated key path of the control gate, expresses the internal information of the model in the form of the key path, and improves the reliability of the back door attack detection method; the back door attack detection of the model to be detected can be completed by using the input sample in the operation as detection data, and the method is very suitable for the back door attack detection of the neural network model in the operation of the deployment stage.

Description

Neural network model backdoor attack detection method
Technical Field
In particular to a neural network model backdoor attack detection method which is mainly applied to a security-critical intelligent system runtime backdoor attack detection scene.
Background
Backdoor attacks are a serious threat to Artificial Intelligence (AI) applications based on Neural Networks (NN). The principle of the backdoor attack is that an attacker trains an attacked model by using a polluted data set and releases the attacked model to a public community. The user uses the attacked model unknowingly, and trigger pictures which are carefully designed by an attacker are mixed in the input of the runtime model, so that the classification precision of the model is greatly reduced, and even the model is unavailable. The goal of the backdoor attack is to embed an attacker-designed backdoor in the neural network model so that the attacker can attack the user's AI system at any time based on the backdoor.
In the aspect of detection of backdoor attacks, b.chen et al propose a backdoor attack detection method based on analysis of data set distribution and activation clustering; wang et al think that the attacked class is unstable, and some minor disturbances can cause classification failure, so a method based on anomaly detection is proposed to detect the backdoor attack; liu et al also proposed a backdoor attack detection method based on detecting the distribution of the prediction results, i.e., normal model classification results are uniformly distributed with respect to the data set, while the presence of a backdoor model highlights the classification aspect of one class over the others.
The system considers that under a common backdoor attack scene, namely an attacker trains and issues an attacked model, then a user is possibly attacked in the using process, and a defender provides a feasible detection method to detect whether the model is attacked or not. Different from the existing detection method, the system is triggered by the characteristics of the model based on the interpretability of the neural network model, and provides a backdoor attack detection method based on the neuron critical path. The system analyzes the generated key path of the model to be detected, finds the difference between the key path of the attacked model and the normal model, and accordingly completes the detection of the model to be determined.
Neural network critical path generation techniques are used to analyze the routing paths of critical neurons in a neural network model. Some neurons in the neural network may not only support the inference operation of the neural network, but may also reflect certain features on the input picture. Neurons that are closely associated with the input picture may be considered key neurons. The key neuron routing paths synthesized by key neuron groups of different layers are called key paths belonging to the class, and the combination of the key paths of different classes is called key paths of the whole model.
On the other hand, a control gate is one structure in a neural network. The control gate needs to be deployed to each neuron in each layer of the neural network and then multiplied as a parameter with the output of the neural network as the final output of the neural network neurons, as shown in fig. 1. The magnitude of the value may indicate the sensitivity and contribution of the corresponding neuron to the current class, with the control gate as a parameter. For example, if the value of the corresponding gate of a certain neuron is 3.2, it is considered that the contribution of the neuron to the current classification should be increased by 3.2 times compared with the output obtained by normal training. Conversely, a control gate with a value below 1 corresponds to a neuron that indicates that the contribution to the final classification should be reduced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a neural network model backdoor attack detection method which uses an input sample during operation as detection data, performs mathematical analysis on a generated key path of a control door, expresses the internal information of a model in the form of the key path and improves the reliability of the backdoor attack detection method.
The purpose of the invention is realized by the following technical scheme: a neural network model backdoor attack detection method comprises the following steps:
s1, collecting input data during the operation of the neural network: aiming at a deployed neural network model, collecting an input sample and a corresponding operation result when the neural network model operates;
s2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class;
s3, generating key neurons;
s4, calculating indexes based on the numerical characteristics of the critical path;
and S5, calculating the abnormal index based on the index, and judging whether the neural network model is attacked by the backdoor.
Further, the specific implementation method of step S1 is as follows:
s11, preprocessing the picture input into the neural network model to make the picture accord with the input standard of the neural network model;
s12, initializing the counter value to be 0;
s13, placing the input picture in a buffer area, and inputting the picture into a neural network model for inference; after the operation result of the neural network model is obtained, taking the input picture and the operation result as a data set; after the operation of the neural network model, the pictures with the same classification result are of the same class, and the pictures of the same class are collected to form a set;
s14, storing the data group, directly storing the pictures by using opencv, storing the operation results and the names of the pictures by using a json format, and ensuring that the pictures correspond to the results one to one;
and S15, adding one to the counter value, judging whether the counter meets the set data volume requirement, if so, ending the collection, otherwise, returning to the step S13.
Further, the specific implementation method of step S2 is as follows:
s21, acquiring an original neural network model;
s22, initializing a control gate deployed in the neural network, wherein the control gate is deployed behind each neuron in the neural network;
s23, inputting the pictures into the original neural network model and the neural network model with the control gate for inference respectively;
s24, collecting the operation results of the two models, calculating the cross entropy of the two operation results, then updating the deployed control gate by using a gradient descent method, and completing the training of the control gate of a single picture after 100 iterations;
s25, saving the control gate belonging to a single picture;
s26, executing the operations of steps S24 and S25 for each picture, and ensuring that all pictures are trained to control the door.
Further, the specific implementation method of step S3 is as follows:
s31, setting two different thresholds which are respectively used for screening key neurons belonging to a single picture and a single class;
s32, traversing control gates of all pictures belonging to the same class, and completing generation of key neurons of all pictures belonging to the same class; the specific generation mode is as follows: for the control gates corresponding to all the neurons of a single picture, if the values of the control gates exceed a set threshold, considering the neurons corresponding to the control gates exceeding the threshold as the key neurons belonging to the picture, and setting the control gates of the corresponding neurons as 1; if the number of the neurons does not exceed the threshold value, the neurons corresponding to the control gate are not the key neurons belonging to the picture, and the value of the control gate is set to be 0;
s33, calculating the activation frequency of each neuron, and if the activation frequency exceeds a set threshold value, determining that the neuron is a key neuron belonging to the class; neurons that do not exceed a set threshold are considered non-critical neurons and are not important for this class;
s34, executing the operations of the steps S32 and S33 on all the types of pictures to obtain all the key neurons.
Further, the specific implementation method of step S4 is as follows:
s41, after calculating to obtain a class of key neurons, connecting all the key neurons belonging to the class to obtain a key path belonging to the class; splicing the key paths of all classes to obtain a key path belonging to the model;
s42, calculating a covariance matrix of each layer of the key path of the plurality of input pictures corresponding to each class;
for the neural network model participating in the operation, L layers are shared, one layer is selected in sequence in one operation to calculate a covariance matrix of a critical path in a certain layer, the covariance matrix represents the difference among a plurality of groups of data and is represented by a matrix form:
Figure BDA0002905045490000031
Figure BDA0002905045490000032
representing the computation of covariance between two sets of critical path data at layer i,
Figure BDA0002905045490000033
data representing a critical path corresponding to the pth group of pictures,
Figure BDA0002905045490000034
representing critical path data corresponding to the q-th group of pictures, K representing the total number of picture data;
p=1,2,...,K,q=1,2,...,K,p≠q,1≤l≤L;
s43, calculating the variance of the key path corresponding to all input pictures at the l layer and the final correlation coefficient alpha at the l layerl
The correlation coefficient index is used for carrying out statistical analysis on neurons with the activation frequency lower than a set threshold tau in key neurons in a key path:
Figure BDA0002905045490000035
ci,jis a matrix ClThe ith row and the jth column of the element,
Figure BDA0002905045490000041
representing critical path data corresponding to the l-th layer
Figure BDA0002905045490000042
The standard deviation of (a); alpha is alphalCalculating a correlation coefficient of a key path corresponding to K pictures on the ith layer;
s44, calculating the dispersion; the dispersion index is used for carrying out statistical analysis on neurons with the activation frequency higher than 80% in the key neurons in the key path; these higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture; first, the number of neurons CARD (Ψ) having a high activation frequency is countedhg) Then counting the total number N of the layer of neuronslThen, the dispersion is calculated based on two values:
Figure BDA0002905045490000043
and S45, repeating the operations of the steps S42-S44, and finishing the calculation of the correlation coefficients and the dispersion of all the classes.
Further, the specific implementation method of step S5 is as follows:
s51, taking the dispersion and correlation coefficient of all classes as undetermined data;
s52, traversing all classes and model layers, and calculating the mean value of the correlation coefficient
Figure BDA0002905045490000044
And mean of dispersion
Figure BDA0002905045490000045
The two mean values are used as a basis for subsequently calculating the variance of the correlation coefficient and the dispersion;
s53, calculating an abnormality index; the abnormality index needs to be calculated based on two indexes of the correlation coefficient and the dispersion, the abnormality index needs to be calculated by combining all layers and all classes of the model, the parameters are integrated, and the abnormality index of the model to be detected is obtained by calculation by combining the correlation coefficient and the dispersion:
Figure BDA0002905045490000046
l represents the L th layer of the current neural network model, N represents the N th class of the current neural network model, N represents the number of output classes of the neural network model, and L represents the total number of layers of the neural network model;
and S53, if the abnormality index AI of the model to be measured is larger than the threshold value, the model is considered to be attacked, otherwise, the model is considered to be a safety model.
The invention has the beneficial effects that:
1. the method carries out mathematical analysis on the generated key path of the control gate, extends into the model, expresses the internal information of the model in the form of the key path, improves the effectiveness of the backdoor attack detection method by utilizing a neural network interpretability method, can explain and explain the reliability of the method more reasonably, and can expand the detection method more based on the interpretability of the model.
2. The present invention uses run-time input samples as detection data. Since most of the conventional methods need to perform backdoor attack detection in an offline state, the writing methods basically need to design or obtain a relatively large data set as a basis for development of the detection method. Compared with the methods, the method can finish the back door attack detection of the model to be detected only by collecting some input samples in operation and developing a detection method for the samples, and is very suitable for the back door attack detection of the neural network model in operation in a deployment stage.
Drawings
FIG. 1 is a schematic diagram of a neural network architecture with control gates deployed;
FIG. 2 is a flow chart of a neural network model backdoor attack detection method of the present invention;
FIG. 3 is a flow chart of collecting input data during operation of a neural network;
FIG. 4 is a flow chart of control gate optimization training;
FIG. 5 is a flow chart of key neuron generation;
FIG. 6 is a flow chart of calculating an indicator based on a numerical feature of a critical path;
FIG. 7 is a flow chart of anomaly index calculation based on indicators.
Detailed Description
Neural networks are widely used in safety-critical fields such as face recognition, autopilot, etc. At present, an attack method aiming at a neural network model is quite common, wherein backdoor attacks are an important part of the attack method. After the attacked neural network is applied to specific applications such as face recognition and automatic driving, dangerous consequences such as information leakage, traffic accident and the like caused by the fact that vehicles cannot recognize traffic signs can be caused. Therefore, the design and application of a defense method against neural network backdoor attacks are very important and urgent. The invention is based on interpretability, combines with specific tasks such as face recognition and guideboard recognition, and completes the detection of backdoor attack through collected running data such as guideboards acquired by faces or vehicles. The technical scheme of the invention is further explained by combining the attached drawings.
As shown in fig. 2, the method for detecting a back door attack of a neural network model of the present invention includes the following steps:
s1, collecting input data during the operation of the neural network: aiming at a deployed neural network model, collecting an input sample and a corresponding operation result when the neural network model operates;
for the input sample collection module, the input samples and the corresponding operation results at the operation time are mainly collected aiming at the deployed neural network model. It is expected that all pictures input into the neural network model are legal and can correspond to a legal output, so that the pictures need to be cut by a preprocessing module when the models are input. And then classifying and sorting the data in operation by using an input sample collection method, wherein the image is stored locally by directly using a storage mode and using an opencv image storage method. Then, after the pictures are input into the model, the corresponding operation results need to be collected continuously. Here the run results are saved using the json format and correspond to previously collected pictures. Finally, it is necessary to ensure that the collected data is adequate. As shown in fig. 3, the specific implementation method is as follows:
s11, preprocessing the picture input into the neural network model to make the picture accord with the input standard of the neural network model;
s12, initializing the counter value to be 0;
s13, placing the input picture in a buffer area, and inputting the picture into a neural network model for inference; after the operation result of the neural network model is obtained, taking the input picture and the operation result as a data set; after the operation of the neural network model, the pictures with the same classification result are of the same class, and the pictures of the same class are collected to form a set;
s14, storing the data group, directly storing the pictures by using opencv, storing the operation results and the names of the pictures by using a json format, and ensuring that the pictures correspond to the results one to one;
and S15, adding one to the counter value, judging whether the counter meets the set data volume requirement, if so, ending the collection, otherwise, returning to the step S13.
S2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class;
after deployment of the control gates, the control gates may serve as criteria for whether the neuron plays a critical role for the final classification. In practice, the control gate may simply be placed as a parameter into the neuron and multiplied by the neuron's activation output as the final neuron's output result. If the value of the control gate corresponding to a certain neuron is greater than 1, the neuron is considered to be more important for the classification output of the neural network model, and if the value of the control gate corresponding to the neuron is less than 1, the neuron is considered to have smaller influence on the final output. In practice the control gate is a non-negative number and is directly set to 0 for control gates smaller than 0. In the training phase of the control gate, the aim is to obtain the optimal control gate corresponding to each picture and each class. Specifically, a control gate corresponding to each input sample is obtained first, which requires a gradient down update of the control gate using a gradient down algorithm in combination with the input picture and the final output result. After 100 iterations, the value of the control gate after the optimization is updated is obtained. For the control gates of each class, only the input samples belonging to the class need to be combined after all the control gates are optimized, and the set is the control gates belonging to the class. As shown in fig. 4, the specific implementation method is as follows:
s21, acquiring an original neural network model;
s22, initializing a control gate deployed in the neural network, wherein the control gate is deployed behind each neuron in the neural network;
s23, inputting the pictures into the original neural network model and the neural network model with the control gate for inference respectively;
s24, collecting the operation results of the two models, calculating the cross entropy of the two operation results, then updating the deployed control gate by using a gradient descent method, and completing the training of the control gate of a single picture after 100 iterations;
s25, saving the control gate belonging to a single picture;
s26, executing the operations of steps S24 and S25 for each picture to ensure that all pictures are trained to control the door.
S3, generating key neurons;
key neurons, as their name implies, are the existence of neurons that are critical to this class. That is, these neurons are very important for the classification result of this class. From another perspective, if the gates of the pictures belonging to the same class are generated, the gate values corresponding to some neurons are always larger. That is, these neurons are critical to these pictures. These neurons can be said to be key neurons in these pictures. And (4) gathering key neurons corresponding to the pictures, and continuously observing the influence of the control gates of the neurons on the classification result of the class. If there are always some neurons whose control gates are of relatively high value for most pictures, these are considered as key neurons belonging to this class. As shown in fig. 5, the specific implementation method is as follows:
s31, setting two different thresholds which are respectively used for screening key neurons belonging to a single picture and a single class;
s32, traversing control gates of all pictures belonging to the same class, and completing generation of key neurons of all pictures belonging to the same class; the specific generation mode is as follows: for the control gates corresponding to all the neurons of a single picture, if the values of the control gates exceed a set threshold, considering the neurons corresponding to the control gates exceeding the threshold as the key neurons belonging to the picture, and setting the control gates of the corresponding neurons as 1; if the threshold value is not exceeded, the neuron corresponding to the control gate is not considered to be a key neuron belonging to the picture, and the value of the control gate is set to be 0;
s33, calculating the activation frequency of each neuron, and if the activation frequency exceeds a set threshold value, determining that the neuron is a key neuron belonging to the class; neurons that do not exceed a set threshold are considered non-critical neurons and are not important for this class;
s34, executing the operations of the steps S32 and S33 on all the types of pictures to obtain all the key neurons.
As shown in fig. 6, the specific implementation method is as follows:
s41, after calculating to obtain a class of key neurons, connecting all the key neurons belonging to the class to obtain a key path belonging to the class; splicing all the key paths to obtain the key path belonging to the model;
s42, calculating a covariance matrix of each layer of the key path of the plurality of input pictures corresponding to each class;
for the neural network model participating in the operation, L layers are shared, one layer is selected in sequence in one operation to calculate a covariance matrix of a key path in a certain layer, the covariance matrix represents the difference among a plurality of groups of data and is represented by a matrix form:
Figure BDA0002905045490000071
Figure BDA0002905045490000072
representing the computation of covariance between two sets of critical path data at layer i,
Figure BDA0002905045490000073
data representing a critical path corresponding to the pth group of pictures,
Figure BDA0002905045490000074
representing critical path data corresponding to the q-th group of pictures, K representing the total number of picture data; p is equal to 1, 2, 1, K, q is equal to 1, 2, 1, K, p is equal to q, and L is equal to or less than 1 and equal to or less than L;
s43, calculating the variance of the key path corresponding to all input pictures at the l layer and the final correlation coefficient alpha at the l layerl
The correlation coefficient index is used for carrying out statistical analysis on neurons with the activation frequency lower than a set threshold tau in key neurons in a key path:
Figure BDA0002905045490000075
ci,jis a matrix ClThe ith row and the jth column of the element,
Figure BDA0002905045490000076
representing critical path data corresponding to the l-th layer
Figure BDA0002905045490000077
Standard deviation of (d); alpha is alphalCalculating a correlation coefficient of a key path corresponding to K pictures on the ith layer;
s44, calculating the dispersion; the dispersion index is used for carrying out statistical analysis on neurons with the activation frequency higher than 80% in the key neurons in the key path; these higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture; first, the number of neurons CARD (Ψ) having a high activation frequency is countedhg) Then counting the total number N of the layer of neuronslThen, the dispersion is calculated based on two values:
Figure BDA0002905045490000081
and S45, repeating the operations of the steps S42-S44, and finishing the calculation of the correlation coefficients and the dispersion of all the classes.
As shown in fig. 7, the method specifically includes the following sub-steps:
s51, taking the dispersion and the correlation coefficient of all the classes as undetermined data;
s52, traversing all classes and model layers, and calculating the mean value of the correlation coefficient
Figure BDA0002905045490000082
And mean of dispersion
Figure BDA0002905045490000083
The two mean values are used as a basis for subsequently calculating the variance of the correlation coefficient and the dispersion;
s53, calculating an abnormality index; the abnormality index indicates the degree of abnormality of a neural network model, and for a neural network model, if it is not attacked by the backdoor, its abnormality index value is normal (within a normal range), and if it is attacked, its abnormality index is in a dangerous state, that is, exceeds a threshold value of the normal range. The abnormality index needs to be calculated based on two indexes of the correlation coefficient and the dispersion, the abnormality index needs to be calculated by combining all layers and all classes of the model, the parameters are integrated, and the abnormality index of the model to be detected is obtained by calculation by combining the correlation coefficient and the dispersion:
Figure BDA0002905045490000084
l represents the L th layer of the current neural network model, N represents the N th class of the current neural network model, N represents the number of output classes of the neural network model, and L represents the total number of layers of the neural network model;
and S53, if the abnormality index AI of the model to be measured is larger than the threshold value, the model is considered to be attacked, otherwise, the model is considered to be a safety model.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (4)

1. A neural network model backdoor attack detection method is characterized by comprising the following steps:
s1, collecting input data during the operation of the neural network: aiming at a deployed neural network model, collecting an input sample and a corresponding operation result when the neural network model operates;
s2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class;
s3, generating key neurons;
s4, calculating indexes based on the numerical characteristics of the critical path; the specific implementation method comprises the following steps:
s41, after calculating to obtain a class of key neurons, connecting all the key neurons belonging to the class to obtain a key path belonging to the class; splicing the key paths of all classes to obtain a key path belonging to the model;
s42, calculating a covariance matrix of each layer of the key path of the plurality of input pictures corresponding to each class;
for the neural network model participating in the operation, L layers are shared, one layer is selected in sequence in one operation to calculate a covariance matrix of a key path in a certain layer, the covariance matrix represents the difference among a plurality of groups of data and is represented by a matrix form:
Figure FDA0003569789890000011
Figure FDA0003569789890000012
representing the computation of covariance between two sets of critical path data at layer i,
Figure FDA0003569789890000013
data representing a critical path corresponding to the pth group of pictures,
Figure FDA0003569789890000014
representing critical path data corresponding to the q-th group of pictures, K representing the total number of picture data; p is equal to 1, 2, 1, K, q is equal to 1, 2, 1, K, p is equal to q, and L is equal to or less than 1 and equal to or less than L;
s43, calculating the variance of the key path corresponding to all input pictures at the l layer and the final correlation coefficient alpha at the l layerl
The correlation coefficient index is used for carrying out statistical analysis on neurons with the activation frequency lower than a set threshold tau in key neurons in a key path:
Figure FDA0003569789890000015
ci,jis a matrix ClThe ith row and the jth column of the element,
Figure FDA0003569789890000016
representing critical path data corresponding to the l-th layer
Figure FDA0003569789890000017
Standard deviation of (d); alpha is alphalCalculating a correlation coefficient of a key path corresponding to K pictures on the ith layer;
s44, calculating the dispersion; the dispersion index is higher than 80% of activation frequency in key neurons in the key pathCarrying out statistical analysis on the neurons; these higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture; first, the number of neurons CARD (Ψ) having a high activation frequency is countedhg) Then counting the total number N of the layer of neuronslThen, the dispersion is calculated based on two values:
Figure FDA0003569789890000018
s45, repeating the operations of the steps S42-S44, and completing the calculation of the correlation coefficients and the dispersion of all the classes;
s5, calculating an abnormal index based on the index, and judging whether the neural network model is attacked by a backdoor; the specific implementation method comprises the following steps:
s51, taking the dispersion and correlation coefficient of all classes as undetermined data;
s52, traversing all classes and model layers, and calculating the mean value of the correlation coefficient
Figure FDA0003569789890000023
And mean of dispersion
Figure FDA0003569789890000022
The two mean values are used as a basis for subsequently calculating the variance of the correlation coefficient and the dispersion;
s53, calculating an abnormality index; the abnormality index needs to be calculated based on two indexes of the correlation coefficient and the dispersion, the abnormality index needs to be calculated by combining all layers and all classes of the model, the parameters are integrated, and the abnormality index of the model to be detected is obtained by calculation by combining the correlation coefficient and the dispersion:
Figure FDA0003569789890000021
l represents the L th layer of the current neural network model, N represents the N th class of the current neural network model, N represents the number of output classes of the neural network model, and L represents the total number of layers of the neural network model;
and S53, if the abnormality index AI of the model to be measured is larger than the threshold value, the model is considered to be attacked, otherwise, the model is considered to be a safety model.
2. The method for detecting the back door attack of the neural network model according to claim 1, wherein the step S1 is specifically implemented by:
s11, preprocessing the picture input into the neural network model to make the picture accord with the input standard of the neural network model;
s12, initializing the counter value to be 0;
s13, placing the input picture in a buffer area, and inputting the picture into a neural network model for inference; after the operation result of the neural network model is obtained, taking the input picture and the operation result as a data set; after the operation of the neural network model, the pictures with the same classification result are of the same class, and the pictures of the same class are collected to form a set;
s14, storing the data group, directly storing the pictures by using opencv, storing the operation results and the names of the pictures by using a json format, and ensuring that the pictures correspond to the results one to one;
and S15, adding one to the counter value, judging whether the counter meets the set data volume requirement, if so, ending the collection, otherwise, returning to the step S13.
3. The method for detecting the back door attack of the neural network model according to claim 1, wherein the step S2 is specifically implemented by:
s21, acquiring an original neural network model;
s22, initializing a control gate deployed in the neural network, wherein the control gate is deployed behind each neuron in the neural network;
s23, inputting the pictures into the original neural network model and the neural network model with the control gate for inference respectively;
s24, collecting the operation results of the two models, calculating the cross entropy of the two operation results, then updating the deployed control gate by using a gradient descent method, and completing the training of the control gate of a single picture after 100 iterations;
s25, saving the control gate belonging to a single picture;
s26, executing the operations of steps S24 and S25 for each picture, and ensuring that all pictures are trained to control the door.
4. The method for detecting the back door attack of the neural network model according to claim 1, wherein the step S3 is specifically implemented by:
s31, setting two different thresholds which are respectively used for screening key neurons belonging to a single picture and a single class;
s32, traversing control gates of all pictures belonging to the same class, and completing generation of key neurons of all pictures belonging to the same class; the specific generation mode is as follows: for the control gates corresponding to all the neurons of a single picture, if the values of the control gates exceed a set threshold, considering the neurons corresponding to the control gates exceeding the threshold as the key neurons belonging to the picture, and setting the control gates of the corresponding neurons as 1; if the threshold value is not exceeded, the neuron corresponding to the control gate is not considered to be a key neuron belonging to the picture, and the value of the control gate is set to be 0;
s33, calculating the activation frequency of each neuron, and if the activation frequency exceeds a set threshold value, determining that the neuron is a key neuron belonging to the class; neurons that do not exceed a set threshold are considered non-critical neurons and are not important for this class;
s34, executing the operations of the steps S32 and S33 on all the types of pictures to obtain all the key neurons.
CN202110068380.2A 2021-01-19 2021-01-19 Neural network model backdoor attack detection method Active CN112765607B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110068380.2A CN112765607B (en) 2021-01-19 2021-01-19 Neural network model backdoor attack detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110068380.2A CN112765607B (en) 2021-01-19 2021-01-19 Neural network model backdoor attack detection method

Publications (2)

Publication Number Publication Date
CN112765607A CN112765607A (en) 2021-05-07
CN112765607B true CN112765607B (en) 2022-05-17

Family

ID=75703117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110068380.2A Active CN112765607B (en) 2021-01-19 2021-01-19 Neural network model backdoor attack detection method

Country Status (1)

Country Link
CN (1) CN112765607B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222120B (en) * 2021-05-31 2022-09-16 北京理工大学 Neural network back door injection method based on discrete Fourier transform
CN113283590B (en) * 2021-06-11 2024-03-19 浙江工业大学 Defending method for back door attack
CN114897161B (en) * 2022-05-17 2023-02-07 中国信息通信研究院 Mask-based graph classification backdoor attack defense method and system, electronic equipment and storage medium
CN115659171B (en) * 2022-09-26 2023-06-06 中国工程物理研究院计算机应用研究所 Model back door detection method and device based on multi-element feature interaction and storage medium
CN116383814B (en) * 2023-06-02 2023-09-15 浙江大学 Neural network model back door detection method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941855A (en) * 2019-11-26 2020-03-31 电子科技大学 Stealing and defending method for neural network model under AIoT scene
CN111242291A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Neural network backdoor attack detection method and device and electronic equipment
CN111260059A (en) * 2020-01-23 2020-06-09 复旦大学 Back door attack method of video analysis neural network model
CN112132262A (en) * 2020-09-08 2020-12-25 西安交通大学 Recurrent neural network backdoor attack detection method based on interpretable model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111049786A (en) * 2018-10-12 2020-04-21 北京奇虎科技有限公司 Network attack detection method, device, equipment and storage medium
US11514297B2 (en) * 2019-05-29 2022-11-29 Anomalee Inc. Post-training detection and identification of human-imperceptible backdoor-poisoning attacks
US11609990B2 (en) * 2019-05-29 2023-03-21 Anomalee Inc. Post-training detection and identification of human-imperceptible backdoor-poisoning attacks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941855A (en) * 2019-11-26 2020-03-31 电子科技大学 Stealing and defending method for neural network model under AIoT scene
CN111260059A (en) * 2020-01-23 2020-06-09 复旦大学 Back door attack method of video analysis neural network model
CN111242291A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Neural network backdoor attack detection method and device and electronic equipment
CN112132262A (en) * 2020-09-08 2020-12-25 西安交通大学 Recurrent neural network backdoor attack detection method based on interpretable model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
《Heatmap-Aware Low-Cost Design to Resist Adversarial Attacks: Work-in-Progress》;Zhiyuan He等;《2020 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)》;20201109;第32-33页 *
《Interpretability Derived Backdoor Attacks Detection in Deep Neural Networks: Work-in-Progress》;Xiangyu Wen等;《2020 International Conference on Embedded Software (EMSOFT)》;20201109;正文第Ⅱ-Ⅴ节 *
《分布式系统中抵御错误注入攻击的优化设计》;文亮等;《计算机应用》;20160229;第36卷(第2期);第495-498页 *
《基于有限状态机的指针解引用静态检测方法》;詹瑾瑜等;《四川大学学报(工程科学版)》;20110430;第43卷(第4期);第135-142页 *
《深度学习模型的中毒攻击与防御综述》;陈晋音等;《信息安全学报》;20200831;第5卷(第4期);第14-29页 *

Also Published As

Publication number Publication date
CN112765607A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN112765607B (en) Neural network model backdoor attack detection method
US11514297B2 (en) Post-training detection and identification of human-imperceptible backdoor-poisoning attacks
US11565721B2 (en) Testing a neural network
Bejani et al. Convolutional neural network with adaptive regularization to classify driving styles on smartphones
CN111310814A (en) Method and device for training business prediction model by utilizing unbalanced positive and negative samples
CN111783442A (en) Intrusion detection method, device, server and storage medium
CN110874471B (en) Privacy and safety protection neural network model training method and device
CN112115761B (en) Countermeasure sample generation method for detecting vulnerability of visual perception system of automatic driving automobile
CN113283599B (en) Attack resistance defense method based on neuron activation rate
CN113537284B (en) Deep learning implementation method and system based on mimicry mechanism
CN113811894B (en) Monitoring of a KI module for driving functions of a vehicle
CN116192500A (en) Malicious flow detection device and method for resisting tag noise
CN114220097A (en) Anti-attack-based image semantic information sensitive pixel domain screening method and application method and system
Kirichek et al. System for detecting network anomalies using a hybrid of an uncontrolled and controlled neural network
CN117454187B (en) Integrated model training method based on frequency domain limiting target attack
CN112084936B (en) Face image preprocessing method, device, equipment and storage medium
CN116383814B (en) Neural network model back door detection method and system
CN113010888B (en) Neural network backdoor attack defense method based on key neurons
CN116305103A (en) Neural network model backdoor detection method based on confidence coefficient difference
CN111666985B (en) Deep learning confrontation sample image classification defense method based on dropout
CN114021136A (en) Back door attack defense system for artificial intelligence model
CN114119382A (en) Image raindrop removing method based on attention generation countermeasure network
Dhonthi et al. Backdoor mitigation in deep neural networks via strategic retraining
CN116739073B (en) Online back door sample detection method and system based on evolution deviation
CN118101326B (en) Lightweight Internet of vehicles intrusion detection method based on improved MobileNetV model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant