CN112765607B

CN112765607B - Neural network model backdoor attack detection method

Info

Publication number: CN112765607B
Application number: CN202110068380.2A
Authority: CN
Inventors: 江维; 詹瑾瑜; 温翔宇; 周星志; 宋子微; 孙若旭; 廖炘可; 范翥峰
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2022-05-17
Anticipated expiration: 2041-01-19
Also published as: CN112765607A

Abstract

The invention discloses a neural network model backdoor attack detection method, which comprises the following steps: s1, collecting input data during the operation of the neural network; s2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class; s3, generating key neurons; s4, calculating indexes based on the numerical characteristics of the critical path; and S5, calculating the abnormal index based on the index, and judging whether the neural network model is attacked by the backdoor. The method performs mathematical analysis on the generated key path of the control gate, expresses the internal information of the model in the form of the key path, and improves the reliability of the back door attack detection method; the back door attack detection of the model to be detected can be completed by using the input sample in the operation as detection data, and the method is very suitable for the back door attack detection of the neural network model in the operation of the deployment stage.

Description

Neural network model backdoor attack detection method

Technical Field

In particular to a neural network model backdoor attack detection method which is mainly applied to a security-critical intelligent system runtime backdoor attack detection scene.

Background

Backdoor attacks are a serious threat to Artificial Intelligence (AI) applications based on Neural Networks (NN). The principle of the backdoor attack is that an attacker trains an attacked model by using a polluted data set and releases the attacked model to a public community. The user uses the attacked model unknowingly, and trigger pictures which are carefully designed by an attacker are mixed in the input of the runtime model, so that the classification precision of the model is greatly reduced, and even the model is unavailable. The goal of the backdoor attack is to embed an attacker-designed backdoor in the neural network model so that the attacker can attack the user's AI system at any time based on the backdoor.

In the aspect of detection of backdoor attacks, b.chen et al propose a backdoor attack detection method based on analysis of data set distribution and activation clustering; wang et al think that the attacked class is unstable, and some minor disturbances can cause classification failure, so a method based on anomaly detection is proposed to detect the backdoor attack; liu et al also proposed a backdoor attack detection method based on detecting the distribution of the prediction results, i.e., normal model classification results are uniformly distributed with respect to the data set, while the presence of a backdoor model highlights the classification aspect of one class over the others.

The system considers that under a common backdoor attack scene, namely an attacker trains and issues an attacked model, then a user is possibly attacked in the using process, and a defender provides a feasible detection method to detect whether the model is attacked or not. Different from the existing detection method, the system is triggered by the characteristics of the model based on the interpretability of the neural network model, and provides a backdoor attack detection method based on the neuron critical path. The system analyzes the generated key path of the model to be detected, finds the difference between the key path of the attacked model and the normal model, and accordingly completes the detection of the model to be determined.

Neural network critical path generation techniques are used to analyze the routing paths of critical neurons in a neural network model. Some neurons in the neural network may not only support the inference operation of the neural network, but may also reflect certain features on the input picture. Neurons that are closely associated with the input picture may be considered key neurons. The key neuron routing paths synthesized by key neuron groups of different layers are called key paths belonging to the class, and the combination of the key paths of different classes is called key paths of the whole model.

On the other hand, a control gate is one structure in a neural network. The control gate needs to be deployed to each neuron in each layer of the neural network and then multiplied as a parameter with the output of the neural network as the final output of the neural network neurons, as shown in fig. 1. The magnitude of the value may indicate the sensitivity and contribution of the corresponding neuron to the current class, with the control gate as a parameter. For example, if the value of the corresponding gate of a certain neuron is 3.2, it is considered that the contribution of the neuron to the current classification should be increased by 3.2 times compared with the output obtained by normal training. Conversely, a control gate with a value below 1 corresponds to a neuron that indicates that the contribution to the final classification should be reduced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a neural network model backdoor attack detection method which uses an input sample during operation as detection data, performs mathematical analysis on a generated key path of a control door, expresses the internal information of a model in the form of the key path and improves the reliability of the backdoor attack detection method.

The purpose of the invention is realized by the following technical scheme: a neural network model backdoor attack detection method comprises the following steps:

s1, collecting input data during the operation of the neural network: aiming at a deployed neural network model, collecting an input sample and a corresponding operation result when the neural network model operates;

s2, performing control gate optimization training to obtain each picture and an optimal control gate corresponding to each class;

s3, generating key neurons;

s4, calculating indexes based on the numerical characteristics of the critical path;

and S5, calculating the abnormal index based on the index, and judging whether the neural network model is attacked by the backdoor.

Further, the specific implementation method of step S1 is as follows:

s11, preprocessing the picture input into the neural network model to make the picture accord with the input standard of the neural network model;

s12, initializing the counter value to be 0;

s13, placing the input picture in a buffer area, and inputting the picture into a neural network model for inference; after the operation result of the neural network model is obtained, taking the input picture and the operation result as a data set; after the operation of the neural network model, the pictures with the same classification result are of the same class, and the pictures of the same class are collected to form a set;

s14, storing the data group, directly storing the pictures by using opencv, storing the operation results and the names of the pictures by using a json format, and ensuring that the pictures correspond to the results one to one;

and S15, adding one to the counter value, judging whether the counter meets the set data volume requirement, if so, ending the collection, otherwise, returning to the step S13.

Further, the specific implementation method of step S2 is as follows:

s21, acquiring an original neural network model;

s22, initializing a control gate deployed in the neural network, wherein the control gate is deployed behind each neuron in the neural network;

s23, inputting the pictures into the original neural network model and the neural network model with the control gate for inference respectively;

s24, collecting the operation results of the two models, calculating the cross entropy of the two operation results, then updating the deployed control gate by using a gradient descent method, and completing the training of the control gate of a single picture after 100 iterations;

s25, saving the control gate belonging to a single picture;

s26, executing the operations of steps S24 and S25 for each picture, and ensuring that all pictures are trained to control the door.

Further, the specific implementation method of step S3 is as follows:

s31, setting two different thresholds which are respectively used for screening key neurons belonging to a single picture and a single class;

s32, traversing control gates of all pictures belonging to the same class, and completing generation of key neurons of all pictures belonging to the same class; the specific generation mode is as follows: for the control gates corresponding to all the neurons of a single picture, if the values of the control gates exceed a set threshold, considering the neurons corresponding to the control gates exceeding the threshold as the key neurons belonging to the picture, and setting the control gates of the corresponding neurons as 1; if the number of the neurons does not exceed the threshold value, the neurons corresponding to the control gate are not the key neurons belonging to the picture, and the value of the control gate is set to be 0;

s33, calculating the activation frequency of each neuron, and if the activation frequency exceeds a set threshold value, determining that the neuron is a key neuron belonging to the class; neurons that do not exceed a set threshold are considered non-critical neurons and are not important for this class;

s34, executing the operations of the steps S32 and S33 on all the types of pictures to obtain all the key neurons.

Further, the specific implementation method of step S4 is as follows:

s41, after calculating to obtain a class of key neurons, connecting all the key neurons belonging to the class to obtain a key path belonging to the class; splicing the key paths of all classes to obtain a key path belonging to the model;

s42, calculating a covariance matrix of each layer of the key path of the plurality of input pictures corresponding to each class;

for the neural network model participating in the operation, L layers are shared, one layer is selected in sequence in one operation to calculate a covariance matrix of a critical path in a certain layer, the covariance matrix represents the difference among a plurality of groups of data and is represented by a matrix form:

representing the computation of covariance between two sets of critical path data at layer i,

data representing a critical path corresponding to the pth group of pictures,

representing critical path data corresponding to the q-th group of pictures, K representing the total number of picture data;

p＝1，2，...，K，q＝1，2，...，K，p≠q，1≤l≤L；

s43, calculating the variance of the key path corresponding to all input pictures at the l layer and the final correlation coefficient alpha at the l layer^l；

The correlation coefficient index is used for carrying out statistical analysis on neurons with the activation frequency lower than a set threshold tau in key neurons in a key path:

c_i，jis a matrix C^lThe ith row and the jth column of the element,

representing critical path data corresponding to the l-th layer

The standard deviation of (a); alpha is alpha^lCalculating a correlation coefficient of a key path corresponding to K pictures on the ith layer;

s44, calculating the dispersion; the dispersion index is used for carrying out statistical analysis on neurons with the activation frequency higher than 80% in the key neurons in the key path; these higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture; first, the number of neurons CARD (Ψ) having a high activation frequency is counted_hg) Then counting the total number N of the layer of neurons_lThen, the dispersion is calculated based on two values:

and S45, repeating the operations of the steps S42-S44, and finishing the calculation of the correlation coefficients and the dispersion of all the classes.

Further, the specific implementation method of step S5 is as follows:

s51, taking the dispersion and correlation coefficient of all classes as undetermined data;

s52, traversing all classes and model layers, and calculating the mean value of the correlation coefficient

And mean of dispersion

The two mean values are used as a basis for subsequently calculating the variance of the correlation coefficient and the dispersion;

s53, calculating an abnormality index; the abnormality index needs to be calculated based on two indexes of the correlation coefficient and the dispersion, the abnormality index needs to be calculated by combining all layers and all classes of the model, the parameters are integrated, and the abnormality index of the model to be detected is obtained by calculation by combining the correlation coefficient and the dispersion:

l represents the L th layer of the current neural network model, N represents the N th class of the current neural network model, N represents the number of output classes of the neural network model, and L represents the total number of layers of the neural network model;

and S53, if the abnormality index AI of the model to be measured is larger than the threshold value, the model is considered to be attacked, otherwise, the model is considered to be a safety model.

The invention has the beneficial effects that:

1. the method carries out mathematical analysis on the generated key path of the control gate, extends into the model, expresses the internal information of the model in the form of the key path, improves the effectiveness of the backdoor attack detection method by utilizing a neural network interpretability method, can explain and explain the reliability of the method more reasonably, and can expand the detection method more based on the interpretability of the model.

2. The present invention uses run-time input samples as detection data. Since most of the conventional methods need to perform backdoor attack detection in an offline state, the writing methods basically need to design or obtain a relatively large data set as a basis for development of the detection method. Compared with the methods, the method can finish the back door attack detection of the model to be detected only by collecting some input samples in operation and developing a detection method for the samples, and is very suitable for the back door attack detection of the neural network model in operation in a deployment stage.

Drawings

FIG. 1 is a schematic diagram of a neural network architecture with control gates deployed;

FIG. 2 is a flow chart of a neural network model backdoor attack detection method of the present invention;

FIG. 3 is a flow chart of collecting input data during operation of a neural network;

FIG. 4 is a flow chart of control gate optimization training;

FIG. 5 is a flow chart of key neuron generation;

FIG. 6 is a flow chart of calculating an indicator based on a numerical feature of a critical path;

FIG. 7 is a flow chart of anomaly index calculation based on indicators.

Detailed Description

Neural networks are widely used in safety-critical fields such as face recognition, autopilot, etc. At present, an attack method aiming at a neural network model is quite common, wherein backdoor attacks are an important part of the attack method. After the attacked neural network is applied to specific applications such as face recognition and automatic driving, dangerous consequences such as information leakage, traffic accident and the like caused by the fact that vehicles cannot recognize traffic signs can be caused. Therefore, the design and application of a defense method against neural network backdoor attacks are very important and urgent. The invention is based on interpretability, combines with specific tasks such as face recognition and guideboard recognition, and completes the detection of backdoor attack through collected running data such as guideboards acquired by faces or vehicles. The technical scheme of the invention is further explained by combining the attached drawings.

As shown in fig. 2, the method for detecting a back door attack of a neural network model of the present invention includes the following steps:

for the input sample collection module, the input samples and the corresponding operation results at the operation time are mainly collected aiming at the deployed neural network model. It is expected that all pictures input into the neural network model are legal and can correspond to a legal output, so that the pictures need to be cut by a preprocessing module when the models are input. And then classifying and sorting the data in operation by using an input sample collection method, wherein the image is stored locally by directly using a storage mode and using an opencv image storage method. Then, after the pictures are input into the model, the corresponding operation results need to be collected continuously. Here the run results are saved using the json format and correspond to previously collected pictures. Finally, it is necessary to ensure that the collected data is adequate. As shown in fig. 3, the specific implementation method is as follows:

s12, initializing the counter value to be 0;

after deployment of the control gates, the control gates may serve as criteria for whether the neuron plays a critical role for the final classification. In practice, the control gate may simply be placed as a parameter into the neuron and multiplied by the neuron's activation output as the final neuron's output result. If the value of the control gate corresponding to a certain neuron is greater than 1, the neuron is considered to be more important for the classification output of the neural network model, and if the value of the control gate corresponding to the neuron is less than 1, the neuron is considered to have smaller influence on the final output. In practice the control gate is a non-negative number and is directly set to 0 for control gates smaller than 0. In the training phase of the control gate, the aim is to obtain the optimal control gate corresponding to each picture and each class. Specifically, a control gate corresponding to each input sample is obtained first, which requires a gradient down update of the control gate using a gradient down algorithm in combination with the input picture and the final output result. After 100 iterations, the value of the control gate after the optimization is updated is obtained. For the control gates of each class, only the input samples belonging to the class need to be combined after all the control gates are optimized, and the set is the control gates belonging to the class. As shown in fig. 4, the specific implementation method is as follows:

s21, acquiring an original neural network model;

s25, saving the control gate belonging to a single picture;

s26, executing the operations of steps S24 and S25 for each picture to ensure that all pictures are trained to control the door.

S3, generating key neurons;

key neurons, as their name implies, are the existence of neurons that are critical to this class. That is, these neurons are very important for the classification result of this class. From another perspective, if the gates of the pictures belonging to the same class are generated, the gate values corresponding to some neurons are always larger. That is, these neurons are critical to these pictures. These neurons can be said to be key neurons in these pictures. And (4) gathering key neurons corresponding to the pictures, and continuously observing the influence of the control gates of the neurons on the classification result of the class. If there are always some neurons whose control gates are of relatively high value for most pictures, these are considered as key neurons belonging to this class. As shown in fig. 5, the specific implementation method is as follows:

s32, traversing control gates of all pictures belonging to the same class, and completing generation of key neurons of all pictures belonging to the same class; the specific generation mode is as follows: for the control gates corresponding to all the neurons of a single picture, if the values of the control gates exceed a set threshold, considering the neurons corresponding to the control gates exceeding the threshold as the key neurons belonging to the picture, and setting the control gates of the corresponding neurons as 1; if the threshold value is not exceeded, the neuron corresponding to the control gate is not considered to be a key neuron belonging to the picture, and the value of the control gate is set to be 0;

As shown in fig. 6, the specific implementation method is as follows:

s41, after calculating to obtain a class of key neurons, connecting all the key neurons belonging to the class to obtain a key path belonging to the class; splicing all the key paths to obtain the key path belonging to the model;

for the neural network model participating in the operation, L layers are shared, one layer is selected in sequence in one operation to calculate a covariance matrix of a key path in a certain layer, the covariance matrix represents the difference among a plurality of groups of data and is represented by a matrix form:

data representing a critical path corresponding to the pth group of pictures,

representing critical path data corresponding to the q-th group of pictures, K representing the total number of picture data; p is equal to 1, 2, 1, K, q is equal to 1, 2, 1, K, p is equal to q, and L is equal to or less than 1 and equal to or less than L;

c_i，jis a matrix C^lThe ith row and the jth column of the element,

representing critical path data corresponding to the l-th layer

Standard deviation of (d); alpha is alpha^lCalculating a correlation coefficient of a key path corresponding to K pictures on the ith layer;

As shown in fig. 7, the method specifically includes the following sub-steps:

s51, taking the dispersion and the correlation coefficient of all the classes as undetermined data;

And mean of dispersion

s53, calculating an abnormality index; the abnormality index indicates the degree of abnormality of a neural network model, and for a neural network model, if it is not attacked by the backdoor, its abnormality index value is normal (within a normal range), and if it is attacked, its abnormality index is in a dangerous state, that is, exceeds a threshold value of the normal range. The abnormality index needs to be calculated based on two indexes of the correlation coefficient and the dispersion, the abnormality index needs to be calculated by combining all layers and all classes of the model, the parameters are integrated, and the abnormality index of the model to be detected is obtained by calculation by combining the correlation coefficient and the dispersion:

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A neural network model backdoor attack detection method is characterized by comprising the following steps:

s3, generating key neurons;

s4, calculating indexes based on the numerical characteristics of the critical path; the specific implementation method comprises the following steps:

data representing a critical path corresponding to the pth group of pictures,

c_i，jis a matrix C^lThe ith row and the jth column of the element,

representing critical path data corresponding to the l-th layer

s44, calculating the dispersion; the dispersion index is higher than 80% of activation frequency in key neurons in the key pathCarrying out statistical analysis on the neurons; these higher activation frequency neurons correspond to trigger pattern features that may be present in the input picture; first, the number of neurons CARD (Ψ) having a high activation frequency is counted_hg) Then counting the total number N of the layer of neurons_lThen, the dispersion is calculated based on two values:

s45, repeating the operations of the steps S42-S44, and completing the calculation of the correlation coefficients and the dispersion of all the classes;

s5, calculating an abnormal index based on the index, and judging whether the neural network model is attacked by a backdoor; the specific implementation method comprises the following steps:

And mean of dispersion

2. The method for detecting the back door attack of the neural network model according to claim 1, wherein the step S1 is specifically implemented by:

s12, initializing the counter value to be 0;

3. The method for detecting the back door attack of the neural network model according to claim 1, wherein the step S2 is specifically implemented by:

s21, acquiring an original neural network model;

s25, saving the control gate belonging to a single picture;

4. The method for detecting the back door attack of the neural network model according to claim 1, wherein the step S3 is specifically implemented by: