CN114021136A

CN114021136A - Back door attack defense system for artificial intelligence model

Info

Publication number: CN114021136A
Application number: CN202111424165.8A
Authority: CN
Inventors: 闫续; 易平; 谢宸琪
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-02-08

Abstract

A backdoor attack defense system against artificial intelligence models, comprising: the model is module and real-time defense module in advance, wherein: in the model pre-detection module scanning the classification model of the image to be detected: whether the operation behavior of reading or modifying the local file and the data output by each layer are carried out or not is judged so as to judge whether the model comprises a back door or not; the real-time defense module adds a filter to the picture containing the trigger, so that a back door in the image classification model is invalid. The method can be applied to models in the fields of image classification and image recognition, and strengthens defense on the models so as to improve the safety performance in the field of deep learning.

Description

Back door attack defense system for artificial intelligence model

Technical Field

The invention relates to a technology in the field of information security, in particular to a backdoor attack defense system aiming at an artificial intelligence model. The technology can be applied to the protection of artificial intelligence models in the image field in the industrial field, such as picture classification, face recognition, automatic driving and the like.

Background

Adding or changing neurons to a normal artificial intelligence model can change it into a model containing a backdoor. Once the input picture contains a trigger for triggering a backdoor, the deep learning model not only outputs a classification result, but also executes the malicious neurons of the model to damage a local system. Against this kind of back door attack, there is currently a lack of effective defense. Tests prove that mainstream antivirus software such as Noton, Kabaski, Mikefei and the like cannot identify whether the model contains the backdoor or not.

Disclosure of Invention

The invention provides a back door attack defense system aiming at an artificial intelligence model aiming at the defects and defects of the prior art, aiming at the characteristics of malicious neurons in the back door model and the characteristics that a trigger is required to be activated by the back door, the artificial intelligence model can be pre-detected to judge whether the model is the back door model, the trigger can be invalidated at the model operation stage to carry out protection, and the system can be applied to models in the fields of image classification and image recognition to strengthen defense of the model and further improve the safety performance in the field of deep learning.

The invention is realized by the following technical scheme:

the invention relates to a backdoor attack defense system aiming at an artificial intelligence model, which comprises the following components: the model is module and real-time defense module in advance, wherein: in the model pre-detection module scanning the classification model of the image to be detected: firstly, whether the operation behavior of reading or modifying the local file is performed and secondly, whether the model comprises a back door or not is judged according to data output by each layer; the real-time defense module adds a filter to the picture containing the trigger, so that a back door in the image classification model is invalid.

The model pre-detection module comprises: the keyword library unit, the scanning difference unit and the scanning result analysis unit, wherein: the keyword library unit adds keywords according to the information of the operating system operated by the model to obtain a keyword library; the scanning difference unit carries out model scanning processing according to the keyword library and the image input by the model to obtain a model scanning result; the scanning result analysis unit performs analysis processing according to the model scanning result information to determine whether the model is a back door model.

The real-time defense module comprises: a filter unit and a model operation unit, wherein: the filter unit adds filter processing to the image input by the model; and the model operation unit obtains a classification result according to the image with the filter.

The invention relates to a backdoor attack defense method aiming at an artificial intelligence model based on the system, which comprises the following steps:

step one, according to the back door attack principle of the artificial intelligence model, namely maliciously tampering system files, constructing a back door model on the basis of a normal artificial intelligence model, and selecting a plurality of image adding triggers on a data set for activating the back door.

Step two, scanning the back door model in the step one through a model pre-detection module, and judging whether the model comprises a back door according to the difference of scanning results: and when the scanned model contains keywords in the keyword library or the input of the model is not changed before and after passing through a certain model layer, judging that the model has implantation backdoor risk.

And step three, modifying the trigger on the picture to be incapable of triggering the model back door through an image filter algorithm in the model operation stage according to the characteristic that the model back door needs to be activated, and simultaneously not influencing the normal classification of the model, thereby realizing real-time defense.

The deep learning model is a convolution neural network.

The model back door is as follows: and when the trigger is detected, the embedded malicious code is executed to modify the system file. The consequences of this include, but are not limited to, malicious user telnet, website-holding attacks.

The trigger means that: the input pictures have specific characteristics, and the input with the characteristics can trigger the back door of the model, so that the model not only outputs the classification result, but also can execute malicious codes in the model and damage a local system.

The classification is as follows: and predicting a classification result of the single input image by the deep learning model. The result is expressed as a vector of p ═ p₁，p₂，p₃，...]Each component of which represents the prediction probability of the input image in each class.

The model scanning refers to: deconstructing and field matching the trained deep learning model, specifically comprising:

①F(x)＝～f(x)_layer&&f(x)_match；

②f(x)_layer＝|layer(x)₂–layer(x)₁|&&|layer(x)₃–layer(x)₂|&&…；

③f(x)_match＝match(keyword)₁||match(keyword)₂||…

wherein x represents an input picture, f (x)_layerIs the model layer-by-layer scan difference function, f (x)_matchIs the model static scan difference function, layer (x)_iRepresents the output of the ith layer after model post-processing, |, represents the euclidean distance between the two classification results (probability vectors). match (keyword)_iRepresenting the use of keywords on a model_iA binary match is made.

The image filter algorithm comprises the following steps: gaussian fuzzy algorithm and median fuzzy algorithm.

Technical effects

The method is used for carrying out pre-detection on the attack purpose of the backdoor and the intention of an attacker, namely when the attack purpose is to maliciously tamper the local system file, the interior of the backdoor model contains the operation of the malicious behavior; meanwhile, the filter is added, and aiming at the characteristic that the back door model has high sensitivity to the trigger, the trigger is modified by slightly changing the input image, so that the model back door is not activated any more.

Compared with the defect that the traditional mainstream virus searching and killing software cannot detect the back door model, the method can effectively detect the back door model and also can effectively reduce the triggering of the back door in the model operation stage, and the model scanning and image filter algorithm adopted by the invention has low requirements on the performance of a computer, does not need to install extra expensive graphic computing resources on a machine, and only carries out the pretreatment on the model file scanning and the image. The method can be flexibly applied to any image classification deep learning model and can be combined with other types of model back door defense methods.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence model backdoor attack.

Fig. 2 is an overall architecture diagram of a model backdoor attack defense system.

FIG. 3 is a schematic diagram of a model pre-detection structure according to the present invention.

FIG. 4 is a schematic diagram of the inconsistency analysis of model scan results.

FIG. 5 is an experimental design of model real-time defense.

Detailed Description

As shown in fig. 1, which is a schematic diagram of a deep learning backdoor model attack, a normal model becomes a model including a backdoor after a malicious code layer is inserted, and when a picture capable of triggering the backdoor is input by the model, a classification result is output, and a local system file is tampered.

As shown in fig. 2 to 5, the system for defending against backdoor attacks against an artificial intelligence model according to the present embodiment includes: the model is module and real-time defense module in advance, wherein: the model pre-detection module detects whether the operation behavior of reading or modifying the local file in the image classification model to be detected is performed through model scanning so as to judge whether the operation behavior comprises a back door; the real-time defense module enables a back door in the image classification model to be invalid by adding a filter to the picture containing the trigger.

The pre-detection is to perform keyword lexicon matching detection and model input layer-by-layer inspection on whether the operation behavior in the image classification model includes reading or modification, and specifically includes:

i) a keyword library is created.

ii) input the input image x into the model to obtain the output layer (x) of each layer₁，layer(x)₂And so on.

iii) obtaining output match (keyword) by field matching from keyword bank₁，match(keyword)₂And so on.

iv) judging the model scanning difference of the output results, which specifically comprises the following steps:

①F(x)＝～f(x)_layer&&f(x)_layer；

③f(x)_match＝match(keyword)₁||match(keyword)₂||…

wherein x represents an input picture, layer (x)_iRepresenting the output of the ith layer after model post-processing, f (x) representing the classification result of the x input image, | · | representing the calculation of Euclidean distance for two classification results, namely probability vectors; match (keyword)_iRepresenting the use of keywords on a model_iPerforming binary matching; layer (x) calls the function provided by the artificial intelligence framework, and match (keyword) calls the character matching function provided by the operating system.

The keyword library is self, environment, ssh, bashrc, writeFile.

And (3) judging that the input model contains a back door when the value obtained by F (x) is 1, otherwise, judging that the model is a normal model.

1) The back door model is trained using a training data set of models. The validation data set on the model is divided into a first data set Dataset1 and a second data set Dataset2 in a 1:1 ratio. And adds a trigger to each image in the first data set Dataset1 that can activate the back door of the model.

2) Under the condition of not using a filter algorithm, the malicious behavior triggering rates of the back door model on the two data sets are tested, and the result shows that the malicious behavior triggering rate of the back door model on the image on the data set containing the trigger is extremely high, and the false triggering rate of the back door model on the data set without artificially setting the trigger is extremely low.

3) And (3) adding a filter on the data set, and testing the malicious behavior triggering rate of the back door model on the two data sets respectively, wherein the result shows that after the filter is added to the data set, the triggering rate of the back door model on whether the data set contains the trigger is very low.

The model back door defense system provided by the invention has the characteristics of low hardware cost and high efficiency while keeping high detection rate and good defense performance.

After the model backdoor pre-detection method and the international known antivirus software in the defense system are used for testing a backdoor model and a non-backdoor model which are trained on MNIST and CIFAR10 data sets, the effects are respectively as follows:

a)MNIST:

I. and (3) norton: back door model detection failure

Kappa. Scotto: back door model detection failure

Michael phenanthrene: back door model detection failure

Model-based pre-detection method (method): successful detection of back door model

b)cifar10:

I. And (3) norton: back door model detection failure

Kappa. Scotto: back door model detection failure

Michael phenanthrene: back door model detection failure

According to the results, the model pre-detection method in the system can detect the backdoor model on the MNIST data set and the CIFAR10 data set, and the antivirus software cannot identify potential malicious behaviors in the backdoor model.

The model backdoor real-time defense method in the defense system is used for testing on MNIST and CIFAR10 data sets, and the effects are as follows:

a) MNIST dataset backgate trigger rate:

	Dataset1	Dataset2
			without using a filter	100％	0％
Using a gaussian blur filter	2.3％	0％
			Using median blur filters	4.7％	0％

b) CIFAR10 dataset backgate trigger rate:

from the results, it can be seen that the real-time defense method in the system can significantly reduce the triggering rate of the back door model on the MNIST data set and the CIFAR10 data set.

Gaussian blur filter parameters: gaussian convolution kernel: 5, variance: 1.1

Median blur Filter parameters: kernel size: 5

The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A backdoor attack defense system for an artificial intelligence model, comprising: the model is module and real-time defense module in advance, wherein: in the model pre-detection module scanning the classification model of the image to be detected: firstly, whether the operation behavior of reading or modifying the local file is performed and secondly, whether the model comprises a back door or not is judged according to data output by each layer; the real-time defense module adds a filter to the picture containing the trigger to ensure that a back door in the image classification model is invalid;

the model back door is as follows: when the trigger is detected, embedded malicious codes are executed to modify the system files, and the caused consequences include but are not limited to malicious user remote login and website holding attack;

the trigger means that: the input picture has a particular feature, the input of which is provided to trigger the back door of the model.

2. A backdoor attack defense system against an artificial intelligence model as recited in claim 1, wherein the model pre-detection module comprises: the keyword library unit, the scanning difference unit and the scanning result analysis unit, wherein: the keyword library unit adds keywords according to the information of the operating system operated by the model to obtain a keyword library; the scanning difference unit carries out model scanning processing according to the keyword library and the image input by the model to obtain a model scanning result; the scanning result analysis unit performs analysis processing according to the model scanning result information to determine whether the model is a back door model.

3. A backdoor attack defense system against an artificial intelligence model as recited in claim 1, wherein said real-time defense module comprises: a filter unit and a model operation unit, wherein: the filter unit adds filter processing to the image input by the model; and the model operation unit obtains a classification result according to the image with the filter.

4. A backdoor attack defense method aiming at an artificial intelligence model based on the system of any one of claims 1-3, characterized by comprising the following steps:

constructing a back door model on the basis of a normal artificial intelligence model according to the back door attack principle of the artificial intelligence model, namely maliciously tampering a system file, and selecting a plurality of image adding triggers on a data set for activating the back door;

step two, scanning the back door model in the step one through a model pre-detection module, and judging whether the model comprises a back door according to the difference of scanning results: when the scanned model contains keywords in a keyword library or the input of the model is not changed before and after passing through a certain model layer, judging that the model has implantation backdoor risk;

5. The method as claimed in claim 4, wherein the deep learning model is a convolutional neural network.

6. The method for defending against backdoor attacks according to claim 4, wherein the classification is: prediction classification result of deep learning model for single input image, and the result is expressed as p ═ p by vector₁，p₂，p₃，...]Each component of which represents the prediction probability of the input image in each class.

7. The method as claimed in claim 4, wherein the model scan is: deconstructing and field matching the trained deep learning model, specifically comprising:

①F(x)＝～f(x)_layer&&f(x)_match；

③f(x)_match＝match(keyword)₁||match(keyword)₂||…

wherein x represents an input picture, f (x)_layerIs the model layer-by-layer scan difference function, f (x)_matchIs the model static scan difference function, layer (x)_iRepresents the output of the i-th layer after model post-processing, |, represents the Euclidean distance of two classification results (probability vectors), match (keyword)_iRepresenting the use of keywords on a model_iA binary match is made.

8. The method of claim 4, wherein the image filter algorithm comprises: gaussian fuzzy algorithm and median fuzzy algorithm.