CN114021136A - Back door attack defense system for artificial intelligence model - Google Patents
Back door attack defense system for artificial intelligence model Download PDFInfo
- Publication number
- CN114021136A CN114021136A CN202111424165.8A CN202111424165A CN114021136A CN 114021136 A CN114021136 A CN 114021136A CN 202111424165 A CN202111424165 A CN 202111424165A CN 114021136 A CN114021136 A CN 114021136A
- Authority
- CN
- China
- Prior art keywords
- model
- back door
- layer
- image
- scanning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/52—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
- G06F21/54—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by adding security routines or objects to programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
A backdoor attack defense system against artificial intelligence models, comprising: the model is module and real-time defense module in advance, wherein: in the model pre-detection module scanning the classification model of the image to be detected: whether the operation behavior of reading or modifying the local file and the data output by each layer are carried out or not is judged so as to judge whether the model comprises a back door or not; the real-time defense module adds a filter to the picture containing the trigger, so that a back door in the image classification model is invalid. The method can be applied to models in the fields of image classification and image recognition, and strengthens defense on the models so as to improve the safety performance in the field of deep learning.
Description
Technical Field
The invention relates to a technology in the field of information security, in particular to a backdoor attack defense system aiming at an artificial intelligence model. The technology can be applied to the protection of artificial intelligence models in the image field in the industrial field, such as picture classification, face recognition, automatic driving and the like.
Background
Adding or changing neurons to a normal artificial intelligence model can change it into a model containing a backdoor. Once the input picture contains a trigger for triggering a backdoor, the deep learning model not only outputs a classification result, but also executes the malicious neurons of the model to damage a local system. Against this kind of back door attack, there is currently a lack of effective defense. Tests prove that mainstream antivirus software such as Noton, Kabaski, Mikefei and the like cannot identify whether the model contains the backdoor or not.
Disclosure of Invention
The invention provides a back door attack defense system aiming at an artificial intelligence model aiming at the defects and defects of the prior art, aiming at the characteristics of malicious neurons in the back door model and the characteristics that a trigger is required to be activated by the back door, the artificial intelligence model can be pre-detected to judge whether the model is the back door model, the trigger can be invalidated at the model operation stage to carry out protection, and the system can be applied to models in the fields of image classification and image recognition to strengthen defense of the model and further improve the safety performance in the field of deep learning.
The invention is realized by the following technical scheme:
the invention relates to a backdoor attack defense system aiming at an artificial intelligence model, which comprises the following components: the model is module and real-time defense module in advance, wherein: in the model pre-detection module scanning the classification model of the image to be detected: firstly, whether the operation behavior of reading or modifying the local file is performed and secondly, whether the model comprises a back door or not is judged according to data output by each layer; the real-time defense module adds a filter to the picture containing the trigger, so that a back door in the image classification model is invalid.
The model pre-detection module comprises: the keyword library unit, the scanning difference unit and the scanning result analysis unit, wherein: the keyword library unit adds keywords according to the information of the operating system operated by the model to obtain a keyword library; the scanning difference unit carries out model scanning processing according to the keyword library and the image input by the model to obtain a model scanning result; the scanning result analysis unit performs analysis processing according to the model scanning result information to determine whether the model is a back door model.
The real-time defense module comprises: a filter unit and a model operation unit, wherein: the filter unit adds filter processing to the image input by the model; and the model operation unit obtains a classification result according to the image with the filter.
The invention relates to a backdoor attack defense method aiming at an artificial intelligence model based on the system, which comprises the following steps:
step one, according to the back door attack principle of the artificial intelligence model, namely maliciously tampering system files, constructing a back door model on the basis of a normal artificial intelligence model, and selecting a plurality of image adding triggers on a data set for activating the back door.
Step two, scanning the back door model in the step one through a model pre-detection module, and judging whether the model comprises a back door according to the difference of scanning results: and when the scanned model contains keywords in the keyword library or the input of the model is not changed before and after passing through a certain model layer, judging that the model has implantation backdoor risk.
And step three, modifying the trigger on the picture to be incapable of triggering the model back door through an image filter algorithm in the model operation stage according to the characteristic that the model back door needs to be activated, and simultaneously not influencing the normal classification of the model, thereby realizing real-time defense.
The deep learning model is a convolution neural network.
The model back door is as follows: and when the trigger is detected, the embedded malicious code is executed to modify the system file. The consequences of this include, but are not limited to, malicious user telnet, website-holding attacks.
The trigger means that: the input pictures have specific characteristics, and the input with the characteristics can trigger the back door of the model, so that the model not only outputs the classification result, but also can execute malicious codes in the model and damage a local system.
The classification is as follows: and predicting a classification result of the single input image by the deep learning model. The result is expressed as a vector of p ═ p1,p2,p3,...]Each component of which represents the prediction probability of the input image in each class.
The model scanning refers to: deconstructing and field matching the trained deep learning model, specifically comprising:
①F(x)=~f(x)layer&&f(x)match;
②f(x)layer=|layer(x)2–layer(x)1|&&|layer(x)3–layer(x)2|&&…;
③f(x)match=match(keyword)1||match(keyword)2||…
wherein x represents an input picture, f (x)layerIs the model layer-by-layer scan difference function, f (x)matchIs the model static scan difference function, layer (x)iRepresents the output of the ith layer after model post-processing, |, represents the euclidean distance between the two classification results (probability vectors). match (keyword)iRepresenting the use of keywords on a modeliA binary match is made.
The image filter algorithm comprises the following steps: gaussian fuzzy algorithm and median fuzzy algorithm.
Technical effects
The method is used for carrying out pre-detection on the attack purpose of the backdoor and the intention of an attacker, namely when the attack purpose is to maliciously tamper the local system file, the interior of the backdoor model contains the operation of the malicious behavior; meanwhile, the filter is added, and aiming at the characteristic that the back door model has high sensitivity to the trigger, the trigger is modified by slightly changing the input image, so that the model back door is not activated any more.
Compared with the defect that the traditional mainstream virus searching and killing software cannot detect the back door model, the method can effectively detect the back door model and also can effectively reduce the triggering of the back door in the model operation stage, and the model scanning and image filter algorithm adopted by the invention has low requirements on the performance of a computer, does not need to install extra expensive graphic computing resources on a machine, and only carries out the pretreatment on the model file scanning and the image. The method can be flexibly applied to any image classification deep learning model and can be combined with other types of model back door defense methods.
Drawings
FIG. 1 is a schematic diagram of an artificial intelligence model backdoor attack.
Fig. 2 is an overall architecture diagram of a model backdoor attack defense system.
FIG. 3 is a schematic diagram of a model pre-detection structure according to the present invention.
FIG. 4 is a schematic diagram of the inconsistency analysis of model scan results.
FIG. 5 is an experimental design of model real-time defense.
Detailed Description
As shown in fig. 1, which is a schematic diagram of a deep learning backdoor model attack, a normal model becomes a model including a backdoor after a malicious code layer is inserted, and when a picture capable of triggering the backdoor is input by the model, a classification result is output, and a local system file is tampered.
As shown in fig. 2 to 5, the system for defending against backdoor attacks against an artificial intelligence model according to the present embodiment includes: the model is module and real-time defense module in advance, wherein: the model pre-detection module detects whether the operation behavior of reading or modifying the local file in the image classification model to be detected is performed through model scanning so as to judge whether the operation behavior comprises a back door; the real-time defense module enables a back door in the image classification model to be invalid by adding a filter to the picture containing the trigger.
The pre-detection is to perform keyword lexicon matching detection and model input layer-by-layer inspection on whether the operation behavior in the image classification model includes reading or modification, and specifically includes:
i) a keyword library is created.
ii) input the input image x into the model to obtain the output layer (x) of each layer1,layer(x)2And so on.
iii) obtaining output match (keyword) by field matching from keyword bank1,match(keyword)2And so on.
iv) judging the model scanning difference of the output results, which specifically comprises the following steps:
①F(x)=~f(x)layer&&f(x)layer;
②f(x)layer=|layer(x)2–layer(x)1|&&|layer(x)3–layer(x)2|&&…;
③f(x)match=match(keyword)1||match(keyword)2||…
wherein x represents an input picture, layer (x)iRepresenting the output of the ith layer after model post-processing, f (x) representing the classification result of the x input image, | · | representing the calculation of Euclidean distance for two classification results, namely probability vectors; match (keyword)iRepresenting the use of keywords on a modeliPerforming binary matching; layer (x) calls the function provided by the artificial intelligence framework, and match (keyword) calls the character matching function provided by the operating system.
The keyword library is self, environment, ssh, bashrc, writeFile.
And (3) judging that the input model contains a back door when the value obtained by F (x) is 1, otherwise, judging that the model is a normal model.
1) The back door model is trained using a training data set of models. The validation data set on the model is divided into a first data set Dataset1 and a second data set Dataset2 in a 1:1 ratio. And adds a trigger to each image in the first data set Dataset1 that can activate the back door of the model.
2) Under the condition of not using a filter algorithm, the malicious behavior triggering rates of the back door model on the two data sets are tested, and the result shows that the malicious behavior triggering rate of the back door model on the image on the data set containing the trigger is extremely high, and the false triggering rate of the back door model on the data set without artificially setting the trigger is extremely low.
3) And (3) adding a filter on the data set, and testing the malicious behavior triggering rate of the back door model on the two data sets respectively, wherein the result shows that after the filter is added to the data set, the triggering rate of the back door model on whether the data set contains the trigger is very low.
The model back door defense system provided by the invention has the characteristics of low hardware cost and high efficiency while keeping high detection rate and good defense performance.
After the model backdoor pre-detection method and the international known antivirus software in the defense system are used for testing a backdoor model and a non-backdoor model which are trained on MNIST and CIFAR10 data sets, the effects are respectively as follows:
a)MNIST:
I. and (3) norton: back door model detection failure
Kappa. Scotto: back door model detection failure
Michael phenanthrene: back door model detection failure
Model-based pre-detection method (method): successful detection of back door model
b)cifar10:
I. And (3) norton: back door model detection failure
Kappa. Scotto: back door model detection failure
Michael phenanthrene: back door model detection failure
Model-based pre-detection method (method): successful detection of back door model
According to the results, the model pre-detection method in the system can detect the backdoor model on the MNIST data set and the CIFAR10 data set, and the antivirus software cannot identify potential malicious behaviors in the backdoor model.
The model backdoor real-time defense method in the defense system is used for testing on MNIST and CIFAR10 data sets, and the effects are as follows:
a) MNIST dataset backgate trigger rate:
Dataset1 | Dataset2 | |
without using a filter | 100% | 0% |
Using a gaussian blur filter | 2.3% | 0% |
Using median blur filters | 4.7% | 0% |
b) CIFAR10 dataset backgate trigger rate:
from the results, it can be seen that the real-time defense method in the system can significantly reduce the triggering rate of the back door model on the MNIST data set and the CIFAR10 data set.
Gaussian blur filter parameters: gaussian convolution kernel: 5, variance: 1.1
Median blur Filter parameters: kernel size: 5
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Claims (8)
1. A backdoor attack defense system for an artificial intelligence model, comprising: the model is module and real-time defense module in advance, wherein: in the model pre-detection module scanning the classification model of the image to be detected: firstly, whether the operation behavior of reading or modifying the local file is performed and secondly, whether the model comprises a back door or not is judged according to data output by each layer; the real-time defense module adds a filter to the picture containing the trigger to ensure that a back door in the image classification model is invalid;
the model back door is as follows: when the trigger is detected, embedded malicious codes are executed to modify the system files, and the caused consequences include but are not limited to malicious user remote login and website holding attack;
the trigger means that: the input picture has a particular feature, the input of which is provided to trigger the back door of the model.
2. A backdoor attack defense system against an artificial intelligence model as recited in claim 1, wherein the model pre-detection module comprises: the keyword library unit, the scanning difference unit and the scanning result analysis unit, wherein: the keyword library unit adds keywords according to the information of the operating system operated by the model to obtain a keyword library; the scanning difference unit carries out model scanning processing according to the keyword library and the image input by the model to obtain a model scanning result; the scanning result analysis unit performs analysis processing according to the model scanning result information to determine whether the model is a back door model.
3. A backdoor attack defense system against an artificial intelligence model as recited in claim 1, wherein said real-time defense module comprises: a filter unit and a model operation unit, wherein: the filter unit adds filter processing to the image input by the model; and the model operation unit obtains a classification result according to the image with the filter.
4. A backdoor attack defense method aiming at an artificial intelligence model based on the system of any one of claims 1-3, characterized by comprising the following steps:
constructing a back door model on the basis of a normal artificial intelligence model according to the back door attack principle of the artificial intelligence model, namely maliciously tampering a system file, and selecting a plurality of image adding triggers on a data set for activating the back door;
step two, scanning the back door model in the step one through a model pre-detection module, and judging whether the model comprises a back door according to the difference of scanning results: when the scanned model contains keywords in a keyword library or the input of the model is not changed before and after passing through a certain model layer, judging that the model has implantation backdoor risk;
and step three, modifying the trigger on the picture to be incapable of triggering the model back door through an image filter algorithm in the model operation stage according to the characteristic that the model back door needs to be activated, and simultaneously not influencing the normal classification of the model, thereby realizing real-time defense.
5. The method as claimed in claim 4, wherein the deep learning model is a convolutional neural network.
6. The method for defending against backdoor attacks according to claim 4, wherein the classification is: prediction classification result of deep learning model for single input image, and the result is expressed as p ═ p by vector1,p2,p3,...]Each component of which represents the prediction probability of the input image in each class.
7. The method as claimed in claim 4, wherein the model scan is: deconstructing and field matching the trained deep learning model, specifically comprising:
①F(x)=~f(x)layer&&f(x)match;
②f(x)layer=|layer(x)2–layer(x)1|&&|layer(x)3–layer(x)2|&&…;
③f(x)match=match(keyword)1||match(keyword)2||…
wherein x represents an input picture, f (x)layerIs the model layer-by-layer scan difference function, f (x)matchIs the model static scan difference function, layer (x)iRepresents the output of the i-th layer after model post-processing, |, represents the Euclidean distance of two classification results (probability vectors), match (keyword)iRepresenting the use of keywords on a modeliA binary match is made.
8. The method of claim 4, wherein the image filter algorithm comprises: gaussian fuzzy algorithm and median fuzzy algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111424165.8A CN114021136A (en) | 2021-11-26 | 2021-11-26 | Back door attack defense system for artificial intelligence model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111424165.8A CN114021136A (en) | 2021-11-26 | 2021-11-26 | Back door attack defense system for artificial intelligence model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114021136A true CN114021136A (en) | 2022-02-08 |
Family
ID=80066759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111424165.8A Pending CN114021136A (en) | 2021-11-26 | 2021-11-26 | Back door attack defense system for artificial intelligence model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114021136A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115659171A (en) * | 2022-09-26 | 2023-01-31 | 中国工程物理研究院计算机应用研究所 | Model backdoor detection method and device based on multivariate feature interaction and storage medium |
-
2021
- 2021-11-26 CN CN202111424165.8A patent/CN114021136A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115659171A (en) * | 2022-09-26 | 2023-01-31 | 中国工程物理研究院计算机应用研究所 | Model backdoor detection method and device based on multivariate feature interaction and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Doan et al. | Lira: Learnable, imperceptible and robust backdoor attacks | |
Li et al. | Backdoor learning: A survey | |
Warnecke et al. | Evaluating explanation methods for deep learning in security | |
Li et al. | Deeppayload: Black-box backdoor attack on deep learning models through neural payload injection | |
Tran et al. | Spectral signatures in backdoor attacks | |
Naway et al. | A review on the use of deep learning in android malware detection | |
Huang et al. | One-pixel signature: Characterizing cnn models for backdoor detection | |
Chan et al. | Baddet: Backdoor attacks on object detection | |
EP4235523A1 (en) | Identifying and correcting vulnerabilities in machine learning models | |
CN107315956A (en) | A kind of Graph-theoretical Approach for being used to quick and precisely detect Malware on the zero | |
Jeong et al. | Adversarial attack-based security vulnerability verification using deep learning library for multimedia video surveillance | |
Wang et al. | A survey of neural trojan attacks and defenses in deep learning | |
CN113935033A (en) | Feature-fused malicious code family classification method and device and storage medium | |
Fang et al. | Backdoor attacks on the DNN interpretation system | |
Kakisim et al. | Sequential opcode embedding-based malware detection method | |
Li et al. | Deep learning algorithms for cyber security applications: A survey | |
Chen et al. | LinkBreaker: Breaking the backdoor-trigger link in DNNs via neurons consistency check | |
Wei et al. | Toward identifying APT malware through API system calls | |
Zhao et al. | Natural backdoor attacks on deep neural networks via raindrops | |
Bountakas et al. | Defense strategies for adversarial machine learning: A survey | |
Temple et al. | Towards quality assurance of software product lines with adversarial configurations | |
CN114021136A (en) | Back door attack defense system for artificial intelligence model | |
Noppel et al. | Disguising attacks with explanation-aware backdoors | |
Kalyan et al. | Detection of malware using cnn | |
Pranav et al. | Detection of botnets in IoT networks using graph theory and machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |