CN114021136A - Back door attack defense system for artificial intelligence model - Google Patents

Back door attack defense system for artificial intelligence model Download PDF

Info

Publication number
CN114021136A
CN114021136A CN202111424165.8A CN202111424165A CN114021136A CN 114021136 A CN114021136 A CN 114021136A CN 202111424165 A CN202111424165 A CN 202111424165A CN 114021136 A CN114021136 A CN 114021136A
Authority
CN
China
Prior art keywords
model
back door
layer
image
scanning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111424165.8A
Other languages
Chinese (zh)
Inventor
闫续
易平
谢宸琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202111424165.8A priority Critical patent/CN114021136A/en
Publication of CN114021136A publication Critical patent/CN114021136A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/54Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by adding security routines or objects to programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

A backdoor attack defense system against artificial intelligence models, comprising: the model is module and real-time defense module in advance, wherein: in the model pre-detection module scanning the classification model of the image to be detected: whether the operation behavior of reading or modifying the local file and the data output by each layer are carried out or not is judged so as to judge whether the model comprises a back door or not; the real-time defense module adds a filter to the picture containing the trigger, so that a back door in the image classification model is invalid. The method can be applied to models in the fields of image classification and image recognition, and strengthens defense on the models so as to improve the safety performance in the field of deep learning.

Description

Back door attack defense system for artificial intelligence model
Technical Field
The invention relates to a technology in the field of information security, in particular to a backdoor attack defense system aiming at an artificial intelligence model. The technology can be applied to the protection of artificial intelligence models in the image field in the industrial field, such as picture classification, face recognition, automatic driving and the like.
Background
Adding or changing neurons to a normal artificial intelligence model can change it into a model containing a backdoor. Once the input picture contains a trigger for triggering a backdoor, the deep learning model not only outputs a classification result, but also executes the malicious neurons of the model to damage a local system. Against this kind of back door attack, there is currently a lack of effective defense. Tests prove that mainstream antivirus software such as Noton, Kabaski, Mikefei and the like cannot identify whether the model contains the backdoor or not.
Disclosure of Invention
The invention provides a back door attack defense system aiming at an artificial intelligence model aiming at the defects and defects of the prior art, aiming at the characteristics of malicious neurons in the back door model and the characteristics that a trigger is required to be activated by the back door, the artificial intelligence model can be pre-detected to judge whether the model is the back door model, the trigger can be invalidated at the model operation stage to carry out protection, and the system can be applied to models in the fields of image classification and image recognition to strengthen defense of the model and further improve the safety performance in the field of deep learning.
The invention is realized by the following technical scheme:
the invention relates to a backdoor attack defense system aiming at an artificial intelligence model, which comprises the following components: the model is module and real-time defense module in advance, wherein: in the model pre-detection module scanning the classification model of the image to be detected: firstly, whether the operation behavior of reading or modifying the local file is performed and secondly, whether the model comprises a back door or not is judged according to data output by each layer; the real-time defense module adds a filter to the picture containing the trigger, so that a back door in the image classification model is invalid.
The model pre-detection module comprises: the keyword library unit, the scanning difference unit and the scanning result analysis unit, wherein: the keyword library unit adds keywords according to the information of the operating system operated by the model to obtain a keyword library; the scanning difference unit carries out model scanning processing according to the keyword library and the image input by the model to obtain a model scanning result; the scanning result analysis unit performs analysis processing according to the model scanning result information to determine whether the model is a back door model.
The real-time defense module comprises: a filter unit and a model operation unit, wherein: the filter unit adds filter processing to the image input by the model; and the model operation unit obtains a classification result according to the image with the filter.
The invention relates to a backdoor attack defense method aiming at an artificial intelligence model based on the system, which comprises the following steps:
step one, according to the back door attack principle of the artificial intelligence model, namely maliciously tampering system files, constructing a back door model on the basis of a normal artificial intelligence model, and selecting a plurality of image adding triggers on a data set for activating the back door.
Step two, scanning the back door model in the step one through a model pre-detection module, and judging whether the model comprises a back door according to the difference of scanning results: and when the scanned model contains keywords in the keyword library or the input of the model is not changed before and after passing through a certain model layer, judging that the model has implantation backdoor risk.
And step three, modifying the trigger on the picture to be incapable of triggering the model back door through an image filter algorithm in the model operation stage according to the characteristic that the model back door needs to be activated, and simultaneously not influencing the normal classification of the model, thereby realizing real-time defense.
The deep learning model is a convolution neural network.
The model back door is as follows: and when the trigger is detected, the embedded malicious code is executed to modify the system file. The consequences of this include, but are not limited to, malicious user telnet, website-holding attacks.
The trigger means that: the input pictures have specific characteristics, and the input with the characteristics can trigger the back door of the model, so that the model not only outputs the classification result, but also can execute malicious codes in the model and damage a local system.
The classification is as follows: and predicting a classification result of the single input image by the deep learning model. The result is expressed as a vector of p ═ p1,p2,p3,...]Each component of which represents the prediction probability of the input image in each class.
The model scanning refers to: deconstructing and field matching the trained deep learning model, specifically comprising:
①F(x)=~f(x)layer&&f(x)match
②f(x)layer=|layer(x)2–layer(x)1|&&|layer(x)3–layer(x)2|&&…;
③f(x)match=match(keyword)1||match(keyword)2||…
wherein x represents an input picture, f (x)layerIs the model layer-by-layer scan difference function, f (x)matchIs the model static scan difference function, layer (x)iRepresents the output of the ith layer after model post-processing, |, represents the euclidean distance between the two classification results (probability vectors). match (keyword)iRepresenting the use of keywords on a modeliA binary match is made.
The image filter algorithm comprises the following steps: gaussian fuzzy algorithm and median fuzzy algorithm.
Technical effects
The method is used for carrying out pre-detection on the attack purpose of the backdoor and the intention of an attacker, namely when the attack purpose is to maliciously tamper the local system file, the interior of the backdoor model contains the operation of the malicious behavior; meanwhile, the filter is added, and aiming at the characteristic that the back door model has high sensitivity to the trigger, the trigger is modified by slightly changing the input image, so that the model back door is not activated any more.
Compared with the defect that the traditional mainstream virus searching and killing software cannot detect the back door model, the method can effectively detect the back door model and also can effectively reduce the triggering of the back door in the model operation stage, and the model scanning and image filter algorithm adopted by the invention has low requirements on the performance of a computer, does not need to install extra expensive graphic computing resources on a machine, and only carries out the pretreatment on the model file scanning and the image. The method can be flexibly applied to any image classification deep learning model and can be combined with other types of model back door defense methods.
Drawings
FIG. 1 is a schematic diagram of an artificial intelligence model backdoor attack.
Fig. 2 is an overall architecture diagram of a model backdoor attack defense system.
FIG. 3 is a schematic diagram of a model pre-detection structure according to the present invention.
FIG. 4 is a schematic diagram of the inconsistency analysis of model scan results.
FIG. 5 is an experimental design of model real-time defense.
Detailed Description
As shown in fig. 1, which is a schematic diagram of a deep learning backdoor model attack, a normal model becomes a model including a backdoor after a malicious code layer is inserted, and when a picture capable of triggering the backdoor is input by the model, a classification result is output, and a local system file is tampered.
As shown in fig. 2 to 5, the system for defending against backdoor attacks against an artificial intelligence model according to the present embodiment includes: the model is module and real-time defense module in advance, wherein: the model pre-detection module detects whether the operation behavior of reading or modifying the local file in the image classification model to be detected is performed through model scanning so as to judge whether the operation behavior comprises a back door; the real-time defense module enables a back door in the image classification model to be invalid by adding a filter to the picture containing the trigger.
The pre-detection is to perform keyword lexicon matching detection and model input layer-by-layer inspection on whether the operation behavior in the image classification model includes reading or modification, and specifically includes:
i) a keyword library is created.
ii) input the input image x into the model to obtain the output layer (x) of each layer1,layer(x)2And so on.
iii) obtaining output match (keyword) by field matching from keyword bank1,match(keyword)2And so on.
iv) judging the model scanning difference of the output results, which specifically comprises the following steps:
①F(x)=~f(x)layer&&f(x)layer
②f(x)layer=|layer(x)2–layer(x)1|&&|layer(x)3–layer(x)2|&&…;
③f(x)match=match(keyword)1||match(keyword)2||…
wherein x represents an input picture, layer (x)iRepresenting the output of the ith layer after model post-processing, f (x) representing the classification result of the x input image, | · | representing the calculation of Euclidean distance for two classification results, namely probability vectors; match (keyword)iRepresenting the use of keywords on a modeliPerforming binary matching; layer (x) calls the function provided by the artificial intelligence framework, and match (keyword) calls the character matching function provided by the operating system.
The keyword library is self, environment, ssh, bashrc, writeFile.
And (3) judging that the input model contains a back door when the value obtained by F (x) is 1, otherwise, judging that the model is a normal model.
1) The back door model is trained using a training data set of models. The validation data set on the model is divided into a first data set Dataset1 and a second data set Dataset2 in a 1:1 ratio. And adds a trigger to each image in the first data set Dataset1 that can activate the back door of the model.
2) Under the condition of not using a filter algorithm, the malicious behavior triggering rates of the back door model on the two data sets are tested, and the result shows that the malicious behavior triggering rate of the back door model on the image on the data set containing the trigger is extremely high, and the false triggering rate of the back door model on the data set without artificially setting the trigger is extremely low.
3) And (3) adding a filter on the data set, and testing the malicious behavior triggering rate of the back door model on the two data sets respectively, wherein the result shows that after the filter is added to the data set, the triggering rate of the back door model on whether the data set contains the trigger is very low.
The model back door defense system provided by the invention has the characteristics of low hardware cost and high efficiency while keeping high detection rate and good defense performance.
After the model backdoor pre-detection method and the international known antivirus software in the defense system are used for testing a backdoor model and a non-backdoor model which are trained on MNIST and CIFAR10 data sets, the effects are respectively as follows:
a)MNIST:
I. and (3) norton: back door model detection failure
Kappa. Scotto: back door model detection failure
Michael phenanthrene: back door model detection failure
Model-based pre-detection method (method): successful detection of back door model
b)cifar10:
I. And (3) norton: back door model detection failure
Kappa. Scotto: back door model detection failure
Michael phenanthrene: back door model detection failure
Model-based pre-detection method (method): successful detection of back door model
According to the results, the model pre-detection method in the system can detect the backdoor model on the MNIST data set and the CIFAR10 data set, and the antivirus software cannot identify potential malicious behaviors in the backdoor model.
The model backdoor real-time defense method in the defense system is used for testing on MNIST and CIFAR10 data sets, and the effects are as follows:
a) MNIST dataset backgate trigger rate:
Dataset1 Dataset2
without using a filter 100% 0%
Using a gaussian blur filter 2.3% 0%
Using median blur filters 4.7% 0%
b) CIFAR10 dataset backgate trigger rate:
Figure BDA0003377622950000041
Figure BDA0003377622950000051
from the results, it can be seen that the real-time defense method in the system can significantly reduce the triggering rate of the back door model on the MNIST data set and the CIFAR10 data set.
Gaussian blur filter parameters: gaussian convolution kernel: 5, variance: 1.1
Median blur Filter parameters: kernel size: 5
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (8)

1. A backdoor attack defense system for an artificial intelligence model, comprising: the model is module and real-time defense module in advance, wherein: in the model pre-detection module scanning the classification model of the image to be detected: firstly, whether the operation behavior of reading or modifying the local file is performed and secondly, whether the model comprises a back door or not is judged according to data output by each layer; the real-time defense module adds a filter to the picture containing the trigger to ensure that a back door in the image classification model is invalid;
the model back door is as follows: when the trigger is detected, embedded malicious codes are executed to modify the system files, and the caused consequences include but are not limited to malicious user remote login and website holding attack;
the trigger means that: the input picture has a particular feature, the input of which is provided to trigger the back door of the model.
2. A backdoor attack defense system against an artificial intelligence model as recited in claim 1, wherein the model pre-detection module comprises: the keyword library unit, the scanning difference unit and the scanning result analysis unit, wherein: the keyword library unit adds keywords according to the information of the operating system operated by the model to obtain a keyword library; the scanning difference unit carries out model scanning processing according to the keyword library and the image input by the model to obtain a model scanning result; the scanning result analysis unit performs analysis processing according to the model scanning result information to determine whether the model is a back door model.
3. A backdoor attack defense system against an artificial intelligence model as recited in claim 1, wherein said real-time defense module comprises: a filter unit and a model operation unit, wherein: the filter unit adds filter processing to the image input by the model; and the model operation unit obtains a classification result according to the image with the filter.
4. A backdoor attack defense method aiming at an artificial intelligence model based on the system of any one of claims 1-3, characterized by comprising the following steps:
constructing a back door model on the basis of a normal artificial intelligence model according to the back door attack principle of the artificial intelligence model, namely maliciously tampering a system file, and selecting a plurality of image adding triggers on a data set for activating the back door;
step two, scanning the back door model in the step one through a model pre-detection module, and judging whether the model comprises a back door according to the difference of scanning results: when the scanned model contains keywords in a keyword library or the input of the model is not changed before and after passing through a certain model layer, judging that the model has implantation backdoor risk;
and step three, modifying the trigger on the picture to be incapable of triggering the model back door through an image filter algorithm in the model operation stage according to the characteristic that the model back door needs to be activated, and simultaneously not influencing the normal classification of the model, thereby realizing real-time defense.
5. The method as claimed in claim 4, wherein the deep learning model is a convolutional neural network.
6. The method for defending against backdoor attacks according to claim 4, wherein the classification is: prediction classification result of deep learning model for single input image, and the result is expressed as p ═ p by vector1,p2,p3,...]Each component of which represents the prediction probability of the input image in each class.
7. The method as claimed in claim 4, wherein the model scan is: deconstructing and field matching the trained deep learning model, specifically comprising:
①F(x)=~f(x)layer&&f(x)match
②f(x)layer=|layer(x)2–layer(x)1|&&|layer(x)3–layer(x)2|&&…;
③f(x)match=match(keyword)1||match(keyword)2||…
wherein x represents an input picture, f (x)layerIs the model layer-by-layer scan difference function, f (x)matchIs the model static scan difference function, layer (x)iRepresents the output of the i-th layer after model post-processing, |, represents the Euclidean distance of two classification results (probability vectors), match (keyword)iRepresenting the use of keywords on a modeliA binary match is made.
8. The method of claim 4, wherein the image filter algorithm comprises: gaussian fuzzy algorithm and median fuzzy algorithm.
CN202111424165.8A 2021-11-26 2021-11-26 Back door attack defense system for artificial intelligence model Pending CN114021136A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111424165.8A CN114021136A (en) 2021-11-26 2021-11-26 Back door attack defense system for artificial intelligence model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111424165.8A CN114021136A (en) 2021-11-26 2021-11-26 Back door attack defense system for artificial intelligence model

Publications (1)

Publication Number Publication Date
CN114021136A true CN114021136A (en) 2022-02-08

Family

ID=80066759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111424165.8A Pending CN114021136A (en) 2021-11-26 2021-11-26 Back door attack defense system for artificial intelligence model

Country Status (1)

Country Link
CN (1) CN114021136A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115659171A (en) * 2022-09-26 2023-01-31 中国工程物理研究院计算机应用研究所 Model backdoor detection method and device based on multivariate feature interaction and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115659171A (en) * 2022-09-26 2023-01-31 中国工程物理研究院计算机应用研究所 Model backdoor detection method and device based on multivariate feature interaction and storage medium

Similar Documents

Publication Publication Date Title
Doan et al. Lira: Learnable, imperceptible and robust backdoor attacks
Li et al. Backdoor learning: A survey
Warnecke et al. Evaluating explanation methods for deep learning in security
Li et al. Deeppayload: Black-box backdoor attack on deep learning models through neural payload injection
Tran et al. Spectral signatures in backdoor attacks
Naway et al. A review on the use of deep learning in android malware detection
Huang et al. One-pixel signature: Characterizing cnn models for backdoor detection
Chan et al. Baddet: Backdoor attacks on object detection
EP4235523A1 (en) Identifying and correcting vulnerabilities in machine learning models
CN107315956A (en) A kind of Graph-theoretical Approach for being used to quick and precisely detect Malware on the zero
Jeong et al. Adversarial attack-based security vulnerability verification using deep learning library for multimedia video surveillance
Wang et al. A survey of neural trojan attacks and defenses in deep learning
CN113935033A (en) Feature-fused malicious code family classification method and device and storage medium
Fang et al. Backdoor attacks on the DNN interpretation system
Kakisim et al. Sequential opcode embedding-based malware detection method
Li et al. Deep learning algorithms for cyber security applications: A survey
Chen et al. LinkBreaker: Breaking the backdoor-trigger link in DNNs via neurons consistency check
Wei et al. Toward identifying APT malware through API system calls
Zhao et al. Natural backdoor attacks on deep neural networks via raindrops
Bountakas et al. Defense strategies for adversarial machine learning: A survey
Temple et al. Towards quality assurance of software product lines with adversarial configurations
CN114021136A (en) Back door attack defense system for artificial intelligence model
Noppel et al. Disguising attacks with explanation-aware backdoors
Kalyan et al. Detection of malware using cnn
Pranav et al. Detection of botnets in IoT networks using graph theory and machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination