CN113609482A - Back door detection and restoration method and system for image classification model - Google Patents

Back door detection and restoration method and system for image classification model Download PDF

Info

Publication number
CN113609482A
CN113609482A CN202110796626.8A CN202110796626A CN113609482A CN 113609482 A CN113609482 A CN 113609482A CN 202110796626 A CN202110796626 A CN 202110796626A CN 113609482 A CN113609482 A CN 113609482A
Authority
CN
China
Prior art keywords
model
trigger
back door
potential
backdoor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110796626.8A
Other languages
Chinese (zh)
Other versions
CN113609482B (en
Inventor
陈恺
朱宏
赵月
梁瑞刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202110796626.8A priority Critical patent/CN113609482B/en
Publication of CN113609482A publication Critical patent/CN113609482A/en
Application granted granted Critical
Publication of CN113609482B publication Critical patent/CN113609482B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Physics (AREA)
  • Neurology (AREA)
  • Computer Hardware Design (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Virology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a backdoor detection and restoration method and a backdoor detection and restoration system aiming at an image classification model, which belong to the technical field of software technology and information security, and adopt a method of model pruning, transfer learning and shallow model training to obtain a series of comparison models which have the same task as a backdoor model but do not have the backdoor; reversing each category of the backdoor model by optimizing the objective function by means of the comparison model to obtain a series of potential triggers; refining the potential trigger by utilizing a contribution thermodynamic diagram, and only keeping key features influencing a model classification result; distinguishing a back door trigger and a countermeasure patch of the potential triggers after refining based on the difference of the migratability of the back door trigger and the countermeasure patch on the comparison model; adding the distinguished rear door trigger into a clean data set, and removing a rear door in the rear door model through countermeasure training. According to the method, only a small amount of clean data is used, the back door of the image classification model can be detected and repaired, and a normal model is generated.

Description

Back door detection and restoration method and system for image classification model
Technical Field
The invention belongs to the technical field of software technology and information security, relates to artificial intelligence oriented security technology, and particularly relates to a backdoor detection and restoration method and system for a deep neural network image classification model.
Background
In recent years, Deep Neural Networks (DNNs) have been widely used in the fields of computer vision, speech recognition, natural language processing, and the like because of their accurate prediction results. Deep neural networks are even used in important security areas such as access control systems, automotive driving and medical diagnostics, because of their accuracy, sometimes even more reliable, than human experts.
However, while being widely used, deep neural networks also face serious security issues, such as data poisoning attacks, counterattacks, backdoor attacks, and so on. In particular, an attacker may inject back-gates into the deep neural network during model training to control the behavior of the model. The DNN model injected into the back door behaves substantially in accordance with the model without the back door on normal input data, but when a special "trigger (i.e., a special pattern overlaid on the original image)" is input, an abnormal behavior of the model is triggered, resulting in an attacker's desired result. The existence of backdoor attacks brings potential safety hazards to the deep neural network. For example, a rear door can be injected into the DNN model to misrecognize the parking signboard stuck with a special sticker (trigger) as a speed limit signboard. If an autonomous automobile is equipped with such a rear door model, a fatal traffic accident may occur.
Disclosure of Invention
The invention aims to provide a back door detection and restoration method for a deep neural network image classification model. The method can detect the backdoor possibly existing in the model by using a small amount of clean data on the premise of not knowing a backdoor trigger and a backdoor attack target, and repair the detected backdoor to generate a normal model.
In order to achieve the purpose, the invention adopts the following technical scheme:
a backdoor detection and restoration method for an image classification model comprises the following steps:
based on a clean data set, a series of comparison models which have the same task as a back door model but do not have a back door are obtained by adopting a model pruning method, a transfer learning method and a shallow model training method;
reversing each category of the back door model by optimizing an objective function by means of the comparison model and the clean data set to obtain a series of potential triggers, wherein the potential triggers comprise a back door trigger and a countermeasure patch;
calculating a contribution thermodynamic diagram according to the clean data set and the potential triggers, refining the potential triggers by using the contribution thermodynamic diagram, and only keeping key features influencing the classification result of the model;
distinguishing a back door trigger and a countermeasure patch of the potential triggers after refining based on the difference of the migratability of the back door trigger and the countermeasure patch on the comparison model;
adding the distinguished rear door trigger into a clean data set, and removing a rear door in the rear door model through countermeasure training.
Further, the clean data set is a pollution training set from backdoor attack or a data set with similar data distribution with the pollution training set, wherein the similar data distribution means that the similarity of the data distribution is higher than a preset index; the amount of data in the clean data set is 10% -20% of the contaminated training set.
Further, the model pruning method comprises the following steps: removing the backdoor by cutting off neurons with low activation rate in the backdoor model, and recovering the classification accuracy of the model by fine tuning training;
the transfer learning method comprises the following steps: on the basis of a neural network model similar to the classification task of the back door model, a comparison model is obtained through transfer learning training;
the shallow model training method comprises the following steps: and simplifying the structure of the back door model, and training on the simplified model structure to obtain a comparison model.
Further, the objective function is optimized by adjusting the weight of the loss function of the objective function, and the formula is as follows:
Figure BDA0003163025020000021
Figure BDA0003163025020000022
Figure BDA0003163025020000023
Figure BDA0003163025020000024
wherein the loss function LbackdoorAnd LcleanRespectively representing the influence of the rear door trigger on the classification results of the rear door model and the comparison model, and a loss function LnoiseIs a noise reduction function applied to m; α, β and γ are weight coefficients of the loss function; Δ and m are two variables of the objective function optimization, which are three-dimensional matrices of the same size as the clean dataset, where Δ is the pattern that holds the potential triggers; m is a transparency matrix, controlling the location of potential triggers; x is the number ofiIs an image randomly selected from a clean dataset; j is a full 1 matrix with dimensions the same as Δ; Δ m + xi(J-m) indicates overlaying the trigger on image xiThe above step (1); f. ofbAnd fcA prediction function for the back door model and the comparison model, respectively; CE is the cross entropy loss function; n is the total number of images in the clean dataset; i is the number of the current image; on the back door model, the image with the trigger is classified into the target class ytClassified into the correct category y on the comparison modeli(ii) a j and k represent the rows and columns, respectively, of the matrix m, and a and b are indices of the summation symbols.
Further, the step of calculating a contribution thermodynamic diagram from the clean data set and the potential triggers includes:
randomly selecting a set of images from the clean dataset and overlaying with the potential triggers;
and calculating a thermodynamic diagram representing the contribution degree of the classification result for all the images, namely the contribution degree thermodynamic diagram.
Further, the step of refining the potential triggers using the contribution thermodynamic diagram includes:
averaging all the contribution thermodynamic diagrams to obtain an average thermodynamic diagram;
removing the area with the lowest current contribution degree in the potential triggers according to the average thermodynamic diagram;
and calculating the current attack success rate of the potential trigger, if the current attack success rate is lower than a threshold value, ending, and if the current attack success rate is not lower than the threshold value, continuously removing the area with the lowest current contribution degree in the potential trigger.
Further, the step of distinguishing the back-gate trigger of the refined potential trigger from the countermeasure patch includes:
randomly selecting a set of images from the clean dataset and overlaying with the potential triggers;
calculating the attack success rate of the potential trigger on the back door model, if the attack success rate is lower than a threshold value, judging as a counterpatch, and ending;
and if the attack success rate is not lower than the threshold, calculating the attack success rate of the potential trigger on all the comparison models, if the attack success rate on one comparison model is higher than the other threshold, judging the potential trigger as a counterpatch, otherwise, judging the potential trigger as a backdoor trigger.
Further, randomly selecting a certain proportion of images from a clean data set, and covering the images by using a rear door trigger; the distinguished back door trigger is then added to the clean data set.
Further, the step of removing the back door in the back door model through the countermeasure training includes: adding the distinguished backdoor trigger into a clean data set, and keeping the class information of the image unchanged to obtain a confrontation training data set; and (5) fine-tuning the training back door model by using the confrontation training data set, and removing the back door in the back door model.
A backdoor detection and restoration system for an image classification model comprises a memory and a processor, wherein a computer program is stored on the memory, and the processor realizes the steps of the method when executing the program.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
Compared with the prior art, the invention has the following positive effects:
the invention has stronger detection capability on the back door, has wider detection range on the back doors of different types of triggers, is less influenced by factors such as area occupation ratio, position, shape, pattern and the like of the triggers, and has lower false alarm rate and missing report rate. Compared with the existing backdoor detection method (such as neural clean, ABS and TABOR), the method has the advantage that the assumption is put forward and limited on the area proportion of the trigger, so that an attacker can avoid detection by adopting the trigger with larger area proportion (more than 10%) at the cost of sacrificing the concealment of the trigger, and the method can still maintain the detection capability when the area proportion of the trigger reaches 25% and is more difficult to attack adaptively.
Drawings
Fig. 1 is an overall flowchart of a backdoor detection and restoration method for an image classification model according to the present invention.
FIG. 2 is a flow diagram of potential trigger refining.
Fig. 3 is a flowchart of back door trigger recognition.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
The embodiment discloses a backdoor detection and restoration method for an image classification model, as shown in fig. 1, the steps are as follows:
1. the invention comprises the following key points:
1.1. generation of a comparison model: meanwhile, a series of comparison models which have the same task as the back door model but do not have the back door are obtained by adopting the methods of model pruning, transfer learning and shallow model training.
1.2. The potential triggers are reversed: an objective function is designed by means of a comparison model and a clean data set, and each category of the back door model is reversed to obtain a series of potential triggers (consisting of a back door trigger and a countermeasure patch).
1.3. Potential trigger refinement: and refining the potential trigger by means of a contribution thermodynamic diagram technology, and removing redundant features of the potential trigger to obtain a refined potential trigger.
1.4. Back door trigger recognition: the refined potential triggers are classified into two categories, back door triggers and countermeasure patches, based on the difference in migratability of the back door triggers and countermeasure patches on the comparison model.
1.5. Repairing a rear door model: and adding a rear door trigger into the clean data set, and removing a rear door in the rear door model through countermeasure training to obtain a normal model without the rear door.
2. The generation of the control model comprises the following three modes and is adopted simultaneously:
2.1. model pruning: and removing the backdoor by cutting off the neurons with low activation rate in the model, and simultaneously restoring the classification accuracy of the model by adopting fine tuning training.
2.2. Transfer learning: the control model is trained by transfer learning based on a model similar to the back door model classification task.
2.3. Training a superficial layer model: the structure of the back door model is simplified, and a comparison model is trained on the simplified model structure.
3. The potential trigger reversal is completed by optimizing an objective function:
Figure BDA0003163025020000041
Figure BDA0003163025020000042
Figure BDA0003163025020000043
Figure BDA0003163025020000051
Δ and m are two variables of the objective function optimization, both three-dimensional matrices of the same size as the clean dataset. Where Δ is a pattern that holds potential triggers; m is a transparency matrix that controls the location of potential triggers. The objective function consists of three loss functions, adjusted by three weights α, β and γ.
xiIs an image randomly selected from a clean data set. J is an all 1 matrix with dimensions the same as Δ. Δ m + xi(J-m) indicates overlaying the trigger on image xiThe above. f. ofbAnd fcRespectively, the prediction functions of the back door model and the control model. CE is the cross entropy loss function. n is the total number of images in the clean dataset and i is the number of the current image. L isbackdoorAnd LcleanRespectively representing the effect of the backdoor trigger on the classification results of the backdoor model and the comparison model. On the back door model, the image with the trigger should be classified into the target class ytShould be classified into the correct category y on the control modeli. Only one control model need be used here. L isnoiseIs the noise reduction function applied to m. j and k represent the rows and columns, respectively, of the matrix m, and a and b are indices of the summation symbols. L isnoiseAnd the purpose of noise reduction is achieved by adding adjacent pixel points of m, taking absolute values and then summing.
4. The flow of potential trigger refinement is shown in FIG. 2 and includes the following steps:
4.1. a set of raw images is randomly selected from the clean dataset and overlaid with potential triggers.
4.2. Calculating a thermodynamic diagram representing the contribution of the classification result (a two-dimensional matrix with the same size as the original image, wherein the larger the numerical value of the midpoint of the matrix, the larger the contribution of the pixel point at the same position of the original image to the classification result) for all the images, and averaging all the thermodynamic diagrams to obtain an average thermodynamic diagram.
4.3. And removing the area with the lowest contribution degree in the potential trigger according to the average thermodynamic diagram.
4.4. Calculating the current attack success rate of the potential trigger, if the current attack success rate is lower than a threshold value (95 percent of the attack success rate of the unrefined original potential trigger), ending, otherwise, skipping to the step 4.3
5. The process of identifying the back door trigger is shown in fig. 3, and includes the following steps:
5.1. a set of images is randomly selected from the clean dataset and overlaid with potential triggers.
5.2. And calculating the attack success rate of the potential trigger on the backdoor model.
5.3. If the attack success rate is lower than a threshold value (the value is 60 percent of the preset hyperparameter), judging as a counterpatch, ending, otherwise, skipping 5.4.
5.4. The attack success rate of the potential trigger on all the comparison models is calculated.
5.5. If the attack success rate on a certain comparison model is higher than another threshold (the preset hyperparameter is related to the number of classification categories, 40% on a data set with a small number of MNIST and GTSRB categories and 20% on a data set with a large number of Youtube-Face and VGG-Face categories), judging as a counterattack patch, otherwise, judging as a back door trigger.
6. The rear door model repairing method comprises the following steps:
6.1. a proportion of the image is randomly selected from the clean data set and covered with a back door trigger.
6.2. And adding the image into a clean data set and keeping the class information of the image unchanged to obtain a confrontation training data set.
6.3. And (5) fine-tuning the training back door model by using the confrontation training data set, and removing the back door in the back door model.
According to the embodiment, firstly, according to the angle of a backdoor attacker, 60 backdoor models are generated by two mainstream backdoor attack modes, namely a pollution training set (Badnets) and a modified pre-training model (TrojanNN), on four data sets in three application fields of handwritten number classification (MNIST data set), traffic sign classification (GTSRB data set) and Face classification (Youtube-Face and VGG-Face data set); meanwhile, 30 normal (no backdoor) models are generated by adopting a normal model training method. The "trigger" of the back door model is a special pattern covering the original image, with an area of 2% -25%, with different positions, shapes and patterns. According to the invention, detection results that the false alarm rate (the number of normal models of the false detected back door/the total number of normal models) and the false alarm rate (the number of back door models of the undetected back door/the total number of back door models) are less than 10% are achieved on the 90 models.
The above embodiments are only intended to illustrate the technical solution of the present invention, but not to limit it, and a person skilled in the art can modify the technical solution of the present invention or substitute it with an equivalent, and the protection scope of the present invention is subject to the claims.

Claims (10)

1. A backdoor detection and restoration method for an image classification model is characterized by comprising the following steps:
based on a clean data set, a series of comparison models which have the same task as a back door model but do not have a back door are obtained by adopting a model pruning method, a transfer learning method and a shallow model training method;
reversing each category of the back door model by optimizing an objective function by means of the comparison model and the clean data set to obtain a series of potential triggers, wherein the potential triggers comprise a back door trigger and a countermeasure patch;
calculating a contribution thermodynamic diagram according to the clean data set and the potential triggers, refining the potential triggers by using the contribution thermodynamic diagram, and only keeping key features influencing the classification result of the model;
distinguishing a back door trigger and a countermeasure patch of the potential triggers after refining based on the difference of the migratability of the back door trigger and the countermeasure patch on the comparison model;
adding the distinguished rear door trigger into a clean data set, and removing a rear door in the rear door model through countermeasure training.
2. The method of claim 1, wherein the model pruning method is: removing the backdoor by cutting off neurons with low activation rate in the backdoor model, and recovering the classification accuracy of the model by fine tuning training;
the transfer learning method comprises the following steps: on the basis of a neural network model similar to the classification task of the back door model, a comparison model is obtained through transfer learning training;
the shallow model training method comprises the following steps: and simplifying the structure of the back door model, and training on the simplified model structure to obtain a comparison model.
3. The method of claim 1, wherein the objective function is optimized by adjusting the penalty function weights of the objective function, as follows:
Figure FDA0003163025010000011
Figure FDA0003163025010000012
Figure FDA0003163025010000013
Figure FDA0003163025010000014
wherein the loss function LbackdoorAnd LcleanRespectively representing the influence of the rear door trigger on the classification results of the rear door model and the comparison model, and a loss function LnoiseIs a noise reduction function applied to m; α, β and γ are weight coefficients of the loss function; Δ and m are two variables of the objective function optimization, which are three-dimensional matrices of the same size as the clean dataset, where Δ is the pattern that holds the potential triggers; m is a matrix of the degree of transparency,controlling the position of the potential trigger; x is the number ofiIs an image randomly selected from a clean dataset; j is a full 1 matrix with dimensions the same as Δ; Δ m + xi(J-m) indicates overlaying the trigger on image xiThe above step (1); f. ofbAnd fcA prediction function for the back door model and the comparison model, respectively; CE is the cross entropy loss function; n is the total number of images in the clean dataset; i is the number of the current image; on the back door model, the image with the trigger is classified into the target class ytClassified into the correct category y on the comparison modeli(ii) a j and k represent the rows and columns, respectively, of the matrix m, and a and b are indices of the summation symbols.
4. The method of claim 1, wherein the step of calculating a contribution thermodynamic diagram from the clean data set and the potential triggers comprises:
randomly selecting a set of images from the clean dataset and overlaying with the potential triggers;
and calculating a thermodynamic diagram representing the contribution degree of the classification result for all the images, namely the contribution degree thermodynamic diagram.
5. The method of claim 1 or 4, wherein the step of refining the potential trigger using a contributing thermodynamic diagram comprises:
averaging all the contribution thermodynamic diagrams to obtain an average thermodynamic diagram;
removing the area with the lowest current contribution degree in the potential triggers according to the average thermodynamic diagram;
and calculating the current attack success rate of the potential trigger, if the current attack success rate is lower than a threshold value, ending, and if the current attack success rate is not lower than the threshold value, continuously removing the area with the lowest current contribution degree in the potential trigger.
6. The method of claim 1, wherein the step of distinguishing between a back-gate trigger of the refined potential triggers and the countermeasure patch comprises:
randomly selecting a set of images from the clean dataset and overlaying with the potential triggers;
calculating the attack success rate of the potential trigger on the back door model, if the attack success rate is lower than a threshold value, judging as a counterpatch, and ending;
and if the attack success rate is not lower than the threshold, calculating the attack success rate of the potential trigger on all the comparison models, if the attack success rate on one comparison model is higher than the other threshold, judging the potential trigger as a counterpatch, otherwise, judging the potential trigger as a backdoor trigger.
7. The method of claim 1, wherein a proportion of the images are first randomly selected from the clean data set and overlaid with a back door trigger; the distinguished back door trigger is then added to the clean data set.
8. The method of claim 1, wherein the step of removing the back door in the back door model by the counter training comprises: adding the distinguished backdoor trigger into a clean data set, and keeping the class information of the image unchanged to obtain a confrontation training data set; and (5) fine-tuning the training back door model by using the confrontation training data set, and removing the back door in the back door model.
9. A backdoor detection and repair system for an image classification model, comprising a memory on which is stored a computer program and a processor which, when executed, carries out the steps of the method of any one of claims 1 to 8.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN202110796626.8A 2021-07-14 2021-07-14 Back door detection and restoration method and system for image classification model Active CN113609482B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110796626.8A CN113609482B (en) 2021-07-14 2021-07-14 Back door detection and restoration method and system for image classification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110796626.8A CN113609482B (en) 2021-07-14 2021-07-14 Back door detection and restoration method and system for image classification model

Publications (2)

Publication Number Publication Date
CN113609482A true CN113609482A (en) 2021-11-05
CN113609482B CN113609482B (en) 2023-10-17

Family

ID=78304643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110796626.8A Active CN113609482B (en) 2021-07-14 2021-07-14 Back door detection and restoration method and system for image classification model

Country Status (1)

Country Link
CN (1) CN113609482B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003511A (en) * 2021-12-24 2022-02-01 支付宝(杭州)信息技术有限公司 Evaluation method and device for model interpretation tool
CN114154589A (en) * 2021-12-13 2022-03-08 成都索贝数码科技股份有限公司 Similarity-based module branch reduction method
CN116091871A (en) * 2023-03-07 2023-05-09 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Physical countermeasure sample generation method and device for target detection model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318099A1 (en) * 2018-04-16 2019-10-17 International Business Machines Corporation Using Gradients to Detect Backdoors in Neural Networks
US20200410098A1 (en) * 2019-06-26 2020-12-31 Hrl Laboratories, Llc System and method for detecting backdoor attacks in convolutional neural networks
CN112989438A (en) * 2021-02-18 2021-06-18 上海海洋大学 Detection and identification method for backdoor attack of privacy protection neural network model
CN113111349A (en) * 2021-04-25 2021-07-13 浙江大学 Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318099A1 (en) * 2018-04-16 2019-10-17 International Business Machines Corporation Using Gradients to Detect Backdoors in Neural Networks
US20200410098A1 (en) * 2019-06-26 2020-12-31 Hrl Laboratories, Llc System and method for detecting backdoor attacks in convolutional neural networks
CN112989438A (en) * 2021-02-18 2021-06-18 上海海洋大学 Detection and identification method for backdoor attack of privacy protection neural network model
CN113111349A (en) * 2021-04-25 2021-07-13 浙江大学 Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUANG X 等: "Neuroninspect: Detecting backdoors in neural networks via output explanations", ARXIV, pages 1 - 7 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154589A (en) * 2021-12-13 2022-03-08 成都索贝数码科技股份有限公司 Similarity-based module branch reduction method
CN114154589B (en) * 2021-12-13 2023-09-29 成都索贝数码科技股份有限公司 Module branch reduction method based on similarity
CN114003511A (en) * 2021-12-24 2022-02-01 支付宝(杭州)信息技术有限公司 Evaluation method and device for model interpretation tool
CN114003511B (en) * 2021-12-24 2022-04-15 支付宝(杭州)信息技术有限公司 Evaluation method and device for model interpretation tool
CN116091871A (en) * 2023-03-07 2023-05-09 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Physical countermeasure sample generation method and device for target detection model
CN116091871B (en) * 2023-03-07 2023-08-25 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Physical countermeasure sample generation method and device for target detection model

Also Published As

Publication number Publication date
CN113609482B (en) 2023-10-17

Similar Documents

Publication Publication Date Title
CN113609482A (en) Back door detection and restoration method and system for image classification model
CN112597993B (en) Patch detection-based countermeasure model training method
Liu et al. Visualization of driving behavior using deep sparse autoencoder
CN113111349B (en) Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning
CN109086797A (en) A kind of accident detection method and system based on attention mechanism
CN112434599B (en) Pedestrian re-identification method based on random occlusion recovery of noise channel
CN110991568A (en) Target identification method, device, equipment and storage medium
CN112016499A (en) Traffic scene risk assessment method and system based on multi-branch convolutional neural network
WO2024051183A1 (en) Backdoor detection method based on decision shortcut search
CN113609784A (en) Traffic limit scene generation method, system, equipment and storage medium
CN111814644B (en) Video abnormal event detection method based on disturbance visual interpretation
CN114332829A (en) Driver fatigue detection method based on multiple strategies
CN113537284A (en) Deep learning implementation method and system based on mimicry mechanism
Parasnis et al. RoadScan: A Novel and Robust Transfer Learning Framework for Autonomous Pothole Detection in Roads
CN116071797B (en) Sparse face comparison countermeasure sample generation method based on self-encoder
CN115098855A (en) Trigger sample detection method based on custom back door behavior
CN113283520B (en) Feature enhancement-based depth model privacy protection method and device for membership inference attack
CN113807541B (en) Fairness repair method, system, equipment and storage medium for decision system
CN110796237B (en) Method and device for detecting attack resistance of deep neural network
CN108647592A (en) Group abnormality event detecting method and system based on full convolutional neural networks
CN118587561B (en) Action recognition migration attack method based on self-adaptive gradient time sequence characteristic pruning
Chen et al. A Defense Method against Backdoor Attacks in Neural Networks Using an Image Repair Technique
CN114639007B (en) Fire detection model training method and detection method based on improved DETR
Chen et al. Investigating the Backdoor on DNNs Based on Recolorization and Reconstruction: From A Multi-Channel Perspective
Chen et al. Functional safety of deep learning techniques in autonomous driving systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant