CN109086884B - Neural network attack defense method based on gradient reverse countermeasure sample restoration - Google Patents

Neural network attack defense method based on gradient reverse countermeasure sample restoration Download PDF

Info

Publication number
CN109086884B
CN109086884B CN201810781467.2A CN201810781467A CN109086884B CN 109086884 B CN109086884 B CN 109086884B CN 201810781467 A CN201810781467 A CN 201810781467A CN 109086884 B CN109086884 B CN 109086884B
Authority
CN
China
Prior art keywords
sample
algorithm
classification
samples
countermeasure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810781467.2A
Other languages
Chinese (zh)
Other versions
CN109086884A (en
Inventor
易平
胡嘉尚
张�浩
倪洁
何芷珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201810781467.2A priority Critical patent/CN109086884B/en
Publication of CN109086884A publication Critical patent/CN109086884A/en
Application granted granted Critical
Publication of CN109086884B publication Critical patent/CN109086884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A neural network attack defense method based on gradient reverse countermeasure sample restoration realizes optimization of a sample set by detecting countermeasure samples from the sample set and then restoring the samples into common samples in an attack mode.

Description

Neural network attack defense method based on gradient reverse countermeasure sample restoration
Technical Field
The invention relates to a technology in the field of artificial intelligence countermeasure engineering, in particular to a method for recovering a countermeasure sample into a non-countermeasure sample through a gradient reverse countermeasure sample.
Background
Artificial intelligence is being widely applied to various fields in life, and in the process of the technology developing increasingly, the safety problem is exposed increasingly, and the safety problem is extremely serious for an artificial intelligence classifier, and an attacker can cause the classifier to make classification errors by adding a carefully constructed disturbance to a sample. Many studies then hope to resist attack against the sample by training a robust enough model, but it is always difficult to achieve satisfactory results. Many recent studies hope to detect the challenge sample by the characteristics of the challenge sample, but the mere detection of the challenge sample still cannot improve the accuracy of the training.
Disclosure of Invention
The invention provides a neural network attack defense method based on gradient reverse countermeasure sample restoration, which aims at the problem of how to process countermeasure samples, can treat the samples as normal samples by adding disturbance to the countermeasure samples and enabling the samples to cross a decision boundary to be restored into normal samples, and also improves the multiplexing degree of a system.
The invention is realized by the following technical scheme:
the invention relates to a neural network attack defense method based on gradient reverse countermeasure sample restoration, which detects countermeasure samples from a sample set and then restores the samples into common samples in an attack mode, thereby realizing the optimization of the sample set.
The confrontation samples in the sample set are generated by, but not limited to, an FGSM algorithm, a C & W algorithm, a Deepfol algorithm and a JSMA algorithm.
The attack mode comprises the following steps: the method comprises a fast gradient descent algorithm (FGSM), an optimization-based confrontation sample distance calculation method (C & W), a confusion deep learning method (DeepFool), and a Jacobian matrix-based greedy matching algorithm (JSMA), wherein the confrontation sample is recovered by preferably adopting the confusion deep learning method.
The recovery comprises the following specific steps:
step 1, calculating the minimum disturbance distance as the shortest distance between the current input point and a segmentation plane, deducing a disturbance generation method under the condition of classification of a two-classification model, and repeating the two-classification model to the most classified model;
and 2, increasing the minimum specification through iterative calculation to resist disturbance, and gradually pushing the image positioned in the classification boundary out of the boundary until the error classification occurs, namely recovering the image to be a normal sample.
The iterative computation specifically includes: initialization x0=xadvI is 0, wherein xadvIs a challenge sample; when argmax (f (x)i))=argmax(f(x0) Time cycle meterCalculating:
Figure GDA0002506140050000021
xi+1=xi+rii ═ i +1, up to f (x)i) And f (x)0) Up to an odd sign, x obtainediI.e. the recovery sample.
Technical effects
Compared with the prior art, the method adds disturbance to the confrontation sample in a reverse attack mode to restore the confrontation sample into a normal sample, solves the problem of treatment of the confrontation sample, makes up for the defects of the function of the detector, and correctly classifies 90.2 percent of the confrontation sample in the experiment by the classifier after the disturbance is added, so that the method can effectively improve the robustness of the neural network classifier.
Drawings
FIG. 1 is a schematic diagram of an embodiment;
FIG. 2 is a diagram illustrating an embodiment obfuscating deep learning method.
Detailed Description
As shown in FIG. 1, the present embodiment selects the landmark identification Belgium TS data set, which may be Belgium TSC _ Testing (76.5MBytes) in the http:// btsd. ethz. ch/shared data/download, "Belgium TS for classified images" part.
The detector in this embodiment detects the challenge samples using a LID-based detection method. The LID (local intrinsic dimension) characterizes the dimensional properties of the space around the sample. Experiments have shown that the LID values of challenge samples are significantly higher than those of normal samples, i.e. the challenge space has a higher intrinsic dimension than the normal sample space. The LID value increases during the transition from the normal sample to the challenge sample. The LID-based antagonistic sample detection method has good detection performance, and the detection accuracy on BelgiumTS is about 95.2%.
The embodiment specifically comprises the following steps:
70% of Belgium TS data set is directly put into a sample set (because normal samples are most in practical conditions), the remaining 30% of Belgium TS data set is divided into four parts, each part is 7.5%, each part generates countermeasure samples through an FGSM algorithm, a C & W algorithm, a Deepfol algorithm and a JSMA algorithm respectively, and then the countermeasure samples are added into the sample set.
Secondly, inputting the samples in the sample set into a detector for detection, directly entering a classifier for normal classification after confirming that the samples are not the countermeasure samples, and increasing the minimum standard countermeasure disturbance through iterative calculation when the samples are detected as the countermeasure samples so as to enable the samples to be restored to the normal samples and then enter the classifier.
The classifier adopts but is not limited to a neural network model with a five-layer structure.
The iterative computation specifically includes: initialization x0=xadvI is 0, wherein xadvIs a challenge sample; when argmax (f (x)i))=argmax(f(x0) Time-loop calculation:
Figure GDA0002506140050000022
xi+1=xi+rii ═ i +1, up to f (x)i) And f (x)0) Up to an odd sign, x obtainediI.e. the recovery sample.
In the implementation environment, the FGSM and the BIM are used for repairing the samples, the repairing success rate of the method in the attack samples generated by different attack methods (namely the repairing success rate is changed into the proportion of normal samples to total attack samples after the method is implemented) is tested, the row in the table represents which repairing method is used, and the column represents the test which is carried out in which attack sample set. The experimental data for this example are given in both MNIST and CIFAR data sets.
Figure GDA0002506140050000031
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (1)

1. A neural network attack defense method based on gradient reverse countermeasure sample restoration is characterized in that a countermeasure sample is detected from a sample set and then restored into a common sample in an attack mode, so that the optimization of the sample set is realized;
the confrontation samples in the sample set are generated by adopting an FGSM algorithm, a C & W algorithm, a Deepfol algorithm and a JSMA algorithm;
the attack mode comprises the following steps: a fast gradient descent algorithm, an optimization-based confrontation sample distance calculation method, a confusion deep learning method and a greedy matching algorithm based on a Jacobian matrix;
the recovery comprises the following specific steps:
step 1, calculating a minimum disturbance distance, wherein the minimum disturbance distance is the shortest distance from a current input point to a segmentation plane, deducing a disturbance generation method under the condition of classification of a two-classification model, and expanding to multi-classification from the two-classification model;
step 2, adding minimum specifications to resist disturbance through iterative calculation, gradually pushing the image positioned in the classification boundary out of the boundary until error classification occurs, and then recovering the image to be a normal sample;
the iterative computation specifically includes: initialization x0=xadvI is 0, wherein xadvIs a challenge sample; when argmax (f (x)i))=argmax(f(x0) Time-loop calculation:
Figure FDA0002534109790000011
xi+1=xi+rii ═ i +1, up to f (x)i) And f (x)0) Up to an odd sign, x obtainediI.e. the recovery sample.
CN201810781467.2A 2018-07-17 2018-07-17 Neural network attack defense method based on gradient reverse countermeasure sample restoration Active CN109086884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810781467.2A CN109086884B (en) 2018-07-17 2018-07-17 Neural network attack defense method based on gradient reverse countermeasure sample restoration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810781467.2A CN109086884B (en) 2018-07-17 2018-07-17 Neural network attack defense method based on gradient reverse countermeasure sample restoration

Publications (2)

Publication Number Publication Date
CN109086884A CN109086884A (en) 2018-12-25
CN109086884B true CN109086884B (en) 2020-09-01

Family

ID=64838063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810781467.2A Active CN109086884B (en) 2018-07-17 2018-07-17 Neural network attack defense method based on gradient reverse countermeasure sample restoration

Country Status (1)

Country Link
CN (1) CN109086884B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784411B (en) * 2019-01-23 2021-01-05 四川虹微技术有限公司 Defense method, device and system for confrontation sample and storage medium
CN111488898B (en) * 2019-01-28 2023-09-19 北京达佳互联信息技术有限公司 Countermeasure data acquisition method, device, equipment and storage medium
CN110768959B (en) * 2019-09-20 2021-12-21 浙江工业大学 Defense method based on signal boundary exploration attack
CN111209370A (en) * 2019-12-27 2020-05-29 同济大学 Text classification method based on neural network interpretability
CN114724014B (en) * 2022-06-06 2023-06-30 杭州海康威视数字技术股份有限公司 Deep learning-based method and device for detecting attack of countered sample and electronic equipment
CN114861893B (en) * 2022-07-07 2022-09-23 西南石油大学 Multi-channel aggregated countermeasure sample generation method, system and terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022273A (en) * 2016-05-24 2016-10-12 华东理工大学 Handwritten form identification system of BP neural network based on dynamic sample selection strategy
CN107463951A (en) * 2017-07-19 2017-12-12 清华大学 A kind of method and device for improving deep learning model robustness
CN108198179A (en) * 2018-01-03 2018-06-22 华南理工大学 A kind of CT medical image pulmonary nodule detection methods for generating confrontation network improvement
US10007866B2 (en) * 2016-04-28 2018-06-26 Microsoft Technology Licensing, Llc Neural network image classifier

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10007866B2 (en) * 2016-04-28 2018-06-26 Microsoft Technology Licensing, Llc Neural network image classifier
CN106022273A (en) * 2016-05-24 2016-10-12 华东理工大学 Handwritten form identification system of BP neural network based on dynamic sample selection strategy
CN107463951A (en) * 2017-07-19 2017-12-12 清华大学 A kind of method and device for improving deep learning model robustness
CN108198179A (en) * 2018-01-03 2018-06-22 华南理工大学 A kind of CT medical image pulmonary nodule detection methods for generating confrontation network improvement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
EVALUATING THE ROBUSTNESS OF NEURAL NETWORKS:AN EXTREME VALUE THEORY APPROACH;Tsui-WeiWeng et al.;《arXiv》;20180131;全文 *

Also Published As

Publication number Publication date
CN109086884A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
CN109086884B (en) Neural network attack defense method based on gradient reverse countermeasure sample restoration
Liu et al. Detection based defense against adversarial examples from the steganalysis point of view
Hosseini et al. Google's cloud vision api is not robust to noise
CN109543760B (en) Confrontation sample detection method based on image filter algorithm
CN109961444B (en) Image processing method and device and electronic equipment
CN110348475B (en) Confrontation sample enhancement method and model based on spatial transformation
CN113554089A (en) Image classification countermeasure sample defense method and system and data processing terminal
CN112396129B (en) Challenge sample detection method and universal challenge attack defense system
CN111753290B (en) Software type detection method and related equipment
CN108416343B (en) Face image recognition method and device
Lv et al. Chinese character CAPTCHA recognition based on convolution neural network
CN117134958A (en) Information processing method and system for network technology service
CN117152486A (en) Image countermeasure sample detection method based on interpretability
Kaur et al. Performance Evaluation of various thresholding methods using canny edge detector
US11349856B2 (en) Exploit kit detection
Kang et al. Identification of multiple image steganographic methods using hierarchical ResNets
CN111209567B (en) Method and device for judging perceptibility of improving robustness of detection model
CN115631333B (en) Countermeasure training method for improving robustness of target detection model and target detection method
CN113139187B (en) Method and device for generating and detecting pre-training language model
WO2021098801A1 (en) Data cleaning device, data cleaning method and face verification method
WO2024115580A1 (en) A method of assessing inputs fed to an ai model and a framework thereof
EP4328813A1 (en) Detection device, detection method, and detection program
Hanyu et al. Incremental Training of SVM-Based Human Detector
Chua et al. Using Adversarial Defences Against Image Classification CAPTCHA
Rangslang et al. Feature Space Perturbation for Transferable Adversarial Examples in Image Forensics Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant