CN109086884B

CN109086884B - Neural network attack defense method based on gradient reverse countermeasure sample restoration

Info

Publication number: CN109086884B
Application number: CN201810781467.2A
Authority: CN
Inventors: 易平; 胡嘉尚; 张�浩; 倪洁; 何芷珊
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-07-17
Filing date: 2018-07-17
Publication date: 2020-09-01
Anticipated expiration: 2038-07-17
Also published as: CN109086884A

Abstract

A neural network attack defense method based on gradient reverse countermeasure sample restoration realizes optimization of a sample set by detecting countermeasure samples from the sample set and then restoring the samples into common samples in an attack mode.

Description

Neural network attack defense method based on gradient reverse countermeasure sample restoration

Technical Field

The invention relates to a technology in the field of artificial intelligence countermeasure engineering, in particular to a method for recovering a countermeasure sample into a non-countermeasure sample through a gradient reverse countermeasure sample.

Background

Artificial intelligence is being widely applied to various fields in life, and in the process of the technology developing increasingly, the safety problem is exposed increasingly, and the safety problem is extremely serious for an artificial intelligence classifier, and an attacker can cause the classifier to make classification errors by adding a carefully constructed disturbance to a sample. Many studies then hope to resist attack against the sample by training a robust enough model, but it is always difficult to achieve satisfactory results. Many recent studies hope to detect the challenge sample by the characteristics of the challenge sample, but the mere detection of the challenge sample still cannot improve the accuracy of the training.

Disclosure of Invention

The invention provides a neural network attack defense method based on gradient reverse countermeasure sample restoration, which aims at the problem of how to process countermeasure samples, can treat the samples as normal samples by adding disturbance to the countermeasure samples and enabling the samples to cross a decision boundary to be restored into normal samples, and also improves the multiplexing degree of a system.

The invention is realized by the following technical scheme:

the invention relates to a neural network attack defense method based on gradient reverse countermeasure sample restoration, which detects countermeasure samples from a sample set and then restores the samples into common samples in an attack mode, thereby realizing the optimization of the sample set.

The confrontation samples in the sample set are generated by, but not limited to, an FGSM algorithm, a C & W algorithm, a Deepfol algorithm and a JSMA algorithm.

The attack mode comprises the following steps: the method comprises a fast gradient descent algorithm (FGSM), an optimization-based confrontation sample distance calculation method (C & W), a confusion deep learning method (DeepFool), and a Jacobian matrix-based greedy matching algorithm (JSMA), wherein the confrontation sample is recovered by preferably adopting the confusion deep learning method.

The recovery comprises the following specific steps:

step 1, calculating the minimum disturbance distance as the shortest distance between the current input point and a segmentation plane, deducing a disturbance generation method under the condition of classification of a two-classification model, and repeating the two-classification model to the most classified model;

and 2, increasing the minimum specification through iterative calculation to resist disturbance, and gradually pushing the image positioned in the classification boundary out of the boundary until the error classification occurs, namely recovering the image to be a normal sample.

The iterative computation specifically includes: initialization x₀＝x_advI is 0, wherein x_advIs a challenge sample; when argmax (f (x)_i))＝argmax(f(x₀) Time cycle meterCalculating:

x_i+1＝x_i+r_ii ═ i +1, up to f (x)_i) And f (x)₀) Up to an odd sign, x obtained_iI.e. the recovery sample.

Technical effects

Compared with the prior art, the method adds disturbance to the confrontation sample in a reverse attack mode to restore the confrontation sample into a normal sample, solves the problem of treatment of the confrontation sample, makes up for the defects of the function of the detector, and correctly classifies 90.2 percent of the confrontation sample in the experiment by the classifier after the disturbance is added, so that the method can effectively improve the robustness of the neural network classifier.

Drawings

FIG. 1 is a schematic diagram of an embodiment;

FIG. 2 is a diagram illustrating an embodiment obfuscating deep learning method.

Detailed Description

As shown in FIG. 1, the present embodiment selects the landmark identification Belgium TS data set, which may be Belgium TSC _ Testing (76.5MBytes) in the http:// btsd. ethz. ch/shared data/download, "Belgium TS for classified images" part.

The detector in this embodiment detects the challenge samples using a LID-based detection method. The LID (local intrinsic dimension) characterizes the dimensional properties of the space around the sample. Experiments have shown that the LID values of challenge samples are significantly higher than those of normal samples, i.e. the challenge space has a higher intrinsic dimension than the normal sample space. The LID value increases during the transition from the normal sample to the challenge sample. The LID-based antagonistic sample detection method has good detection performance, and the detection accuracy on BelgiumTS is about 95.2%.

The embodiment specifically comprises the following steps:

70% of Belgium TS data set is directly put into a sample set (because normal samples are most in practical conditions), the remaining 30% of Belgium TS data set is divided into four parts, each part is 7.5%, each part generates countermeasure samples through an FGSM algorithm, a C & W algorithm, a Deepfol algorithm and a JSMA algorithm respectively, and then the countermeasure samples are added into the sample set.

Secondly, inputting the samples in the sample set into a detector for detection, directly entering a classifier for normal classification after confirming that the samples are not the countermeasure samples, and increasing the minimum standard countermeasure disturbance through iterative calculation when the samples are detected as the countermeasure samples so as to enable the samples to be restored to the normal samples and then enter the classifier.

The classifier adopts but is not limited to a neural network model with a five-layer structure.

The iterative computation specifically includes: initialization x₀＝x_advI is 0, wherein x_advIs a challenge sample; when argmax (f (x)_i))＝argmax(f(x₀) Time-loop calculation:

In the implementation environment, the FGSM and the BIM are used for repairing the samples, the repairing success rate of the method in the attack samples generated by different attack methods (namely the repairing success rate is changed into the proportion of normal samples to total attack samples after the method is implemented) is tested, the row in the table represents which repairing method is used, and the column represents the test which is carried out in which attack sample set. The experimental data for this example are given in both MNIST and CIFAR data sets.

The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A neural network attack defense method based on gradient reverse countermeasure sample restoration is characterized in that a countermeasure sample is detected from a sample set and then restored into a common sample in an attack mode, so that the optimization of the sample set is realized;

the confrontation samples in the sample set are generated by adopting an FGSM algorithm, a C & W algorithm, a Deepfol algorithm and a JSMA algorithm;

the attack mode comprises the following steps: a fast gradient descent algorithm, an optimization-based confrontation sample distance calculation method, a confusion deep learning method and a greedy matching algorithm based on a Jacobian matrix;

the recovery comprises the following specific steps:

step 1, calculating a minimum disturbance distance, wherein the minimum disturbance distance is the shortest distance from a current input point to a segmentation plane, deducing a disturbance generation method under the condition of classification of a two-classification model, and expanding to multi-classification from the two-classification model;

step 2, adding minimum specifications to resist disturbance through iterative calculation, gradually pushing the image positioned in the classification boundary out of the boundary until error classification occurs, and then recovering the image to be a normal sample;