CN109165671A

CN109165671A - Confrontation sample testing method based on sample to decision boundary distance

Info

Publication number: CN109165671A
Application number: CN201810768347.9A
Authority: CN
Inventors: 易平; 胡嘉尚; 张�浩; 倪洁; 何芷珊; 胡又佳
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-07-13
Filing date: 2018-07-13
Publication date: 2019-01-08

Abstract

A kind of confrontation sample testing method based on sample to decision boundary distance, it is generated according to common sample to resisting sample, and feature extraction is carried out to all samples, calculate each sample to decision boundary range estimation, then classifier is trained using range estimation as the feature of sample, classifier after training is detector, for detecting to resisting sample.The present invention can be widely applied for the machine learning model based on classifier, such as speech recognition, image classification multiple fields, improve confrontation pattern detection rate.For artificial intelligence API, input sample can be filtered, the safety of artificial intelligence is obviously improved.

Description

Confrontation sample testing method based on sample to decision boundary distance

Technical field

It is specifically a kind of based on sample to decision boundary the present invention relates to a kind of technology in artificial intelligence confrontation field The confrontation sample testing method of distance.

Background technique

Artificial intelligence develops rapidly in recent years, is also applicable in more and more fields.But research finds artificial intelligence There are more serious security breaches in classifier, malicious attacker can by carrying out small sample perturbations to normal identification sample, It becomes to resisting sample, classifier identification error can be made to resisting sample, can be supported to a certain extent using antagonistic training It is anti-that resisting sample is attacked, but effect is always unsatisfactory, then, Many researchers are wished by some of resisting sample Inherent characteristic is detected to resisting sample, to resist confrontation sexual assault.

Summary of the invention

The present invention, which is directed to, attacks resisting sample, proposes a kind of confrontation pattern detection side based on sample to decision boundary distance Method, distance using sample to decision boundary as sample feature, whether to be to resisting sample as tag along sort, training one Classifier, using this classifier as confrontation sample detector.The present invention, to attack resistance, can be widely applied for for artificial intelligence Machine learning model based on classifier, such as speech recognition, image classification multiple fields improve confrontation pattern detection rate.With In artificial intelligence API, input sample can be filtered, the safety of artificial intelligence is obviously improved.

The present invention is achieved by the following technical solutions:

The present invention is generated according to common sample to resisting sample, and to carry out feature extraction to all samples, that is, is calculated each Then sample instructs a classifier using range estimation as the feature of sample to the range estimation of decision boundary Practice, the classifier after training is detector, for detecting to resisting sample.

It is described to resisting sample, equal proportion is mixed to get after being generated by a variety of pairs of resisting sample generating modes, generation side Method include the Fast Field descent algorithm (iter-FGSM) of iteration, based on optimization to resisting sample distance calculating method (C&W), Confuse deep learning method (DeepFool), the greedy matching algorithm (JSMA) based on Jacobin matrix.

The feature extraction preferably first carries out invalid sample rejecting to all samples, and wherein invalid sample includes normal Be classified in sample mistake sample, and can not deceive classifier (i.e. not across decision boundary) to resisting sample.

The classifier is specifically included by neural fusion: full articulamentum and Dropout layers.

The range estimation, by apart from upper bound dist_UWith apart from lower bound dist_LIts range is limited, by adjusting the distance The distance of sample to decision boundary so that narrowed down to a more accurate range, i.e. [dist by the estimation of bound_L, dist_U]。

It is described apart from the upper bound, obtained using the distance calculating method based on attack；It is described apart from lower bound, use intersection Lipschitz bounding method.

The distance calculating method based on attack specifically: use sample the Fast Field descent method of iteration (iter-FGSM) it calculates to resisting sample, then using generation to needed for resisting sample, disturbance is as the estimation apart from the upper bound, specifically Are as follows: Wherein:It is the sample that the i-th wheel FGSM is calculated,It is positive Normal sample,ForLoss function, ε is the disturbance constant of each round FGSM,It produces when after k takes turns FGSM to resisting sample, is then exactly disturbing for k wheel apart from the upper bound The sum of moving vector:

It is preferably 1 that constant ε is disturbed in the present invention.

The intersection Lipschitz bounding method specifically:Wherein: dist_LI.e. Sample point is to decision boundary apart from lower bound, f_j(x₀) it is j-th component of the sample by output vector after model, f_c(x₀) Subscript c is x₀Affiliated classification,It is locally Lipschitz function constant, when with g (x₀)=f_c(x₀)-f_j(x₀), then

Wherein: B_p(x₀, R) and it is l_pWith x under normal form₀For the centre of sphere, radius is the sphere of R, p and q Relationship be Specific calculation are as follows: in B_p(x₀, R) and enough x are randomly selected in ball, by reversely passing (back propagation) is passed to calculate at each x | | ▽ g (x) | |_q, then it is maximized.

Radius is preferably 5 in the present invention, and sampling number is preferably 500.

Technical effect

The present invention is characterized by the distance of sample to decision boundary, and to identify to resisting sample, effect is obvious, discrimination compared with It is high.In the calculating apart from the upper bound, using iter-FGSM as attack pattern, can find sample to decision boundary shortest path Diameter, it is more accurate to measure.In the calculating apart from lower bound, using intersection Lipschitz bounding method, and part has been used Lipschitz constant, rather than overall situation Lipschitz constant can make Lipschitz normal by sampling enough points Number measurement result is accurate enough.The detector finally trained has reached preferable detection effect, and Detection accuracy is higher than existing The intrinsic dimension method (LID) in part.

Detailed description of the invention

Fig. 1 is embodiment flow diagram；

Fig. 2 is the neural network structure figure of detector；

Fig. 3 is embodiment detection effect contrast schematic diagram.

Specific embodiment

As shown in Figure 1, the present embodiment uses BelgiumTS data set, method protects landmark identification through this embodiment API is protected it from and is attacked resisting sample.

The present embodiment specifically includes:

Step 1: being generated to resisting sample: using the training sample set of API as normal sample, with a part of normal sample one's duty Not Tong Guo iter-FGSM, C&W, DeepFool, tetra- kinds of attack patterns of JSMA generate to resisting sample (equal proportion mixing).

Step 2: invalid sample is rejected: rejecting invalid sample respectively to normal sample and to resisting sample: invalid sample packet Include: 1. itself is normal sample, but identifies mistake by API, these samples are closer away from decision boundary, therefore is rejected；2. this is as right Resisting sample, but identified correctly by API, this kind of attack resisting sample fails, and can not threaten API.

Step 3: being calculated apart from bound: calculating first apart from bound: the calculating apart from the upper bound: sampling this x₀With Iter-FGSM generate to resisting sample, then will to the perturbation vector of k iteration in resisting sample generating process superposition, obtain away from From the upper bound, then obtained by following steps apart from lower bound:

3.1) this x is sampled₀If it is correctly classified as c, to each classification j in addition to c, find outWhereinIt needs by with x₀For the B of the centre of sphere_p(x₀, R) and in ball, 500 points are randomly selected, it finds so that g (x)=f_c(x)-f_j(x) Gradient l_pNormal form | | ▽ g (x) | |_qMaximum x；

3.2) find out | | ▽ g (x) | |_q, asThus it finds outThen find so thatIt is minimum J, then correspond toFor apart from lower bound.

Step 4: the training of detector: the network structure of detector is as shown in Fig. 2, specifically include: three full articulamentums, Centre folder two layers Dropout layers；In training process, the size of each trained batch is 64.

As shown in figure 3, carrying out Contrast on effect with other existing confrontation sample testing methods: the intrinsic dimension in part after training Counting method (LID): 93.5%

Cuclear density method (Kernel Density): 90.7%

K- average distance (k-mean distance): 86.0%

Confrontation sample testing method based on sample to decision boundary distance: 95.2%

Identical experiment is carried out on other data sets, comparative experiments effect: MNIST:

The intrinsic dimension methodology (LID) in part: 96.8%

Cuclear density method (Kernel Density): 95.7%

K- average distance (k-mean distance): 93.0%

Confrontation sample testing method based on sample to decision boundary distance: 98.4%

CIFAR

The intrinsic dimension methodology (LID) in part: 91.1%

Cuclear density method (Kernel Density): 83.5%

K- average distance (k-mean distance): 80.7%

Confrontation sample testing method based on sample to decision boundary distance: 94.3%

It can be seen that test effect of the invention is all higher than existing several detection methods on these sample sets, inspection The histogram for surveying effect is as shown in Figure 3.

Above-mentioned specific implementation can by those skilled in the art under the premise of without departing substantially from the principle of the invention and objective with difference Mode carry out local directed complete set to it, protection scope of the present invention is subject to claims and not by above-mentioned specific implementation institute Limit, each implementation within its scope is by the constraint of the present invention.

Claims

1. a kind of confrontation sample testing method based on sample to decision boundary distance, which is characterized in that raw according to common sample Pairs of resisting sample, and will to all samples carry out feature extraction, that is, calculate each sample to decision boundary range estimation, so Classifier is trained using range estimation as the feature of sample afterwards, the classifier after training is used to detect confrontation sample This.

2. according to the method described in claim 1, it is characterized in that, it is described to resisting sample, pass through a variety of pairs of resisting sample generation sides Equal proportion is mixed to get after formula generates, and generation method includes the Fast Field descent algorithm of iteration, the confrontation sample based on optimization This distance calculating method, fascination deep learning method, the greedy matching algorithm based on Jacobin matrix.

3. according to the method described in claim 1, it is characterized in that, the feature extraction first carries out invalid sample to all samples This rejecting, wherein invalid sample include be classified in normal sample mistake sample, and can not deceive classifier (i.e. not across Decision boundary) to resisting sample.

4. according to the method described in claim 1, it is characterized in that, the classifier is specifically included by neural fusion: Full articulamentum and Dropout layers.

5. according to the method described in claim 1, it is characterized in that, the range estimation, by apart from upper bound dist_UWith away from From lower bound dist_LIts range is limited, the distance of sample to decision boundary is narrowed down to by the estimation for bound of adjusting the distance One more accurate range, i.e. [dist_L, dist_U], wherein obtained apart from the upper bound using the distance calculating method based on attack, It is calculated apart from lower bound using Lipschitz bounding method is intersected.

6. according to the method described in claim 5, it is characterized in that, the distance calculating method based on attack specifically: right Sample is calculated using the Fast Field descent method (iter-FGSM) of iteration to resisting sample, then with generation to needed for resisting sample It disturbs as the estimation apart from the upper bound, specifically:Wherein:It is The sample that i-th wheel FGSM is calculated,For normal sample,ForLoss function, ε is the disturbance of each round FGSM Constant,It produces when after k takes turns FGSM to resisting sample, is then exactly k wheel apart from the upper bound The sum of perturbation vector:

7. according to the method described in claim 5, it is characterized in that, the intersection Lipschitz bounding method specifically:Wherein: dist_LI.e. sample point is to decision boundary apart from lower bound, f_j(x₀) it is that sample passes through J-th of component of output vector, f after model_c(x₀) subscript c be classification belonging to x0,It is locally Lipschitz function constant, when With g (x₀)=f_c(x₀)-f_j(x₀), thenWherein: B_p(x₀, R) and it is l_pWith x under normal form₀For ball The heart, radius are the sphere of R, and the relationship of p and q are Specific calculation are as follows: in B_p(x₀, R) and it is taken out in ball at random Enough x are taken fully, are calculated at each x by back transfer (back propagation) | | ▽ g (x) | |_q, then take maximum Value.