CN114821227B

CN114821227B - Deep neural network countermeasures sample scoring method

Info

Publication number: CN114821227B
Application number: CN202210378464.0A
Authority: CN
Inventors: 陈龙; 艾锐; 欧阳柳
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2024-03-22
Anticipated expiration: 2042-04-12
Also published as: CN114821227A

Abstract

The invention discloses a deep neural network countersample scoring method, and provides a novel method for evaluating countersample attack effects in a black box mode, wherein the countersample attack effects are evaluated and quantified by adopting a fuzzy comprehensive evaluation method and an index named countersample scoring (Adversarial Examples Score, AES). The method specifically comprises the steps of calculating mobility, imperceptibility, attack success rate and label offset of a countermeasure sample, determining a membership subset table, determining evaluation weight A in each aspect by using a hierarchical analysis method, and blurring a comprehensive evaluation matrix to obtain a score index of the countermeasure sample. The output of the AES index is a score that measures the effect of combating a sample attack, which can be used to evaluate the hazard of combating a sample to a deep neural network.

Description

Deep neural network countermeasures sample scoring method

Technical Field

The invention relates to the field of deep neural networks, in particular to a deep neural network countermeasures sample scoring method.

Background

Deep neural network technology has made a major breakthrough in solving complex tasks, however, deep neural network technology (especially artificial neural networks and data driven artificial intelligence) is very vulnerable to attack against samples during training or testing, which can easily subvert the original output of machine learning technology. For example, for an image classification deep neural network model, an contrast sample may be generated by adding some disturbance to a given image, which does not see the difference from the original image from the human eye, but is misclassified by a known good-performing deep neural network model, which exhibits a strong vulnerability to a resistance attack as the resistance machine learning technique is getting more advanced and complicated and the update speed is extremely fast. Therefore, it is necessary to evaluate the countermeasure effect of the challenge sample, the model performance of the deep neural network model, the defensive ability, and the like, and discover potential safety hazards of the challenge sample to the deep neural network model. And recommending a defense strategy for improving the safety of the model according to the evaluation result of the countermeasure sample, thereby improving the safety of the deep neural network model.

The existing work needs to evaluate the attack effect of the challenge sample on the target neural network in a white box manner according to whether the given neural network can correctly classify the challenge sample. The method is unstable and has high randomness. In many confidentiality scenarios, the assessment becomes impractical because it is difficult for an evaluator to grasp the internal structure of the deep learning model.

Thus, a new method of evaluating the effect against a sample attack is needed. Currently, there is no systematic, intuitive index to reflect the effect of the challenge sample on the deep neural network, nor is there a standard system to remotely evaluate the jeopardy of the challenge sample in a black box manner. Therefore, the invention provides a deep neural network challenge sample scoring method for evaluating and quantifying the challenge effect of a challenge sample.

Disclosure of Invention

In order to overcome the problems of the prior art, the invention provides a deep neural network countersample scoring method. The invention comprises a challenge sample mobility calculating module, a challenge sample imperceptibility calculating module, a challenge sample attack success rate calculating module, a challenge sample label offset calculating module and a challenge sample scoring calculating module. The challenge sample score calculation module calculates the total destructive power score of the final challenge sample to evaluate and quantify the vulnerability of the deep neural network and the harmfulness of the challenge sample.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: a deep neural network challenge sample scoring method comprising the steps of:

step one, calculating the mobility, imperceptibility, attack success rate and label offset degree of the challenge sample, wherein the challenge sample is an image challenge sample and/or a text challenge sample.

And step two, determining a membership degree subset table.

And thirdly, determining the evaluation weight A of each aspect by using an analytic hierarchy process.

And step four, blurring the comprehensive evaluation matrix to obtain a score index of the countermeasure sample.

The invention has the advantages and beneficial effects as follows:

the invention proposes an challenge sample score AES (Adversarial Examples Score) index to evaluate the effect of challenge samples on image and text deep learning networks. The advantages are as follows:

the AES index provides an impact against sample evaluation score. In terms of computer vision, application scenes of the image countermeasure samples exist in image classification, face recognition, image semantic segmentation, target detection, automatic driving and the like, and in terms of natural language processing, application scenes of the text countermeasure samples include text classification, machine translation, text abstracts and the like. Because the AES index is designed by integrating different factors (such as a sample, an countermeasure generation algorithm, a deep neural network model) and aiming at the characteristics of the image sample and the text sample, the AES index can be used for evaluating the harmfulness of the image type countermeasure sample and the text type countermeasure sample to the deep neural network in a universal way, and can be used as other indexes, such as a reference index for evaluating and measuring the quality of a certain type of sample of a target model and measuring the vulnerability of the model.

First, the AES index may be used to quantify the quality of the antagonism samples generated by different antagonism sample generation algorithms and the effect of attacks on the neural network. By means of the AES index, after the characteristics and attack effects of different countermeasure sample generation algorithms are obtained, a practitioner can select the most appropriate and most efficient countermeasure sample generation algorithm according to the actual conditions of the neural network model and the samples in the attack and defense scene. For example, under the application scenes of image classification, face recognition, machine translation and the like, a practitioner can attack and test the neural network model better by means of the AES index, and can recommend a defending strategy for improving the safety of the model according to the evaluation result of the countermeasure sample, so that the safety of the deep neural network model is improved.

Second, the AES index may be used as a reference for the quality of the selected training samples. For a target neural network, given the current training sample, if the model can correctly classify the original training sample but cannot correctly classify the challenge sample, this may indicate that the model needs to be further trained on more or better quality training samples to make the model sufficiently robust.

Finally, the AES index may be used to measure and evaluate the security and vulnerability of the model. Traditionally, deep learning researchers and practitioners have focused mainly on the performance of deep neural network models, ignoring security and vulnerabilities. In the fields of image recognition, target detection, automatic driving, text classification and the like, a large number of deep neural network models exist, but a safety evaluation scheme for the models is lacking, and by means of an AES index, the models can be tested and the safety of the models can be measured at the same time. This will enable practitioners to determine the best deep neural network model to use, even by improving the model with new vulnerability issues.

Drawings

FIG. 1 is a diagram illustrating generation of a deep learning model challenge sample in accordance with the present invention;

FIG. 2 is a flowchart of an anti-sample migration algorithm according to the present invention;

FIG. 3 is a flowchart of an algorithm for calculating the LO index according to the present invention;

fig. 4 is a flowchart of the challenge sample scoring AES calculation of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

referring to fig. 1-4, in an embodiment of the present invention, the system includes a challenge sample mobility calculation module, a challenge sample unaware calculation module, a challenge sample attack success rate calculation module, a challenge sample label offset calculation module, and a challenge sample score calculation module. The challenge sample score calculation module calculates the total destructive power score of the final challenge sample to evaluate and quantify the vulnerability of the deep neural network and the harmfulness of the challenge sample.

1. Computing migratability

The mobility as shown in fig. 2 represents the ability of the challenge sample generated by one method to maintain a certain challenge under different deep learning models, which represents the applicability of the challenge sample. Some mobility exists against the sample mainly due to the deep learning model classifier having the following features, called discriminant model. When using a discriminant model to solve the classification problem, the goal is to better classify the data. Thus, the model will maximize the distance between the sample and the decision boundary and expand the space of each class. The advantage is that classification is made easier, but the disadvantage is that each region has redundant spaces that do not belong to this class, in which there are antagonistic samples. Mobility is the ability of an opposing disturbance calculated by one model to migrate to another independently trained model. Since it is possible for any two models to learn similar non-robust features, perturbations that manipulate such features can be applied to both. The calculation process of the challenge sample mobility is as follows:

step 1: m is M _N Is a group of neural network models for evaluation, and the target neural network model M is generated based on an countermeasure sample generation algorithm a to be evaluated ₁ Generating challenge sample a _c The method comprises the steps of carrying out a first treatment on the surface of the For example M ₁ Is BidLSTM model, M ₂ Is Fasttext model, M ₃ For the Bert model, the challenge sample generation algorithm a is a WordHandling algorithm, and the WordHandling algorithm is used for generating the challenge sample a _c 。

Step 2: retraining a target neural network model M ₁ Using challenge sample a _c Testing the device to obtain the identification accuracy AR ₁ ；

Step 3: training neural network model M _i (i=2, 3,..n.) using challenge sample a _c Testing it to obtain AR _i Until i > N, N represents the number of test neural network models, n=3 in this embodiment;

step 4: calculating the mobility Tf of the countermeasure sample, wherein the calculation formula is as follows

2. Calculating imperceptibility

Referring to fig. 1, some fine disturbance of the challenge sample to the original sample, which is difficult for human eyes to perceive, can cause a deep learning model classification error with high confidence, if the disturbance in the sample can be perceived by human after the challenge sample is generated, the attack of the challenge sample can be avoided, so the imperceptibility of the challenge sample can also represent the attack capability of the challenge sample, namely that the challenge sample is difficult to perceive only through human senses, and thus the challenge attack can be camouflaged. An attack against an attack is to deliberately add some imperceptible fine disturbance to the input sample, resulting in the model giving a false output with high confidence. Imperceptibility of the challenge sample is also an important indicator of the challenge sample.

In terms of image samples, the p-norm is most commonly used to measure the magnitude and number of disturbances added to an image, considering that it is difficult to define a metric that measures human visual ability. p-norm L _p Calculating a distance x-x 'i of an input space between the clean image x and the generated challenge sample x' _p Where p ε {0,1,2, +. specific distance calculation formulaAs shown below (p-norm represents manhattan distance when p=1; euclidean distance when p=2):

in the aspect of text samples, the invention adopts the score of language model confusion (perplexity) to evaluate the fluency of sentences, so as to judge the disturbance size and the semantic authenticity of the sentences. The basic idea of confusion is: the language model giving the sentence of the test set with the higher probability value is better, and after the language model is trained, the sentence of the test set is a normal sentence, so that the trained model is better when the probability on the test set is higher, and the calculation formula is as follows:

wherein w is _i Representing word sequence w ₁ ,w ₂ ,…,w _i-1 The i-th word in (a), N represents the total number of words, p (w _i |w ₁ ,w ₂ ,…,w _i-1 ) Representing the first i-1 words in a given sentence, the language model can predict the probability distribution that the ith word may appear, the greater the sentence probability, the better the language model, the less confusion, and the higher the imperceptibility of the text against the sample.

3. Calculating attack success rate

The attack success rate refers to the percentage of samples that misclassify the target model after attack. The model may output erroneous results after challenge. If the attack effect is good, the classification accuracy of the target model after being attacked is greatly reduced. The attack success rate is an important aspect for measuring the attack effect. For directional attacks, the calculation formula of the attack success rate is as follows:

wherein A represents an challenge sample generation algorithm, f represents a classification algorithm of the target model,is the target type of the directed attack. For non-directional attack, only the classification result and the original sample y need to be calculated _i In different cases, the formula is as follows:

4. calculating a label offset

The label offset refers to the offset of the model classifying the challenge sample as a correct label, specifically, the difference between the probability of classifying the challenge sample as a correct label and the probability of classifying the original sample as correct, that is, the confidence offset value of the correct label. Recall that the output layer of the deep learning model determines the final class based on the probabilities assigned to each class. Thus, the predictive probability information for each category provides a basis for how the model classifies the challenge samples. For a robust deep learning model, it should assign the highest probability to the correct class. For a given raw sample and its challenge sample, the model may generate two different probabilities for each category. The difference between the probability of predicting the challenge sample as the correct class and the probability of predicting the original sample as the correct class reflects their distance from the predicted result. The probability that the original sample predicts as the correct class must be the largest in all class space, and the probability that the model predicts that the challenge sample is the correct class may be reduced to varying degrees depending on the effectiveness of the challenge sample, but if it is still the most probable then the final classification result is still correct. The more the predicted outcome of the challenge sample deviates from the correct category, the more destructive the challenge sample is. As shown in fig. 3, the detailed procedure for calculating the tag offset is as follows:

step 1: inputting a target neural network model M and an original sample set x _c Countermeasure againstThe sample generation algorithm a.

Step 2: calculating the original sample of the target neural network model M pairPrediction category +.>Model M +.>Predicted probability set for each class +.>Model M +.>The predicted category result is->Probability of->Wherein->If->Returning to the step 2, and performing calculation of the next sample.

Step 3: generating raw samples according to the challenge sample generation algorithm aIs->Calculating the model M versus the challenge sample>Predicted probability set for each class +.>Model M +.>The predicted category result is->Probability of (2)

Step 4: calculate the countermeasures in model MThe prediction category is +.>Degree of deviation of-> Let i=i+1 until i > n.

Step 5: the LO-index is calculated and,

the input of the algorithm is a target neural network model M and an original sample image set x _c Challenge sample generation algorithm a. The output is the label offset against the samples.

5. Calculating challenge sample score (AES index)

As shown in fig. 4, the AES index is calculated by a fuzzy comprehensive evaluation method, and the AES index is intended to provide a measure for evaluating the ability of a given challenge sample to destroy the target deep learning model. The steps for calculating the AES index are as follows:

step 1: and determining a membership degree subset table of the mobility, imperceptibility, attack success rate and label offset of the challenge sample, and further determining a membership degree matrix.

Step 2: and constructing a pair comparison matrix, and determining the weight and the maximum characteristic root of the migration performance, the imperceptibility, the attack success rate and the label offset of the challenge sample.

Step 3: and (5) performing consistency test.

Step 4: by the formulaAnd calculating to obtain an evaluation result matrix, wherein A is the weight of four indexes, R is a membership matrix, and then performing defuzzification to obtain an AES index.

The membership subset tables for the mobility, imperceptibility, attack success rate and label offset of the challenge sample are as follows:

table 1 mobility membership subset table

Table 2 text challenge sample imperceptible membership subset table

Table 3 image challenge sample imperceptible membership subset table

Table 4 attack success rate membership subset table

Table 5 tag offset membership subset table

A pair-wise comparison matrix is created, weight vectors are calculated and consistency checks are made. The calculation method of each index weight is as follows:

normalizing each column of the judgment matrix

Averaging the normalized columns

ThenI.e. the feature vector that is sought.

Calculating the maximum characteristic root of the judgment matrixWherein W is _i Representing normalized ith feature vector, (BW) _i Representing the i-th element of the vector BW.

The four indexes of the invention are weighted to obtain the following results:

TABLE 6 index weight calculation

Lambda is calculated according to the above formula _max =4.048, thus finding the consistency index ci= 0.01598, looking up the random consistency index value ri=0.90, so cr= 0.01796<0.1. Meeting the consistency requirement.

The membership degree subset table of the non-migration, perceptibility and sample construction cost constructed by the invention can be utilized to obtain the following membership degree matrix of each index. Element r in the matrix _ij A membership vector representing mobility when i=1, a membership vector representing imperceptibility when i=2, a membership vector representing attack success rate when i=3, and a membership vector representing label offset when i=4.

The weights of the mobility, imperceptibility, attack success rate and label offset degree of the four indexes are A= (A) ₁ ,A ₂ ,A ₃ ,A ₄ ) The fuzzy comprehensive evaluation formula is as follows:

wherein A is the weight of four indexes of mobility, imperceptibility, attack success rate and label offset, R is a membership matrix obtained according to index calculation results, and B is a final evaluation result matrix of availability. Since the calculation result is only a fuzzy vector, the hazard of the countermeasure sample cannot be intuitively seen, and the membership vector is required to be subjected to anti-fuzzification processing to obtain a final AES index to score the countermeasure sample.

Handle b _j For the evaluation set v _j Taking the value obtained by weighted average as a judgment result to obtain an anti-fuzzified result b ^* The following formula is shown, where m represents the number of elements of the evaluation result matrix B:

if judge index b _j Normalized, i.e. as follows：

Evaluation set v of the invention _j = (1, 2,3, 4), the formula for the final calculation of AES index is as follows:

the above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims

1. A method for scoring a challenge sample by a deep neural network, comprising the steps of:

step one, calculating the mobility, imperceptibility, attack success rate and label offset of a countermeasure sample, wherein the countermeasure sample is an image countermeasure sample and/or a text countermeasure sample;

the step of calculating the mobility of the challenge sample comprises:

step 1: m is M _N Is a group of neural network models for evaluation, and the target neural network model M is generated based on an countermeasure sample generation algorithm a to be evaluated ₁ Generating challenge sample a _c ；

Step 3: training neural network model M _i I=2, 3,..n, using challenge sample a _c Enter itPerforming row test to obtain AR _i Until i > N, N represents the number of test neural network models;

The calculating imperceptibility comprises calculating imperceptibility of an image challenge sample and calculating imperceptibility of a text challenge sample;

the imperceptibility of the computed image against the sample is: p-norm L _p Calculating a distance x-x 'of an input space between the clean image x and the generated image challenge sample x' | _p Where p ε {0,1,2, +. the specific distance calculation formula is as follows:

the imperceptibility of the calculated text challenge sample is: judging the disturbance size and the authenticity of the semantics by adopting the score of the confusion degree of the language model, wherein the smaller the confusion degree is, the higher the imperceptibility of the text countermeasure sample is, and the calculation formula of the confusion degree PP (w) of the text countermeasure sample is as follows:

wherein w is _i Representing word sequence w ₁ ，w ₂ ，...，w _i-1 The i-th word in (a), N represents the total number of words, p (w _i |w ₁ ，w ₂ ，...，w _i-1 ) Representing i-1 words before a given sentence, the language model can predict the probability distribution that the i-th word possibly appears, and the better the sentence probability, the lower the confusion degree is;

the calculating attack success rate comprises the following steps:

for directional attack, the calculation formula of the attack success rate is as follows:

where a denotes an challenge sample generation algorithm, f denotes a classification algorithm of the target model,is the target type of the directional attack, N represents the number of samples, x _i Is the i-th original sample, a (x _i ) Representing sample x _i The challenge samples generated under algorithm a,is indicated at->A decrease in the accuracy of model identification;

for non-directional attack, only the classification result and the original sample y need to be calculated _i In different cases, the formula is as follows:

wherein N represents the number of samples, x _i Is the i-th original sample, a (x _i ) Representing sample x _i The challenge samples generated under algorithm a, f represents the classification algorithm of the target model, I (f (a (x _i ))≠y _i ) Is represented by f (a (x) _i ))≠y _i A decrease in the accuracy of model identification;

the step of calculating the label offset specifically comprises the following steps:

step 1: inputting a target neural network model M and an original sample set x _c An challenge sample generation algorithm a;

step 2: calculating the original sample of the target neural network model M pairPrediction category +.>Where i=1, 2,..n, n represents the number of samples, model M +.>Predicted probability set for each class +.>Model M +.>The predicted category result is->Probability of->Wherein->If-> Returning to the step 2, and calculating the next sample;

step 3: generating raw samples according to the challenge sample generation algorithm aIs->Calculating the model M versus the challenge sample>Predicted probability set for each class +.>Model M +.>The predicted category result is->Probability of->

Step 4: calculate the countermeasures in model MThe prediction category is +.>Degree of deviation of-> Let i=i+1 until i > n;

step 5: calculating the label offset LO of the countermeasure sample, wherein the calculation formula is as follows

Step two, determining a membership degree subset table, and obtaining each membership degree subset table by utilizing the mobility, imperceptibility and sample construction costA membership matrix R of the index, wherein the elements R in the matrix _ij A membership vector representing mobility when i=1, a membership vector representing imperceptibility when i=2, a membership vector representing attack success rate when i=3, a membership vector representing label offset when i=4,

step three, determining evaluation weights A of all aspects by using an analytic hierarchy process; the evaluation weight A comprises weights of mobility, imperceptibility, attack success rate and label offset, and A= { A ₁ ，A ₂ ，A ₃ ，A ₄ )；

Step four, fuzzy comprehensive evaluation matrix is adopted to obtain score index of the countermeasure sample; the fuzzy comprehensive evaluation formula is as follows:

wherein A is the weight of four indexes of mobility, imperceptibility, attack success rate and label offset, R is a membership matrix obtained according to index calculation results, and B is a final evaluation result matrix of the obtained utilizability; since the calculation result is only a fuzzy vector, the harmfulness of the countermeasure sample cannot be intuitively seen, and the membership vector is required to be subjected to anti-fuzzification processing to obtain a final AES index to score the countermeasure sample, and the final formula for calculating the AES index is as follows:

b _j is weight, v _j Is an evaluation set.