CN114821227A

CN114821227A - Deep neural network confrontation sample scoring method

Info

Publication number: CN114821227A
Application number: CN202210378464.0A
Authority: CN
Inventors: 陈龙; 艾锐; 欧阳柳
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2022-07-29
Anticipated expiration: 2042-04-12
Also published as: CN114821227B

Abstract

The invention discloses a method for grading a countersample of a deep neural network, and provides a novel method for evaluating the countersample attack effect in a black box mode. The method specifically comprises the steps of calculating the mobility, the imperceptibility, the attack success rate and the label offset of the confrontation sample, determining a membership grade subset table, determining evaluation weights A of all aspects by utilizing an analytic hierarchy process, and obtaining a confrontation sample score index by fuzzy comprehensive evaluation matrix. The output of the AES index is a score that measures the effectiveness of the challenge sample attack and can be used to assess the harmfulness of the challenge sample to the deep neural network.

Description

Deep neural network confrontation sample scoring method

Technical Field

The invention relates to the field of deep neural networks, in particular to a method for scoring an antagonistic sample of a deep neural network.

Background

Increasing government and business organizations worldwide are increasingly recognizing the economic and strategic importance of artificial intelligence. Deep neural networks are one of the core research fields of artificial intelligence. The application of deep learning has spread to various branches of artificial intelligence, such as expert systems, cognitive simulation, planning and problem solving, data mining, network information services, image recognition, fault diagnosis, natural language understanding, robots, gaming, and the like. The deep neural network technology has penetrated into various fields of daily life of people and is gradually integrated into the national infrastructure construction, so the safety of the deep neural network model is related to the civil safety and the national safety.

Deep neural network technology makes a major breakthrough in solving complex tasks, however, deep neural network technology (especially artificial neural network and data-driven artificial intelligence) is very vulnerable to attack on resisting samples during training or testing, and the samples can easily subvert the original output of machine learning technology. For example, for the image classification deep neural network model, the confrontation sample can be generated by adding some disturbance in the given image, and these confrontation images can not be seen from the human eyes, but can be wrongly classified by the known good-performance deep neural network model. Therefore, it is necessary to evaluate the countermeasure effect of the countermeasure sample, the model performance of the deep neural network model, the defense capability, and the like, and find the potential safety hazard that the countermeasure sample may cause to the deep neural network model. And recommending a defense strategy for improving the safety of the model according to the evaluation result of the countermeasure sample, thereby improving the safety of the deep neural network model.

Existing work requires evaluating the effect of an attack on a target neural network by challenge samples in a white-box manner, depending on whether a given neural network can correctly classify the challenge samples. This method is unstable and highly random. In many confidentiality scenarios, evaluation becomes impractical because it is difficult for an evaluator to master the internal structure of a deep learning model.

Therefore, a new method for evaluating the effectiveness against sample attacks is needed. At present, no systematic and visual index is available for reflecting the attack effect of the countermeasure sample on the deep neural network, and no standard system is available for remotely evaluating the harmfulness of the countermeasure sample in a black box manner. Therefore, the invention provides a deep neural network confrontation sample scoring method for evaluating and quantifying the attack effect of the confrontation sample.

Disclosure of Invention

In order to overcome the problems existing in the prior art, the invention provides a deep neural network confrontation sample scoring method. The method comprises an antagonistic sample mobility calculation module, an antagonistic sample imperceptibility calculation module, an antagonistic sample attack success rate calculation module, an antagonistic sample label offset degree calculation module and an antagonistic sample score calculation module. The system comprises an antagonistic sample mobility calculation module, an antagonistic sample imperceptibility calculation module, an antagonistic sample attack success rate calculation module and an antagonistic sample label offset degree calculation module, wherein the antagonistic sample mobility calculation module, the antagonistic sample imperceptibility calculation module, the antagonistic sample attack success rate calculation module and the antagonistic sample label offset degree calculation module respectively calculate the mobility, the imperceptibility, the attack success rate and the label offset degree of an antagonistic sample, and the antagonistic sample score calculation module calculates the total damage capability score of the final antagonistic sample so as to evaluate and quantify the vulnerability of the deep neural network and the hazard of the antagonistic sample.

In order to achieve the purpose, the invention adopts the technical scheme that: a deep neural network confrontation sample scoring method comprises the following steps:

calculating the migratability, imperceptibility, attack success rate and label offset of a confrontation sample, wherein the confrontation sample is an image confrontation sample and/or a text confrontation sample.

And step two, determining a membership grade subset table.

And step three, determining each evaluation weight A by using an analytic hierarchy process.

And step four, fuzzy comprehensive evaluation matrix to obtain a confrontation sample scoring index.

The invention has the following advantages and beneficial effects:

the invention provides an anti-sample scoring AES (advanced samples score) index for evaluating the effect of resisting sample attacks on an image and text deep learning network. The advantages are as follows:

the AES index provides an evaluation score against the sample attack effect. In the aspect of computer vision, the application scenes of the image countermeasure samples include image classification, face recognition, image semantic segmentation, target detection, automatic driving and the like, and in the aspect of natural language processing, the application scenes of the text countermeasure samples include text classification, machine translation, text summarization and the like. Because the AES index is designed by integrating different factors (e.g., a sample, a countermeasure generation algorithm, a deep neural network model) and aiming at the characteristics of an image sample and a text sample, the AES index can be used for universally evaluating the harmfulness of the image type countermeasure sample and the text type countermeasure sample to the deep neural network, and can also be used as other indexes, such as a reference index for evaluating and measuring the quality of a certain type of sample of a target model and measuring the vulnerability of the model.

First, the AES index may be used to quantify the quality of the challenge samples generated by different challenge sample generation algorithms and the effect of attacks on the neural network. By means of the AES index, after the characteristics and the attack effects of different confrontation sample generation algorithms are obtained, a practitioner can select the most appropriate and efficient confrontation sample generation algorithm according to the actual conditions of the neural network model and the samples in the attack and defense scene. For example, under application scenarios such as image classification, face recognition, machine translation and the like, by means of the AES index, a practitioner can attack and test the neural network model better, and a defense strategy for improving the model security can be recommended according to an evaluation result of an anti-sample, so that the security of the deep neural network model is improved.

Second, the AES index may be used as a reference for the quality of the selected training sample. For a target neural network, given the current training samples, if the model can correctly classify the original training samples but not the challenge samples, this may indicate that the model needs to train further more or better quality training samples to make the model robust enough.

Finally, the AES index may be used to measure and evaluate the security and vulnerability of the model. Traditionally, deep learning researchers and practitioners have focused primarily on the performance of deep neural network models, ignoring security and vulnerabilities. In the fields of image recognition, target detection, automatic driving, text classification and the like, a large number of deep neural network models exist, but a safety evaluation scheme for the models is lacked, and by means of AES (advanced encryption Standard) indexes, the models can be tested and the safety of the models can be measured at the same time when various deep neural network models are tried. This will enable the practitioner to determine the best deep neural network model to use, and may even improve the model by new vulnerability questions.

Drawings

FIG. 1 is a diagram illustrating an example of generating a confrontation sample for a deep learning model according to the present invention;

FIG. 2 is a flow chart of an anti-sample migratability algorithm according to the present invention;

FIG. 3 is a flow chart of an algorithm for calculating LO index according to the present invention;

FIG. 4 is a flowchart of AES calculation for confrontation sample score according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

referring to fig. 1-4, the embodiment of the present invention includes a confrontation sample mobility calculation module, a confrontation sample imperceptibility calculation module, a confrontation sample attack success rate calculation module, a confrontation sample label migration degree calculation module, and a confrontation sample score calculation module. The system comprises an antagonistic sample mobility calculation module, an antagonistic sample imperceptibility calculation module, an antagonistic sample attack success rate calculation module and an antagonistic sample label offset degree calculation module, wherein the antagonistic sample mobility calculation module, the antagonistic sample imperceptibility calculation module, the antagonistic sample attack success rate calculation module and the antagonistic sample label offset degree calculation module respectively calculate the mobility, the imperceptibility, the attack success rate and the label offset degree of an antagonistic sample, and the antagonistic sample score calculation module calculates the total damage capability score of the final antagonistic sample so as to evaluate and quantify the vulnerability of the deep neural network and the hazard of the antagonistic sample.

1. Computing migratability

The migratability as shown in fig. 2 represents the ability of the challenge sample generated by one method to maintain a certain resistance under different deep learning models, and represents the applicable scope of the challenge sample. The existence of certain mobility of the confrontation sample is mainly due to the fact that the deep learning model classifier has the following characteristics and is called a discriminant model. When using discriminant models to solve the classification problem, the goal is to better classify the data. Thus, the model will maximize the distance between the samples and the decision boundaries and extend the space of each class. The advantage is that the classification is made easier, but the disadvantage is that each region has redundant spaces that do not belong to this class, in which there are competing samples. Migratability is that the counterdisturbance computed by one model can migrate to another independently trained model. Since any two models are likely to learn similar non-robust features, perturbations that manipulate such features can be applied to both. The calculation process of the anti-sample mobility is as follows:

step 1: m _N Is a group of neural network models for evaluation, and generates an algorithm a to a target neural network model M based on a confrontation sample to be evaluated ₁ Generating a confrontation sample a _c (ii) a E.g. M ₁ As a BidLSTM model, M ₂ As a Fastext model, M ₃ For the Bert model, the confrontation sample generation algorithm a is a WordHandling algorithm, and the WordHandling algorithm is used for generating the confrontation sample a _c 。

Step 2: retraining a target neural network model M ₁ Using challenge specimen a _c Testing the same to obtain the identification accuracy rate AR ₁ ；

And step 3: training neural network model M _i (i-2, 3, … N), challenge sample a was used _c Testing it to obtain AR _i Up to i>N, N represents the number of neural network models tested, and in this embodiment, N is 3;

and 4, step 4: calculating the mobility Tf of the confrontation sample by the formula

2. Calculating imperceptibility

Referring to fig. 1, the countermeasure sample performs some subtle perturbations on the original sample, which are difficult for a person to perceive visually, so that deep learning model classification errors can be caused with high confidence, and if the countermeasure sample is generated, the person can perceive the perturbations in the sample, so that the attack of the countermeasure sample can be avoided, so that the imperceptibility of the countermeasure sample can also represent the attack capability of the countermeasure sample, and the imperceptibility of the countermeasure sample means that one sample is difficult to perceive through human senses only and is a countertrained sample, so that some pretend attacks can be performed. The attack approach to combat the attack is to intentionally add some imperceptible subtle perturbation to the input samples, causing the model to give an erroneous output with high confidence. The imperceptibility of the challenge sample is also an important measure of the challenge sample.

In terms of image samples, the p-norm is most often used to measure the magnitude and number of perturbations added to an image, given the difficulty in defining a metric that measures human visual ability. p norm L _p Calculating the distance | | x-x' | of the input space between the clean image x and the generated countermeasure sample x | | _p Where p ∈ {0,1,2, ∞ }, the specific distance calculation formula is as follows (where p-norm denotes manhattan distance when p ═ 1; and euclidean distance when p ═ 2):

in the aspect of text samples, the method adopts the score of language model confusion to evaluate the fluency degree of sentences so as to judge the disturbance size and semantic authenticity of the sentences. The basic idea of perplexity is: the language model with higher probability value is better to be given to the sentences in the test set, when the language model is trained, the sentences in the test set are all normal sentences, and then the trained model is that the higher the probability on the test set is, the better the probability is, and the calculation formula is as follows:

wherein, w _i Representing a sequence of words w ₁ ,w ₂ ,…,w _i-1 N denotes the total number of words, p (w) _i |w ₁ ,w ₂ ,…,w _i-1 ) The language model can predict the probability distribution of the possible occurrence of the ith word given the first i-1 words of a sentence, the higher the probability of the sentence is, the better the language model is, the lower the confusion is, and the higher the imperceptibility of the text against the sample is.

3. Calculating the success rate of attack

Attack success rate refers to the percentage of samples that make the target model misclassified after an attack. After the counter attack, the model may output erroneous results. If the attack effect is good, the classification accuracy of the target model after being attacked is greatly reduced. Attack success rate is an important aspect for measuring the attack effect. For the directional attack, the calculation formula of the attack success rate is as follows:

wherein A represents a confrontation sample generation algorithm, f represents a classification algorithm of the target model,

is the target type of the targeted attack. For non-directional attack, only the classification result and the original sample y need to be calculated _i In different cases, the formula is as follows:

4. calculating label offset

The label offset degree refers to the offset degree of the model for classifying the confrontation sample into the correct label, and specifically, is the difference between the probability that the model classifies the confrontation sample into the correct label and the probability that the original sample is classified correctly, that is, the confidence offset value of the correct label. Recall that the output layer of the deep learning model determines the final class based on the probability assigned to each class. Thus, the predictive probability information for each class provides the basis for how the model classifies the challenge samples. For a robust deep learning model, it should assign the maximum probability to the correct class. For a given raw sample and its confrontational sample, the model may generate two different probabilities for each class. The difference between the probability of predicting the countersample to the correct class and the probability of predicting the original sample to the correct class reflects their distance from the prediction. The probability that the original sample is predicted to be the correct class is certainly the largest in all class spaces, and the probability that the model predicts that the countersample is the correct class may be reduced to different degrees depending on the countersample attack effect, but the final classification result is correct if the final classification result is also the maximum probability. The more the predicted outcome of the challenge sample deviates from the correct category, the more destructive the challenge sample. As shown in fig. 3, the detailed process of calculating the label offset is as follows:

step 1: inputting a target neural network model M and an original sample set x _c And a confrontation sample generation algorithm a.

Step 2: computing a target neural network model M versus the original samples

Prediction category of

Model M pairs of original samples

Predicted per-class probability set

Model M pairs of original samples

Predict the class result as

Probability of (2)

Wherein

If it is not

And returning to the step 2, and calculating the next sample.

And step 3: generating raw samples according to a challenge sample generation algorithm a

Against sample

Calculating model M pairs of anti samples

Predicted per-class probability set

Model M pairs of original samples

Predict the class result as

Probability of (2)

And 4, step 4: calculate the pair of antibody samples in model M

The prediction category is

Is offset distance ofThe degree of the magnetic field is measured,

let i equal i +1 until i>n。

And 5: the LO index is calculated and,

the input of the algorithm is a target neural network model M and an original sample image set x _c And a confrontation sample generation algorithm a. The output is the degree of label bias against the swatch.

5. Calculating confrontation sample score (AES index)

As shown in fig. 4, the AES index is calculated by fuzzy synthesis evaluation, and is intended to provide a measure of the ability of a given challenge sample to damage the target deep learning model. The steps for calculating the AES index are as follows:

step 1: and determining a membership subset table of the mobility, imperceptibility, attack success rate and label offset of the confrontation sample, and further determining a membership matrix.

Step 2: and (3) constructing a pair comparison matrix, and determining the mobility, imperceptibility, attack success rate, label offset weight and maximum characteristic root of the confrontation sample.

And step 3: and (5) carrying out consistency check.

And 4, step 4: by the formula

And calculating to obtain an evaluation result matrix, wherein A is the weight of the four indexes, and R is a membership matrix, and then performing defuzzification on the matrix to obtain the AES index.

The subset table with membership respectively for the mobility, imperceptibility, attack success rate and label offset of the confrontation sample is as follows:

TABLE 1 migratability membership subsets Table

TABLE 2 text confrontation sample imperceptible membership subset Table

TABLE 3 image confrontation sample imperceptible membership subset Table

Table 4 attack success rate membership subset table

TABLE 5 tag offset membership subset Table

The following pairwise comparison matrix is created, the weight vectors are calculated and a consistency check is made. The method for calculating the weight of each index is as follows:

normalizing each column of the judgment matrix

Averaging canonical columns

Then

The feature vector is obtained.

Calculating the maximum of the decision matrixCharacteristic root λ _max ：

Wherein W _i Represents the normalized ith feature vector (BW) _i Representing the ith element of the vector BW.

The four indexes of the invention are weighted to obtain the following results:

TABLE 6 index weight calculation

Calculating to obtain lambda according to the formula _max When the consistency index CI is 0.01598, the random consistency index RI is 0.90, so CR 0.01796<0.1. And the consistency requirement is met.

By utilizing the membership grade subset table of the non-migratability, the perceptibility and the sample construction cost, which is constructed by the invention, the following membership grade matrix of each index can be obtained. Element r in the matrix _ij When i is 1, the membership vector of the migratability is expressed, when + is 2, the membership vector of the imperceptibility is expressed, when + is 3, the membership vector of the attack success rate is expressed, and when i is 4, the membership vector of the label offset is expressed.

The weight of the four indexes of transferability, imperceptibility, attack success rate and label offset is A ═ A ₁ ,A ₂ ,A ₃ ,A ₄ ) The fuzzy comprehensive evaluation formula is as follows:

wherein A is the weight of four indexes of migratability, imperceptibility, attack success rate and label offset, R is a membership matrix obtained according to the index calculation result, and B is a finally obtained evaluation result matrix of the utilizability. Because only one fuzzy vector is obtained from the calculation result, the harmfulness of the resisting sample cannot be visually seen, the membership vector needs to be subjected to defuzzification processing to obtain a final AES index to score the resisting sample.

Handle with b _j For weight, set v of evaluations _j Taking the value obtained by weighted average as the judgment result to obtain the result b after defuzzification ^* As shown in the following equation, m represents the number of elements of the evaluation result matrix B:

if the index b is judged _j Normalized, i.e. as in the following formula:

evaluation set v of the invention _j (1,2,3,4), so the formula for the final calculation of AES index is as follows:

the above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure in any way whatsoever. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A deep neural network confrontation sample scoring method is characterized by comprising the following steps:

calculating the mobility, imperceptibility, attack success rate and label offset of a confrontation sample, wherein the confrontation sample is an image confrontation sample and/or a character confrontation sample;

step two, determining a membership grade subset table;

determining evaluation weights A of all aspects by using an analytic hierarchy process;

2. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: the step of calculating the migratability of the challenge sample comprises:

step 1: m _N Is a group of neural network models for evaluation, and generates an algorithm a to a target neural network model M based on a confrontation sample to be evaluated ₁ Generating a confrontation sample a _c ；

And step 3: training neural network model M _i (i ═ 2,3,. N), challenge sample a was used _c Testing it to obtain AR _i Until i > N, N representing the number of neural network models tested;

3. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: the calculating the imperceptibility comprises calculating the imperceptibility of the image confrontation sample and calculating the imperceptibility of the text confrontation sample;

the imperceptibility of the computed image against the sample is: p norm L _p Calculating a clean image xDistance | | | x-x '| luminance of input space between countermeasures sample x' with the generated image _p Wherein p ∈ {0,1,2, ∞ }, and the specific distance calculation formula is as follows:

the imperceptibility of the computational text confrontation sample is: and judging the disturbance size and semantic authenticity of the statement by adopting the score of the language model confusion degree, wherein the smaller the confusion degree is, the higher the imperceptibility of the text countersample is, and the calculation formula of the confusion degree PP (w) of the text countersample is as follows:

wherein, w _i Representing a sequence of words w ₁ ，w ₂ ，...，w _i-1 N denotes the total number of words, p (w) _i |w ₁ ，w ₂ ，...，w _i-1 ) The language model can predict the probability distribution of the possible occurrence of the ith word in the first i-1 words of a given sentence, and the higher the probability of the sentence is, the better the language model is and the lower the confusion degree is.

4. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: the calculating attack success rate comprises:

for the directional attack, the calculation formula of the attack success rate is as follows:

is the target type of the directional attack, N represents the number of samples, x _i Is the ith original sample, a (x) _i ) Representing a sample x _i The challenge samples generated under algorithm a,

is shown in

A reduced value of model identification accuracy in the case of (1);

for non-directional attack, only the classification result and the original sample y need to be calculated _i In different cases, the formula is as follows:

where N denotes the number of samples, x _i Is the ith original sample, a (x) _i ) Represents a sample x _i The confrontation samples generated under algorithm a, f denotes the classification algorithm of the target model, I (f (a (x)) _i ))≠y _i ) Is represented by f (A (x) _i ))≠y _i The model identifies a reduced value of accuracy in the case of (1).

5. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: the calculating of the label offset specifically comprises the following steps:

step 1: inputting a target neural network model M and an original sample set x _c A confrontational sample generation algorithm a;

step 2: computing a target neural network model M versus the original samples

Prediction category of (2)

Where n represents the number of samples and the model M is for the original samples

Predicted per-class probability set

Model M pairs of original samples

Predict the class result as

Probability of (2)

Wherein

If it is not

Returning to the step 2, calculating the next sample;

Against sample

Calculating model M pairs of anti samples

Predicted per-class probability set

Model M pairs of original samples

Predict the class result as

Probability of (2)

And 4, step 4: calculate the pair of antibody samples in model M

The prediction category is

The degree of the offset of (a) is,

making i ═ i +1 until i > n;

and 5: calculating the label offset LO of the confrontation sample by the formula

6. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: and the membership degree subset table is used for obtaining a membership degree matrix R of each index by utilizing the membership degree subset table of the immigability, the perceptibility and the sample construction cost, wherein an element R in the matrix _ij A membership vector representing mobility when i is 1, a membership vector representing imperceptibility when i is 2, a membership vector representing attack success rate when i is 3, a membership vector representing tag offset when i is 4,

7. the method for scoring the confrontation sample of the deep neural network as claimed in any one of claims 1 to 6, wherein: the evaluation weight A comprises the weight of migratability, imperceptibility, attack success rate and label offset, wherein A is (A ═ A ₁ ，A ₂ ，A ₃ ，A ₄ )。

8. The method of claim 7, wherein the deep neural network confrontation sample scoring method comprises: the fourth step specifically comprises the following fuzzy comprehensive evaluation formula:

wherein A is the weight of four indexes of migratability, imperceptibility, attack success rate and label offset, R is a membership matrix obtained according to the index calculation result, and B is a finally obtained utilized evaluation result matrix; because only one fuzzy vector is obtained from the calculation result, the harmfulness of the antagonistic sample cannot be visually seen, the membership vector needs to be defuzzified to obtain a final AES index to score the antagonistic sample, and a formula for finally calculating the AES index is as follows:

b _j is a weight, v _j Rated as a price set.