CN114821227A - Deep neural network confrontation sample scoring method - Google Patents

Deep neural network confrontation sample scoring method Download PDF

Info

Publication number
CN114821227A
CN114821227A CN202210378464.0A CN202210378464A CN114821227A CN 114821227 A CN114821227 A CN 114821227A CN 202210378464 A CN202210378464 A CN 202210378464A CN 114821227 A CN114821227 A CN 114821227A
Authority
CN
China
Prior art keywords
sample
neural network
model
confrontation
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210378464.0A
Other languages
Chinese (zh)
Other versions
CN114821227B (en
Inventor
陈龙
艾锐
欧阳柳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210378464.0A priority Critical patent/CN114821227B/en
Publication of CN114821227A publication Critical patent/CN114821227A/en
Application granted granted Critical
Publication of CN114821227B publication Critical patent/CN114821227B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a method for grading a countersample of a deep neural network, and provides a novel method for evaluating the countersample attack effect in a black box mode. The method specifically comprises the steps of calculating the mobility, the imperceptibility, the attack success rate and the label offset of the confrontation sample, determining a membership grade subset table, determining evaluation weights A of all aspects by utilizing an analytic hierarchy process, and obtaining a confrontation sample score index by fuzzy comprehensive evaluation matrix. The output of the AES index is a score that measures the effectiveness of the challenge sample attack and can be used to assess the harmfulness of the challenge sample to the deep neural network.

Description

Deep neural network confrontation sample scoring method
Technical Field
The invention relates to the field of deep neural networks, in particular to a method for scoring an antagonistic sample of a deep neural network.
Background
Increasing government and business organizations worldwide are increasingly recognizing the economic and strategic importance of artificial intelligence. Deep neural networks are one of the core research fields of artificial intelligence. The application of deep learning has spread to various branches of artificial intelligence, such as expert systems, cognitive simulation, planning and problem solving, data mining, network information services, image recognition, fault diagnosis, natural language understanding, robots, gaming, and the like. The deep neural network technology has penetrated into various fields of daily life of people and is gradually integrated into the national infrastructure construction, so the safety of the deep neural network model is related to the civil safety and the national safety.
Deep neural network technology makes a major breakthrough in solving complex tasks, however, deep neural network technology (especially artificial neural network and data-driven artificial intelligence) is very vulnerable to attack on resisting samples during training or testing, and the samples can easily subvert the original output of machine learning technology. For example, for the image classification deep neural network model, the confrontation sample can be generated by adding some disturbance in the given image, and these confrontation images can not be seen from the human eyes, but can be wrongly classified by the known good-performance deep neural network model. Therefore, it is necessary to evaluate the countermeasure effect of the countermeasure sample, the model performance of the deep neural network model, the defense capability, and the like, and find the potential safety hazard that the countermeasure sample may cause to the deep neural network model. And recommending a defense strategy for improving the safety of the model according to the evaluation result of the countermeasure sample, thereby improving the safety of the deep neural network model.
Existing work requires evaluating the effect of an attack on a target neural network by challenge samples in a white-box manner, depending on whether a given neural network can correctly classify the challenge samples. This method is unstable and highly random. In many confidentiality scenarios, evaluation becomes impractical because it is difficult for an evaluator to master the internal structure of a deep learning model.
Therefore, a new method for evaluating the effectiveness against sample attacks is needed. At present, no systematic and visual index is available for reflecting the attack effect of the countermeasure sample on the deep neural network, and no standard system is available for remotely evaluating the harmfulness of the countermeasure sample in a black box manner. Therefore, the invention provides a deep neural network confrontation sample scoring method for evaluating and quantifying the attack effect of the confrontation sample.
Disclosure of Invention
In order to overcome the problems existing in the prior art, the invention provides a deep neural network confrontation sample scoring method. The method comprises an antagonistic sample mobility calculation module, an antagonistic sample imperceptibility calculation module, an antagonistic sample attack success rate calculation module, an antagonistic sample label offset degree calculation module and an antagonistic sample score calculation module. The system comprises an antagonistic sample mobility calculation module, an antagonistic sample imperceptibility calculation module, an antagonistic sample attack success rate calculation module and an antagonistic sample label offset degree calculation module, wherein the antagonistic sample mobility calculation module, the antagonistic sample imperceptibility calculation module, the antagonistic sample attack success rate calculation module and the antagonistic sample label offset degree calculation module respectively calculate the mobility, the imperceptibility, the attack success rate and the label offset degree of an antagonistic sample, and the antagonistic sample score calculation module calculates the total damage capability score of the final antagonistic sample so as to evaluate and quantify the vulnerability of the deep neural network and the hazard of the antagonistic sample.
In order to achieve the purpose, the invention adopts the technical scheme that: a deep neural network confrontation sample scoring method comprises the following steps:
calculating the migratability, imperceptibility, attack success rate and label offset of a confrontation sample, wherein the confrontation sample is an image confrontation sample and/or a text confrontation sample.
And step two, determining a membership grade subset table.
And step three, determining each evaluation weight A by using an analytic hierarchy process.
And step four, fuzzy comprehensive evaluation matrix to obtain a confrontation sample scoring index.
The invention has the following advantages and beneficial effects:
the invention provides an anti-sample scoring AES (advanced samples score) index for evaluating the effect of resisting sample attacks on an image and text deep learning network. The advantages are as follows:
the AES index provides an evaluation score against the sample attack effect. In the aspect of computer vision, the application scenes of the image countermeasure samples include image classification, face recognition, image semantic segmentation, target detection, automatic driving and the like, and in the aspect of natural language processing, the application scenes of the text countermeasure samples include text classification, machine translation, text summarization and the like. Because the AES index is designed by integrating different factors (e.g., a sample, a countermeasure generation algorithm, a deep neural network model) and aiming at the characteristics of an image sample and a text sample, the AES index can be used for universally evaluating the harmfulness of the image type countermeasure sample and the text type countermeasure sample to the deep neural network, and can also be used as other indexes, such as a reference index for evaluating and measuring the quality of a certain type of sample of a target model and measuring the vulnerability of the model.
First, the AES index may be used to quantify the quality of the challenge samples generated by different challenge sample generation algorithms and the effect of attacks on the neural network. By means of the AES index, after the characteristics and the attack effects of different confrontation sample generation algorithms are obtained, a practitioner can select the most appropriate and efficient confrontation sample generation algorithm according to the actual conditions of the neural network model and the samples in the attack and defense scene. For example, under application scenarios such as image classification, face recognition, machine translation and the like, by means of the AES index, a practitioner can attack and test the neural network model better, and a defense strategy for improving the model security can be recommended according to an evaluation result of an anti-sample, so that the security of the deep neural network model is improved.
Second, the AES index may be used as a reference for the quality of the selected training sample. For a target neural network, given the current training samples, if the model can correctly classify the original training samples but not the challenge samples, this may indicate that the model needs to train further more or better quality training samples to make the model robust enough.
Finally, the AES index may be used to measure and evaluate the security and vulnerability of the model. Traditionally, deep learning researchers and practitioners have focused primarily on the performance of deep neural network models, ignoring security and vulnerabilities. In the fields of image recognition, target detection, automatic driving, text classification and the like, a large number of deep neural network models exist, but a safety evaluation scheme for the models is lacked, and by means of AES (advanced encryption Standard) indexes, the models can be tested and the safety of the models can be measured at the same time when various deep neural network models are tried. This will enable the practitioner to determine the best deep neural network model to use, and may even improve the model by new vulnerability questions.
Drawings
FIG. 1 is a diagram illustrating an example of generating a confrontation sample for a deep learning model according to the present invention;
FIG. 2 is a flow chart of an anti-sample migratability algorithm according to the present invention;
FIG. 3 is a flow chart of an algorithm for calculating LO index according to the present invention;
FIG. 4 is a flowchart of AES calculation for confrontation sample score according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
referring to fig. 1-4, the embodiment of the present invention includes a confrontation sample mobility calculation module, a confrontation sample imperceptibility calculation module, a confrontation sample attack success rate calculation module, a confrontation sample label migration degree calculation module, and a confrontation sample score calculation module. The system comprises an antagonistic sample mobility calculation module, an antagonistic sample imperceptibility calculation module, an antagonistic sample attack success rate calculation module and an antagonistic sample label offset degree calculation module, wherein the antagonistic sample mobility calculation module, the antagonistic sample imperceptibility calculation module, the antagonistic sample attack success rate calculation module and the antagonistic sample label offset degree calculation module respectively calculate the mobility, the imperceptibility, the attack success rate and the label offset degree of an antagonistic sample, and the antagonistic sample score calculation module calculates the total damage capability score of the final antagonistic sample so as to evaluate and quantify the vulnerability of the deep neural network and the hazard of the antagonistic sample.
1. Computing migratability
The migratability as shown in fig. 2 represents the ability of the challenge sample generated by one method to maintain a certain resistance under different deep learning models, and represents the applicable scope of the challenge sample. The existence of certain mobility of the confrontation sample is mainly due to the fact that the deep learning model classifier has the following characteristics and is called a discriminant model. When using discriminant models to solve the classification problem, the goal is to better classify the data. Thus, the model will maximize the distance between the samples and the decision boundaries and extend the space of each class. The advantage is that the classification is made easier, but the disadvantage is that each region has redundant spaces that do not belong to this class, in which there are competing samples. Migratability is that the counterdisturbance computed by one model can migrate to another independently trained model. Since any two models are likely to learn similar non-robust features, perturbations that manipulate such features can be applied to both. The calculation process of the anti-sample mobility is as follows:
step 1: m N Is a group of neural network models for evaluation, and generates an algorithm a to a target neural network model M based on a confrontation sample to be evaluated 1 Generating a confrontation sample a c (ii) a E.g. M 1 As a BidLSTM model, M 2 As a Fastext model, M 3 For the Bert model, the confrontation sample generation algorithm a is a WordHandling algorithm, and the WordHandling algorithm is used for generating the confrontation sample a c
Step 2: retraining a target neural network model M 1 Using challenge specimen a c Testing the same to obtain the identification accuracy rate AR 1
And step 3: training neural network model M i (i-2, 3, … N), challenge sample a was used c Testing it to obtain AR i Up to i>N, N represents the number of neural network models tested, and in this embodiment, N is 3;
and 4, step 4: calculating the mobility Tf of the confrontation sample by the formula
Figure BDA0003591180800000031
2. Calculating imperceptibility
Referring to fig. 1, the countermeasure sample performs some subtle perturbations on the original sample, which are difficult for a person to perceive visually, so that deep learning model classification errors can be caused with high confidence, and if the countermeasure sample is generated, the person can perceive the perturbations in the sample, so that the attack of the countermeasure sample can be avoided, so that the imperceptibility of the countermeasure sample can also represent the attack capability of the countermeasure sample, and the imperceptibility of the countermeasure sample means that one sample is difficult to perceive through human senses only and is a countertrained sample, so that some pretend attacks can be performed. The attack approach to combat the attack is to intentionally add some imperceptible subtle perturbation to the input samples, causing the model to give an erroneous output with high confidence. The imperceptibility of the challenge sample is also an important measure of the challenge sample.
In terms of image samples, the p-norm is most often used to measure the magnitude and number of perturbations added to an image, given the difficulty in defining a metric that measures human visual ability. p norm L p Calculating the distance | | x-x' | of the input space between the clean image x and the generated countermeasure sample x | | p Where p ∈ {0,1,2, ∞ }, the specific distance calculation formula is as follows (where p-norm denotes manhattan distance when p ═ 1; and euclidean distance when p ═ 2):
Figure BDA0003591180800000041
in the aspect of text samples, the method adopts the score of language model confusion to evaluate the fluency degree of sentences so as to judge the disturbance size and semantic authenticity of the sentences. The basic idea of perplexity is: the language model with higher probability value is better to be given to the sentences in the test set, when the language model is trained, the sentences in the test set are all normal sentences, and then the trained model is that the higher the probability on the test set is, the better the probability is, and the calculation formula is as follows:
Figure BDA0003591180800000042
wherein, w i Representing a sequence of words w 1 ,w 2 ,…,w i-1 N denotes the total number of words, p (w) i |w 1 ,w 2 ,…,w i-1 ) The language model can predict the probability distribution of the possible occurrence of the ith word given the first i-1 words of a sentence, the higher the probability of the sentence is, the better the language model is, the lower the confusion is, and the higher the imperceptibility of the text against the sample is.
3. Calculating the success rate of attack
Attack success rate refers to the percentage of samples that make the target model misclassified after an attack. After the counter attack, the model may output erroneous results. If the attack effect is good, the classification accuracy of the target model after being attacked is greatly reduced. Attack success rate is an important aspect for measuring the attack effect. For the directional attack, the calculation formula of the attack success rate is as follows:
Figure BDA0003591180800000043
wherein A represents a confrontation sample generation algorithm, f represents a classification algorithm of the target model,
Figure BDA0003591180800000044
is the target type of the targeted attack. For non-directional attack, only the classification result and the original sample y need to be calculated i In different cases, the formula is as follows:
Figure BDA0003591180800000045
4. calculating label offset
The label offset degree refers to the offset degree of the model for classifying the confrontation sample into the correct label, and specifically, is the difference between the probability that the model classifies the confrontation sample into the correct label and the probability that the original sample is classified correctly, that is, the confidence offset value of the correct label. Recall that the output layer of the deep learning model determines the final class based on the probability assigned to each class. Thus, the predictive probability information for each class provides the basis for how the model classifies the challenge samples. For a robust deep learning model, it should assign the maximum probability to the correct class. For a given raw sample and its confrontational sample, the model may generate two different probabilities for each class. The difference between the probability of predicting the countersample to the correct class and the probability of predicting the original sample to the correct class reflects their distance from the prediction. The probability that the original sample is predicted to be the correct class is certainly the largest in all class spaces, and the probability that the model predicts that the countersample is the correct class may be reduced to different degrees depending on the countersample attack effect, but the final classification result is correct if the final classification result is also the maximum probability. The more the predicted outcome of the challenge sample deviates from the correct category, the more destructive the challenge sample. As shown in fig. 3, the detailed process of calculating the label offset is as follows:
step 1: inputting a target neural network model M and an original sample set x c And a confrontation sample generation algorithm a.
Step 2: computing a target neural network model M versus the original samples
Figure BDA0003591180800000051
Prediction category of
Figure BDA0003591180800000052
Model M pairs of original samples
Figure BDA0003591180800000053
Predicted per-class probability set
Figure BDA0003591180800000054
Model M pairs of original samples
Figure BDA0003591180800000055
Predict the class result as
Figure BDA0003591180800000056
Probability of (2)
Figure BDA0003591180800000057
Wherein
Figure BDA0003591180800000058
If it is not
Figure BDA0003591180800000059
And returning to the step 2, and calculating the next sample.
And step 3: generating raw samples according to a challenge sample generation algorithm a
Figure BDA00035911808000000510
Against sample
Figure BDA00035911808000000511
Calculating model M pairs of anti samples
Figure BDA00035911808000000512
Predicted per-class probability set
Figure BDA00035911808000000513
Model M pairs of original samples
Figure BDA00035911808000000514
Predict the class result as
Figure BDA00035911808000000515
Probability of (2)
Figure BDA00035911808000000516
And 4, step 4: calculate the pair of antibody samples in model M
Figure BDA00035911808000000517
The prediction category is
Figure BDA00035911808000000518
Is offset distance ofThe degree of the magnetic field is measured,
Figure BDA00035911808000000519
Figure BDA00035911808000000520
let i equal i +1 until i>n。
And 5: the LO index is calculated and,
Figure BDA00035911808000000521
the input of the algorithm is a target neural network model M and an original sample image set x c And a confrontation sample generation algorithm a. The output is the degree of label bias against the swatch.
5. Calculating confrontation sample score (AES index)
As shown in fig. 4, the AES index is calculated by fuzzy synthesis evaluation, and is intended to provide a measure of the ability of a given challenge sample to damage the target deep learning model. The steps for calculating the AES index are as follows:
step 1: and determining a membership subset table of the mobility, imperceptibility, attack success rate and label offset of the confrontation sample, and further determining a membership matrix.
Step 2: and (3) constructing a pair comparison matrix, and determining the mobility, imperceptibility, attack success rate, label offset weight and maximum characteristic root of the confrontation sample.
And step 3: and (5) carrying out consistency check.
And 4, step 4: by the formula
Figure BDA00035911808000000522
And calculating to obtain an evaluation result matrix, wherein A is the weight of the four indexes, and R is a membership matrix, and then performing defuzzification on the matrix to obtain the AES index.
The subset table with membership respectively for the mobility, imperceptibility, attack success rate and label offset of the confrontation sample is as follows:
TABLE 1 migratability membership subsets Table
Figure BDA0003591180800000061
TABLE 2 text confrontation sample imperceptible membership subset Table
Figure BDA0003591180800000062
TABLE 3 image confrontation sample imperceptible membership subset Table
Figure BDA0003591180800000063
Table 4 attack success rate membership subset table
Figure BDA0003591180800000064
TABLE 5 tag offset membership subset Table
Figure BDA0003591180800000065
The following pairwise comparison matrix is created, the weight vectors are calculated and a consistency check is made. The method for calculating the weight of each index is as follows:
normalizing each column of the judgment matrix
Figure BDA0003591180800000066
Averaging canonical columns
Figure BDA0003591180800000067
Then
Figure BDA0003591180800000068
The feature vector is obtained.
Calculating the maximum of the decision matrixCharacteristic root λ max
Figure BDA0003591180800000069
Wherein W i Represents the normalized ith feature vector (BW) i Representing the ith element of the vector BW.
The four indexes of the invention are weighted to obtain the following results:
TABLE 6 index weight calculation
Figure BDA0003591180800000071
Calculating to obtain lambda according to the formula max When the consistency index CI is 0.01598, the random consistency index RI is 0.90, so CR 0.01796<0.1. And the consistency requirement is met.
By utilizing the membership grade subset table of the non-migratability, the perceptibility and the sample construction cost, which is constructed by the invention, the following membership grade matrix of each index can be obtained. Element r in the matrix ij When i is 1, the membership vector of the migratability is expressed, when + is 2, the membership vector of the imperceptibility is expressed, when + is 3, the membership vector of the attack success rate is expressed, and when i is 4, the membership vector of the label offset is expressed.
Figure BDA0003591180800000072
The weight of the four indexes of transferability, imperceptibility, attack success rate and label offset is A ═ A 1 ,A 2 ,A 3 ,A 4 ) The fuzzy comprehensive evaluation formula is as follows:
Figure BDA0003591180800000073
wherein A is the weight of four indexes of migratability, imperceptibility, attack success rate and label offset, R is a membership matrix obtained according to the index calculation result, and B is a finally obtained evaluation result matrix of the utilizability. Because only one fuzzy vector is obtained from the calculation result, the harmfulness of the resisting sample cannot be visually seen, the membership vector needs to be subjected to defuzzification processing to obtain a final AES index to score the resisting sample.
Handle with b j For weight, set v of evaluations j Taking the value obtained by weighted average as the judgment result to obtain the result b after defuzzification * As shown in the following equation, m represents the number of elements of the evaluation result matrix B:
Figure BDA0003591180800000074
if the index b is judged j Normalized, i.e. as in the following formula:
Figure BDA0003591180800000075
Figure BDA0003591180800000076
evaluation set v of the invention j (1,2,3,4), so the formula for the final calculation of AES index is as follows:
Figure BDA0003591180800000081
the above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure in any way whatsoever. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (8)

1. A deep neural network confrontation sample scoring method is characterized by comprising the following steps:
calculating the mobility, imperceptibility, attack success rate and label offset of a confrontation sample, wherein the confrontation sample is an image confrontation sample and/or a character confrontation sample;
step two, determining a membership grade subset table;
determining evaluation weights A of all aspects by using an analytic hierarchy process;
and step four, fuzzy comprehensive evaluation matrix to obtain a confrontation sample scoring index.
2. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: the step of calculating the migratability of the challenge sample comprises:
step 1: m N Is a group of neural network models for evaluation, and generates an algorithm a to a target neural network model M based on a confrontation sample to be evaluated 1 Generating a confrontation sample a c
Step 2: retraining a target neural network model M 1 Using challenge specimen a c Testing the same to obtain the identification accuracy rate AR 1
And step 3: training neural network model M i (i ═ 2,3,. N), challenge sample a was used c Testing it to obtain AR i Until i > N, N representing the number of neural network models tested;
and 4, step 4: calculating the mobility Tf of the confrontation sample by the formula
Figure FDA0003591180790000011
3. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: the calculating the imperceptibility comprises calculating the imperceptibility of the image confrontation sample and calculating the imperceptibility of the text confrontation sample;
the imperceptibility of the computed image against the sample is: p norm L p Calculating a clean image xDistance | | | x-x '| luminance of input space between countermeasures sample x' with the generated image p Wherein p ∈ {0,1,2, ∞ }, and the specific distance calculation formula is as follows:
Figure FDA0003591180790000012
the imperceptibility of the computational text confrontation sample is: and judging the disturbance size and semantic authenticity of the statement by adopting the score of the language model confusion degree, wherein the smaller the confusion degree is, the higher the imperceptibility of the text countersample is, and the calculation formula of the confusion degree PP (w) of the text countersample is as follows:
Figure FDA0003591180790000013
wherein, w i Representing a sequence of words w 1 ,w 2 ,...,w i-1 N denotes the total number of words, p (w) i |w 1 ,w 2 ,...,w i-1 ) The language model can predict the probability distribution of the possible occurrence of the ith word in the first i-1 words of a given sentence, and the higher the probability of the sentence is, the better the language model is and the lower the confusion degree is.
4. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: the calculating attack success rate comprises:
for the directional attack, the calculation formula of the attack success rate is as follows:
Figure FDA0003591180790000014
wherein a represents a confrontation sample generation algorithm, f represents a classification algorithm of the target model,
Figure FDA0003591180790000021
is the target type of the directional attack, N represents the number of samples, x i Is the ith original sample, a (x) i ) Representing a sample x i The challenge samples generated under algorithm a,
Figure FDA0003591180790000022
is shown in
Figure FDA0003591180790000023
A reduced value of model identification accuracy in the case of (1);
for non-directional attack, only the classification result and the original sample y need to be calculated i In different cases, the formula is as follows:
Figure FDA0003591180790000024
where N denotes the number of samples, x i Is the ith original sample, a (x) i ) Represents a sample x i The confrontation samples generated under algorithm a, f denotes the classification algorithm of the target model, I (f (a (x)) i ))≠y i ) Is represented by f (A (x) i ))≠y i The model identifies a reduced value of accuracy in the case of (1).
5. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: the calculating of the label offset specifically comprises the following steps:
step 1: inputting a target neural network model M and an original sample set x c A confrontational sample generation algorithm a;
step 2: computing a target neural network model M versus the original samples
Figure FDA0003591180790000025
Prediction category of (2)
Figure FDA0003591180790000026
Where n represents the number of samples and the model M is for the original samples
Figure FDA0003591180790000027
Predicted per-class probability set
Figure FDA0003591180790000028
Model M pairs of original samples
Figure FDA0003591180790000029
Predict the class result as
Figure FDA00035911807900000210
Probability of (2)
Figure FDA00035911807900000211
Wherein
Figure FDA00035911807900000212
If it is not
Figure FDA00035911807900000213
Returning to the step 2, calculating the next sample;
and step 3: generating raw samples according to a challenge sample generation algorithm a
Figure FDA00035911807900000214
Against sample
Figure FDA00035911807900000215
Calculating model M pairs of anti samples
Figure FDA00035911807900000216
Predicted per-class probability set
Figure FDA00035911807900000217
Model M pairs of original samples
Figure FDA00035911807900000218
Predict the class result as
Figure FDA00035911807900000219
Probability of (2)
Figure FDA00035911807900000220
And 4, step 4: calculate the pair of antibody samples in model M
Figure FDA00035911807900000221
The prediction category is
Figure FDA00035911807900000222
The degree of the offset of (a) is,
Figure FDA00035911807900000223
Figure FDA00035911807900000224
making i ═ i +1 until i > n;
and 5: calculating the label offset LO of the confrontation sample by the formula
Figure FDA00035911807900000225
6. The method for scoring the confrontation sample of the deep neural network as claimed in claim 1, wherein: and the membership degree subset table is used for obtaining a membership degree matrix R of each index by utilizing the membership degree subset table of the immigability, the perceptibility and the sample construction cost, wherein an element R in the matrix ij A membership vector representing mobility when i is 1, a membership vector representing imperceptibility when i is 2, a membership vector representing attack success rate when i is 3, a membership vector representing tag offset when i is 4,
Figure FDA00035911807900000226
7. the method for scoring the confrontation sample of the deep neural network as claimed in any one of claims 1 to 6, wherein: the evaluation weight A comprises the weight of migratability, imperceptibility, attack success rate and label offset, wherein A is (A ═ A 1 ,A 2 ,A 3 ,A 4 )。
8. The method of claim 7, wherein the deep neural network confrontation sample scoring method comprises: the fourth step specifically comprises the following fuzzy comprehensive evaluation formula:
Figure FDA0003591180790000031
wherein A is the weight of four indexes of migratability, imperceptibility, attack success rate and label offset, R is a membership matrix obtained according to the index calculation result, and B is a finally obtained utilized evaluation result matrix; because only one fuzzy vector is obtained from the calculation result, the harmfulness of the antagonistic sample cannot be visually seen, the membership vector needs to be defuzzified to obtain a final AES index to score the antagonistic sample, and a formula for finally calculating the AES index is as follows:
Figure FDA0003591180790000032
b j is a weight, v j Rated as a price set.
CN202210378464.0A 2022-04-12 2022-04-12 Deep neural network countermeasures sample scoring method Active CN114821227B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210378464.0A CN114821227B (en) 2022-04-12 2022-04-12 Deep neural network countermeasures sample scoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210378464.0A CN114821227B (en) 2022-04-12 2022-04-12 Deep neural network countermeasures sample scoring method

Publications (2)

Publication Number Publication Date
CN114821227A true CN114821227A (en) 2022-07-29
CN114821227B CN114821227B (en) 2024-03-22

Family

ID=82534421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210378464.0A Active CN114821227B (en) 2022-04-12 2022-04-12 Deep neural network countermeasures sample scoring method

Country Status (1)

Country Link
CN (1) CN114821227B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858343A (en) * 2020-07-23 2020-10-30 深圳慕智科技有限公司 Countermeasure sample generation method based on attack capability
CN112465015A (en) * 2020-11-26 2021-03-09 重庆邮电大学 Adaptive gradient integration adversity attack method oriented to generalized nonnegative matrix factorization algorithm
CN112464245A (en) * 2020-11-26 2021-03-09 重庆邮电大学 Generalized security evaluation method for deep learning image classification model
CN112882382A (en) * 2021-01-11 2021-06-01 大连理工大学 Geometric method for evaluating robustness of classified deep neural network
CN113947016A (en) * 2021-09-28 2022-01-18 浙江大学 Vulnerability assessment method for deep reinforcement learning model in power grid emergency control system
US20220100867A1 (en) * 2020-09-30 2022-03-31 International Business Machines Corporation Automated evaluation of machine learning models
CN115438337A (en) * 2022-08-23 2022-12-06 中国电子科技网络信息安全有限公司 Method for evaluating safety of deep learning confrontation sample

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858343A (en) * 2020-07-23 2020-10-30 深圳慕智科技有限公司 Countermeasure sample generation method based on attack capability
US20220100867A1 (en) * 2020-09-30 2022-03-31 International Business Machines Corporation Automated evaluation of machine learning models
CN112465015A (en) * 2020-11-26 2021-03-09 重庆邮电大学 Adaptive gradient integration adversity attack method oriented to generalized nonnegative matrix factorization algorithm
CN112464245A (en) * 2020-11-26 2021-03-09 重庆邮电大学 Generalized security evaluation method for deep learning image classification model
CN112882382A (en) * 2021-01-11 2021-06-01 大连理工大学 Geometric method for evaluating robustness of classified deep neural network
CN113947016A (en) * 2021-09-28 2022-01-18 浙江大学 Vulnerability assessment method for deep reinforcement learning model in power grid emergency control system
CN115438337A (en) * 2022-08-23 2022-12-06 中国电子科技网络信息安全有限公司 Method for evaluating safety of deep learning confrontation sample

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XIAOYONG YUAN 等: "Adversarial Examples: Attacks and Defenses for Deep Learning", 《MACHINE LEARNING》, 7 July 2018 (2018-07-07), pages 1 - 20 *
仝鑫;王罗娜;王润正;王靖亚;: "面向中文文本分类的词级对抗样本生成方法", 信息网络安全, no. 09, 10 September 2020 (2020-09-10), pages 12 - 16 *
艾锐: "对抗样本危害程度分析与评估", 《万方数据》, 6 July 2023 (2023-07-06), pages 1 - 58 *

Also Published As

Publication number Publication date
CN114821227B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
US11132444B2 (en) Using gradients to detect backdoors in neural networks
Lee et al. Fair selective classification via sufficiency
Hu et al. EAR: an enhanced adversarial regularization approach against membership inference attacks
Tuna et al. Closeness and uncertainty aware adversarial examples detection in adversarial machine learning
Tsiligkaridis Failure prediction by confidence estimation of uncertainty-aware Dirichlet networks
Xiao et al. Latent imitator: Generating natural individual discriminatory instances for black-box fairness testing
CN112613032B (en) Host intrusion detection method and device based on system call sequence
CN116432184A (en) Malicious software detection method based on semantic analysis and bidirectional coding characterization
Bharath Kumar et al. Analysis of the impact of white box adversarial attacks in resnet while classifying retinal fundus images
CN111639688B (en) Local interpretation method of Internet of things intelligent model based on linear kernel SVM
CN114821227A (en) Deep neural network confrontation sample scoring method
CN115438337A (en) Method for evaluating safety of deep learning confrontation sample
Sha et al. Rationalizing predictions by adversarial information calibration
Dong et al. Towards Intrinsic Adversarial Robustness Through Probabilistic Training
Dao et al. Demystifying deep neural networks through interpretation: A survey
Huang et al. Focus-Shifting Attack: An Adversarial Attack That Retains Saliency Map Information and Manipulates Model Explanations
Vij et al. Vizai: Selecting accurate visualizations of numerical data
Abdukhamidov et al. Single-Class Target-Specific Attack against Interpretable Deep Learning Systems
Stock et al. Lessons learned: How (not) to defend against property inference attacks
Ingle et al. Enhancing Model Robustness and Accuracy Against Adversarial Attacks via Adversarial Input Training.
Xie Towards Interpretable and Reliable Deep Neural Networks for Visual Intelligence
Shi et al. Enhancing IoT Flow Anomaly Detection with Differential Optimal Feature Subspace
Wang Towards the Robustness of Deep Learning Systems Against Adversarial Examples in Sequential Data
Akhtom et al. Enhancing Trustworthy Deep Learning for Image Classification against Evasion Attacks: A systematic literature review
Wu et al. Multi-scale Features Destructive Universal Adversarial Perturbations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant