CN115439880A - Deep neural network robustness evaluation method and tuning method - Google Patents

Deep neural network robustness evaluation method and tuning method Download PDF

Info

Publication number
CN115439880A
CN115439880A CN202210894180.7A CN202210894180A CN115439880A CN 115439880 A CN115439880 A CN 115439880A CN 202210894180 A CN202210894180 A CN 202210894180A CN 115439880 A CN115439880 A CN 115439880A
Authority
CN
China
Prior art keywords
sample
target picture
perturbation
attack
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210894180.7A
Other languages
Chinese (zh)
Inventor
范洺源
周文猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210894180.7A priority Critical patent/CN115439880A/en
Publication of CN115439880A publication Critical patent/CN115439880A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is an anti-attack evaluation method, including: determining the attack success rate of adversity attack aiming at the target deep neural network; adding initial disturbance to an original sample of a target picture to obtain an initial confrontation sample of the target picture; obtaining an initial value of the distance between an original sample and a confrontation sample; adjusting the initial disturbance to obtain a minimum disturbance which meets the attack success rate and enables the distance to be minimum; and evaluating the robustness of the deep neural network against adversarial attacks based on the minimal perturbation. The method avoids the need of setting different disturbance budgets for different data sets in the prior art by searching for the minimum disturbance under the given attack success rate, and can greatly reduce the calculation cost required by DNN robustness evaluation by obtaining a minimum disturbance value for a picture sample. In addition, the method can more accurately evaluate the robustness of the target DNN against the adversarial attack through the reasonable setting of the evaluation index.

Description

Deep neural network robustness evaluation method and tuning method
Technical Field
The disclosure relates to the field of deep learning, and in particular relates to a deep neural network robustness assessment method and a tuning method.
Background
In recent years, deep Neural Networks (DNNs) have made significant progress, becoming the core technology of many industries. Many studies have shown that deep neural networks are vulnerable to attack. In particular, a countering attack may spoof the deep neural network by using a countering sample that adds a slight carefully designed hostile noise to the normal sample, causing the deep neural network to generate speculation errors. The vulnerability of deep neural networks becomes a major constraint for their deployment in high safety demanding scenarios such as autonomous driving, medical treatment, etc. Since the blind spot of the deep neural network can be exposed by resisting the attack, developing an effective and efficient DNN robustness method against the adversarial attack has become a basic task in the field of deep learning security.
Disclosure of Invention
One technical problem to be solved by the present disclosure is to provide an anti-attack evaluation method capable of effectively and efficiently evaluating the true robustness of DNN against anti-attack. The method avoids the need of setting different disturbance budgets for different data sets in the prior art by searching for the minimum disturbance under a given Attack Success Rate (ASR), and can greatly reduce the calculation expense required by DNN robustness evaluation by acquiring a minimum disturbance value for a picture sample. Furthermore, the robustness of the DNN against the adversarial attack can be more accurately evaluated through reasonable setting of the evaluation index.
According to a first aspect of the present disclosure, there is provided a deep neural network robustness assessment method, including: determining the attack success rate of adversity attack aiming at the target deep neural network; adding initial disturbance to an original sample of a target picture to obtain an initial target picture confrontation sample; acquiring an initial value of a distance between the target picture original sample and the target picture confrontation sample; adjusting the initial disturbance to obtain a minimum disturbance which satisfies the attack success rate and minimizes the distance; and evaluating the robustness of the deep neural network against adversarial attacks based on the minimal perturbation.
Optionally, the distance comprises a human perceptible distance between the target picture original samples and the target picture confrontation samples, the target picture original samples and the target picture confrontation samples being mapped to a color difference space to find the human perceptible distance.
Optionally, the method further comprises: using an attack effectiveness index to represent the attack success rate, wherein the attack effectiveness index at least comprises at least one of the following items: punishing an attack validity index output by the target deep neural network based on a real label of the target picture original sample; implicitly adjusting the attack validity index of the step length in the minimum disturbance searching process; and fusing the attack effectiveness indexes of the target deep neural network classification information.
Optionally, adding an initial perturbation to the target picture original sample to obtain an initial target picture confrontation sample comprises: adding a first initial perturbation to the target picture original sample to obtain the target picture initial confrontation sample, wherein the first initial perturbation makes the initial target picture confrontation sample misclassified by the target neural network; and adjusting the initial perturbation to obtain a minimum perturbation that satisfies the attack success rate and minimizes the distance comprises: the minimum perturbation is found by iterative calculations, wherein each iteration moves the perturbation towards the direction in which the human perceptible distance decreases the most.
Optionally, adding an initial perturbation to the target picture original sample to obtain an initial target picture confrontation sample comprises: adding a second initial disturbance to the target picture original sample to obtain the initial target picture confrontation sample, wherein the second initial disturbance is an all-zero vector; and calculating the minimum disturbance through iterative calculation, wherein each iteration enables the disturbance value to move towards the direction enabling the target picture to resist the sample to be wrongly classified by the target neural network.
Optionally, the obtaining the minimum disturbance through the iterative computation further includes: reducing a search step size for the minimum perturbation in an iteration round in which the target picture countermeasure sample is changed by the target neural network classification.
Optionally, adjusting the initial disturbance to obtain a minimum disturbance that satisfies the attack success rate and minimizes the distance includes: constructing an objective function which simultaneously characterizes the classification result of the target image countermeasure sample added with the disturbance value by the target neural network and can minimize the distance; and searching for the minimum perturbation based on a gradient optimization of the objective function.
Optionally, the attack success rate represents a probability that a target picture data set including n target picture samples is misclassified by the target depth data network after the minimum disturbance is added, where adjusting the initial disturbance to obtain the minimum disturbance that satisfies the attack success rate and minimizes the distance includes: for each target picture sample in the target picture data set, solving the minimum disturbance for the sample once; and, based on the minimal perturbation, evaluating the robustness of the deep neural network against adversarial attacks includes: calculating the human perceptible distance between the original sample of each target picture sample and the attack sample added with the corresponding minimum disturbance; and evaluating the robustness of the deep neural network against adversarial attacks according to the n human perceptible distances.
According to a second aspect of the present disclosure, there is provided a deep neural network tuning method, including: constructing a target picture confrontation sample by using the minimum perturbation obtained by the method of the first aspect; iteratively tuning the deep neural network using the target picture countermeasure sample and the original label; and acquiring an optimized deep neural network capable of classifying the target picture confrontation sample to the original label corresponding classification based on the iterative optimization result.
According to a third aspect of the present disclosure, there is provided a computing device comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described in the first aspect above.
According to a fourth aspect of the present disclosure, there is provided a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method as described in the first aspect above.
Therefore, the invention discloses a DNN robustness evaluation method for resisting attacks, which can effectively and efficiently evaluate the real robustness of DNN to resisting attacks. The method avoids the need of setting different disturbance budgets for different data sets in the prior art by searching for the minimum disturbance under a given Attack Success Rate (ASR), and can greatly reduce the calculation cost required by DNN robustness evaluation by obtaining a minimum disturbance value for a picture sample. Furthermore, the DNN robustness assessment can be more accurately carried out through reasonable design of human perceptible indexes and attack effectiveness indexes.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
Fig. 1 shows an example of constructing an antagonistic attack against a sample.
FIG. 2 shows a schematic flow diagram of a deep neural network robustness assessment method according to one embodiment of the present invention.
Fig. 3 shows a schematic flow chart of a deep neural network tuning method according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a computing device that can be used to implement the deep neural network robustness assessment method according to an embodiment of the present invention.
Fig. 5 shows an example of the construction of a challenge sample according to the present invention and the evaluation method of the prior art.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Deep Neural Network (DNN) is a mathematical computation model, has strong data fitting capability, and is widely used in the fields of computer vision, natural language processing, and the like. A challenge attack is an attack that is specific to a neural network, and can spoof a deep neural network by using challenge samples, so that the deep neural network generates a speculation error, for example, so that a classifier connected to the neural network outputs an erroneous classification. Here, the "confrontation sample" (may also be referred to as a confrontation sample) refers to a sample made by adding slight carefully designed hostile noise (hereinafter, also referred to as confrontation disturbance, or confrontation disturbance) to a normal sample. The antagonistic sample often looks the human eye indistinguishable from the original sample, but does result in a neural network classification error.
Fig. 1 shows an example of constructing an antagonistic attack against a sample. As shown in the figure, after the original panda picture on the left side is input into a trained picture classification model, the model presumes that the picture includes pandas with a half confidence, which indicates that the model has the capability of correctly classifying the panda picture. But after the picture is subjected to a little special noise by an attacker to generate a countermeasure sample, the picture on the right side can be obtained. The right picture (i.e., the original picture added with the confrontation sample obtained after the confrontation disturbance, for example, a small disturbance not exceeding the disturbance budget e) appears to the human eye as a panda picture, and the human eye cannot distinguish the left and right pictures, but the neural network may generate unexpected output for the confrontation sample, for example, the picture is presumed to include a gibbon with a very high degree of confidence.
The vulnerability of deep neural networks to resistant attacks becomes a major constraint in their deployment in scenarios with high security requirements such as autopilot, medical care, etc. Since evaluating the robustness of DNN against attacks can help to further improve network security performance, developing an efficient and effective DNN robustness evaluation scheme has become an important task in the field of deep learning security.
For the convenience of understanding the principles of the present invention, the conventional adversity sample generation method and the corresponding DNN robustness assessment method in the deep learning security field will be described first.
For convenience of illustration and comparison, the existing evaluation method may be referred to as a first type of evaluation method (hereinafter also referred to as a type I evaluation method, or method I). The class I evaluation method can be roughly summarized as the following two steps: 1) Under a certain distance constraint (called human perception metric), the threat antagonism disturbance is generated as much as possible by maximizing the attack effectiveness metric; 2) Using the perturbation to generate a countersample and estimate the corresponding probability that the DNN was spoofed is called the Attack Success Rate (ASR). In short, class I evaluation methods report the probability that a DNN is fooled under a given constraint.
The existing I-type evaluation method has low efficiency and low effectiveness.
First, in terms of efficiency, the class I evaluation method requires a proper constraint size to be set in advance. However, it is difficult to determine the appropriate constraint amplitude in advance because the difference of different data sets causes the appropriate constraint amplitudes in the reference data set to be inapplicable to each other. The evaluator needs to manually select the constraint magnitude for a given data set, usually by experience. Such operations are cumbersome and computationally expensive, requiring costly access to threatening challenge samples. And most evaluators are not experts, so that the constraint amplitude E cannot be adjusted skillfully and effectively, and the overhead is more unbearable.
Second, from a validity standpoint, class I evaluation methods typically employ a counter attack that follows a norm-based constraint (e.g., an infinity norm constraint) to make the counter perturbation difficult for human perception. Since norm-based metrics cannot cover all the imperceptible antagonism disturbances of humans, so that many of the more threatening antagonism disturbances lie outside the constraints, class I evaluation methods typically overestimate the robustness of DNNs (due to the inability to find antagonism samples that construct more threats), making such methods less effective.
Therefore, the invention provides a novel method for generating the antagonism sample and evaluating the DNN robustness. For convenience, the evaluation method proposed by the present invention can be referred to as a second type evaluation method (hereinafter, also referred to as a type II evaluation method, or method II).
Unlike class I evaluation methods, which evaluate the robustness of DNNs by reporting ASRs with specified perturbation constraints, the type II evaluation method of the present invention evaluates the robustness of DNNs by estimating at least how much perturbation needs to be applied to reach a given ASR.
From the viewpoint of data distribution, there is a large difference between the distribution of the challenge sample and the distribution of the natural sample, i.e., a distribution shift phenomenon. The size of the perturbation constraint can be regarded as a domain constraint, namely tolerance to distribution deviation, and the attack effectiveness is the size of the performance degradation of the model when the distribution deviation occurs. And it is reasonable to constrain one and adopt the maximum deviation of the other as an evaluation index of robustness.
In addition, from an overhead perspective, to achieve a specified ASR, it is necessary to find the most vulnerable combination of samples from all samples of size N · ASR, i.e., there are no other combinations of perturbations that can lead to a complete misclassification and whose minimum perturbation is smaller than the most vulnerable combination. Here, the total number of all samples is N, and in the case where ARS is equal to 25%, for example (meaning that N/4 countersamples are required to be misclassified by the deep neural network after adding perturbations to the samples), it is necessary to find the weakest combination of N/4 (i.e., N · ASR) samples from among the N samples.
However, finding the weakest combination requires enumerating all combinations of the same size, which equates to an NP-hard problem. The NP-hard problem can be reduced to finding the smallest threatening antagonism perturbation for each sample independently, as will be demonstrated below, so that the problem can be solved in polynomial time, thus making the overhead much lower than a type I evaluation method that makes the antagonism perturbation for all samples at each constrained magnitude. Therefore, in the class II evaluation method, only one confrontation sample is generated for each sample, and evaluation is not required to be performed for multiple times under different disturbance sizes like in the type I evaluation method, so that the calculation overhead is greatly reduced.
To facilitate an understanding of the principles of the present invention, the basic concepts and representations involved in deep neural network robustness assessment will be first set forth below, and a detailed description of the class II assessment method of the present invention will be subsequently introduced.
1. Deep neural network
A deep neural network is a complex function consisting of several to hundreds of neural network layers. Each neural network layer is essentially a simple non-linear function, typically a combination of a linear function and an activation function (e.g., relu (x) = max (x, 0)). Herein, for convenience of description, may be used
Figure BDA0003768721360000061
To express the DNN with the network parameter theta and use F θ (x)[i](i =1,2, …, m) represents the prediction confidence of the DNN. This prediction confidence, also referred to as the location of the DNN (i.e., location refers to the vector predicted by the DNN network to be fed into the softmax layer), is used to classify x into the ith class, where m is the total number of classes. Since it is usually desired that the final prediction result is a probability distribution of all classes, a softmax function can be added to normalize logit. The probability that the Softmax function outputs the ith class is as follows.
Figure BDA0003768721360000071
The DNN needs to be trained to be on a given data set
Figure BDA0003768721360000072
Figure BDA0003768721360000073
The above works well (no distinction is made between training and validation sets in the description of the invention, since the invention is intended to assess the robustness of DNN, not the performanceThe training set is evaluated to optimize the DNN and verify the performance of the DNN on the verification data set. Furthermore, adversarial attacks and defenses are mainly focused on the computer vision domain, i.e. the data set is usually a picture set), where c, h, w represent the channel, height, width of the input image, respectively, whose performance is quantified by the accuracy (acc) as follows:
Figure BDA0003768721360000074
in this case, the amount of the solvent to be used,
Figure BDA0003768721360000075
if the input condition is satisfied, 1 is output, otherwise, an indication function of 0 is output.
In order to obtain the (approximately) most optimal performance of DNN on the data set D, it is standard practice to optimize θ in an accuracy-dependent end-to-end manner using a mini-batch gradient descent algorithm or a variant thereof. However, since accuracy hinders the gradient propagation process, and therefore gradient-based optimization algorithms cannot be directly applied to optimize θ, accuracy is usually replaced by a differentiable objective function (also called a loss function), and the most commonly used objective function is a cross-entropy function that can be represented by CE (· >):
CE(x,y)=-log(softmax(F θ (x))[y]) (3)
the quality of the objective function has a great influence on the finally obtained DNN, and poor selection of the objective function can cause the deterioration of θ.
2. Antagonistic attack
Due to its powerful learning capabilities, DNN has become popular in many fields such as image analysis. However, since DNNs are extremely vulnerable to adversarial attacks, it is still difficult at this stage to deploy DNNs into security-critical scenarios such as medical image processing.
A antagonism attack (which may also be referred to hereinafter as an antagonism attack) is an operation in which a neural network is misled to produce a misprediction using an antagonism sample to which a tailored perturbation, indistinguishable by the human eye, is added as shown on the right side of figure 1. Since DNN is by nature a highly non-linear and non-convex function, it is difficult to theoretically find the robustness of DNN accurately, so that a viable solution for DNN robustness assessment is to construct powerful antagonistic attacks.
At the heart of the challenge attack, a challenge sample is constructed that is imperceptible to humans (usually imperceptible to the human eye, which may also be referred to as imperceptible to humans) but effective, and the specific implementation of the attack depends on the evaluation method. If the existing type I evaluation method is used, the ASR is required to be given under disturbance constraint in the attack; in contrast, if a class II assessment method is employed, the attack should give the magnitude of the antagonistic perturbation to achieve a given attack effect. The type II evaluation method of the present invention is advantageous in view of the fact that the overhead of evaluation should be as small as possible. In the present invention, it may be assumed that the evaluator has full access to the target DNN, i.e. the white-box scene. The white-box counter attack is more threatening than the black-box counter attack, so the lower bound of robustness can be estimated more accurately.
3. Antagonistic sample
As previously mentioned, the evaluation of DNN robustness requires the construction of appropriate challenge samples to perform challenge attacks. For a given neural network and a natural sample (i.e., an original sample to which no perturbation is added, which may also be referred to as an original sample hereinafter), a resistant sample (which may also be referred to as an antagonistic sample hereinafter) may be constructed by adding a tailored minute perturbation (an antagonistic perturbation, which may also be referred to as an antagonistic perturbation hereinafter) to the natural sample.
To avoid ambiguity, first a distinction is made between countering disturbances and random disturbances. An antagonistic perturbation as referred to herein refers to a perturbation that is specifically tailored for a given input, with the purpose of spoofing the target model; whereas random perturbations are only perturbations that follow a certain distribution and are independent of the input or the model. The countermeasure disturbance can be classified into a threatening countermeasure disturbance and a weak countermeasure disturbance. If the disturbance only produces confidence level reduction effect and does not produce error classification, the disturbance is weak countermeasure disturbance; if the addition of a perturbation results in a misclassification of the target model, the perturbation is a threat opposition perturbation.
Definitions of countering attacks, countering perturbations, and countering samples are given formally as follows.
The aim of antagonistic attackSpoofing a target DNN F by finding a human-imperceptible (e.g., imperceptible to the human eye) perturbation δ for a given input x with a true tag y θ (x) .1. The Thus, if a method constructs δ by approximately or accurately solving the following optimization task or its equivalent, the method is referred to as a counter attack.
Figure BDA0003768721360000081
Wherein
Figure BDA0003768721360000091
Is a human perceptible distance function, i.e. a human perception measure, between two images. Here, equation 4 describes that the value of the disturbance δ is found to minimize the humanly perceptible distance between the natural and disturbed samples, and is constrained by the condition that the classification of the disturbed samples by the target neural network is erroneous.
It should be noted here that the value of the antagonistic disturbance δ is defined by
Figure BDA0003768721360000092
Rather than δ itself. In other words, it is desirable to seek to minimize the distance that is perceivable by humans due to the disturbance δ, and not necessarily to minimize the value of the disturbance δ itself. In addition, since it is difficult to directly derive the analytical solution of equation 4, a gradient-based optimization method may be employed to solve equation 4. To this end, the antagonistic attack can be approximated by solving equation 4 based on:
Figure BDA0003768721360000093
wherein L (·,. Cndot.) is argmax j=1,···,m F θ (x)[j]A proxy function, not equal to y, is typically a cross-entropy loss function to avoid the irreducible problem, while e is a perturbation budget to constrain the distance between x and x + δ to be imperceptible to humans. The value of L (-) is positively correlated with the misclassification probability of the model pair input, L (-) isAnd.) can be referred to as an attack validity metric. Equation 5 describes here that the value of the disturbance δ is found to make the attack most effective, and is constrained by the condition that the human perceptible distance is not greater than the disturbance budget e.
Further, given an input x with a true tag y and a target DNN, if the perturbation δ is specifically constructed by a challenge attack, δ and x + δ are referred to as the challenge perturbation and challenge samples, respectively. If the target DNN incorrectly identifies x + δ, then perturbation δ is a threat countermeasure perturbation; otherwise, it is a weak antagonistic perturbation. Furthermore, the expression that the antagonistic disturbance is minimized for x means that the antagonistic disturbance makes
Figure BDA0003768721360000094
The minimum, i.e. the minimum disturbance, will vary according to the application
Figure BDA0003768721360000095
But may vary.
Different antagonistic attacks can be identified by solving equation 4 in different ways. For example, the sample may be pushed to move in a direction that makes the sample loss as high as possible (i.e., using equation 5), and the gradient direction may effectively match this direction. Equation 4 can be solved using the fast gradient descent method (FGSM) and its modified Basic Iterative Method (BIM) and the mapped gradient descent method (PGD). In the present invention, several different loss functions (described below) are newly designed and can be selected among them to perform better instead of the cross-entropy loss function.
As mentioned above, the class I evaluation method is too large in calculation amount due to the difficulty in determining the constraint size, and therefore the class II evaluation method is adopted in the invention.
4. DNN robustness assessment method
FIG. 2 shows a schematic flow diagram of a deep neural network robustness assessment method according to one embodiment of the present invention. This method corresponds to the class II assessment method employed in the present invention.
In step S210, an attack success rate for the antagonistic attack on the target deep neural network is determined. The Attack Success Rate (ASR) may be denoted by p and refers to the proportion of the DNN that misclassifies the picture fighting samples when attacking the DNN using n picture fighting samples to which the fighting disturbance is added (i.e., the DNN takes the n fighting samples as input for inferential classification), e.g., p =75% if n picture fighting samples were misclassified with 3n/4 samples. Here, the target deep neural network is a neural network whose robustness needs to be evaluated, and is a trained neural network. That is, the target deep neural network has the ability to correctly classify the n original pictures (natural samples without adding disturbance). However, under the delicate construction and addition of the disturbance, the deep neural network can be caused to carry out classification errors on n pictures against the samples.
In one embodiment, the attack success rate can be directly referred to using the percentage (p) (see equation 6 below). In a preferred embodiment, attack success rates may be characterized using different attack effectiveness indicators (see equations 7, 8, 9 and 11 below).
Subsequently, in step S220, an initial perturbation may be added to the target picture original sample to obtain an initial target picture confrontation sample, and in step S230, an initial value of the distance between the target picture original sample and the target picture confrontation sample is obtained.
Initial disturbance delta 0 (the disturbance added to the picture sample is represented by delta in the description below) can be an arbitrarily valued vector, and the original disturbance delta is added to the original picture 0 Thereafter, an initial challenge sample can be obtained. Since the optimized value of the perturbation needs to be searched from the initial perturbation, the countermeasure sample added with the initial perturbation will not be equal to the final target picture perturbation sample added with the optimized minimum perturbation. In addition, in order to improve the search efficiency of the optimized value, the initial disturbance δ may be made 0 Taking a particular value, for example, the value described below can be a first initial perturbation that causes a neural network to misclassify, or a second initial perturbation that takes the value as an all zero vector.
In the scheme of the invention for solving the minimum disturbance for the given ASR, the minimum disturbance is not the minimum value of the disturbance delta, but the confrontation sample of the target picture after the disturbance is added looks closest to the original sample of the target picture. For this reason, the closeness of the two pictures in human perception needs to be characterized by the distance between the original sample and the confrontation sample. In one embodiment, the norm-based distance between two pictures can be directly found (see equation 10 below). In the preferred embodiment of the present invention, it is desirable to utilize a human perceptible distance that more closely characterizes the "look" difference of the two pictures before and after the addition of the perturbation.
The human perceptible distance is used to characterize the true human perceptible distance between two different images. In one embodiment, the target picture original samples and the target picture confrontation samples may be mapped to a color difference space to find the human perceivable distance. CIE, and especially CIEDE2000, may be used as an indicator of distance perceivable by humans, e.g., "human imperceptibility indicator" as will be detailed below. As used in the following description
Figure BDA0003768721360000111
Representing human-perceptible distances between original and confrontational samples
Subsequently, a search for perturbation optimization values may be performed. Then, in step S240, the initial perturbation is adjusted to obtain the minimum perturbation which satisfies the attack success rate and minimizes the distance. That is, the minimum perturbation for the target picture sample is searched (the perturbation added to the picture sample is represented by δ in the following description). It should be noted that the minimum perturbation is a perturbation that minimizes the human perceptible distance between the target picture original sample and the target picture countersample if the attack success rate is met. In other words, the perturbation of the target picture is minimized so that the countersample looks different from the original sample of the target picture in human eyes, and the value of the non-perturbation delta is minimized. After the minimum perturbation is found, the above perturbation may be added to the original picture, thereby obtaining a confrontation sample picture corresponding to the original picture. The confrontation sample picture can be used for subsequent tuning of the deep neural network model.
In the present invention, the minimum disturbance is requiredTo meet the requirements of ASR (e.g., meet attack effectiveness indicators), there is a need to make
Figure BDA0003768721360000112
The minimum disturbance value is obtained, and therefore, the minimum disturbance value can be regarded as a search process of a joint optimization problem. Therefore, the minimum perturbation searching process includes finding the minimum perturbation through iterative calculation based on the initial value.
In one embodiment, a search may be started from a location where the initial perturbation meets the ASR requirements (e.g., meets the attack validity indicator) and successive approximation is performed such that
Figure BDA0003768721360000113
Minimum requirements. At this time, step S220 may include: adding a first initial perturbation (at this time, x + δ = x ', wherein a label of x' is different from y) to the target picture original sample to obtain the target picture initial confrontation sample, wherein the initial perturbation causes the target picture initial confrontation sample to be misclassified by the target neural network. Accordingly, step S240 may include: the minimum perturbation value is found by an iterative calculation, wherein the iterative calculation causes the perturbation value to move towards the direction in which the human perceptible distance decreases the most. This embodiment may correspond to an internal optimization scheme as detailed below in connection with equation 8, i.e. the initial perturbation meets the constraint F θ (x+δ)≠y。
In another embodiment, the search may also be started from a location where the initial perturbation does not meet the ASR requirements (e.g., does not meet the attack validity indicator). Preferably, the initial value may be an all zero vector, and x + δ = x at the start of the search. Since at this time
Figure BDA0003768721360000121
Already minimum, the search direction should be successive approximation to the constraint F θ (x + δ) ≠ y. To this end, step S220 may include: adding a second initial perturbation to the target picture original sample to obtain the target picture initial confrontation sample, wherein the initial perturbation is an all-zero vector. Preferably, step S240 mayTo include finding a minimum perturbation value by an iterative calculation that moves the perturbation value in a direction that causes the target picture to oppose misclassification of a sample by the target neural network. This embodiment may correspond to an external optimization scheme as detailed below in connection with equation 8, i.e. the initial perturbation does not meet the constraint F θ (x+δ)≠y。
An adaptive step size strategy can be used, whether for internal or external optimization schemes. At this time, the obtaining of the minimum disturbance value through the iterative computation further includes: reducing the search step size of the minimum disturbance value in the iteration round of the target picture confrontation sample changed by the target neural network classification. In particular, in an internal optimization scheme, if the current iteration results in a constraint F θ (x + δ) ≠ y no longer holds, i.e. the classification changes from error to correct, the search step length of this round can be reduced to approach the optimization point. Similarly, in the internal optimization scheme, if the current iteration results in the constraint F θ (x + δ) ≠ y becomes true, i.e., the classification changes from correct to false, the search step size of this round can be reduced to approximate the optimization point.
Additionally, in a preferred embodiment, a jointly optimized objective function (e.g., a loss function, see equation 9 below and its associated description) may also be constructed. To this end, step S240 may include: constructing an objective function which simultaneously comprises the minimum human perceptible distance and characterizes the classification result of the target picture confrontation sample added with the disturbance value by the target neural network; and searching for the minimum perturbation based on a gradient optimization of the objective function.
Subsequently, in step S250, the robustness of the deep neural network against adversarial attacks is evaluated based on the minimal perturbation. In one embodiment, the attack success rate characterizes a probability that a target picture data set comprising n target picture samples is misclassified by the target depth data network after adding the minimum perturbation. To this end, step S240 may include: for each target picture sample in the target picture data set, the minimum perturbation for that sample is solved once. Accordingly, based on the minimal perturbation, the step S250 of evaluating the robustness of the deep neural network against adversarial attacks comprises: calculating the human perceptible distance between the original sample of each target picture sample and the attack sample added with the corresponding minimum disturbance; and evaluating the robustness of the deep neural network against adversarial attacks according to the n human perceptible distances. The expected temporal complexity of finding the minimum perturbation for each target picture sample once for that sample is demonstrated below as O (n).
In addition, as described above, the attack success rate may be characterized using an attack effectiveness index. In different embodiments, different attack validity indicators may be constructed. Preferably, the attack validity indicator includes at least one of: punishing an attack validity index output by the target deep neural network based on a real label of the target picture original sample; implicitly adjusting the attack validity index of the step length in the minimum disturbance searching process; and fusing the attack effectiveness indexes of the classification information of the target deep neural network. F in equation 11 will be incorporated as follows 1 ~f 7 A description will be given.
The deep neural network robustness assessment method of the present invention is described in detail above in conjunction with fig. 3. To further the understanding of the inventive principles of the present invention, the problem to be solved by the class II evaluation method will be expressed mathematically (by formula) as follows, and a linear time complexity algorithm is introduced for the solution. Since existing antagonistic attacks are no longer suitable for class II evaluation methods, the present invention customizes several effective antagonistic attacks. Specifically, tailoring the adversarial attack to the class II evaluation method is equivalent to solving equation (4), i.e., assigning a special M and argmax j=1,···,m F θ (x)[j]Not equal to y (equation (4) can also be solved using a gradient-based optimization method). Subsequently, a human imperceptibility measure and an attack validity measure are given. After all this is done, a search scheme to find the minimum perturbation is detailed, which generally consists of three elements: initialization strategy, search direction and step size.
5. Formulation of problems
For conceptual simplicity, assume D as the evaluation dataset. Given ASR p, the goal of the class II evaluation method is to obtain the minimum counterdisturbance of D to achieve a given ASR. The goal may be expressed as optimizing the following task to obtain the counterdisturbance δ 1 ,...,δ n
Figure BDA0003768721360000131
And n.p is an integer (in this case, the number of classification errors is directly used as the representation of the ASR), and I is an indication function which outputs 1 if the input condition is satisfied and outputs 0 if the input condition is not satisfied. Herein, if I i =1, indicates x ii Is misclassified; if I i =0, then x is indicated ii Not misclassified. After obtaining the solution of equation 6, one can use
Figure BDA0003768721360000132
To evaluate the robustness of DNNs against challenge attacks. In other words, in a preferred embodiment, the class II assessment method of the present invention uses the sum of the human perceptible distances between each natural sample and the resistant sample in the data set D to characterize the amount of added perturbation under the predetermined ASR.
Before solving equation 6, consider the special case of p = 100%. If p =100%, then for all I, I is present i =1, and must search so that all x i Is DNN F θ (x) Misclassification and make
Figure BDA0003768721360000141
Minimum antagonistic disturbance delta i . Due to different x i Searching for such antagonistic perturbations delta i Is independent, so equation 6 can be converted as follows for each x i Optimization tasks performed separately.
Figure BDA0003768721360000142
The task solution in equation 7 is greatly simplified compared to equation 6.
Further, assume δ i I =1, …, n is p =100% by solving equation 7. Consider resetting p to a new value p 'with the goal of finding the most vulnerable combination of n.p' instances (i.e., the sample combination that is most prone to misclassification). This goal is equivalent to a range from { δ [ ] 1 ,···,δ n Find elements that do not belong to the weakest combination. For this purpose, { δ 1 ,···,δ n Has the maximum of
Figure BDA0003768721360000143
N (p-p') elements of (a) are set to zero because of δ i Are independent of each other, thereby obtaining
Figure BDA0003768721360000144
Is the smallest of all combinations of size n.p'. Further, δ i Is to result in a model F θ (x) For x i Minimum antagonism perturbation of misclassification, which indicates if δ i The decrease (i.e.,
Figure BDA0003768721360000145
increase), the model will correctly recognize x i . Thus, the perturbation { δ } thus obtained 1 ,···,δ n Exactly that ASR is p' is the solution of equation 6.
Suppose δ is generated i Has a time complexity of O (1). Then, the expected time complexity of solving equation 6 using the above method of the present invention is O (n), in contrast to solving equation 6 directly
Figure BDA0003768721360000146
The results of the class I evaluation method can be easily derived from the results of the class II evaluation method. For the class I evaluation method, the result is the maximum ASR for a given disturbance budget ∈. Just because the class II evaluation method at ASR =100% is for each instance x i All produce a minimum magnitude of threatening opposition disturbanceSo that applying a perturbation below this magnitude means x i Are correctly classified. Thus, at disturbance budget ∈, the maximum ASR is equal to the sample ratio at which the minimum threat confrontation disturbance magnitude is less than ∈. Thus, evaluation using the class II evaluation method is always preferred over the class I evaluation method, since the results of the class I evaluation method can be easily obtained by the class II evaluation method, whereas they are not. A more important reason for using the class II evaluation method is that the class II evaluation method gets rid of the huge burden of adjusting the hyperparameter ∈.
For simplicity of representation, the subscript i in equation 7 is omitted and the problem to be solved is as follows:
Figure BDA0003768721360000151
except that the optimization is directly performed under the optimization constraint as shown in equation 8
Figure BDA0003768721360000152
In addition, another method is to relax the constraint, arg max j=1,···,m F θ (x)[j]And putting the objective function as a penalty item. Formally, this idea can be expressed as:
Figure BDA0003768721360000153
wherein F θ (x + δ) ≠ y is defined by L (F) θ (x + δ), y) to adapt the gradient-based optimization method to the task.
Searching for δ based on equation 9 is more efficient and effective than equation 8. The evaluation methods all expect δ to be attack-effective while being as imperceptible as possible. That is, the amount of search direction information containing attack validity and human-imperceptible information is larger. The search direction used in equation 9 is affected by both the attack effectiveness metric and the human imperceptibility metric (here a gradient-based optimization method is utilized; furthermore, the optimization method search direction is the gradient direction of the objective function.); while the search direction in equation 8 utilizes only one of two indices. However, solving equation 9 presents significant difficulties: a small a does not guarantee the effectiveness of the attack for a well-designed delta, while a large a ignores human perception metrics. The value of alpha can be determined by a two-way approximation search method.
6. Characterization index
The threat confrontation sample has two characteristics: attack effectiveness and human imperceptibility. Therefore, antagonistic attacks typically make antagonistic samples with both attributes as above by solving an optimization task involving both the effectiveness of the attack and the imperceptibility of humans. The metric used determines to a large extent the quality of the resulting challenge sample.
The core of the human imperceptibility measure is how accurately to approximate the true human perceptual distance between two different images. (the similarity distance function herein is a loose version of the mathematically defined distance metric that should satisfy the nonnegativity, symmetry, and trigonometric inequalities as a strict distance metric, but sometimes the human perceptual distance may violate the trigonometric inequality.) however, most previous work employs a norm-based distance function as the similarity distance function, as shown below.
Figure BDA0003768721360000154
More specifically, the most common ∞ norm distance function is the maximum of all elements of | x-y |. The main drawback of the norm-based distance function is that it is not sufficiently consistent with the human perceptible distance function, so the present invention preferably uses CIEDE2000 instead. CIEDE2000 is the distance that currently best matches the human perceptible distance. Since the human perception distance of two images does not change uniformly with the RGB spatial distance of the two images, CIEDE2000 maps the two images from the RGB space to the CIELAB space first, and the distance in the CIELAB space can be more matched with the human perception system. The CIEDE2000 of the two images is then the weighted sum of the lightness, chroma and hue differences between the two images. Finally, it should be understood that there is currently no similar distance function that can act as a proxy for the human-perceptible distance function without bias, but that CIEDE2000 has a better match than the norm-based distance function.
Further, the quality of the proxy functions of ASR greatly affects the elaborated challenge sample, where the following seven proxy functions can be considered:
Figure BDA0003768721360000161
f 1 ~f 3 is the most straightforward proxy function. f. of 1 And f 2 The x-based true tag directly penalizes logit and probability (i.e., normalized logit), while f 3 Is essentially a negative cross-entropy loss function and is widely used for most antagonistic attacks, such as FGSM, BIM, and PGD.
f 4 Is f 2 An improved version of (1). Larger f 2 (x, y) indicates that x may be correctly classified by DNN, and in this case the attack validity indicator needs to have a higher weight than the human imperceptible indicator; and vice versa. To append this value before the attack validity index, we will f 2 (x, y) is resized to a scaling factor
Figure BDA0003768721360000162
If f is 2 If (x, y) is larger, the factor will be amplified by f 2 (x, y). The weight adjustment can also be considered as an implicit adjustment step size during the search.
f 1 ~f 4 Only the correct category is considered and no other category information that might be useful for guiding the search direction is explored. In particular, if the true label of x should be a dog, then it is easier for the spoof model to classify x as a cat than as an airplane, because dogs and cats typically have many common features. Thus, the model may be directed to classify x into the most similar category to category y, thereby more efficiently and effectively generating a confrontational sample. Thus, such knowledge is fused to f 5 ~f 7 Wherein arg max j≠y {f 2 (x, j) } is considered by the modelThe most similar category to the real category y.
And f 2 To f 4 Is improved similarly to f 6 And f 7 An adaptive amplitude function is also introduced, but in a different way. In particular, if the model misclassifies x with an additional confidence C, f 6 No longer considering attack validity, but f 7 Attack effectiveness in the research process is always considered.
Finally, f 4 ~f 7 Is through f 2 Instead of f 1 Or f 3 It is derived because it is convenient to adjust the hyperparameter C and infer the predicted trend of the model for x when predicting to a probabilistic form.
7. Search strategy
The minimum perturbation search based on equations 8 and 9 may be used with the attack validity indicator and the human imperceptible indicator as described above. The search scheme typically includes three parts: initialization strategy, search direction and step size. The present invention uses a gradient-based optimization method, i.e. a search finds the gradient direction that is the objective function. The step size is more related to the convergence speed, and the overall design of the search algorithm is hardly influenced. Therefore, the initialization strategy is mainly considered as follows. The initialization strategy can be roughly divided into internal initialization and external initialization. Internally initializing and selecting a point meeting the constraint condition as an initialization point; and external initialization selects a point violating the constraint as an initialization point. Thus, for equations 8 and 9, each of them may correspond to two different search schemes. Here, the search algorithm that solves equation 8 using internal and external initialization is labeled as internal and external optimization, respectively; similarly, for equation 9, the search algorithm using the inner initialization and the outer initialization is represented using the inner joint optimization and the outer joint optimization.
For internal optimization, the initialization of the antagonistic perturbation δ needs to meet the constraint F θ (x + δ) ≠ y, which is equivalent to the model identifying x + δ as other classes. Thus, realize F θ A simple method of (x + δ) ≠ y is to initialize δ, presenting x + δNow samples belonging to other classes (at which point the model has been trained with the ability to correctly classify natural text), i.e. x + δ = x ', where the label of x' is different from y. Wherein x' may be selected from D train δ = x' -x was extracted and obtained (training data set). At this time, since x and x + δ belong to different categories, the value of M (x, x + δ) is very large, and for this reason, the value needs to be reduced to make δ imperceptible to humans. The gradient descent algorithm may be used to move δ in the direction in which M (x, x + δ) is the most reduced, i.e., the negative gradient direction of M (x, x + δ) with respect to δ.
The similarity between x and x + δ increases with the number of iterations when implementing the above search scheme, thereby resulting in an increased probability of the model correctly identifying x. In other words, the scheme cannot search for a feasible solution due to violations of constraints. To solve this problem, before updating δ in each iteration, the algorithm should check whether this update can result in F θ (x + δ) = y condition. If F θ (x + δ) = y, abandoning the updating and terminating the searching process; otherwise, the algorithm runs normally.
Due to F θ The occurrence of (x + δ) = y may be due to a larger initialization step size, so an adaptive step size strategy is introduced to the search process, which strategy allows for a reduction of the step size to achieve a finer granularity of the search. In particular, if a certain update results in F θ (x + δ) = y, the step size will be reduced to half it, and then the condition is checked again. Further, the program is typically executed multiple times. If all attempts fail, the search process ends.
For external optimization, the initialization of δ no longer needs to satisfy F θ (x + δ) ≠ y, i.e., there should be F for δ θ (x + δ) = y. One approach is to set δ to an all-zero vector, such that x + δ = x, then F θ (x+δ)=y。
With the initialization strategy described above, the objective function can directly get the value (i.e., 0) that is best suited for the strategy, but the perturbation is not threatening. Therefore, the external optimization algorithm should move the perturbation towards causing F θ (x + δ) ≠ y and the direction of least perturbation, and this direction should be the most optimal search direction. However, considering the disturbance requirement energyCan lead to F θ (x + δ) ≠ y, simply
Figure BDA0003768721360000181
Is not sufficient to suggest an optimal direction because of the gradient direction
Figure BDA0003768721360000182
Does not contain the relation F θ Any information of (c). There are two ways to approach the most optimal direction. The first method is to jointly optimize two indexes of attack effectiveness and human imperceptibility, namely an internal joint optimization search algorithm and an external joint optimization search algorithm; the second approach is to optimize these two metrics alternately, which we will discuss in the next section. Here we focus more on using the gradient direction of one of the two indices as the search direction. The initial δ is the perturbation that most resembles x and x + δ, so we should pay more attention to the constraint of how to move δ to obtain F θ (x + δ) ≠ y. If F θ (x + δ) ≠ y is differentiable, so the most efficient direction is its gradient direction, but unfortunately it is not differentiable; therefore, we use F θ (x + δ) ≠ y. Furthermore, if x + δ is misclassified by the model, the search process should be ended as early as possible, since intuitively moving may increase
Figure BDA0003768721360000183
Also, an adaptive step size strategy can be used to search for antagonistic perturbations that are closer to the optimization point.
As previously mentioned, another solution to equation 8 is to jointly optimize two indicators of attack effectiveness and human imperceptibility, equation 9. Searching for the minimum perturbation using equation 9 generally performs better than optimizing one of the metrics alone, but requires determining an approximate α. A small α means that the search strategy focuses more on human imperceptibility indicators than on attack validity indicators, resulting in ineffective antagonism perturbation. For example, if α =0, the algorithm will focus on making δ =0.
Different alpha is used, and the obtained optimized value of each disturbance is different; as α increases, the model classification changes from correct to incorrect. This indicates that there is a critical point that leads to misclassification and that this critical point is an optimal value for a, i.e. the resulting antagonistic perturbation is most imperceptible to humans and still threatening. The optimized value can be found by an adaptive method. In particular, at each iteration, if samples with antagonistic perturbations are misclassified, the weight of the human-imperceptible metric may be increased; otherwise, the weight of the attack efficiency is increased. In this way, the most suitable α can be obtained, resulting in the most optimal antagonistic perturbation.
Further, a step size adjustment strategy may be added in the optimization process, wherein the step size decreases linearly to 0 with iteration. Furthermore, if the initialization strategy of the internal optimization is used, it is called internal joint optimization; otherwise, it may be referred to as external federated optimization.
Further, the minimal perturbation obtained by the class II evaluation method according to the present invention can be used for tuning the target deep neural network. FIG. 3 shows a schematic flow diagram of a deep neural network tuning method, according to one embodiment of the present invention.
In step S310, a target picture confrontation sample may be constructed using the minimum perturbation found by the DNN robustness assessment method as described above. In step S320, the deep neural network is iteratively tuned using the target picture countermeasure samples and original labels. In step S330, based on the result of iterative tuning, a tuned deep neural network capable of classifying the target picture confrontation sample into a corresponding classification of the original label is obtained. Therefore, the defense capability of the deep neural network against the attack can be further improved, and a foundation is provided for the arrangement of the deep neural network in the safety key field.
Fig. 4 is a schematic structural diagram of a computing device that can be used to implement the deep neural network robustness assessment method according to an embodiment of the present invention.
Referring to fig. 4, computing device 400 includes memory 410 and processor 420.
The processor 420 may be a multi-core processor or may include a plurality of processors. In some embodiments, processor 420 may include a general-purpose host processor and one or more special coprocessors such as a Graphics Processor (GPU), a Digital Signal Processor (DSP), or the like. In some embodiments, processor 420 may be implemented using custom circuits, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).
The memory 410 may include various types of storage units, such as system memory, read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are required by the processor 420 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 410 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 410 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense optical disc, flash memory cards (e.g., SD, min SD, micro-SD, etc.), a magnetic floppy disk, and the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 410 has stored thereon executable code that, when processed by the processor 420, may cause the processor 420 to perform the deep neural network robustness assessment methods described above.
Further, fig. 5 shows an example of the construction of the countermeasure sample by the present invention and the evaluation method of the related art. As shown on the left side of fig. 5, the raw data set may include 12 pictures (i.e., include 12 raw samples), and the classification correctness (acc) of the trained deep neural network is 93.75%. In the prior art, i.e., in the class I evaluation method, the maximum ASR is found from the perturbation budget ∈. As shown in the middle of FIG. 5, as the perturbation budget increases (e.g., the perturbation budget ∈ doubles from 8 to 32), the countersample deviates more and more from the original sample, and the ASR also increases more and more. In contrast, the challenge samples with the present invention need only be generated once for each picture under a given ASR (e.g., ASR =100% shown on the right side of the figure), and the challenge samples have less human perceptible differences compared to the original samples, thereby increasing the confidence of DNN robustness assessment and enabling tuning of more robust DNNs based on the resulting challenge samples.
The deep neural network robustness assessment method according to the present invention has been described in detail above with reference to the accompanying drawings. The invention discloses a DNN robustness evaluation method for resisting attacks, which can effectively and efficiently evaluate the real robustness of DNN to resisting attacks. The method avoids the need of setting different disturbance budgets for different data sets in the prior art by searching for the minimum disturbance under a given Attack Success Rate (ASR), and can greatly reduce the calculation cost required by DNN robustness evaluation by obtaining a minimum disturbance value for a picture sample. Furthermore, the DNN robustness assessment can be more accurately carried out through reasonable design of human perceptible indexes and attack effectiveness indexes.
Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.
Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (11)

1. A deep neural network robustness assessment method comprises the following steps:
determining the attack success rate of the antagonistic attack aiming at the target deep neural network;
adding initial disturbance to an original sample of a target picture to obtain an initial confrontation sample of the target picture;
acquiring an initial value of a distance between the target picture original sample and the target picture confrontation sample;
adjusting the initial disturbance to obtain a minimum disturbance which meets the attack success rate and enables the distance to be minimum; and
evaluating the robustness of the target deep neural network against adversarial attacks based on the minimal perturbation.
2. The method of claim 1, wherein the distance comprises a human perceivable distance between the target picture original sample and the target picture confrontation sample, the target picture original sample and the target picture confrontation sample being mapped to a color difference space for the human perceivable distance.
3. The method of claim 1, further comprising:
using an attack effectiveness index to represent the attack success rate, wherein the attack effectiveness index at least comprises at least one of the following items:
punishing an attack validity index output by the target deep neural network based on a real label of the target picture original sample;
implicitly adjusting the attack validity index of the step length in the minimum disturbance searching process; and
and fusing the attack effectiveness indexes of the target deep neural network classification information.
4. The method of claim 1, wherein adding an initial perturbation to the target picture original sample to obtain an initial target picture countermeasure sample comprises:
adding a first initial perturbation to the target picture original sample to obtain a target picture initial confrontation sample, wherein the first initial perturbation enables the initial target picture confrontation sample to be misclassified by the target neural network;
and adjusting the initial perturbation to obtain a minimum perturbation that satisfies the attack success rate and minimizes the distance comprises:
the minimum perturbation is found by iterative calculations, wherein each iteration round moves the perturbation towards the direction in which the human perceivable distance decreases the most.
5. The method of claim 1, wherein adding an initial perturbation to the target picture original sample to obtain an initial target picture countermeasure sample comprises:
adding a second initial disturbance to the target picture original sample to obtain the initial target picture confrontation sample, wherein the second initial disturbance is an all-zero vector; and
and solving the minimum disturbance through iterative calculation, wherein each iteration turn causes the disturbance value to move towards the direction which causes the target picture confrontation sample to be wrongly classified by the target neural network.
6. The method of claim 4 or 5, wherein the obtaining the minimum perturbation through iterative computation further comprises:
reducing a search step size for the minimum perturbation in an iteration round in which the target picture countermeasure sample is changed by the target neural network classification.
7. The method of claim 1, wherein adjusting the initial perturbation to obtain a minimum perturbation that satisfies the attack success rate and minimizes the distance comprises:
constructing a target function which simultaneously represents the classification result of the target image countermeasure sample added with the disturbance value by the target neural network and can minimize the distance; and
searching for the minimum perturbation based on a gradient optimization of the objective function.
8. The method of claim 1, wherein the attack success rate characterizes a probability that a target picture data set comprising n target picture samples is misclassified by the target depth data network after adding the minimum perturbation, wherein adjusting the initial perturbation to obtain the minimum perturbation that satisfies the attack success rate and minimizes the distance comprises:
for each target picture sample in the target picture data set, solving the minimum disturbance for the sample once;
and, based on the minimal perturbation, evaluating the robustness of the deep neural network against adversarial attacks includes:
calculating the human perceptible distance between the original sample of each target picture sample and the attack sample added with the corresponding minimum disturbance; and the number of the first and second groups,
and evaluating the robustness of the deep neural network against adversarial attacks according to the n human perceptible distances.
9. A deep neural network tuning method comprises the following steps:
constructing a target picture confrontation sample using the minimum perturbation found by the method of any one of claims 1-8;
iteratively tuning the deep neural network using the target picture countermeasure sample and an original label; and
and acquiring an optimized deep neural network capable of classifying the target picture confrontation samples to the corresponding classification of the original labels based on the result of iterative optimization.
10. A computing device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1 to 9.
11. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-9.
CN202210894180.7A 2022-07-27 2022-07-27 Deep neural network robustness evaluation method and tuning method Pending CN115439880A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210894180.7A CN115439880A (en) 2022-07-27 2022-07-27 Deep neural network robustness evaluation method and tuning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210894180.7A CN115439880A (en) 2022-07-27 2022-07-27 Deep neural network robustness evaluation method and tuning method

Publications (1)

Publication Number Publication Date
CN115439880A true CN115439880A (en) 2022-12-06

Family

ID=84242107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210894180.7A Pending CN115439880A (en) 2022-07-27 2022-07-27 Deep neural network robustness evaluation method and tuning method

Country Status (1)

Country Link
CN (1) CN115439880A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030312A (en) * 2023-03-30 2023-04-28 中国工商银行股份有限公司 Model evaluation method, device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030312A (en) * 2023-03-30 2023-04-28 中国工商银行股份有限公司 Model evaluation method, device, computer equipment and storage medium
CN116030312B (en) * 2023-03-30 2023-06-16 中国工商银行股份有限公司 Model evaluation method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
DeVries et al. Learning confidence for out-of-distribution detection in neural networks
US8325999B2 (en) Assisted face recognition tagging
Nesti et al. Detecting adversarial examples by input transformations, defense perturbations, and voting
CN112396129A (en) Countermeasure sample detection method and general countermeasure attack defense system
Jha et al. Detecting adversarial examples using data manifolds
US20220129758A1 (en) Clustering autoencoder
CN115860112B (en) Model inversion method-based countermeasure sample defense method and equipment
CN113269241B (en) Soft threshold defense method for remote sensing image confrontation sample
CN114187483A (en) Method for generating countermeasure sample, training method of detector and related equipment
CN115439880A (en) Deep neural network robustness evaluation method and tuning method
CN118316699B (en) Malicious client detection method and device for encryption federal learning, electronic equipment and storage medium
CN117940936A (en) Method and apparatus for evaluating robustness against
US20220138494A1 (en) Method and apparatus for classification using neural network
Wang et al. Understanding universal adversarial attack and defense on graph
Tsiligkaridis et al. Second order optimization for adversarial robustness and interpretability
Yin et al. Adversarial attack, defense, and applications with deep learning frameworks
CN116917899A (en) Method and apparatus for deep neural networks with capability for resistance detection
CN111950635A (en) Robust feature learning method based on hierarchical feature alignment
US20230206589A1 (en) Apparatus and method for detecting object using object boundary localization uncertainty aware network and attention module
CN113486736B (en) Black box anti-attack method based on active subspace and low-rank evolution strategy
US20230259619A1 (en) Inference apparatus, inference method and computer-readable storage medium
Jha et al. Trinity: Trust resilience and interpretability of machine learning models
US11599827B2 (en) Method and apparatus for improving the robustness of a machine learning system
Gunasekaran Evasion and Poison attacks on Logistic Regression-based Machine Learning Classification Model
Zhao Towards Robust Image Classification with Deep Learning and Real-Time DNN Inference on Mobile

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination