CN115797747A

CN115797747A - Countermeasure sample detection method based on model weight variation and confidence degree distance

Info

Publication number: CN115797747A
Application number: CN202211565742.XA
Authority: CN
Inventors: 陈晋音; 陈若曦; 金海波; 郑海斌
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-03-14

Abstract

The invention discloses a method for detecting a confrontation sample based on model weight variation and confidence degree distance, which comprises the steps of selecting an image data set, and dividing the image data set into a training set and a testing set; correspondingly inputting the image data set into a neural network model for training until reaching a preset precision; modifying the model weight of each trained neural network model to perform model weight variation to obtain a variation model, and screening according to classification accuracy; traversing the screened variant models of the benign samples in the training set and the corresponding confrontation samples, calculating confidence degree distances, and respectively obtaining a confrontation sample confidence degree distance matrix and a benign sample confidence degree distance matrix; constructing a binary classifier, splicing the confidence coefficient distance matrix of the confrontation sample and the confidence coefficient distance matrix of the benign sample to be used as a binary data set, and training and optimizing the binary classifier by using the binary data set until the preset accuracy is reached; and distinguishing the confrontation samples from the benign samples by using the optimized binary classifier.

Description

Confrontation sample detection method based on model weight variation and confidence degree distance

Technical Field

The invention relates to the field of data security, in particular to a countercheck sample detection method based on model weight variation and confidence degree distance.

Background

The field of counterattack of deep learning is increasingly wide, and the risks of countersample and toxic sample are increasingly highlighted. Dimension reduction attack can be realized by utilizing data stream processing in deep learning, and false and true countermeasures can cause the recognition system driven by artificial intelligence to have missed judgment or misjudgment. For example, an attacker uses a tiny sticker as noise to be attached to a 'stop' sign to make an antagonistic sample, the observation of human eyes is not different from the original 'stop' sign, and the recognition system based on the deep learning model can recognize the antagonistic sample as a 'speed limit' sign or other signs so as to attack an automatically-driven automobile; it has also been found that the injection of a small or even a single toxic sample (e.g. a note affixed to the stop sign) into the training data set alters the recognition of the image content by the depth model, and that the note or a note-like mark becomes a backdoor that can trigger the AI autopilot to recognize the picture as a "stop" mark, causing potential injury. Therefore, research on defense and detection technologies aiming at antagonistic samples is carried out to improve the robustness and safety of the model facing unknown threats, and the method has important significance for improving the reliability of the application process and realizing safe and credible artificial intelligence algorithm.

The counterattack aiming at deep learning is a research hotspot of the artificial intelligence security problem. The anti-attack definition: in the stage of model testing, an attacker adds carefully designed micro-disturbance to original data to obtain a countermeasure sample, so that the deep learning model is completely invalid and misjudges malicious attacks with higher confidence. According to whether the model structure of an attack target is known or not, white box attack and black box attack are divided; aiming at the original expectation of an attacker, the method comprises the following steps of dividing target attack and non-target attack; according to different confrontational samples, the attack is divided into a virtual digital space attack and a real physical space attack. An attacker causes machine learning to misclassify samples that human beings look very different as the samples that the attacker wants to mimic by producing a specific countersample. Generating high-quality countermeasure samples is the key of spoofing attacks, and related technologies include methods based on direct gradients (L-BFGS, FGSM, BIM, JSMA, deepFool, C & W attacks), methods based on gradient estimation (zo), methods based on countermeasure transformation (ATN), methods based on network generation (UPSET), and methods based on differential evolution (One Pixel attach), etc.

Defense methods can be divided into detection-only defense and full defense, depending on the defense effect. Unlike the challenge defense which attempts to correctly classify challenge samples, the challenge detection detects and selects challenge samples by the difference between the challenge samples and the normal samples. The countercheck sample detection method is mainly divided into detection based on empirical statistics, detection based on image preprocessing and reconstruction, and detection based on a detection network. Hendrycks and Gimpel found that the late principal component of the challenge sample generally had a larger variance than the benign late principal component, and using this difference, a threshold for the difference between the benign sample and the challenge sample was calculated to achieve challenge detection. Liang et al treat the image perturbations as noise and use scalar quantization and spatial smoothing filters to detect respectively competing samples of different pixel magnitudes. Cohen et al combine the k-Nearest neighbor algorithm with the influence function to extract the Nearest Neighbor Influence Function (NNIF), so as to realize antagonistic sample detection. Feinman et al propose countermeasure detection using kernel density and Bayesian uncertainty estimates. A Kernel Density Estimate (KDE) is used to identify whether the data point is far from the manifold-like, and a Bayesian Uncertainty Estimate (BUE) detects whether the data point is near a low confidence domain where the KDE is invalid. By examining each output of the inner convolution layer of the original model, it is determined whether the input sample has antagonism. Gong et al discriminates between benign samples and confrontational samples by training a binary classifier network. The binary classifier is a network completely separated from the main classifier, and generates countermeasure samples not for the detector but for the pre-trained classifier and trains the binary classifier by adding these countermeasure samples to the original training data. Gross et al, by adding a network of confrontational class enhancement classifiers, trained a new model with additional classes given a pre-trained model using benign samples and confrontational samples generated by the model.

Although existing detection methods work well, they still face the following challenges:

(1) The problem that the complexity of an algorithm is high in the existing advanced countercheck sample detection technology whether based on empirical statistics or image preprocessing exists, additional model parameters need to be added, and the countercheck samples generated by different attack methods are highly dependent. Therefore, how to design a lightweight countermeasure sample detection technology and reduce the dependency on the countermeasure sample is one of the technical difficulties that need to be overcome.

(2) Continuous playing of attack and defense and black box property of artificial intelligence cause that the existing defense is mostly posterior defense designed based on experience, various unknown attacks cannot be defended, and the detection range is limited.

(3) The current method for detecting the confrontation sample is mainly based on the confrontation sample, and the influence of the internal change of the model on the confrontation sample is ignored. There is still a need to develop interpretable techniques of attack and defense to understand the cause of attack and the feasibility of defense to guide the testing of challenge samples.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method for detecting a confrontation sample based on model weight variation and confidence degree distance, which comprises the steps of carrying out variation on a neural network model, calculating confidence degree distribution by observing output variation of a series of neural network models, training a binary classifier according to the difference of the confidence degree distance, and realizing high-precision and high-efficiency detection of the confrontation sample.

In order to realize the purpose, the technical scheme of the invention is as follows: the first aspect of the embodiments of the present invention provides a method for detecting a confrontation sample based on model weight variation and confidence distance, the method including the following steps:

1) Selecting an image data set, dividing the image data set into a training set and a test set according to classes, and performing single hot coding pretreatment on class tables of all pictures in the image data set;

2) Correspondingly inputting the image data set selected in the step 1) into a neural network model for training until the preset precision is reached;

3) Modifying the model weight of each neural network model trained in the step 2) to perform model weight variation to obtain a variation model, and screening the variation model according to classification accuracy;

4) Traversing the screened variation models of the benign samples in the training set and the corresponding confrontation samples, calculating confidence degree distances, and splicing to obtain a confrontation sample confidence degree distance matrix and a benign sample confidence degree distance matrix;

5) Constructing a binary classifier, splicing the confidence degree distance matrix of the confrontation sample and the confidence degree distance matrix of the benign sample to form a binary data set, and training and optimizing the binary classifier by using the binary data set until the preset accuracy is reached; and distinguishing the confrontation samples from the benign samples by using the optimized binary classifier.

A second aspect of the embodiments of the present invention provides a countermeasure sample detection apparatus based on model weight variation and confidence distance, including one or more processors, for the countermeasure sample detection method based on model weight variation and confidence distance.

A third aspect of an embodiment of the present invention provides a computer-readable storage medium having stored thereon a program for a confrontational sample detection method based on model weight variation and confidence distance as described above, when the program is executed by a processor.

Compared with the prior art, the invention has the following beneficial effects: the invention provides a method for detecting a confrontation sample with model weight variation and confidence degree distance, aiming at the problems of limited detection range, dependence on prior knowledge of confrontation attack and poor interpretability of the existing method for detecting the confrontation sample. The experimental result on the real deep learning model shows that the method has good applicability, can effectively detect unknown confrontation samples, has low spatial complexity and keeps the classification accuracy of benign samples.

Drawings

FIG. 1 is a block diagram of a method for detecting challenge samples based on model weight variance and confidence distance according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an overall framework of a method for detecting a challenge sample based on model weight variation and confidence distance.

Fig. 3 is a schematic diagram of an apparatus for testing challenge samples based on model weight variance and confidence distance according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings. The features of the following examples and embodiments may be combined with each other without conflict.

Referring to fig. 1 to 2, a method for detecting a confrontation sample based on model weight variation and confidence distance includes the following steps:

1) Selecting an image data set, dividing the image data set into a training set and a testing set according to classes, and carrying out single hot code preprocessing on class lists of all pictures in the image data set, wherein the specific process is as follows:

1.1 Select an image dataset: the image dataset in the embodiment of the invention is selected from MNIST, CIFAR-10 and GTSRB datasets.

Where the MNIST dataset consists of 70,000 gray scale images of size 28 × 28 with handwritten numbers 0-9 in content. The data set included 60,000 training data and 10,000 test data. The CIFAR-10 data set consisted of 60,000 color images of size 32X 32, for a total of 10 classes of 6,000 images each, a training set of 50,000 and a test set of 10,000. The GTSRB data set includes over 50,000 48 × 48 traffic signal pictures, for a total of 43 classes. Saving image samples and corresponding class marks thereof, wherein the sample set is marked as X = { X = ₁ ,x ₂ ,…,x _m And the class of each picture is marked as y.

1.2 The initial image dataset is classified into a training set and a test set by class, and a class table of all pictures in the initial image dataset is subjected to a one-hot encoding preprocessing.

Illustratively, for the GTSRB dataset, 80% of each class of GTSRB dataset is randomly drawn as a training set in this example, and the remaining pictures are taken as test sets. The class labels y of all data sets are one-hot coded (one-hot coded) to facilitate subsequent training.

2) Correspondingly inputting the image data set selected in the step 1 into a neural network model for training until reaching a preset precision, wherein the specific steps are as follows:

in this example, different neural network models need to be adopted for training different image datasets, and exemplarily, the MNIST dataset is input to the LeNet-5 model for training, the CIFAR-10 dataset is input to the VGG19 model for training, and the GTSRB dataset is input to the ResNet20 model for training. The input layer size of the classifier model is the same as the image size, and is [ H, W, C ], the output layer size is [ H multiplied by W multiplied by C,1], wherein H is the image height, W is the width, and C is the number of input channels.

The process of training the neural network model comprises the following steps: one Deep Neural network model (DNN) can be expressed as:

wherein

Represents the input of the deep neural network model, and Y ∈ Y represents the output of the deep neural network model. In the training process, the loss function calculation formula of the deep neural network model is as follows:

wherein the content of the first and second substances,

and expressing a loss function of the deep neural network model, wherein y is a real class mark of the sample x, theta is a model parameter, C is the total number of classes to which the sample belongs, and i represents an index value of the sample. After training is finished, if the classification precision of the deep neural network model reaches more than 95%, the model and the training parameters are saved, otherwise, the training is continued.

3) Modifying the model weight of each neural network model trained in the step 2) to perform model weight variation to obtain a variation model, and screening the variation model according to classification accuracy. The method comprises the following specific steps:

3.1 For a neural network model trained in step 2), a clean training set is given

The loss function of equation (1) can be rewritten as:

wherein the neural network model parameter θ satisfies

3.2 Define the model weight change δ, the calculation formula is as follows:

δ＝-H ^-1 g+η (3)

where H is the Hessian matrix of the loss function of the model on the clean data set, i.e.

g is the first derivative of the loss function, i.e.

η (0, 1) is a random value that satisfies the standard normal distribution.

3.3 Carrying out model weight variation on the neural network model modification model weights trained in the step 2): taking a neural network model trained in the step 2) as a seed model M _s To it proceed withn number of model weight variation operations, in this example n is taken to be 10, resulting in n number of variation models M _m ＝{M ₁ ，M ₂ ，...，M _n }。

The process of the weight variation of the model comprises the following steps: modifying the neural network model parameters into theta + delta and modifying the loss function into

In particular, in order to ensure the classification accuracy of the benign samples by the model, the change amount of the weight also needs to satisfy the following formula:

wherein | · | purple sweet ₂ Which represents the two-norm of the vector,

represents the amount of change in the loss function, and o (1) represents the high order infinitesimal.

3.4 Screening the variant models according to classification accuracy: respectively calculating variation model M _m ＝{M ₁ ，M ₂ ，...，M _n The classification accuracy of the mutation model is rejected if the classification accuracy is lower than 90%. The variation models are arranged in a descending order according to the classification accuracy, top-k variation models are taken for subsequent use, wherein k is<n is used as the index. And if the accuracy of the variation model does not meet the requirement, continuing to perform variation on the model until k variation model sequences meeting the classification precision are obtained. In this example, k is 5.

4) Traversing the screened variation model of the benign sample and the corresponding challenge sample, calculating the confidence degree distance, and splicing to respectively obtain a challenge sample confidence degree distance matrix and a benign sample confidence degree distance matrix.

Wherein, the definition of the confidence coefficient distance is as follows:

wherein d (p) is the probability distance and v (p) is the probability variance. p' is p after descending order,

mean values in the p sequence are indicated.

X = { X ] for a given series of model inputs ₁ ，x ₂ An analysis and a series of variation models M _m ＝{M ₁ ，M ₂ ，...，M _k }. Inputting a sample into the jth variation model, wherein j is less than or equal to k, calculating the output confidence coefficient of the variation model, traversing all k variation models, calculating the confidence coefficient distance, and splicing the obtained values to obtain a confidence coefficient sequence

Where C represents the total number of categories.

Generating a countermeasure sample by using FGSM, and respectively inputting the initial benign sample x and the corresponding countermeasure sample into the screened variation model to respectively obtain a countermeasure sample confidence degree distance matrix and a benign sample confidence degree distance matrix.

5) Constructing a binary classifier, splicing the confidence coefficient distance matrix of the confrontation sample and the confidence coefficient distance matrix of the benign sample to be used as a binary data set, and training and optimizing the binary classifier by using the binary data set until the preset accuracy is reached; and distinguishing the confrontation samples from the benign samples by using the optimized binary classifier.

The step 5) specifically comprises the following substeps:

5.1 Build a binary classifier: because the confidence distance matrix is a simple two-dimensional matrix, classification can be realized only by a simple fully-connected network. Three fully connected layers (full _ connected layers) are stacked, the active function of the first two layers selects ReLU, and the active function of the last layer is softmax.

5.2 Mark the class label of the benign sample confidence degree distance matrix as 0, mark the class label of the confrontation sample confidence degree distance matrix as 1, and concatenate the confrontation sample confidence degree distance matrix and the benign sample confidence degree distance matrix to construct a binary data set, and divide the data set by 8. And then carrying out one-hot coding on the class mark.

In particular, for the MNIST dataset, 200 challenge samples and 100 benign samples were used. For the CIFAR-10 dataset, 200 benign samples and 200 FGSM challenge samples were used. For the GTSRB dataset, 400 FGSM challenge samples and 15 benign samples were used.

5.3 Training a binary classifier using the partitioned dataset, wherein the hyper-parameters of the binary classifier training are set as: the loss function selects the cross entropy and the optimizer selects SGD with a batch _ size of 5. For the CIFAR-10 dataset, epoch is set to 10; the epoch of the GTSRB dataset is 15; the epoch of MNIST is set to 5.

5.4 Test binary classifiers for optimization.

And inputting the confidence degree distance matrix of the test set into the trained binary classifier, and if the classification precision is insufficient, modifying the training epoch number or increasing the structure of a full connection layer in the binary classifier, and retraining the binary classifier again.

And inputting the decision score of the benign sample into a binary classifier, if the classification precision is insufficient, increasing the number of the benign samples in a training set, and retraining the binary classifier, so that the detector can accurately judge the resisting sample and ensure the classification precision of the benign sample.

5.5 ) use an optimized binary classifier to distinguish between competing and benign samples.

Corresponding to the foregoing embodiments of the method for detecting a challenge sample based on the model weight variation and the confidence distance, the present invention further provides embodiments of an apparatus for detecting a challenge sample based on the model weight variation and the confidence distance.

Referring to fig. 3, the apparatus for detecting challenge samples based on model weight variation and confidence distance according to the embodiment of the present invention includes one or more processors, which are configured to implement the method for detecting challenge samples based on model weight variation and confidence distance in the above embodiment.

The embodiment of the apparatus for testing a challenge sample based on the model weight variation and the confidence distance according to the present invention can be applied to any device with data processing capability, such as a computer or other devices. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. From a hardware aspect, as shown in fig. 3, a hardware structure diagram of an arbitrary device with data processing capability of a countermeasure sample detection apparatus based on model weight variation and confidence distance according to the present invention is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 3, in an embodiment, the arbitrary device with data processing capability of the apparatus may further include other hardware according to an actual function of the arbitrary device with data processing capability, which is not described again.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement without inventive effort.

An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the method for detecting a challenge sample based on a model weight variation and a confidence distance in the above embodiments is implemented.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium can be any device with data processing capability, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing capable device, and may also be used for temporarily storing data that has been output or is to be output.

The embodiments described in this specification are merely illustrative of implementation forms of the inventive concept, and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments, but also equivalent technical means that can be conceived by one skilled in the art based on the inventive concept.

Claims

1. A method for detecting a confrontation sample based on model weight variation and confidence distance, the method comprising the steps of:

2. The method of claim 1, wherein the image dataset is selected from the group consisting of MNIST, CIFAR-10, GTSRB, etc.; the step 2) of training the image data set selected in the step 1 corresponding to the input neural network model comprises the following steps: inputting the MNIST data set into a LeNet-5 model for training, inputting the CIFAR-10 data set into a VGG19 model for training, and inputting the GTSRB data set into a ResNet20 model for training.

3. The method for detecting the confrontation sample based on the model weight variation and the confidence distance as claimed in claim 1, wherein the loss function of the neural network model in the step 2) is calculated as follows:

wherein the content of the first and second substances,

and expressing a loss function of the deep neural network model, wherein y is a real class mark of the sample x, theta is a model parameter, C is the total number of classes to which the sample belongs, and i represents an index value of the sample.

4. The method for detecting the confrontation sample based on the model weight variation and the confidence degree distance according to claim 1, wherein the process of performing the model weight variation on each neural network model modification model weight trained in the step 2) to obtain the variation model comprises:

modifying the neural network model parameter to theta + delta and modifying the loss function to theta + delta

Specifically, for a neural network model trained in the step 2), a clean training set is given

The loss function of the neural network model is changed to:

wherein the neural network model parameter θ satisfies

Defining the weight change quantity delta of the model as:

δ＝-H ^-1 g+η

wherein H is a neural network model in a clean data set

The Hessian matrix of the upper loss function, i.e.

g being the first derivative of the loss function, i.e.

Eta to N (0, 1) are random values that satisfy a standard normal distribution.

Carrying out model weight variation on each neural network model modification model weight trained in the step 2): taking a neural network model trained in the step 2) as a seed model M _s To it proceed withn times of model weight variation operation to obtain n variation models M _m ＝{M ₁ ，M ₂ ，...，M _n }。

5. The method as claimed in claim 4, wherein the change δ of the model weight is further satisfied by:

wherein | · | purple sweet ₂ Which represents the two-norm of the vector,

represents the amount of change of the loss function, and o (1) represents that the higher order is infinitesimal.

6. The method for detecting the confrontation sample based on the model weight variation and the confidence distance according to claim 1, wherein the binary classifier constructed in the step 5) is a three-layer full-connection layer network, the activation functions of the first two layers are ReLU, and the activation function of the last layer is softmax.

7. The method for detecting the confrontation sample based on the model weight variation and the confidence coefficient distance as claimed in claim 1, wherein the process of splicing the confrontation sample confidence coefficient distance matrix and the benign sample confidence coefficient distance matrix in the step 5) as the binary data set comprises:

and marking the class label of the benign sample confidence coefficient distance matrix as 0, marking the class label of the countermeasure sample confidence coefficient distance matrix as 1, and splicing the countermeasure sample confidence coefficient distance matrix and the benign sample confidence coefficient distance matrix to construct a binary data set.

8. The method for detecting the confrontation sample based on the model weight variation and the confidence distance as claimed in claim 1, wherein the process of optimizing the binary classifier in the step 5) comprises:

inputting the confidence distance matrix into a trained binary classifier for testing, and if the classification precision does not reach the preset precision, modifying the training round or increasing the structure of a full connection layer in the binary classifier, and retraining the binary classifier;

and inputting the decision score of the benign sample into the binary classifier, and if the classification precision does not reach the preset precision, increasing the number of the benign samples in the training set and retraining the binary classifier.

9. A device for detecting challenge samples based on model weight variation and confidence distance, comprising one or more processors for implementing the method for detecting challenge samples based on model weight variation and confidence distance according to any one of claims 1 to 8.

10. A computer-readable storage medium having stored thereon a program, which when executed by a processor, is configured to implement the method for detecting a challenge sample based on model weight variation and confidence distance according to any one of claims 1 to 8.