CN115797747A - Countermeasure sample detection method based on model weight variation and confidence degree distance - Google Patents

Countermeasure sample detection method based on model weight variation and confidence degree distance Download PDF

Info

Publication number
CN115797747A
CN115797747A CN202211565742.XA CN202211565742A CN115797747A CN 115797747 A CN115797747 A CN 115797747A CN 202211565742 A CN202211565742 A CN 202211565742A CN 115797747 A CN115797747 A CN 115797747A
Authority
CN
China
Prior art keywords
model
sample
confidence
data set
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211565742.XA
Other languages
Chinese (zh)
Inventor
陈晋音
陈若曦
金海波
郑海斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202211565742.XA priority Critical patent/CN115797747A/en
Publication of CN115797747A publication Critical patent/CN115797747A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method for detecting a confrontation sample based on model weight variation and confidence degree distance, which comprises the steps of selecting an image data set, and dividing the image data set into a training set and a testing set; correspondingly inputting the image data set into a neural network model for training until reaching a preset precision; modifying the model weight of each trained neural network model to perform model weight variation to obtain a variation model, and screening according to classification accuracy; traversing the screened variant models of the benign samples in the training set and the corresponding confrontation samples, calculating confidence degree distances, and respectively obtaining a confrontation sample confidence degree distance matrix and a benign sample confidence degree distance matrix; constructing a binary classifier, splicing the confidence coefficient distance matrix of the confrontation sample and the confidence coefficient distance matrix of the benign sample to be used as a binary data set, and training and optimizing the binary classifier by using the binary data set until the preset accuracy is reached; and distinguishing the confrontation samples from the benign samples by using the optimized binary classifier.

Description

Confrontation sample detection method based on model weight variation and confidence degree distance
Technical Field
The invention relates to the field of data security, in particular to a countercheck sample detection method based on model weight variation and confidence degree distance.
Background
The field of counterattack of deep learning is increasingly wide, and the risks of countersample and toxic sample are increasingly highlighted. Dimension reduction attack can be realized by utilizing data stream processing in deep learning, and false and true countermeasures can cause the recognition system driven by artificial intelligence to have missed judgment or misjudgment. For example, an attacker uses a tiny sticker as noise to be attached to a 'stop' sign to make an antagonistic sample, the observation of human eyes is not different from the original 'stop' sign, and the recognition system based on the deep learning model can recognize the antagonistic sample as a 'speed limit' sign or other signs so as to attack an automatically-driven automobile; it has also been found that the injection of a small or even a single toxic sample (e.g. a note affixed to the stop sign) into the training data set alters the recognition of the image content by the depth model, and that the note or a note-like mark becomes a backdoor that can trigger the AI autopilot to recognize the picture as a "stop" mark, causing potential injury. Therefore, research on defense and detection technologies aiming at antagonistic samples is carried out to improve the robustness and safety of the model facing unknown threats, and the method has important significance for improving the reliability of the application process and realizing safe and credible artificial intelligence algorithm.
The counterattack aiming at deep learning is a research hotspot of the artificial intelligence security problem. The anti-attack definition: in the stage of model testing, an attacker adds carefully designed micro-disturbance to original data to obtain a countermeasure sample, so that the deep learning model is completely invalid and misjudges malicious attacks with higher confidence. According to whether the model structure of an attack target is known or not, white box attack and black box attack are divided; aiming at the original expectation of an attacker, the method comprises the following steps of dividing target attack and non-target attack; according to different confrontational samples, the attack is divided into a virtual digital space attack and a real physical space attack. An attacker causes machine learning to misclassify samples that human beings look very different as the samples that the attacker wants to mimic by producing a specific countersample. Generating high-quality countermeasure samples is the key of spoofing attacks, and related technologies include methods based on direct gradients (L-BFGS, FGSM, BIM, JSMA, deepFool, C & W attacks), methods based on gradient estimation (zo), methods based on countermeasure transformation (ATN), methods based on network generation (UPSET), and methods based on differential evolution (One Pixel attach), etc.
Defense methods can be divided into detection-only defense and full defense, depending on the defense effect. Unlike the challenge defense which attempts to correctly classify challenge samples, the challenge detection detects and selects challenge samples by the difference between the challenge samples and the normal samples. The countercheck sample detection method is mainly divided into detection based on empirical statistics, detection based on image preprocessing and reconstruction, and detection based on a detection network. Hendrycks and Gimpel found that the late principal component of the challenge sample generally had a larger variance than the benign late principal component, and using this difference, a threshold for the difference between the benign sample and the challenge sample was calculated to achieve challenge detection. Liang et al treat the image perturbations as noise and use scalar quantization and spatial smoothing filters to detect respectively competing samples of different pixel magnitudes. Cohen et al combine the k-Nearest neighbor algorithm with the influence function to extract the Nearest Neighbor Influence Function (NNIF), so as to realize antagonistic sample detection. Feinman et al propose countermeasure detection using kernel density and Bayesian uncertainty estimates. A Kernel Density Estimate (KDE) is used to identify whether the data point is far from the manifold-like, and a Bayesian Uncertainty Estimate (BUE) detects whether the data point is near a low confidence domain where the KDE is invalid. By examining each output of the inner convolution layer of the original model, it is determined whether the input sample has antagonism. Gong et al discriminates between benign samples and confrontational samples by training a binary classifier network. The binary classifier is a network completely separated from the main classifier, and generates countermeasure samples not for the detector but for the pre-trained classifier and trains the binary classifier by adding these countermeasure samples to the original training data. Gross et al, by adding a network of confrontational class enhancement classifiers, trained a new model with additional classes given a pre-trained model using benign samples and confrontational samples generated by the model.
Although existing detection methods work well, they still face the following challenges:
(1) The problem that the complexity of an algorithm is high in the existing advanced countercheck sample detection technology whether based on empirical statistics or image preprocessing exists, additional model parameters need to be added, and the countercheck samples generated by different attack methods are highly dependent. Therefore, how to design a lightweight countermeasure sample detection technology and reduce the dependency on the countermeasure sample is one of the technical difficulties that need to be overcome.
(2) Continuous playing of attack and defense and black box property of artificial intelligence cause that the existing defense is mostly posterior defense designed based on experience, various unknown attacks cannot be defended, and the detection range is limited.
(3) The current method for detecting the confrontation sample is mainly based on the confrontation sample, and the influence of the internal change of the model on the confrontation sample is ignored. There is still a need to develop interpretable techniques of attack and defense to understand the cause of attack and the feasibility of defense to guide the testing of challenge samples.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for detecting a confrontation sample based on model weight variation and confidence degree distance, which comprises the steps of carrying out variation on a neural network model, calculating confidence degree distribution by observing output variation of a series of neural network models, training a binary classifier according to the difference of the confidence degree distance, and realizing high-precision and high-efficiency detection of the confrontation sample.
In order to realize the purpose, the technical scheme of the invention is as follows: the first aspect of the embodiments of the present invention provides a method for detecting a confrontation sample based on model weight variation and confidence distance, the method including the following steps:
1) Selecting an image data set, dividing the image data set into a training set and a test set according to classes, and performing single hot coding pretreatment on class tables of all pictures in the image data set;
2) Correspondingly inputting the image data set selected in the step 1) into a neural network model for training until the preset precision is reached;
3) Modifying the model weight of each neural network model trained in the step 2) to perform model weight variation to obtain a variation model, and screening the variation model according to classification accuracy;
4) Traversing the screened variation models of the benign samples in the training set and the corresponding confrontation samples, calculating confidence degree distances, and splicing to obtain a confrontation sample confidence degree distance matrix and a benign sample confidence degree distance matrix;
5) Constructing a binary classifier, splicing the confidence degree distance matrix of the confrontation sample and the confidence degree distance matrix of the benign sample to form a binary data set, and training and optimizing the binary classifier by using the binary data set until the preset accuracy is reached; and distinguishing the confrontation samples from the benign samples by using the optimized binary classifier.
A second aspect of the embodiments of the present invention provides a countermeasure sample detection apparatus based on model weight variation and confidence distance, including one or more processors, for the countermeasure sample detection method based on model weight variation and confidence distance.
A third aspect of an embodiment of the present invention provides a computer-readable storage medium having stored thereon a program for a confrontational sample detection method based on model weight variation and confidence distance as described above, when the program is executed by a processor.
Compared with the prior art, the invention has the following beneficial effects: the invention provides a method for detecting a confrontation sample with model weight variation and confidence degree distance, aiming at the problems of limited detection range, dependence on prior knowledge of confrontation attack and poor interpretability of the existing method for detecting the confrontation sample. The experimental result on the real deep learning model shows that the method has good applicability, can effectively detect unknown confrontation samples, has low spatial complexity and keeps the classification accuracy of benign samples.
Drawings
FIG. 1 is a block diagram of a method for detecting challenge samples based on model weight variance and confidence distance according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of an overall framework of a method for detecting a challenge sample based on model weight variation and confidence distance.
Fig. 3 is a schematic diagram of an apparatus for testing challenge samples based on model weight variance and confidence distance according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings. The features of the following examples and embodiments may be combined with each other without conflict.
Referring to fig. 1 to 2, a method for detecting a confrontation sample based on model weight variation and confidence distance includes the following steps:
1) Selecting an image data set, dividing the image data set into a training set and a testing set according to classes, and carrying out single hot code preprocessing on class lists of all pictures in the image data set, wherein the specific process is as follows:
1.1 Select an image dataset: the image dataset in the embodiment of the invention is selected from MNIST, CIFAR-10 and GTSRB datasets.
Where the MNIST dataset consists of 70,000 gray scale images of size 28 × 28 with handwritten numbers 0-9 in content. The data set included 60,000 training data and 10,000 test data. The CIFAR-10 data set consisted of 60,000 color images of size 32X 32, for a total of 10 classes of 6,000 images each, a training set of 50,000 and a test set of 10,000. The GTSRB data set includes over 50,000 48 × 48 traffic signal pictures, for a total of 43 classes. Saving image samples and corresponding class marks thereof, wherein the sample set is marked as X = { X = 1 ,x 2 ,…,x m And the class of each picture is marked as y.
1.2 The initial image dataset is classified into a training set and a test set by class, and a class table of all pictures in the initial image dataset is subjected to a one-hot encoding preprocessing.
Illustratively, for the GTSRB dataset, 80% of each class of GTSRB dataset is randomly drawn as a training set in this example, and the remaining pictures are taken as test sets. The class labels y of all data sets are one-hot coded (one-hot coded) to facilitate subsequent training.
2) Correspondingly inputting the image data set selected in the step 1 into a neural network model for training until reaching a preset precision, wherein the specific steps are as follows:
in this example, different neural network models need to be adopted for training different image datasets, and exemplarily, the MNIST dataset is input to the LeNet-5 model for training, the CIFAR-10 dataset is input to the VGG19 model for training, and the GTSRB dataset is input to the ResNet20 model for training. The input layer size of the classifier model is the same as the image size, and is [ H, W, C ], the output layer size is [ H multiplied by W multiplied by C,1], wherein H is the image height, W is the width, and C is the number of input channels.
The process of training the neural network model comprises the following steps: one Deep Neural network model (DNN) can be expressed as:
Figure BDA0003986037240000046
wherein
Figure BDA0003986037240000041
Represents the input of the deep neural network model, and Y ∈ Y represents the output of the deep neural network model. In the training process, the loss function calculation formula of the deep neural network model is as follows:
Figure BDA0003986037240000042
wherein the content of the first and second substances,
Figure BDA0003986037240000043
and expressing a loss function of the deep neural network model, wherein y is a real class mark of the sample x, theta is a model parameter, C is the total number of classes to which the sample belongs, and i represents an index value of the sample. After training is finished, if the classification precision of the deep neural network model reaches more than 95%, the model and the training parameters are saved, otherwise, the training is continued.
3) Modifying the model weight of each neural network model trained in the step 2) to perform model weight variation to obtain a variation model, and screening the variation model according to classification accuracy. The method comprises the following specific steps:
3.1 For a neural network model trained in step 2), a clean training set is given
Figure BDA0003986037240000044
The loss function of equation (1) can be rewritten as:
Figure BDA0003986037240000045
wherein the neural network model parameter θ satisfies
Figure BDA0003986037240000051
3.2 Define the model weight change δ, the calculation formula is as follows:
δ=-H -1 g+η (3)
where H is the Hessian matrix of the loss function of the model on the clean data set, i.e.
Figure BDA0003986037240000052
g is the first derivative of the loss function, i.e.
Figure BDA0003986037240000053
η (0, 1) is a random value that satisfies the standard normal distribution.
3.3 Carrying out model weight variation on the neural network model modification model weights trained in the step 2): taking a neural network model trained in the step 2) as a seed model M s To it proceed withn number of model weight variation operations, in this example n is taken to be 10, resulting in n number of variation models M m ={M 1 ,M 2 ,...,M n }。
The process of the weight variation of the model comprises the following steps: modifying the neural network model parameters into theta + delta and modifying the loss function into
Figure BDA0003986037240000054
In particular, in order to ensure the classification accuracy of the benign samples by the model, the change amount of the weight also needs to satisfy the following formula:
Figure BDA0003986037240000055
wherein | · | purple sweet 2 Which represents the two-norm of the vector,
Figure BDA0003986037240000056
represents the amount of change in the loss function, and o (1) represents the high order infinitesimal.
3.4 Screening the variant models according to classification accuracy: respectively calculating variation model M m ={M 1 ,M 2 ,...,M n The classification accuracy of the mutation model is rejected if the classification accuracy is lower than 90%. The variation models are arranged in a descending order according to the classification accuracy, top-k variation models are taken for subsequent use, wherein k is<n is used as the index. And if the accuracy of the variation model does not meet the requirement, continuing to perform variation on the model until k variation model sequences meeting the classification precision are obtained. In this example, k is 5.
4) Traversing the screened variation model of the benign sample and the corresponding challenge sample, calculating the confidence degree distance, and splicing to respectively obtain a challenge sample confidence degree distance matrix and a benign sample confidence degree distance matrix.
Wherein, the definition of the confidence coefficient distance is as follows:
Figure BDA0003986037240000057
wherein d (p) is the probability distance and v (p) is the probability variance. p' is p after descending order,
Figure BDA0003986037240000061
mean values in the p sequence are indicated.
X = { X ] for a given series of model inputs 1 ,x 2 An analysis and a series of variation models M m ={M 1 ,M 2 ,...,M k }. Inputting a sample into the jth variation model, wherein j is less than or equal to k, calculating the output confidence coefficient of the variation model, traversing all k variation models, calculating the confidence coefficient distance, and splicing the obtained values to obtain a confidence coefficient sequence
Figure BDA0003986037240000062
Where C represents the total number of categories.
Generating a countermeasure sample by using FGSM, and respectively inputting the initial benign sample x and the corresponding countermeasure sample into the screened variation model to respectively obtain a countermeasure sample confidence degree distance matrix and a benign sample confidence degree distance matrix.
5) Constructing a binary classifier, splicing the confidence coefficient distance matrix of the confrontation sample and the confidence coefficient distance matrix of the benign sample to be used as a binary data set, and training and optimizing the binary classifier by using the binary data set until the preset accuracy is reached; and distinguishing the confrontation samples from the benign samples by using the optimized binary classifier.
The step 5) specifically comprises the following substeps:
5.1 Build a binary classifier: because the confidence distance matrix is a simple two-dimensional matrix, classification can be realized only by a simple fully-connected network. Three fully connected layers (full _ connected layers) are stacked, the active function of the first two layers selects ReLU, and the active function of the last layer is softmax.
5.2 Mark the class label of the benign sample confidence degree distance matrix as 0, mark the class label of the confrontation sample confidence degree distance matrix as 1, and concatenate the confrontation sample confidence degree distance matrix and the benign sample confidence degree distance matrix to construct a binary data set, and divide the data set by 8. And then carrying out one-hot coding on the class mark.
In particular, for the MNIST dataset, 200 challenge samples and 100 benign samples were used. For the CIFAR-10 dataset, 200 benign samples and 200 FGSM challenge samples were used. For the GTSRB dataset, 400 FGSM challenge samples and 15 benign samples were used.
5.3 Training a binary classifier using the partitioned dataset, wherein the hyper-parameters of the binary classifier training are set as: the loss function selects the cross entropy and the optimizer selects SGD with a batch _ size of 5. For the CIFAR-10 dataset, epoch is set to 10; the epoch of the GTSRB dataset is 15; the epoch of MNIST is set to 5.
5.4 Test binary classifiers for optimization.
And inputting the confidence degree distance matrix of the test set into the trained binary classifier, and if the classification precision is insufficient, modifying the training epoch number or increasing the structure of a full connection layer in the binary classifier, and retraining the binary classifier again.
And inputting the decision score of the benign sample into a binary classifier, if the classification precision is insufficient, increasing the number of the benign samples in a training set, and retraining the binary classifier, so that the detector can accurately judge the resisting sample and ensure the classification precision of the benign sample.
5.5 ) use an optimized binary classifier to distinguish between competing and benign samples.
Corresponding to the foregoing embodiments of the method for detecting a challenge sample based on the model weight variation and the confidence distance, the present invention further provides embodiments of an apparatus for detecting a challenge sample based on the model weight variation and the confidence distance.
Referring to fig. 3, the apparatus for detecting challenge samples based on model weight variation and confidence distance according to the embodiment of the present invention includes one or more processors, which are configured to implement the method for detecting challenge samples based on model weight variation and confidence distance in the above embodiment.
The embodiment of the apparatus for testing a challenge sample based on the model weight variation and the confidence distance according to the present invention can be applied to any device with data processing capability, such as a computer or other devices. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. From a hardware aspect, as shown in fig. 3, a hardware structure diagram of an arbitrary device with data processing capability of a countermeasure sample detection apparatus based on model weight variation and confidence distance according to the present invention is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 3, in an embodiment, the arbitrary device with data processing capability of the apparatus may further include other hardware according to an actual function of the arbitrary device with data processing capability, which is not described again.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement without inventive effort.
An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the method for detecting a challenge sample based on a model weight variation and a confidence distance in the above embodiments is implemented.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium can be any device with data processing capability, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing capable device, and may also be used for temporarily storing data that has been output or is to be output.
The embodiments described in this specification are merely illustrative of implementation forms of the inventive concept, and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments, but also equivalent technical means that can be conceived by one skilled in the art based on the inventive concept.

Claims (10)

1. A method for detecting a confrontation sample based on model weight variation and confidence distance, the method comprising the steps of:
1) Selecting an image data set, dividing the image data set into a training set and a test set according to classes, and performing single hot coding pretreatment on class tables of all pictures in the image data set;
2) Correspondingly inputting the image data set selected in the step 1) into a neural network model for training until the preset precision is reached;
3) Modifying the model weight of each neural network model trained in the step 2) to perform model weight variation to obtain a variation model, and screening the variation model according to classification accuracy;
4) Traversing the screened variation models of the benign samples in the training set and the corresponding confrontation samples, calculating confidence degree distances, and splicing to obtain a confrontation sample confidence degree distance matrix and a benign sample confidence degree distance matrix;
5) Constructing a binary classifier, splicing the confidence degree distance matrix of the confrontation sample and the confidence degree distance matrix of the benign sample to form a binary data set, and training and optimizing the binary classifier by using the binary data set until the preset accuracy is reached; and distinguishing the confrontation samples from the benign samples by using the optimized binary classifier.
2. The method of claim 1, wherein the image dataset is selected from the group consisting of MNIST, CIFAR-10, GTSRB, etc.; the step 2) of training the image data set selected in the step 1 corresponding to the input neural network model comprises the following steps: inputting the MNIST data set into a LeNet-5 model for training, inputting the CIFAR-10 data set into a VGG19 model for training, and inputting the GTSRB data set into a ResNet20 model for training.
3. The method for detecting the confrontation sample based on the model weight variation and the confidence distance as claimed in claim 1, wherein the loss function of the neural network model in the step 2) is calculated as follows:
Figure FDA0003986037230000011
wherein the content of the first and second substances,
Figure FDA0003986037230000012
and expressing a loss function of the deep neural network model, wherein y is a real class mark of the sample x, theta is a model parameter, C is the total number of classes to which the sample belongs, and i represents an index value of the sample.
4. The method for detecting the confrontation sample based on the model weight variation and the confidence degree distance according to claim 1, wherein the process of performing the model weight variation on each neural network model modification model weight trained in the step 2) to obtain the variation model comprises:
modifying the neural network model parameter to theta + delta and modifying the loss function to theta + delta
Figure FDA0003986037230000013
Specifically, for a neural network model trained in the step 2), a clean training set is given
Figure FDA0003986037230000021
The loss function of the neural network model is changed to:
Figure FDA0003986037230000022
wherein the neural network model parameter θ satisfies
Figure FDA0003986037230000023
Defining the weight change quantity delta of the model as:
δ=-H -1 g+η
wherein H is a neural network model in a clean data set
Figure FDA0003986037230000024
The Hessian matrix of the upper loss function, i.e.
Figure FDA0003986037230000025
g being the first derivative of the loss function, i.e.
Figure FDA0003986037230000026
Eta to N (0, 1) are random values that satisfy a standard normal distribution.
Carrying out model weight variation on each neural network model modification model weight trained in the step 2): taking a neural network model trained in the step 2) as a seed model M s To it proceed withn times of model weight variation operation to obtain n variation models M m ={M 1 ,M 2 ,...,M n }。
5. The method as claimed in claim 4, wherein the change δ of the model weight is further satisfied by:
Figure FDA0003986037230000027
wherein | · | purple sweet 2 Which represents the two-norm of the vector,
Figure FDA0003986037230000028
represents the amount of change of the loss function, and o (1) represents that the higher order is infinitesimal.
6. The method for detecting the confrontation sample based on the model weight variation and the confidence distance according to claim 1, wherein the binary classifier constructed in the step 5) is a three-layer full-connection layer network, the activation functions of the first two layers are ReLU, and the activation function of the last layer is softmax.
7. The method for detecting the confrontation sample based on the model weight variation and the confidence coefficient distance as claimed in claim 1, wherein the process of splicing the confrontation sample confidence coefficient distance matrix and the benign sample confidence coefficient distance matrix in the step 5) as the binary data set comprises:
and marking the class label of the benign sample confidence coefficient distance matrix as 0, marking the class label of the countermeasure sample confidence coefficient distance matrix as 1, and splicing the countermeasure sample confidence coefficient distance matrix and the benign sample confidence coefficient distance matrix to construct a binary data set.
8. The method for detecting the confrontation sample based on the model weight variation and the confidence distance as claimed in claim 1, wherein the process of optimizing the binary classifier in the step 5) comprises:
inputting the confidence distance matrix into a trained binary classifier for testing, and if the classification precision does not reach the preset precision, modifying the training round or increasing the structure of a full connection layer in the binary classifier, and retraining the binary classifier;
and inputting the decision score of the benign sample into the binary classifier, and if the classification precision does not reach the preset precision, increasing the number of the benign samples in the training set and retraining the binary classifier.
9. A device for detecting challenge samples based on model weight variation and confidence distance, comprising one or more processors for implementing the method for detecting challenge samples based on model weight variation and confidence distance according to any one of claims 1 to 8.
10. A computer-readable storage medium having stored thereon a program, which when executed by a processor, is configured to implement the method for detecting a challenge sample based on model weight variation and confidence distance according to any one of claims 1 to 8.
CN202211565742.XA 2022-12-07 2022-12-07 Countermeasure sample detection method based on model weight variation and confidence degree distance Pending CN115797747A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211565742.XA CN115797747A (en) 2022-12-07 2022-12-07 Countermeasure sample detection method based on model weight variation and confidence degree distance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211565742.XA CN115797747A (en) 2022-12-07 2022-12-07 Countermeasure sample detection method based on model weight variation and confidence degree distance

Publications (1)

Publication Number Publication Date
CN115797747A true CN115797747A (en) 2023-03-14

Family

ID=85418921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211565742.XA Pending CN115797747A (en) 2022-12-07 2022-12-07 Countermeasure sample detection method based on model weight variation and confidence degree distance

Country Status (1)

Country Link
CN (1) CN115797747A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117494220A (en) * 2023-12-29 2024-02-02 武汉大学 Deep learning classification model privacy protection method and system based on model orthogonalization

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117494220A (en) * 2023-12-29 2024-02-02 武汉大学 Deep learning classification model privacy protection method and system based on model orthogonalization

Similar Documents

Publication Publication Date Title
Chakraborty et al. A survey on adversarial attacks and defences
Zhang et al. A survey on neural network interpretability
Yuan et al. Adversarial examples: Attacks and defenses for deep learning
CN106358444B (en) Method and system for face verification
Yeganejou et al. Interpretable deep convolutional fuzzy classifier
Zhu et al. Toward understanding and boosting adversarial transferability from a distribution perspective
CN111046673A (en) Countermeasure generation network for defending text malicious samples and training method thereof
Wu et al. GoDP: Globally Optimized Dual Pathway deep network architecture for facial landmark localization in-the-wild
Kabisha et al. Face and Hand Gesture Recognition Based Person Identification System using Convolutional Neural Network
CN115797747A (en) Countermeasure sample detection method based on model weight variation and confidence degree distance
CN111694954A (en) Image classification method and device and electronic equipment
Lu et al. Dance: Enhancing saliency maps using decoys
Ge et al. Contributions of shape, texture, and color in visual recognition
Yeganejou et al. Improved deep fuzzy clustering for accurate and interpretable classifiers
Sun et al. Open‐set iris recognition based on deep learning
Hui et al. FoolChecker: A platform to evaluate the robustness of images against adversarial attacks
CN108985382A (en) The confrontation sample testing method indicated based on critical data path
Wang et al. Interpret neural networks by extracting critical subnetworks
Lin et al. Towards interpretable ensemble learning for image-based malware detection
Pu et al. Differential residual learning for facial expression recognition
Yu et al. Two strategies to optimize the decisions in signature verification with the presence of spoofing attacks
Ding et al. Interpreting Universal Adversarial Example Attacks on Image Classification Models
Kwon et al. FriendNet backdoor: indentifying backdoor attack that is safe for friendly deep neural network
Liu et al. Adversarial examples generated from sample subspace
Wu et al. Few-shot malicious traffic classification based on Siamese Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination