LU505793B1 - Defensive method against interpretability camouflage samples in deep recognition neural networks - Google Patents

Defensive method against interpretability camouflage samples in deep recognition neural networks Download PDF

Info

Publication number
LU505793B1
LU505793B1 LU505793A LU505793A LU505793B1 LU 505793 B1 LU505793 B1 LU 505793B1 LU 505793 A LU505793 A LU 505793A LU 505793 A LU505793 A LU 505793A LU 505793 B1 LU505793 B1 LU 505793B1
Authority
LU
Luxembourg
Prior art keywords
samples
adversarial
model
interpretability
camouflage
Prior art date
Application number
LU505793A
Other languages
French (fr)
Inventor
Ming Kang
Xiangxing Tao
Original Assignee
Univ Zhejiang Sience & Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Zhejiang Sience & Technology filed Critical Univ Zhejiang Sience & Technology
Priority to LU505793A priority Critical patent/LU505793B1/en
Application granted granted Critical
Publication of LU505793B1 publication Critical patent/LU505793B1/en

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention discloses a defensive method against interpretability camouflage samples in deep recognition neural networks, comprising the following steps: Step 1, Constructing a Model: Building a deep neural network model for image classification. Step 2, Detecting Model: Detecting adversarial samples within the model and extracting effective adversarial sample models. The invention has improved the training stability, parameter amount, and computation amount of the model. A denoising network is proposed for denoising adversarial samples, which includes batch normalization processing, making the model easier to train. The introduction of depthwise separable convolution and the stepwise reduction in the number of channels in hidden layers significantly reduce the model's parameter amount and computation amount. In the adversarial sample detection module based on difference scoring, the distance between the softmax output of the original image and the denoised image is used as a difference score to measure the difference before and after denoising the input image. This, combined with the detection threshold obtained on the training dataset, is used for adversarial detection of the input image.

Description

Defensive Method Against Interpretability Camouflage Samples in
Deep Recognition Neural Networks
Technical Field
The present invention relates to the field of defense technologies, specifically to a defensive method against interpretability camouflage samples in deep recognition neural networks.
Background Technology
Deep learning, a machine learning method, enables computers to learn from experience and data without explicit programming. It extracts useful patterns from raw data. Traditional machine learning algorithms struggle to extract features with good representation due to limitations like the curse of dimensionality, computational bottlenecks, and the need for domain-specific expertise. Deep learning addresses these issues by constructing multiple simple features to represent a complex concept.
For instance, image classification systems based on deep learning represent objects by describing edges, constructions, and structures in hidden layers. With increasing training data and hardware acceleration of computing time, deep learning models have solved many complex problems. Neural network layers consist of a set of perceptrons, artificial neurons, each using an activation function to map a set of inputs to an output value. Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs) are the most widely used in recent neural network architectures.
CNNs deploy convolution operations in hidden layers to share weights and reduce the number of parameters. They extract local information from grid-like input data and have achieved significant success in computer vision tasks such as image classification, object detection, and semantic segmentation. RNNs, designed to process variable- length sequential input data, generate outputs at each time step, with hidden neurons computed based on current input and the previous time step's hidden neurons. Long
Short-Term Memory and Gated Recurrent Units with controllable gates aim to avoid gradient vanishing or exploding in RNNs' long-term dependencies. Deep neural 17905783 network architectures like LeNet, VGG, AlexNet, GoogLeNet (Inception V1-V4), and
ResNet have been extensively used in computer vision tasks. In the ImageNet 2012 challenge, AlexNet was the first to demonstrate that deep learning models could significantly outperform traditional machine learning algorithms, leading to future research in deep learning. These architectures have achieved great breakthroughs in the ImageNet challenge and are milestones in image classification. Attackers often generate adversarial samples against these benchmark architectures. Standard deep learning datasets : MNIST, CIFAR-10, and ImageNet are three datasets widely used in computer vision tasks, that MNIST for handwritten digit recognition, CIFAR-10, and
ImageNet for image recognition tasks. CIFAR-10 consists of 60,000 tiny color images of 32 x 32 pixels, divided into ten categories. The ImageNet dataset contains 14,197,122 images and 1,000 categories. Due to the vast number of images in ImageNet, most adversarial methods are evaluated only on a portion of the ImageNet dataset. The
Street View House Numbers dataset, similar to MNIST, consists of digits extracted from real-world house numbers in Google Street View images. The YoutubeDataset, sourced from Youtube, contains about ten million images.
In various domains of machine learning, deep learning has made significant progress, such as in image classification, object recognition, target detection, speech recognition, language translation, and speech synthesis. Driven by big data and hardware acceleration, deep learning requires fewer artificial features and expertise.
Complex data can be represented at higher and more abstract levels, extracted from raw input features. An increasing number of applications and systems are supported by deep learning. Companies from the IT to automotive industries, like Google, Tesla,
Mercedes, and Uber, are testing self-driving cars, requiring extensive deep learning technologies such as object recognition, reinforcement learning, and multimodal learning. Facial recognition systems, a biometric authentication method, have been deployed in ATMs. Apple offers facial authentication to unlock phones. Malware detection and anomaly detection solutions based on behavior are also built on deep learning, discovering semantic features. As deep learning progresses rapidly and 17905783 achieves significant success in various applications, it is increasingly implemented in many security-critical environments. However, deep neural networks are susceptible to carefully designed input samples known as adversarial samples. These are imperceptible to humans but can easily deceive deep neural networks during testing and deployment phases, making adversarial samples one of the primary risks in applying deep neural networks in security-critical environments. Hence, attacks and defenses against adversarial samples have garnered great attention.
However, traditional defense methods have the following drawbacks:
Defenses against adversarial attacks depend on the target model's parameters.
White-box defense strategies change the gradient transmission process of the target model, while black-box attacks use substitute models to construct adversarial samples.
The inherent transferability property of these makes them generalize well in black-box attacks, rendering the white-box defense strategies used by models ineffective.
Contents of the Invention
The purpose of the present invention is to provide a defensive method against interpretability camouflage samples in deep recognition neural networks, addressing the dependency issue on target model parameters in defense against adversarial attacks as mentioned in the background technology. The model's white-box defense strategy changes the target model's gradient transmission process, while black-box attacks use substitute models to construct adversarial samples. The inherent transferability property of these samples allows them to generalize well in black-box attacks, making the white-box defense strategies used by models ineffective.
To achieve the above purpose, the present invention provides the following technical solution: A defensive method against interpretability camouflage samples in deep recognition neural networks, including the following steps:
Step 1, Constructing a Model: Building a deep neural network model for image classification. 17905793
Step 2, Detecting Model: Detecting adversarial samples in the model and extracting effective adversarial sample models.
Step 3, Sample Preprocessing: Preprocessing adversarial sample images in the model.
Step 4, Sample Detection: Comparing and detecting preprocessed adversarial samples with original samples.
Step 5, First Defense Simulation: Conducting the first defense simulation using attack experience.
Step 6, Second Defense Simulation: Abstracting the attack. In practice, the attack method is considered an abstract operation with range constraints. Defenders only need to ensure the model remains correct within the range of the abstract operation, thereby completing the second defense simulation.
Step 7, Effect Verification: Deploying the defense model in the deep recognition neural network for simulated attacks and validation.
As a preferred embodiment of the present invention, the preprocessing in Step 3 specifically involves compressing features of input samples to mitigate disturbances, predicting the model's output for samples before and after compression, and identifying adversarial samples based on the differences in predictions before and after compression.
As a preferred embodiment of the present invention, the preprocessing in Step 3 specifically involves selecting sample classification labels that need protection, training trapdoor embeddings corresponding to these protected labels into the model, and identifying adversarial samples based on the activation state of neurons from the input samples.
As a preferred embodiment of the present invention, the preprocessing in Step 3 specifically involves input denoising and feature denoising. Input denoising, during the testing phase of the model, entails processing the input data to eliminate part or all of 17905783 the adversarial perturbations. Feature denoising aims to mitigate the impact of adversarial interference on the high-level features learned by the Deep Neural
Network (DNN). 5
As a preferred embodiment of the present invention, the comparative detection in Step 4 specifically utilizes the different numerical features of adversarial samples and original samples, such as the shape of the probability distribution obtained after the samples pass through the network. By checking if the input conforms to the distribution of normal samples, it is determined whether the input has adversarial properties. For example, the softmax output vector of a classification network reflects the numerical feature distribution of the sample well, and in many cases, there is a significant difference between the softmax output vectors of adversarial and normal samples. The softmax output vector of normal samples tends to be more dispersed, i.e., farther from a uniform distribution, with the highest probability value typically being much larger and standing out in a specific category. This is because adversarial samples focus only on the highest probability value, ignoring the output probabilities of other classes. Therefore, by detecting the dispersion level of the sample's softmax output vector, it can be determined whether the sample has adversarial properties.
For adversarial attacks that aim to stop perturbations at the decision boundary, i.e., with the smallest possible perturbation, distinguishing adversarial samples through the difference in softmax distribution is very effective.
As a preferred embodiment of the present invention, the comparative detection in Step 4 specifically involves using the output of the middle part of the Deep Neural
Network as the input for the detector to detect adversarial samples. For example, the original input will produce an output after passing through a convolutional layer, and then another output after entering a Resnet architecture. All these outputs are utilized, meaning a detector is trained for each output. Each detector is a binary network that outputs the probability that the sample is an adversarial sample. Specifically, normal samples and adversarial samples are input into the original classification network, and 17905783 the mid-output of the original classification network at the interface with the adversarial detection network is obtained: normal sample data (x1,0) and adversarial sample data (x2,1). These two types of data with different labels are collected as the training dataset for the adversarial detection network at this interface.
As a preferred embodiment of the present invention, the first defense simulation in Step 5 specifically involves disrupting existing attack methods as a precondition.
Facing a myriad of new attacks, they are easily breached. For example, if the attack method requires the model's complete output, then truncate the output. If the attack method needs to find the minimum value of the gradient, then set some traps on the gradient that appear to be minimum values.
As a preferred embodiment of the present invention, the attack experience in
Step 5 specifically involves training a model typically with normal samples. Therefore, to make the model more robust, adversarial samples are mainly generated during the model's training phase and included in the training to train the neural network, achieving the purpose of defending against adversarial samples. The generated adversarial samples are added to the training set for data augmentation, allowing the model to learn about adversarial samples during training.
Compared to existing technologies, the beneficial effects of the present invention are: 1. Improvements to the model in terms of training stability, parameter amount, and computation amount have been made. A denoising network for denoising adversarial samples has been proposed, which includes batch normalization processing, making the model easier to train. 2. The introduction of depthwise separable convolution and the stepwise reduction of the number of channels in hidden layers significantly reduce the model's parameter amount and computation amount. In the adversarial sample detection module based on difference scoring, the distance between the softmax output of the 17905783 original image and the denoised image is used as a difference score to measure the difference before and after denoising the input image. This, combined with the detection threshold obtained on the training dataset, is used for adversarial detection of the input image. 3. The adversarial sample detection scheme has been implemented and experimentally validated. First, the image denoising module was tested, followed by adversarial sample detection experiments. This detection scheme can effectively distinguish between regular samples and adversarial samples, showing good performance in detection accuracy and F1 score.
Description of the Drawing
FIG.1: flowchart of the present invention.
Specific Embodiments
The following detailed and complete description of the technical solutions in the embodiments of the present invention is provided in conjunction with the embodiments of the invention. It is clear that the described embodiments are only part of the embodiments of the present invention and not all embodiments. Based on the embodiments in the present invention, all other embodiments obtained by those skilled in the art without creative efforts fall within the protection scope of the present invention.
Please refer to FIG.1. The present invention provides a defensive method against interpretability camouflage samples in deep recognition neural networks, including the following steps:
Step 1, Constructing a Model: Building a deep neural network model for image classification.
Step 2, Detecting Model: Detecting adversarial samples within the model and extracting effective adversarial sample models.
Step 3, Sample Preprocessing: Preprocessing adversarial sample images within 17905783 the model.
Step 4, Sample Detection: Comparing and detecting preprocessed adversarial samples with original samples.
Step 5, First Defense Simulation: Conducting the first defense simulation using attack experience.
Step 6, Second Defense Simulation: Abstracting the attack as a range-constrained abstract operation in practical operations. Defenders only need to ensure the model remains correct within the range of the abstract operation, thereby completing the second defense simulation.
Step 7, Effect Verification: Deploying the defense model within the deep recognition neural network for simulated attacks and validation.
The preprocessing in Step 3 specifically involves feature compression to alleviate disturbances in input samples, predicting for the model with samples before and after compression, and identifying adversarial samples based on the differences in prediction results.
In another aspect of Step 3, preprocessing involves selecting sample classification labels that need protection, training trapdoor embeddings corresponding to these protected labels into the model, and identifying adversarial samples from input samples based on the activation state of neurons.
Preprocessing in Step 3 also specifically includes input denoising and feature denoising. Input denoising, during the testing phase of the model, involves processing the input data to eliminate part or all of the adversarial perturbations. Feature denoising attempts to reduce the impact of adversarial interference on the high-level features learned by the DNN.
In Step 4, the comparative detection specifically utilizes the different numerical features of adversarial samples and original samples, such as the shape of the 17905783 probability distribution obtained after the samples pass through the network. By checking if the input conforms to the distribution of normal samples, it is determined whether the input has adversarial properties. For example, the softmax output vector of a classification network reflects the numerical feature distribution of the sample well, and in many cases, there is a significant difference between the softmax output vectors of adversarial and normal samples. The softmax output vector of normal samples tends to be more dispersed, i.e., farther from a uniform distribution, with the highest probability value typically being much larger and standing out in a specific category. This is because adversarial samples focus only on the highest probability value, ignoring the output probabilities of other classes. Therefore, by detecting the dispersion level of the sample's softmax output vector, it can be determined whether the sample has adversarial properties. For adversarial attacks that aim to stop perturbations at the decision boundary, i.e., with the smallest possible perturbation, distinguishing adversarial samples through the difference in softmax distribution is very effective.
Comparative detection in Step 4 also involves using the output of the middle part of the Deep Neural Network as the input for the detector to detect adversarial samples.
For example, the original input will produce an output after passing through a convolutional layer, and then another output after entering a Resnet architecture. All these outputs are utilized, meaning a detector is trained for each output. Each detector is a binary network that outputs the probability that the sample is an adversarial sample. Specifically, normal samples and adversarial samples are input into the original classification network, and the mid-output of the original classification network at the interface with the adversarial detection network is obtained: normal sample data (x1,0) and adversarial sample data (x2,1). These two types of data with different labels are collected as the training dataset for the adversarial detection network at this interface.
The first defense simulation in Step 5 specifically involves disrupting existing attack methods as a precondition. For example, truncating the model's output if the 17905783 attack method requires it, or setting traps on the gradient that appear to be minimum values.
The attack experience in Step 5 specifically involves training a model typically with normal samples. Therefore, to make the model more robust, adversarial samples are mainly generated during the model's training phase and included in the training to train the neural network, achieving the purpose of defending against adversarial samples. The generated adversarial samples are added to the training set for data augmentation, allowing the model to learn about adversarial samples during training.
In the present invention, a deep neural network model for image classification is constructed; adversarial samples within the model are detected and effective models of adversarial samples are extracted; adversarial sample images within the model are preprocessed; preprocessed adversarial samples are compared and detected against original samples; the first defense simulation is conducted using attack experience; the attack is abstracted, in practical operations, the method of attack is considered as a range-constrained abstract operation, and defenders only need to ensure the model is maintained correctly within the range of abstract operations for the second defense simulation; the defense model is deployed in the deep recognition neural network for simulated attacks and validation.
Despite the detailed description of the present invention with reference to the aforementioned embodiments, it is still possible for those skilled in the art to modify the technical solutions recorded in the aforementioned embodiments or to equivalently replace some technical features. Any modification, equivalent replacement, improvement, etc, made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims (8)

Claims
1. A defensive method against interpretability camouflage samples in deep recognition neural networks, characterized by including the following steps: Step 1, Constructing a Model: Building a deep neural network model for image classification. Step 2, Detecting Model: Detecting adversarial samples within the model and extracting effective adversarial sample models. Step 3, Sample Preprocessing: Preprocessing adversarial sample images within the model. Step 4, Sample Detection: Comparing and detecting preprocessed adversarial samples with original samples. Step 5, First Defense Simulation: Conducting the first defense simulation using attack experience. Step 6, Second Defense Simulation: Abstracting the attack as a range-constrained abstract operation in practical operations. Defenders only need to ensure the model remains correct within the range of the abstract operation, thereby completing the second defense simulation. Step 7, Effect Verification: Deploying the defense model within the deep recognition neural network for simulated attacks and validation.
2. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the preprocessing in step 3 specifically involves feature compression to alleviate disturbances in input samples, predicting for the model with samples before and after compression, and identifying adversarial samples based on the differences in prediction results.
3. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the preprocessing in step 3 specifically involves selecting sample classification labels that 17905783 need protection, training trapdoor embeddings corresponding to these protected labels into the model, and identifying adversarial samples from input samples based on the activation state of neurons.
4. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the preprocessing in step 3 specifically includes input denoising and feature denoising. Input denoising, during the testing phase of the model, involves processing the input data to eliminate part or all of the adversarial perturbations. Feature denoising attempts to reduce the impact of adversarial interference on the high-level features learned by the DNN.
5. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the comparative detection in step 4 specifically utilizes the different numerical features of adversarial samples and original samples, such as the shape of the probability distribution obtained after the samples pass through the network, to determine the adversarial nature of the input.
6. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the comparative detection in step 4 specifically involves using the output of the middle part of the Deep Neural Network as the input for the detector to detect adversarial samples.
7. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the first defense simulation in step 5 specifically involves disrupting existing attack methods as a precondition, to address the vulnerability to emerging new-style attacks.
8. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the attack experience in step 5 specifically involves training a model typically with normal samples.
Therefore, to make the model more robust, adversarial samples are mainly generated during the model's training phase and included in the training to train the neural network, achieving the purpose of defending against adversarial samples.
The generated adversarial samples are added to the training set for data augmentation,
allowing the model to learn about adversarial samples during training.
LU505793A 2023-12-14 2023-12-14 Defensive method against interpretability camouflage samples in deep recognition neural networks LU505793B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
LU505793A LU505793B1 (en) 2023-12-14 2023-12-14 Defensive method against interpretability camouflage samples in deep recognition neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
LU505793A LU505793B1 (en) 2023-12-14 2023-12-14 Defensive method against interpretability camouflage samples in deep recognition neural networks

Publications (1)

Publication Number Publication Date
LU505793B1 true LU505793B1 (en) 2024-06-14

Family

ID=91539649

Family Applications (1)

Application Number Title Priority Date Filing Date
LU505793A LU505793B1 (en) 2023-12-14 2023-12-14 Defensive method against interpretability camouflage samples in deep recognition neural networks

Country Status (1)

Country Link
LU (1) LU505793B1 (en)

Similar Documents

Publication Publication Date Title
Zhong et al. Backdoor embedding in convolutional neural network models via invisible perturbation
Yuan et al. Adversarial examples: Attacks and defenses for deep learning
Aldahdooh et al. Adversarial example detection for DNN models: A review and experimental comparison
Vasan et al. Image-Based malware classification using ensemble of CNN architectures (IMCEC)
Ma et al. Nic: Detecting adversarial samples with neural network invariant checking
CN110135157B (en) Malicious software homology analysis method and system, electronic device and storage medium
US11475130B2 (en) Detection of test-time evasion attacks
Kaviani et al. Defense against neural trojan attacks: A survey
CN113297572B (en) Deep learning sample-level anti-attack defense method and device based on neuron activation mode
CN111062036A (en) Malicious software identification model construction method, malicious software identification medium and malicious software identification equipment
Park et al. Host-based intrusion detection model using siamese network
CN111753290A (en) Software type detection method and related equipment
Chang et al. Evaluating robustness of ai models against adversarial attacks
Barros et al. Malware‐SMELL: A zero‐shot learning strategy for detecting zero‐day vulnerabilities
Agarwal et al. A-iLearn: An adaptive incremental learning model for spoof fingerprint detection
CN117454187B (en) Integrated model training method based on frequency domain limiting target attack
Zanddizari et al. Generating black-box adversarial examples in sparse domain
Wang et al. Attention‐guided black‐box adversarial attacks with large‐scale multiobjective evolutionary optimization
Zhou et al. Explaining generalization power of a dnn using interactive concepts
Abady et al. A siamese-based verification system for open-set architecture attribution of synthetic images
LU505793B1 (en) Defensive method against interpretability camouflage samples in deep recognition neural networks
Visaggio et al. A comparative study of adversarial attacks to malware detectors based on deep learning
Pérez-Bravo et al. Encoding generative adversarial networks for defense against image classification attacks
CN113259369A (en) Data set authentication method and system based on machine learning member inference attack
Asha et al. Evaluation of adversarial machine learning tools for securing AI systems