LU505793B1

LU505793B1 - Defensive method against interpretability camouflage samples in deep recognition neural networks

Info

Publication number: LU505793B1
Application number: LU505793A
Authority: LU
Inventors: Ming Kang; Xiangxing Tao
Original assignee: Univ Zhejiang Sience & Technology
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-06-14

Abstract

The present invention discloses a defensive method against interpretability camouflage samples in deep recognition neural networks, comprising the following steps: Step 1, Constructing a Model: Building a deep neural network model for image classification. Step 2, Detecting Model: Detecting adversarial samples within the model and extracting effective adversarial sample models. The invention has improved the training stability, parameter amount, and computation amount of the model. A denoising network is proposed for denoising adversarial samples, which includes batch normalization processing, making the model easier to train. The introduction of depthwise separable convolution and the stepwise reduction in the number of channels in hidden layers significantly reduce the model's parameter amount and computation amount. In the adversarial sample detection module based on difference scoring, the distance between the softmax output of the original image and the denoised image is used as a difference score to measure the difference before and after denoising the input image. This, combined with the detection threshold obtained on the training dataset, is used for adversarial detection of the input image.

Description

Defensive Method Against Interpretability Camouflage Samples in

Deep Recognition Neural Networks

Technical Field

The present invention relates to the field of defense technologies, specifically to a defensive method against interpretability camouflage samples in deep recognition neural networks.

Background Technology

Deep learning, a machine learning method, enables computers to learn from experience and data without explicit programming. It extracts useful patterns from raw data. Traditional machine learning algorithms struggle to extract features with good representation due to limitations like the curse of dimensionality, computational bottlenecks, and the need for domain-specific expertise. Deep learning addresses these issues by constructing multiple simple features to represent a complex concept.

For instance, image classification systems based on deep learning represent objects by describing edges, constructions, and structures in hidden layers. With increasing training data and hardware acceleration of computing time, deep learning models have solved many complex problems. Neural network layers consist of a set of perceptrons, artificial neurons, each using an activation function to map a set of inputs to an output value. Convolutional Neural Networks (CNNs) and Recurrent Neural

Networks (RNNs) are the most widely used in recent neural network architectures.

CNNs deploy convolution operations in hidden layers to share weights and reduce the number of parameters. They extract local information from grid-like input data and have achieved significant success in computer vision tasks such as image classification, object detection, and semantic segmentation. RNNs, designed to process variable- length sequential input data, generate outputs at each time step, with hidden neurons computed based on current input and the previous time step's hidden neurons. Long

Short-Term Memory and Gated Recurrent Units with controllable gates aim to avoid gradient vanishing or exploding in RNNs' long-term dependencies. Deep neural 17905783 network architectures like LeNet, VGG, AlexNet, GoogLeNet (Inception V1-V4), and

ResNet have been extensively used in computer vision tasks. In the ImageNet 2012 challenge, AlexNet was the first to demonstrate that deep learning models could significantly outperform traditional machine learning algorithms, leading to future research in deep learning. These architectures have achieved great breakthroughs in the ImageNet challenge and are milestones in image classification. Attackers often generate adversarial samples against these benchmark architectures. Standard deep learning datasets : MNIST, CIFAR-10, and ImageNet are three datasets widely used in computer vision tasks, that MNIST for handwritten digit recognition, CIFAR-10, and

ImageNet for image recognition tasks. CIFAR-10 consists of 60,000 tiny color images of 32 x 32 pixels, divided into ten categories. The ImageNet dataset contains 14,197,122 images and 1,000 categories. Due to the vast number of images in ImageNet, most adversarial methods are evaluated only on a portion of the ImageNet dataset. The

Street View House Numbers dataset, similar to MNIST, consists of digits extracted from real-world house numbers in Google Street View images. The YoutubeDataset, sourced from Youtube, contains about ten million images.

In various domains of machine learning, deep learning has made significant progress, such as in image classification, object recognition, target detection, speech recognition, language translation, and speech synthesis. Driven by big data and hardware acceleration, deep learning requires fewer artificial features and expertise.

Complex data can be represented at higher and more abstract levels, extracted from raw input features. An increasing number of applications and systems are supported by deep learning. Companies from the IT to automotive industries, like Google, Tesla,

Mercedes, and Uber, are testing self-driving cars, requiring extensive deep learning technologies such as object recognition, reinforcement learning, and multimodal learning. Facial recognition systems, a biometric authentication method, have been deployed in ATMs. Apple offers facial authentication to unlock phones. Malware detection and anomaly detection solutions based on behavior are also built on deep learning, discovering semantic features. As deep learning progresses rapidly and 17905783 achieves significant success in various applications, it is increasingly implemented in many security-critical environments. However, deep neural networks are susceptible to carefully designed input samples known as adversarial samples. These are imperceptible to humans but can easily deceive deep neural networks during testing and deployment phases, making adversarial samples one of the primary risks in applying deep neural networks in security-critical environments. Hence, attacks and defenses against adversarial samples have garnered great attention.

However, traditional defense methods have the following drawbacks:

Defenses against adversarial attacks depend on the target model's parameters.

White-box defense strategies change the gradient transmission process of the target model, while black-box attacks use substitute models to construct adversarial samples.

The inherent transferability property of these makes them generalize well in black-box attacks, rendering the white-box defense strategies used by models ineffective.

Contents of the Invention

The purpose of the present invention is to provide a defensive method against interpretability camouflage samples in deep recognition neural networks, addressing the dependency issue on target model parameters in defense against adversarial attacks as mentioned in the background technology. The model's white-box defense strategy changes the target model's gradient transmission process, while black-box attacks use substitute models to construct adversarial samples. The inherent transferability property of these samples allows them to generalize well in black-box attacks, making the white-box defense strategies used by models ineffective.

To achieve the above purpose, the present invention provides the following technical solution: A defensive method against interpretability camouflage samples in deep recognition neural networks, including the following steps:

Step 1, Constructing a Model: Building a deep neural network model for image classification. 17905793

Step 2, Detecting Model: Detecting adversarial samples in the model and extracting effective adversarial sample models.

Step 3, Sample Preprocessing: Preprocessing adversarial sample images in the model.

Step 4, Sample Detection: Comparing and detecting preprocessed adversarial samples with original samples.

Step 5, First Defense Simulation: Conducting the first defense simulation using attack experience.

Step 6, Second Defense Simulation: Abstracting the attack. In practice, the attack method is considered an abstract operation with range constraints. Defenders only need to ensure the model remains correct within the range of the abstract operation, thereby completing the second defense simulation.

Step 7, Effect Verification: Deploying the defense model in the deep recognition neural network for simulated attacks and validation.

As a preferred embodiment of the present invention, the preprocessing in Step 3 specifically involves compressing features of input samples to mitigate disturbances, predicting the model's output for samples before and after compression, and identifying adversarial samples based on the differences in predictions before and after compression.

As a preferred embodiment of the present invention, the preprocessing in Step 3 specifically involves selecting sample classification labels that need protection, training trapdoor embeddings corresponding to these protected labels into the model, and identifying adversarial samples based on the activation state of neurons from the input samples.

As a preferred embodiment of the present invention, the preprocessing in Step 3 specifically involves input denoising and feature denoising. Input denoising, during the testing phase of the model, entails processing the input data to eliminate part or all of 17905783 the adversarial perturbations. Feature denoising aims to mitigate the impact of adversarial interference on the high-level features learned by the Deep Neural

Network (DNN). 5

As a preferred embodiment of the present invention, the comparative detection in Step 4 specifically utilizes the different numerical features of adversarial samples and original samples, such as the shape of the probability distribution obtained after the samples pass through the network. By checking if the input conforms to the distribution of normal samples, it is determined whether the input has adversarial properties. For example, the softmax output vector of a classification network reflects the numerical feature distribution of the sample well, and in many cases, there is a significant difference between the softmax output vectors of adversarial and normal samples. The softmax output vector of normal samples tends to be more dispersed, i.e., farther from a uniform distribution, with the highest probability value typically being much larger and standing out in a specific category. This is because adversarial samples focus only on the highest probability value, ignoring the output probabilities of other classes. Therefore, by detecting the dispersion level of the sample's softmax output vector, it can be determined whether the sample has adversarial properties.

For adversarial attacks that aim to stop perturbations at the decision boundary, i.e., with the smallest possible perturbation, distinguishing adversarial samples through the difference in softmax distribution is very effective.

As a preferred embodiment of the present invention, the comparative detection in Step 4 specifically involves using the output of the middle part of the Deep Neural

Network as the input for the detector to detect adversarial samples. For example, the original input will produce an output after passing through a convolutional layer, and then another output after entering a Resnet architecture. All these outputs are utilized, meaning a detector is trained for each output. Each detector is a binary network that outputs the probability that the sample is an adversarial sample. Specifically, normal samples and adversarial samples are input into the original classification network, and 17905783 the mid-output of the original classification network at the interface with the adversarial detection network is obtained: normal sample data (x1,0) and adversarial sample data (x2,1). These two types of data with different labels are collected as the training dataset for the adversarial detection network at this interface.

As a preferred embodiment of the present invention, the first defense simulation in Step 5 specifically involves disrupting existing attack methods as a precondition.

Facing a myriad of new attacks, they are easily breached. For example, if the attack method requires the model's complete output, then truncate the output. If the attack method needs to find the minimum value of the gradient, then set some traps on the gradient that appear to be minimum values.

As a preferred embodiment of the present invention, the attack experience in

Step 5 specifically involves training a model typically with normal samples. Therefore, to make the model more robust, adversarial samples are mainly generated during the model's training phase and included in the training to train the neural network, achieving the purpose of defending against adversarial samples. The generated adversarial samples are added to the training set for data augmentation, allowing the model to learn about adversarial samples during training.

Compared to existing technologies, the beneficial effects of the present invention are: 1. Improvements to the model in terms of training stability, parameter amount, and computation amount have been made. A denoising network for denoising adversarial samples has been proposed, which includes batch normalization processing, making the model easier to train. 2. The introduction of depthwise separable convolution and the stepwise reduction of the number of channels in hidden layers significantly reduce the model's parameter amount and computation amount. In the adversarial sample detection module based on difference scoring, the distance between the softmax output of the 17905783 original image and the denoised image is used as a difference score to measure the difference before and after denoising the input image. This, combined with the detection threshold obtained on the training dataset, is used for adversarial detection of the input image. 3. The adversarial sample detection scheme has been implemented and experimentally validated. First, the image denoising module was tested, followed by adversarial sample detection experiments. This detection scheme can effectively distinguish between regular samples and adversarial samples, showing good performance in detection accuracy and F1 score.

Description of the Drawing

FIG.1: flowchart of the present invention.

Specific Embodiments

The following detailed and complete description of the technical solutions in the embodiments of the present invention is provided in conjunction with the embodiments of the invention. It is clear that the described embodiments are only part of the embodiments of the present invention and not all embodiments. Based on the embodiments in the present invention, all other embodiments obtained by those skilled in the art without creative efforts fall within the protection scope of the present invention.

Please refer to FIG.1. The present invention provides a defensive method against interpretability camouflage samples in deep recognition neural networks, including the following steps:

Step 1, Constructing a Model: Building a deep neural network model for image classification.

Step 2, Detecting Model: Detecting adversarial samples within the model and extracting effective adversarial sample models.

Step 3, Sample Preprocessing: Preprocessing adversarial sample images within 17905783 the model.

Step 6, Second Defense Simulation: Abstracting the attack as a range-constrained abstract operation in practical operations. Defenders only need to ensure the model remains correct within the range of the abstract operation, thereby completing the second defense simulation.

Step 7, Effect Verification: Deploying the defense model within the deep recognition neural network for simulated attacks and validation.

The preprocessing in Step 3 specifically involves feature compression to alleviate disturbances in input samples, predicting for the model with samples before and after compression, and identifying adversarial samples based on the differences in prediction results.

In another aspect of Step 3, preprocessing involves selecting sample classification labels that need protection, training trapdoor embeddings corresponding to these protected labels into the model, and identifying adversarial samples from input samples based on the activation state of neurons.

Preprocessing in Step 3 also specifically includes input denoising and feature denoising. Input denoising, during the testing phase of the model, involves processing the input data to eliminate part or all of the adversarial perturbations. Feature denoising attempts to reduce the impact of adversarial interference on the high-level features learned by the DNN.

In Step 4, the comparative detection specifically utilizes the different numerical features of adversarial samples and original samples, such as the shape of the 17905783 probability distribution obtained after the samples pass through the network. By checking if the input conforms to the distribution of normal samples, it is determined whether the input has adversarial properties. For example, the softmax output vector of a classification network reflects the numerical feature distribution of the sample well, and in many cases, there is a significant difference between the softmax output vectors of adversarial and normal samples. The softmax output vector of normal samples tends to be more dispersed, i.e., farther from a uniform distribution, with the highest probability value typically being much larger and standing out in a specific category. This is because adversarial samples focus only on the highest probability value, ignoring the output probabilities of other classes. Therefore, by detecting the dispersion level of the sample's softmax output vector, it can be determined whether the sample has adversarial properties. For adversarial attacks that aim to stop perturbations at the decision boundary, i.e., with the smallest possible perturbation, distinguishing adversarial samples through the difference in softmax distribution is very effective.

Comparative detection in Step 4 also involves using the output of the middle part of the Deep Neural Network as the input for the detector to detect adversarial samples.

For example, the original input will produce an output after passing through a convolutional layer, and then another output after entering a Resnet architecture. All these outputs are utilized, meaning a detector is trained for each output. Each detector is a binary network that outputs the probability that the sample is an adversarial sample. Specifically, normal samples and adversarial samples are input into the original classification network, and the mid-output of the original classification network at the interface with the adversarial detection network is obtained: normal sample data (x1,0) and adversarial sample data (x2,1). These two types of data with different labels are collected as the training dataset for the adversarial detection network at this interface.

The first defense simulation in Step 5 specifically involves disrupting existing attack methods as a precondition. For example, truncating the model's output if the 17905783 attack method requires it, or setting traps on the gradient that appear to be minimum values.

The attack experience in Step 5 specifically involves training a model typically with normal samples. Therefore, to make the model more robust, adversarial samples are mainly generated during the model's training phase and included in the training to train the neural network, achieving the purpose of defending against adversarial samples. The generated adversarial samples are added to the training set for data augmentation, allowing the model to learn about adversarial samples during training.

In the present invention, a deep neural network model for image classification is constructed; adversarial samples within the model are detected and effective models of adversarial samples are extracted; adversarial sample images within the model are preprocessed; preprocessed adversarial samples are compared and detected against original samples; the first defense simulation is conducted using attack experience; the attack is abstracted, in practical operations, the method of attack is considered as a range-constrained abstract operation, and defenders only need to ensure the model is maintained correctly within the range of abstract operations for the second defense simulation; the defense model is deployed in the deep recognition neural network for simulated attacks and validation.

Despite the detailed description of the present invention with reference to the aforementioned embodiments, it is still possible for those skilled in the art to modify the technical solutions recorded in the aforementioned embodiments or to equivalently replace some technical features. Any modification, equivalent replacement, improvement, etc, made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A defensive method against interpretability camouflage samples in deep recognition neural networks, characterized by including the following steps: Step 1, Constructing a Model: Building a deep neural network model for image classification. Step 2, Detecting Model: Detecting adversarial samples within the model and extracting effective adversarial sample models. Step 3, Sample Preprocessing: Preprocessing adversarial sample images within the model. Step 4, Sample Detection: Comparing and detecting preprocessed adversarial samples with original samples. Step 5, First Defense Simulation: Conducting the first defense simulation using attack experience. Step 6, Second Defense Simulation: Abstracting the attack as a range-constrained abstract operation in practical operations. Defenders only need to ensure the model remains correct within the range of the abstract operation, thereby completing the second defense simulation. Step 7, Effect Verification: Deploying the defense model within the deep recognition neural network for simulated attacks and validation.

2. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the preprocessing in step 3 specifically involves feature compression to alleviate disturbances in input samples, predicting for the model with samples before and after compression, and identifying adversarial samples based on the differences in prediction results.

3. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the preprocessing in step 3 specifically involves selecting sample classification labels that 17905783 need protection, training trapdoor embeddings corresponding to these protected labels into the model, and identifying adversarial samples from input samples based on the activation state of neurons.

4. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the preprocessing in step 3 specifically includes input denoising and feature denoising. Input denoising, during the testing phase of the model, involves processing the input data to eliminate part or all of the adversarial perturbations. Feature denoising attempts to reduce the impact of adversarial interference on the high-level features learned by the DNN.

5. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the comparative detection in step 4 specifically utilizes the different numerical features of adversarial samples and original samples, such as the shape of the probability distribution obtained after the samples pass through the network, to determine the adversarial nature of the input.

6. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the comparative detection in step 4 specifically involves using the output of the middle part of the Deep Neural Network as the input for the detector to detect adversarial samples.

7. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the first defense simulation in step 5 specifically involves disrupting existing attack methods as a precondition, to address the vulnerability to emerging new-style attacks.

8. The defensive method against interpretability camouflage samples in deep recognition neural networks according to claim 1, characterized in that the attack experience in step 5 specifically involves training a model typically with normal samples.

Therefore, to make the model more robust, adversarial samples are mainly generated during the model's training phase and included in the training to train the neural network, achieving the purpose of defending against adversarial samples.

The generated adversarial samples are added to the training set for data augmentation,

allowing the model to learn about adversarial samples during training.