CN115797711B

CN115797711B - Improved classification method for countermeasure sample based on reconstruction model

Info

Publication number: CN115797711B
Application number: CN202310132811.6A
Authority: CN
Inventors: 郭杰龙; 魏宪; 俞辉; 阳帆; 李�杰; 张剑锋; 邵东恒
Original assignee: Quanzhou Institute of Equipment Manufacturing
Current assignee: Quanzhou Institute of Equipment Manufacturing
Priority date: 2023-02-20
Filing date: 2023-02-20
Publication date: 2023-04-21
Anticipated expiration: 2043-02-20
Also published as: CN115797711A

Abstract

The invention discloses an improved classification method of an countermeasure sample based on a reconstruction model, which belongs to the field of automatic driving image classification and comprises the following steps: constructing a reconstruction model, and extracting characteristic information of an image dataset based on the reconstruction model; learning an abstract representation from the reconstruction model using a challenge attack, generating a challenge sample based on the generator; constructing a classifier, and training the classifier based on the challenge sample to obtain a trained classifier; and training the countermeasure sample and the trained classifier based on the common loss function in a combined way to obtain a target classification model, and classifying the image data set based on the target classification model to obtain a classification result. The invention can learn the characteristics for assisting classification from the countermeasure sample based on the reconstruction model, thereby improving the model classification precision.

Description

Improved classification method for countermeasure sample based on reconstruction model

Technical Field

The invention belongs to the field of automatic driving image classification, and particularly relates to an improved classification method for an countermeasure sample based on a reconstruction model.

Background

Deep Neural Networks (DNNs) are a powerful learning model that achieves excellent performance in a variety of fields, such as language translation, image recognition, object detection and recognition, and the like. While deep neural networks are susceptible to antagonistic samples, outputting erroneous results with high confidence. In deep learning Szegedy et al originally proposed the concept of a challenge sample, a non-random disturbance that is small on the original image and that may arbitrarily alter the network predictions. Such disturbances are often unrecognizable by the human eye but can mislead the output of the model to be mispredicted. Such a defect has relatively little impact on applications in non-secure systems, but may cause serious practical problems in secure systems. A vulnerability in the image classification detection model, such as in an unmanned system, may be exploited to maliciously cause failure of the vehicle-mounted identification camera detection. The application and development of deep neural networks in real life is severely hampered by significant safety hazards created by the nature of the challenge sample and its portability. Accordingly, challenge samples have received extensive attention and research. At present, most researches on challenge samples take the challenge samples as attacks for destroying the robustness of the model, or propose a new attack method to greatly reduce the classification performance of the classification model, or try to promote the defensive ability of the model to the challenge samples. Whereas analysis of challenge samples from the opposite perspective may contain potential information related to model prediction, studies to aid model prediction classification have been rarely reported.

Disclosure of Invention

Since the challenge sample may also contain potential information related to model prediction, the present invention aims to obtain more accurate classification results by performing model prediction classification by taking the information contained in the challenge sample into consideration.

In order to achieve the above object, the present invention provides the following solutions: a method of improving classification of challenge samples based on a reconstruction model, comprising:

constructing a reconstruction model, and extracting characteristic information of an image dataset based on the reconstruction model; learning an abstract representation from the reconstruction model using a challenge attack, generating a challenge sample based on a generator;

constructing a classifier, and training the classifier based on the challenge sample to obtain a trained classifier;

and training the countermeasure sample and the trained classifier based on the common loss function in a combined way to obtain a target classification model, and classifying the image data set based on the target classification model to obtain a classification result.

Preferably, the reconstruction model includes a self-encoder network structure and a variational self-encoder network structure for transforming the original high-dimensional image

Compressing to a low-dimensional space to obtain a low-dimensional space image, and reconstructing the low-dimensional space image to be the same as the original high-dimensional image +.>

Approximate picture->

。

Preferably, the reconstructed model of the self-encoder network structure obtains model parameters by minimizing a loss function;

the minimization loss function expression is:

。

preferably, the reconstruction model loss function of the variation self-encoder network structure comprises two parts of reconstruction loss and KL divergence, and the training is optimized through the following formula:

wherein ,

for image->

Potential variable of low-dimensional space obtained after passing through encoder,/->

For the functional representation of the encoder, < >>

The representation is based on the input +.>

Is>

Conditional probability of->

For the actual data distribution of the potential space, +.>

Representing a constant,/->

Represents the KL divergence, which is used to measure the magnitude of the difference between the two distributions,

i.e. express +.>

and />

KL divergence between two data distributions,

representing a minimization of the loss function->

To complete the training of the VAE model parameters.

Preferably, the abstract representation is learned from the reconstruction model using a challenge attack, the generating of the challenge sample based on the generator comprising,

after training of the reconstruction model is completed, a generator with consistent input and output image sizes is constructed; image is formed

The generator is trained as input until the reconstruction performance of the reconstruction model is destroyed when the loss function value of the reconstruction model is maximized, rendering the reconstruction model antagonistic.

Preferably, the training process has the formula:

wherein ,

representing by maximizing +.>

This loss function, training classifier->

Parameter of->

Representing challenge sample->

Represents the intermediate layer variables of the challenge sample after the dimension reduction of the self-coding model AE or VAE, < >>

A reconstructed image representing the challenge sample; />

Model parameters representing a generator based on generator +.>

Arbitrary clean samples->

Corresponding challenge sample->

Can be directly producedThe method comprises the following steps:

。

preferably, the classifier is trained based on the challenge sample, and the formula for obtaining the trained classifier is:

wherein ,

representing by minimizing the loss function->

Training to obtain classifier parameters->

；/>

For model parameters of the classifier, +.>

For classification model predictive value, +.>

For clean sample->

Is a true mark of (c).

Preferably, the training of the challenge sample and said trained classifier is jointly based on a common loss function, the process of obtaining a target classification model comprising,

and combining the training processes of the generator and the classifier, performing an end-to-end global training step, and fine-tuning the parameters of the generator and the classifier by using a common loss function.

Preferably, the formula for fine tuning the parameters of the generator and classifier using a common loss function comprises:

wherein ,

representing minimizing the training of the loss function in brackets +.>

and />

Parameter of->

Representing the loss function of AE or VAE, +.>

Representing the deviation of the classifier from the predicted value and the true value of the input image by calculating the cross entropy +.>

Obtained.

The invention discloses the following technical effects:

according to the reconstruction model-based antagonism sample improved classification method, in the first stage, feature information of a data set image is extracted by means of the reconstruction model, and then abstract representation is learned from the reconstruction model by means of antagonism attack. And in the second stage, the information of the beneficial classification is fed back to the classification model through the generated countermeasure sample. And finally, by combining the two stages, constructing an end-to-end global training structure to complete the overall training of the two-stage combined learning paradigm, thereby improving the generalization precision of the classification model. Compared with the existing countermeasure training method, the method provided by the invention realizes higher generalization precision and larger amplitude improvement in image recognition in a plurality of data sets such as MNIST, CIFAR10, CIFAR100 and the like and different classification models. Meanwhile, experiments prove that compared with a countermeasure sample based on a classification model, even a clean sample, the countermeasure sample based on the reconstruction model provided by the invention contains more sufficient characteristic information which is more beneficial to learning and understanding of the classification model.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a two-stage joint learning process according to an embodiment of the present invention;

FIG. 3 is a graph of the comparison of a challenge sample and a Gaussian noise sample according to an embodiment of the invention;

fig. 4 is a diagram of classification accuracy of different models under different noise intensities according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

As shown in fig. 1-2, the present invention provides an improved classification method for an countermeasure sample based on a reconstruction model, which mainly comprises two training parts: the first part of training is separation training, and a training reconstruction model learns characteristic information of a data set, wherein the attack reconstruction model generation countermeasure sample and the training classifier are independently performed. The second part is global training, and the common loss function is used for jointly training the countermeasures and the classifier, so that end-to-end information transfer is realized.

Further, the separation training comprises three parts of training a reconstruction model, generating an countermeasure sample and training a classifier, and each part of training is independently performed.

The reconstruction model employs a network architecture of a self-encoder (AE) and a variational self-encoder (VAE). Both of which can convert the original high-dimensional image

Compressing to low dimensional space, and reconstructing into picture similar to original picture +.>

For the AE model, by minimizing the loss function +.>

Parameters of the model are obtained. For the VAE model, its loss function includes two parts, reconstruction loss and KL divergence, the training is often optimized by the following formula:

wherein ,

for image->

For the functional representation of the encoder, < >>

The representation is based on the input +.>

Is>

Conditional probability of->

Is the actual data distribution of the potential space. />

Represents a constant, obviously when +.>

Loss function of VAE->

Loss function with AE->

And consistent. />

Represents KL divergence for measuring the magnitude of the difference between the two distributions,/L>

I.e. the representation

and />

KL divergence between two data distributions, < >>

Representing a minimization loss function

To complete the training of the VAE model parameters.

After the training of the reconstructed model is completed, an challenge sample is generated for the model, which is based on the generator. By constructing one Generator, both input and output are images of uniform size. Training the images generated by the generator in order to make the images generated by the generator antagonistic

Let the loss function of the reconstruction model +.>

As large as possible.

When the loss function value is maximum, it is shown that the reconstruction performance of the reconstruction model is destroyed, i.e., the reconstruction model is resistant. The training process can be formulated as:

wherein ,

representing by maximizing +.>

This loss function, training classifier->

Parameter of->

Representing challenge sample->

A reconstructed image representing the challenge sample; />

Model parameters representing the generator. Based on generator->

Arbitrary clean samples->

Corresponding challenge sample->

Can directly generate:

this process of reconstructing the model by attack deviates the key structure and basic distribution of the challenge sample generated from the clean sample to obtain more discriminative class information. Thus in the final step of the separation phase, the challenge sample is obtained by training

Training a classifier:

wherein ,

representing by minimizing the loss function->

Training to obtain classifier parameters->

；/>

For model parameters of the classifier, +.>

For classification model predictive value, +.>

For clean sample->

Is a true mark of (c).

Further, after separation training is completed, a joint generator

And classifier->

And (3) performing an end-to-end global training step to realize global optimization and further improve the precision of the classifier. At this stage, the generator and classifier parameters are fine-tuned using a common loss function:

wherein ,

representing minimizing the training of the loss function in brackets +.>

and />

Parameter of->

Representing the loss function of AE or VAE, +.>

Obtained.

Example 1

The part takes the automatic driving image classification process as an application scene, firstly, the sample obtained by using the generator is a countermeasure sample, and the countermeasure thereof is verified. And secondly, comparison tests prove that the generalization precision of the classification model can be improved only by using the countermeasure sample training generated by the method. Finally, ablation experiments are carried out on the aspects of global training, attack methods and the like.

Experimental setup

Data set: the embodiment adoptsThe dataset is MNIST, CIFAR10, CIFAR100. The MNIST dataset consisted of black and white images of handwritten numbers from 0 to 9, containing 60000 training images and 10000 test images. Wherein the images are all single channels, and the picture size is 28

28. Both the CIFAR10 and CIFAR100 datasets consisted of 60000 sheets of 3 +.>

32/>

32, of which 50000 are training images and 10000 are test images. The pictures in CIFAR10 are real object images in the real world for a total of 10 categories. Whereas CIFAR100 contains 100 categories of pictures, each category containing 600 images. To minimize potential effects, the image pixel values used for the experiments were determined to be from the original range [0,255]Normalized to [0,1]。

And (3) model: the models used in the experiments mainly include a reconstruction model, a classification model and a generator. The reconstruction model mainly employs AE and VAE architecture, where VAE is used in most experiments. And the classification model mainly adopts ResNet-20, which is specially designed for CIFAR data sets. VGG-19 was also used for ablation experiments. The generator used in generating the challenge sample is a generative model based on the VAE structure.

Parameters: model training in experiments is optimized by adopting an Adam optimizer, and the learning rate is adjusted by using a MultiStepLR. In generating the challenge sample, an infinite norm is employed

Limiting the magnitude of the noise immunity. The paper precision adopted in the experiment is the generalization precision obtained on a clean sample, and the training of all classifiers has no model pre-training and data enhancement.

To demonstrate that the samples generated by the above method are resistant, i.e., the samples are destructive to the reconstruction effect of the reconstruction model, two evaluation indexes, namely Structural Similarity (SSIM) and peak signal to noise ratio (PSNR), are introduced. Both are used for measuring the similarity between the reconstructed image and the original image, and the larger the value is, the better the reconstruction effect is.

In the experiment, a sample added with ordinary Gaussian noise is used as a control group, and the difference of the influence of the reconstruction effect of the countermeasure sample and the sample containing ordinary Gaussian noise is compared. The experimental results show that the reconstruction effect of the challenge samples is significantly lower than that of gaussian noise samples, as shown in fig. 3, and this difference increases with increasing noise intensity. The noise intensity of the common Gaussian noise needs to be more than 8 times (0.25) to be equal to the noise intensity of the common Gaussian noise when the noise intensity of the anti-attack effect is 0.03. Thus, the challenge sample is more misleading, i.e. more resistive, to the reconstruction model completing the reconstruction task.

By incorporating the reconstruction model-based challenge sample proposed in the present embodiment

Challenge sample for classification model with mainstream study +.>

And clean sample->

Compared with the trained model, the countermeasure sample provided by the embodiment can improve the precision of the classification model. To make the difference more obvious, the noise intensity of the challenge sample is increased to 0.3. Obviously, challenge samples based on reconstruction model +.>

Noise and sample for classification model>

The difference exists, and the noise of the noise is regular and is in a flat grid shape; whereas the latter noise is chaotic and chapter-free.

As shown in table 1, under different data sets, challenge samples for the classifier

Trained model accuracy is better than using clean samples +.>

Indicating that challenge training reduces the generalization accuracy of the model, which is in substantial agreement with the conclusions of the prior studies. Unlike this, the present embodiment provides a reconstructed model-based challenge sample ∈ ->

The generalization precision of the classification model can be improved, and the MNIST, CIFAR10 and CIFAR100 data sets are respectively improved by 0.06%,1.34% and 0.98% under the condition that data enhancement and model pre-training are not carried out.

TABLE 1

In addition, compared with other classification models trained using challenge samples (e.g., table 2, table 3), the ICRAE method of the present embodiment can achieve the highest classification accuracy of 99.7% on MNIST and the greatest accuracy improvement of 1.34% on CIFAR10 dataset.

TABLE 2

TABLE 3 Table 3

Next, the influence of different factors including attack mode, noise intensity, used model and the like on the experimental result is explored.

In the second step of the training process, different challenge methods, such as the usual fast gradient descent method (FGSM), projection gradient descent method (PGD), and the Generator-based method used in this embodiment, may be used. Wherein FGSM and PGD are both baseIterative attack-countering method by inputting

Add a disturbance->

Obtain challenge sample->

. The specific calculation method of FGSM is that firstly, the loss function is calculated about input +>

Gradient of->

Intensity of attack and gradient->

Is multiplied by the sign of (a) and the resulting disturbance +.>

And->

Adding, and obtaining the countermeasure sample by one step of iteration. The calculation process is formulated as:

wherein ,

representing the amount of change of the loss function +.>

Representing the amount of change of the input image,/->

Indicating the strength of the attack and,

representing calculated +.>

Is a sign of (3).

And PGD is equivalent to repeating the FGSM process multiple times, with multiple iterations. And thus its attack effect is better than FGSM. In contrast to both, the method of deriving the challenge sample with the generator is based on optimization, with the parameters of the generator being trained by means of an optimizer. The classification accuracy corresponding to the challenge samples generated by the different attack methods is compared in table 4, wherein G-s represents the model accuracy after the generator-based and separation training is completed; and G-G represents model accuracy after generator-based and global training. According to the experimental result, the overall training is completed, and the model classification effect is improved to the greatest extent by a generator-based method (namely the ICRAE algorithm proposed by the embodiment). In combination with table 5, the end-to-end global training can further improve the classification precision of the model and realize the global optimization of the feature extraction of the classification information.

TABLE 4 Table 4

Noise intensity

Refers to infinite norm +.>

The method comprises the following steps:

noise intensity

The larger the noise on the challenge sample, the more noticeable the gap from the original image. Table 5 shows the generalization accuracy of the trained classification model at different noise intensities. In different data sets, the precision is along withThe noise increases, and the trend of increasing and then decreasing is shown, and the peak value in each data set appears when the noise intensity is 0.01. At the same time, the general noise intensity is within the interval [0.01,0.05 ]]When training the classification model against the samples, the accuracy is higher than for a model corresponding to a clean sample. This means that too high an anti-noise still damages the feature information in the image, thereby reducing the classification accuracy.

TABLE 5

Further, experiments were also performed using different reconstruction models and classification models, the results of which are shown in fig. 4. In fig. 4, the abscissa indicates noise intensity, and the ordinate indicates classification accuracy. The left graph is the result of using the ResNet model, the right graph is the result of using the VGG model, and the dashed line represents the generalization accuracy obtained by training the classifier using clean samples. Obviously, the noise intensity corresponding to the peak of the classification accuracy is different for different classification models. ResNet corresponds to the highest accuracy at 0.01, while VGG is optimal at a noise strength of 0.03. On the other hand, under the ResNet model, the AE is used as a reconstruction model to improve the final classification effect optimally; while the reconstruction with VAE is better for VGG models. However, in general, the trend of classification accuracy with noise size is substantially consistent under different models, and also at noise intensities of 0.01 to 0.05, training using the challenge sample improves classification model accuracy. This shows that the ICRAE algorithm provided in this embodiment can be applied to different reconstruction models and classification models, so as to improve classification accuracy.

In most of the prior studies, the challenge sample is considered as an image with destroyed characteristic information, so that an excellent classification model cannot be trained. In the embodiment, the challenge sample is researched from a forward direction, a challenge sample lifting and classifying algorithm based on a reconstruction model is provided, a classifier with better performance than that of training a clean sample can be trained by only using the challenge sample, and the improvement of the model generalization precision is realized. This demonstrates that the challenge sample contains sufficient classification information to facilitate further investigation of the challenge sample. The characteristic information in the countermeasure sample can be split on the basis of the characteristic information in the future, and the action mechanism of different characteristic information in the characteristic information on the basis of the characteristic information can be explored, so that a classification model with stronger predictability and stronger robustness can be obtained.

The above embodiments are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solutions of the present invention should fall within the protection scope defined by the claims of the present invention without departing from the design spirit of the present invention.

Claims

1. A method for improving classification of challenge samples based on a reconstruction model, comprising:

in a scene of automatic driving image classification processing, constructing a reconstruction model, and extracting characteristic information of an image dataset based on the reconstruction model; learning an abstract representation from the reconstruction model using a challenge attack, generating a challenge sample based on a generator;

training a challenge sample and the trained classifier in a combined way based on a common loss function to obtain a target classification model, classifying the image data set based on the target classification model, and obtaining a classification result;

the reconstruction model comprises a self-encoder network structure and a variational self-encoder network structure for transforming the original high-dimensional image

Approximate picture->

；

The reconstruction model of the self-encoder network structure obtains model parameters by minimizing a loss function;

the minimization loss function expression is:

the reconstruction model loss function of the variation self-encoder network structure comprises two parts, namely reconstruction loss and KL divergence, and the training is optimized through the following formula:

wherein ,/>

For image->

For the functional representation of the encoder, < >>

The representation is based on input

Is>

Conditional probability of->

For the actual data distribution of the potential space, +.>

Representing a constant,/->

I.e. express +.>

And

KL divergence between two data distributions, < >>

Representing a minimization of the loss function->

Training parameters of the VAE model is completed;

learning an abstract representation from the reconstruction model using a challenge attack, the generator-based process of generating a challenge sample including,

Training the generator as input until the reconstruction performance of the reconstruction model is destroyed when the loss function value of the reconstruction model is maximum, so that the reconstruction model has antagonism;

the training process has the following formula:

wherein ,/>

Representation by maximization

This loss function, training classifier->

Parameter of->

Representing challenge sample->

A reconstructed image representing the challenge sample;

model parameters representing a generator based on generator +.>

Arbitrary clean samples->

Corresponding challenge sample->

Can directly generate: />

Training the classifier based on the challenge sample to obtain a trained classifier with the following formula:

wherein ,/>

Representing passing through the mostMinimization of the loss function->

Training to obtain classifier parameters->

；/>

For model parameters of the classifier, +.>

For classification model predictive value, +.>

For clean samples

Is a true mark of (2);

training the challenge sample and the trained classifier jointly based on a common loss function, the process of obtaining a target classification model comprising,

combining the training process of the generator and the classifier, performing an end-to-end global training step, and fine-tuning parameters of the generator and the classifier by using a common loss function;

the formula for fine tuning the parameters of the generator and classifier using a common loss function includes:

wherein ,

representing minimizing the training of the loss function in brackets +.>

and />

Parameter of->

Representing the loss function of AE or VAE, +.>

Obtained. />