CN116228537A

CN116228537A - Attack image defense method based on denoising and super-resolution reconstruction fusion

Info

Publication number: CN116228537A
Application number: CN202310135980.5A
Authority: CN
Inventors: 蒋晓悦; 王众鹏; 冯晓毅; 夏召强; 韩逸飞
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-03-09
Filing date: 2023-02-20
Publication date: 2023-06-06
Also published as: CN114881852A

Abstract

The invention relates to an attack image defense method based on denoising and super-resolution reconstruction fusion. Because the noise is mainly high-frequency information, the multi-scale encoding and decoding network can utilize the low-frequency information to guide the reconstruction of the high-frequency information, thereby eliminating the high-frequency noise. Then, the super-resolution reconstruction network is used for fine cleaning of noise, and the distribution of residual noise countermeasure is destroyed by injecting high-frequency components in the super-resolution reconstruction process. The method effectively combines self-encoder denoising and super-resolution reconstruction to perform countermeasure defense. In an experiment, the sample cleaning and defending algorithm provided by the invention obtains excellent defending effect on a self-made satellite countermeasure sample data set. After the sample is cleaned, the attack noise is completely removed, and the accuracy rate is improved by 42.67% in the identification of the Efficient Net network.

Description

Attack image defense method based on denoising and super-resolution reconstruction fusion

Technical Field

The invention belongs to the field of image processing, and particularly relates to an attack image defense method based on denoising and super-resolution reconstruction fusion.

Background

In recent years, convolutional neural networks have been widely used for various computational vision tasks, including image classification, object detection, semantic segmentation, visual questions and answers, and the like. Meanwhile, the basic calculation vision tasks are also applied to various intelligent platforms, such as automatic driving, disease diagnosis, military strategy systems and the like. However, deep neural networks also have certain problems, have a certain non-interpretability, and are also susceptible to specific disturbance noise. That is, if some specially designed disturbance noise is superimposed in the input image, the original neural network will be interfered by the attack noise and cannot output a correct recognition result.

In the real world, such errors are fatal, for example, some hackers intentionally disseminate pictures added with attack noise into a network, so that a search system is attacked, and an output search result is inconsistent with input information, thereby they can disseminate harmful information and pictures, so that physical and mental health of teenagers is affected. In military, an adversary can elaborate false targets through algorithms to cheat and attack the machine learning system, so that the military intelligent system cannot work normally, and even subversion wrong decisions are generated, and the destructive failure of a battlefield is caused.

The countermeasure defense methods are mainly classified into two types according to policy directions, the first is an active defense method and the second is a passive defense method.

The active defense method adjusts the parameters of a specific model through methods such as overall countermeasure training or defensive distillation. Overall countermeasure training, by softening the decision boundary, normalizes the network so as to include nearby countermeasure images. Defensive distillation the robustness of a given model is improved in a substantially similar manner by retraining the model with soft labels obtained using a distillation mechanism. Active defense methods typically require computationally intensive micro-transformations. In addition, these transformations are vulnerable to further attacks, as adversaries can exploit the micro-modules to circumvent these attacks.

The hypothetical model of the passive defense approach is a black box, the network structure of which is not known, and the impact of the opposing disturbances in the input image domain on the model is mainly mitigated by applying various transformations. It basically solves the above-mentioned problems, but easily loses key image contents when removing enemy noise, resulting in poor classification performance of non-antagonistic images.

Disclosure of Invention

Technical problem to be solved

Aiming at the problem that key image contents are easily lost due to the removal of disturbance of input image countermeasure through transformation in the conventional passive defense, so that the classification performance of non-countermeasure images is poor, the invention provides an attack image defense method based on denoising and super-resolution reconstruction fusion.

Technical proposal

An attack image defense method based on denoising and super-resolution reconstruction fusion is characterized by comprising the following steps:

step 1: constructing a training image sample library and evaluation indexes;

step 2: constructing an countermeasure defense network; the countermeasure network comprises a self-encoder sub-network and a super-resolution reconstruction network; feature fusion is carried out on the features of each scale of the sub-network of the self-encoder and the output of the super-resolution reconstruction network, and the fused feature map enriches the high-frequency information of the image through an up-sampling layer to generate a super-resolution image without noise resistance;

the self-encoder sub-network comprises two codec sub-networks, and a jump connection layer is arranged between the codecs based on a standard U-Net structure; extracting features of each scale at each layer of the codec using a channel attention module CAB; the jump connection layer of the U-Net also uses CAB to process the feature map of the same scale; designing an operation of 'image characteristic splicing' between an encoder and a decoder; for the first stage network, the encoder outputs 4 encoder feature maps according to the input original image blocks; before being input to the decoder, the 4 encoder feature maps would be spliced two by two into 2 larger encoder feature maps; similarly, in the second stage network, the encoder outputs 2 encoder feature maps according to the input original image block and the decoder feature map output by the first stage network; before being input to a decoder, the 2 encoder feature maps are spliced into a complete encoder feature map; finally, the decoder outputs a corresponding decoder characteristic diagram according to the input characteristic diagram;

the super-resolution reconstruction network comprises 3 super-resolution blocks, and each block comprises 8 residual blocks; a batch normalization layer and a convolution layer are removed from the residual block;

step 3: defining a loss function of the countermeasure network;

(1) The self-encoder network loss function is shown in equation (1):

wherein L is _char Representing Charbonnier loss, L _edge Represents edge loss, X _s Representing the denoised image, Y representing the group-trunk image, S representing each stage, lambda taking 0.05;

the calculation formula of Charbonnier loss is shown as formula (2):

the calculation formula of the edge loss is shown as formula (3):

wherein delta represents the Laplacian, epsilon takes 10 ^-3 ；

(2) The super-resolution reconstruction network loss function is shown in formula (4):

wherein L2 represents L ₁ Loss, y _i Representing a group-trunk image, f (x _i ) Representing the super-resolution image;

the total loss function L is shown in equation (5):

L＝L1+L2 (5)

step 4: training a countermeasure defense network;

step 5: inputting an original image to be processed into an anti-defense network integrating denoising and super-resolution reconstruction, cleaning noise of the input image and restoring initial characteristic information through a self-encoder sub-network and the super-resolution reconstruction network, and finally obtaining an image after passive defense.

The invention further adopts the technical scheme that: the training image sample library in step 1 includes a pure digital countermeasure data set and a half digital semi-physical countermeasure data set, wherein the pure digital countermeasure data set refers to all pixel areas of a clean image that are attacked by countermeasure noise, and the half digital semi-physical countermeasure data set refers to a rectangular pixel area of the clean image that is attacked by the countermeasure noise set but not other areas; each data set contains 3 noise classes, BIM, FGSM, and PGD, respectively.

The invention further adopts the technical scheme that: the evaluation indexes in the step 1 comprise MSE, SSIM, and EfficientNet network identification accuracy.

A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.

A computer readable storage medium, characterized by storing computer executable instructions that when executed are configured to implement the method described above.

Advantageous effects

According to the attack image defense method based on denoising and super-resolution reconstruction fusion, the countermeasure defense method based on denoising and super-resolution reconstruction fusion is adopted, so that the problem that the operation amount of an active defense method is overlarge and the problem that other passive defense methods are easy to lose key image content and cause poor classification performance of non-countermeasure images are solved.

1. The invention selects two codec sub-networks as backbone networks. The network learns the noise-resistant branches step by step through feature fusion and attention mechanisms among the subnetworks, retains effective information as much as possible, and finally achieves the purpose of achieving defense.

2. The invention selects the traditional residual network as the backbone network, removes a batch normalization layer and a convolution layer in each residual block, adds an up-sampling layer at the tail part, and finally obtains a super-resolution network. The network destroys the original noise countermeasure by adding high-frequency components to the sample, thereby achieving the defending effect.

In an experiment, the sample cleaning and defending algorithm provided by the invention obtains excellent defending effect on a self-made satellite countermeasure sample data set. After the sample is cleaned, the attack noise is completely removed, and the accuracy rate is improved by 42.67% in the identification of the Efficient Net network.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.

FIG. 1 is a schematic diagram of a self-encoder-super-resolution reconstruction defense network architecture;

FIG. 2 is a schematic diagram of a monitor attention module;

FIG. 3 is a schematic diagram of a residual block network architecture;

FIG. 4 is a diagram illustrating three types of noise floor patterns;

FIG. 5 is a graph showing the comparison of the cleaning effect of the digital challenge sample;

FIG. 6 is a comparative schematic of the results of different feature fusion strategies;

FIG. 7 is a schematic diagram showing the comparison of the cleaning effect of the semi-digital semi-physical challenge sample;

FIG. 8 is a comparative schematic diagram of the result of mask pretreatment.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The invention provides an antagonism defense method based on denoising and super-resolution reconstruction fusion, which constructs an antagonism defense network based on denoising and super-resolution reconstruction fusion. The network can be regarded as a "parallel + serial" network structure. It is divided into three phases, the first and second phase mainly consisting of codecs as a self-encoder subnetwork. And the third stage consists of a residual block network and an up-sampling module as a super-resolution sub-network. Experiments show that the structure of the self-encoder network is suitable for processing the countermeasure samples with larger countermeasure noise intensity. The self-encoder can learn the noise in the image by utilizing the strong learning capability of the network. The more complex the structure of the network, the more network parameters, but the more learning ability, the slower the speed of reasoning. The super-resolution reconstruction network has a relatively simple structure, the main part is the stacking of residual blocks, the network parameters are fewer, and the countermeasure samples with lower noise intensity can be processed. Thus, when it is desired to process a challenge sample with a high noise intensity, a self-encoder network may be employed for challenge defense; and when it is necessary to process a challenge sample with a smaller noise intensity, a super-resolution network may be used for challenge defense.

In order to achieve the above object, the present invention provides an anti-defense method based on denoising and super-resolution reconstruction fusion, comprising the steps of:

step 1: training image sample library and construction of evaluation index

(1) Experimental data set construction

For the experiment of challenge sample cleaning, the data set of the experiment is a challenge sample data set which is prepared by the laboratory and comprises a pure digital challenge data set and a semi-digital semi-physical challenge data set. The pure digital challenge data set refers to the whole pixel area of a clean image that is attacked by the challenge noise, while the semi-digital semi-physical challenge data set refers to a rectangular pixel area of the clean image that is attacked by the challenge noise set but not other areas. Each data set contains 3 noise classes, BIM, FGSM, and PGD, respectively.

(2) Description of evaluation index

When evaluating the cleaning effect of the countermeasure sample, corresponding evaluation indexes are needed, and three indexes of Mean Square Error (MSE), structural Similarity (SSIM) and Efficient Net network identification accuracy are used for evaluating the cleaned countermeasure sample label image.

Step 2: construction of an antagonistic defense network

(1) Self-encoder network construction

The convolutional neural network for image denoising is basically designed by using one of the following two structural designs: self-encoder and single-scale feature channel. The self-encoder network first maps the input image to a low resolution for representation and then inversely maps the low-scale image features back to the original resolution size. While such a model can effectively encode multi-scale information, it sacrifices detail information of image space because of repeated use of the downsampling operation. Conversely, a single-scale feature channel network can process spatial information in an image well, but the output of such a network suffers from insufficient semantic robustness due to lack of experience. This demonstrates the inherent limitations of the two structural options described above, which can produce spatially accurate or contextually reliable outputs, but not both. To take full advantage of both designs, the present invention uses a dual-stage framework comprising dual self encoders in the self encoder subnetwork, as shown in fig. 1.

The self-encoder sub-network is mainly composed of two codecs, and a hop-connection layer is arranged between the codecs based on a standard U-Net structure. The present invention uses a channel attention module (CAB) at each layer of the codec to extract features for each scale. The jump connection layer of U-Net also uses CAB to process the feature map of the same scale. The invention designs an operation of 'image characteristic splicing' between an encoder and a decoder. For the first stage network, the encoder will output 4 encoder feature maps based on the input original image blocks. The 4 encoder feature maps are spliced two by two into 2 larger encoder feature maps before being input to the decoder. Similarly, in the second stage network, the encoder outputs 2 encoder feature maps based on the input original image block and the decoder feature map output by the first stage network. The 2 encoder feature maps are spliced into a complete encoder feature map before being input to the decoder. Finally, the decoder outputs a corresponding decoder feature map according to the input feature map. The present invention does not use transposed convolution to increase the spatial resolution of features in the decoder, but uses bilinear upsampling after the convolutional layer. Doing so helps reduce the checkerboard noise in the output image that occurs due to transpose convolution.

Between each phase, the present invention introduces Cross-phase feature fusion (Cross-stage Feature Fusion, CSFF). The codec has 3 different scale image features. The invention refines the characteristic of each scale by using 1X 1 convolution, and then propagates the characteristic of the coder and the decoder of the same scale to the encoder of the network of the next stage, and fuses the characteristic of the same scale with the encoder. Cross-phase feature fusion has several advantages. First, since the up-down sampling operation is repeatedly used in the codec, it may be that the network is not susceptible to network loss; secondly, the multi-scale features of the current stage are helpful to enrich the features of the next stage; eventually, the optimization of the entire network becomes more stable. Because cross-phase feature fusion simplifies information flow, the present invention allows for the addition of multiple phases in the overall architecture. At the end of the first two phases, the invention does not pass the prediction directly to the next phase, but adds a supervised attention module at the end. This module helps achieve significant performance gains. Its schematic diagram is shown in fig. 2, and its advantages are two. First, it provides useful real supervision information for image denoising at each stage. Secondly, with the help of local supervision prediction, the attention generated by the invention can be used for suppressing the characteristics with smaller information content in the current stage, and only the useful characteristics are allowed to be transmitted to the next stage. The monitor attention module first takes the output characteristic F from the last stage _in Generating a residual image R by a simple 1X 1 convolution _s . Then the residual image is added to the original image I to obtain a restored image X _S . The monitor attention module can then obtain the pixel monitor mask M by a 1X 1 convolution plus an activation layer, and thenWith 1X 1 convolution transformed feature F _in Re-correcting, and finally adding F _in Adding up to get a concentration-enhanced anti-noise feature F _out And passed on to the next stage for processing.

(2) Super-resolution reconstruction network construction

The sub-network contains 3 super-resolution blocks, each block containing 8 residual blocks. The network structure of the residual block is shown in fig. 3. This residual block removes the BN layer compared to previous super-resolution networks, as the BN layer normalizes the features, reducing the range flexibility of the network. After each super-resolution block, the invention performs feature fusion on the features of each scale of the second-stage codec and the output of the super-resolution block. Finally, the fused feature map enriches the high-frequency information of the image through an up-sampling layer to generate a super-resolution image without noise countermeasure.

Step 3: defining a loss function

(1) The self-encoder network loss function is shown in equation (1):

wherein L is _char Representing Charbonnier loss, L _edge Represents edge loss, X _s Representing the denoised image, Y representing the group-try image, S representing each stage, lambda is typically 0.05.

The calculation formula of Charbonnier loss is shown as formula (2):

the calculation formula of the edge loss is shown as formula (3):

/>

wherein delta represents the Laplacian, epsilon is taken10 ^-3 。

wherein L2 represents L ₁ Loss, y _i Representing a group-trunk image, f (x _i ) Representing the super-resolution image.

Therefore, the total loss function L is shown in equation (5):

L＝L1+L2 (5)

step 4: network training

For all experiments, the network optimizer was an Adam optimizer, whose parameters were β ₁ Equal to 0.9 beta ₂ Equal to 0.999, the weight decay rate is 1e-8. The number of epochs in the network was 100, and the initial learning rate was 0.0002. The batch size for model training is uniformly set to 4. The input images all take 128x128x3 and the range of input pixels is normalized to [0, 1]]The test stage is also normalized, and because the network is full convolution and downsamples for 3 times, the size of the test image is 0 filling mode to ensure that the height and width are all multiples of 8, and the output result is unfilled to obtain the test result. For image augmentation, the invention adopts an augmentation mode of combining inversion and rotation. Training a denoising defending network by using a satellite countermeasure sample library, updating network parameters by adopting an Adam optimization method, and stopping training when the loss function value defined in the step 3 is minimum, so as to obtain a final trained network.

Step 5: and firstly, the original image to be processed is fed into an anti-defense network integrating denoising and super-resolution reconstruction, noise cleaning and initial characteristic information restoration are carried out on the input image through a self-encoder sub-network and the super-resolution reconstruction network, and finally, the image after passive defense is obtained.

In order that those skilled in the art will better understand the present invention, the following detailed description of the present invention will be provided with reference to specific examples.

Example 1:

step 1: training image sample library and construction of evaluation index

First a training image sample library is constructed. There are a total of 3 types of noise immunity, BIM, FGSM and PGD, respectively. These 3 types of noise immunity samples and tag samples are shown in fig. 4, for example. Each challenge noise contained 4 noise intensities, so one dataset corresponds to 12 small challenge sample datasets. Each small challenge sample dataset contained 4 types of satellites for a total of 13095 images. The images of each type of satellite are photographed with one rotation in a 360 ° pose. The training set and the data set were divided in a 8:2 ratio, so the training set for each challenge sample was 10476 images and the test set was 2619 images. Considering that the data volume of the data set is huge, if the countermeasure samples of all noise types are put together for training, the time cost is very high, so the invention considers that the two types of noise intensity in each type of countermeasure noise are the highest and the lowest to form the countermeasure sample cleaning data set, and the data set contains 6 noise types which are half of the original data set, so the time cost of training can be greatly reduced. Then the training set of the dataset had a total of 62856 images and the test set had a total of 15714 images. Because the network output of the present invention is a 2-fold enlarged high resolution image, the training set and its label image need to be preprocessed by downsampling to a quarter of the original image. The resolution of the original image is 256×256, and the image after downsampling is 128×128.

Then, an appropriate evaluation index is selected.

(1) Mean Square Error (MSE)

The mean square error is commonly used for evaluating the difference degree of the prediction and the label on the global range, and in the task of decomposing the intrinsic image, the gray level difference of the point-to-point of the prediction image and the label image is evaluated to be good or bad. The mathematical definition of the mean square error is shown in equation (6). The invention does not normalize the input of the formula first, because the dataset image taken by the invention is near black in most areas, many pixel values are near zero. If normalized, the results become very small and it is difficult to characterize the effect of sample cleaning with an index.

(2) Structural Similarity (SSIM)

Structural similarity is a quantitative indicator that measures the degree of similarity in brightness, contrast, and structure of two images. Given two images X, Y, the structural similarity index for X and Y can be expressed as shown in equations (7) through (8).

SSIM(X，Y)＝L(X，Y)*C(X，Y)*S(X，Y) (7)

Wherein mu _x Sum mu _Y Is the mean value of the image, sigma _X Sum sigma _Y Is the standard deviation of the image, sigma _XY Representing covariance of image, C ₁ ，C ₂ And C ₃ Is constant and is usually C ₁ ＝(k ₁ L) ² ，C ₂ ＝(k ₂ L) ² ，

The value is so to avoid zero division phenomenon and k is the last ₁ ＝0.01，k ₂ =0.03, l=255. The formulas and data of L (X, Y), C (X, Y) and S (X, Y) are brought into formula (7), and finally the complete formula of the structural similarity is obtained as shown in formula (11).

The range of structural similarity is [ -1,1], and when the two images are identical, the structural similarity is 1, and the smaller the dissimilarity value is.

(3) EfficientNet network identification accuracy

The invention inputs the washed challenge sample into the EffectcientNet network for identification. The network is a four-class network trained with data sets that are not resistant to noise attacks. When the input image is a normal image, the network can correctly output the satellite category to which the image belongs; when the input image is protected against noise attacks, the network can incorrectly identify the input image and classify it as incorrect. Therefore, it can be seen from the accuracy of the network identification whether the noise cleaning process of the challenge sample is successful. And the recognition accuracy of the network is the number of sample recognition correctness divided by the total input sample number x 100%.

Step 2: denoising and super-resolution reconstruction fusion defense

The invention respectively builds the codec sub-network and the original super-resolution reconstruction network through the PyTorch framework, and finally forms the whole denoising defense framework.

After the denoising defense network is built, the invention trains the network in the challenge sample database for a total of 100 epochs. The invention adopts Charbonnier loss, edge loss and L1 loss as loss functions, the loss functions are gripped and converged in training, and the final predicted result is shown in step 3

Step 3: experimental results and analysis

(1) Experimental results and analysis of purely digital challenge datasets

The invention first experiments on a purely digital challenge data set. A purely digital challenge data set refers to a challenge of the whole image against noise such that the challenge noise is evenly distributed over the whole image. According to the data set partitioning strategy described in the step 1, the invention trains and tests the network and quantitatively evaluates the final result by using two indexes of Mean Square Error (MSE) and Structural Similarity (SSIM). In order to verify the effectiveness of the invention in network countermeasure, the invention tests on each noise data set, and the experimental results are shown in table 1.

Table 1 pure digital challenge data set cleaning effect quantization table

As shown in table 1, the MSE values of the clean samples corresponding to each noise are smaller than those of the challenge samples, and the SSIM values are larger than those of the challenge samples. Whereas the MSE value and SSIM value of FGSM0.3 and the other noise types differ significantly. This is because the noise intensity is large, and thus the range to which effective denoising can be carried out has been exceeded. For each noise, the higher the MSE value, the lower the SSIM value, and the theoretical case, when its noise strength is greater. From the results, it can be demonstrated that the countermeasure of the present invention is basically effective for cleaning noise.

In order to visualize the denoising effect, the present invention selects a challenge sample of the same background and a corresponding cleaning image from three data sets of FGSM0.3, FGSM0.1 and FGSM0.01 as a display, as shown in fig. 5. In theory, the resolution of the challenge sample is 128×128, and the resolution of the washed sample becomes 256×256 after super-resolution amplification. But in order to enhance the effect of the visualization, the present invention adjusts the resolution of the resulting output image and the label image to be the same as the input image. It can be seen that, aiming at the countermeasure samples with different noise intensities, the method provided by the invention can clean the countermeasure noise therein, and is basically consistent with the label, which indicates that the method can realize the pure digital countermeasure sample cleaning task.

Meanwhile, the invention also compares the self-encoder characteristic feature fusion strategy with the residual network characteristic fusion strategy, and evaluates the self-encoder characteristic feature fusion strategy by using the index of EfficientNet network accuracy. The experimental results are shown in table 2. Where "ED" represents a fusion strategy using only a codec and "SR" represents a fusion strategy using only super resolution.

TABLE 2 feature fusion policy comparison results based on pure digital challenge data set

As can be seen from table 2, for each of the antagonistic samples, the identification accuracy of the "codec+super-resolution" fusion strategy is higher than that of the "codec only" fusion strategy and the "super-resolution only" fusion strategy, which can indicate that the network is very effective for the removal of the antagonistic noise. Meanwhile, for each type of countermeasure sample, the greater the noise intensity is, the lower the recognition accuracy is, and conversely, the higher the recognition accuracy is. For data sets with higher noise intensity (noise figure is 0.05 and above), the effect of cleaning by the method of only using the codec is better than that of processing by the method of only using the super resolution, the method of describing the codec is applicable to cleaning of the anti-sample with higher noise intensity, and the method of super resolution is applicable to cleaning of the anti-sample with lower noise intensity (noise figure is 0.01 and below).

In order to visualize the denoising effect, the present invention performs visual comparison on the results of the FGSM0.1 sample images through different feature fusion strategies, as shown in fig. 6. Also, in order to enhance the effect of the visualization, the present invention adjusts the resolution of the resultant output image to be the same as the input image. The fusion strategy of the codec only removes the noise countermeasure of the input image, but removes the inherent texture characteristics of the original satellite, so the image returns to appear a blurring phenomenon; the fusion strategy of "super resolution only" does not remove the noise but rather injects high frequency components into the image to reconstruct the high frequency details, so that the fusion strategy is similar to the input image as viewed by naked eyes. The result of two fusion strategies is combined, so that the noise-proof noise is eliminated, the high-frequency detail characteristics are reconstructed, and finally, a lossless amplified image is obtained, and the fusion strategy provided by the invention can realize the task of noise-proof cleaning.

(2) Semi-digital semi-physical challenge sample

The semi-digital semi-physical challenge sample is based on a rectangular region of the image that is subject to a noise-combating attack. The present invention requires denoising of the image pixel region using a masking method for this type of challenge sample. The noise region of the half-digital half-physical challenge sample is known as a rectangular region. The present invention obtains a mask image for each of the antagonistic samples by giving pixel coordinates of the upper left corner and the lower right corner of the noise region. In the cleaning process, the invention shields the non-noise area through the mask, and only the noise area is left for cleaning. After the cleaning is finished, the method and the device superimpose the cleaned result with the original result which is not shielded by the mask to obtain the final denoising result. In order to verify the effectiveness of the invention in network countermeasure, the invention tests on each noise data set, and the experimental results are shown in table 3.

Table 3 results quantization table for semi-digital semi-physical challenge data set

As shown in table 3, the MSE values of the clean samples corresponding to each noise are smaller than those of the challenge samples, and the SSIM values are larger than those of the challenge samples. Whereas the MSE values and SSIM values of FGSM0.3 and PGD0.5 differ significantly from the values of other noise types. This is because the noise intensity is large, and thus the range to which effective denoising can be carried out has been exceeded. For each noise, the higher the MSE value, the lower the SSIM value, and the theoretical case, when its noise strength is greater. From the results, it can be demonstrated that the countermeasure of the present invention is basically effective for cleaning noise.

In order to visualize the denoising effect, the present invention selects the challenge samples with the same background and the corresponding cleaning images from the three data sets of FGSM0.3, FGSM0.1 and FGSM0.01 as the display, as shown in fig. 7. In the figure, the resolution of the input image is 128×128, and the resolution of the output cleaning image is 256×256 after super resolution enlargement. But in order to enhance the effect of the visualization, the present invention adjusts the resolution of the resulting output image and the label image to be the same as the input image. It can be seen that the invention can well clean the noise-resistant, whether the intensity of the noise-resistant is large or small, so that the output image is basically consistent with the label, and the invention can realize the semi-digital semi-physical task of cleaning the sample-resistant.

Meanwhile, the invention also carries out a comparison experiment aiming at whether a mask is used or not, and adopts an index of EfficientNet network identification accuracy to quantify the final result. The experimental results are shown in Table 4. "With Mask" means pretreatment using a Mask, and "Without Mask" means pretreatment Without using a Mask.

Table 4 classification accuracy results based on mask contrast for semi-digital semi-physical challenge datasets

As can be seen from table 4, the accuracy of using a mask to assist in denoising is higher than the accuracy of not using a mask for each half-digital half-physical challenge sample. At the same time, their accuracy is higher than that of the challenge sample. This illustrates that the use of a mask to assist in the overall denoising effect is helpful. Because the denoising network denoises areas outside the noisy areas of the image if no mask is used. However, since the area is not attacked by noise, the denoising network reacts, not only does not remove noise, but also additional noise is added, so that the final recognition accuracy is reduced. Meanwhile, for each type of countermeasure sample, the greater the noise intensity is, the lower the recognition accuracy is, and conversely, the higher the recognition accuracy is.

In order to visualize the denoising effect, the present invention performs visual comparison on the result obtained by performing masking processing on the sample image of FGSM0.1, as shown in fig. 8. Also, in order to enhance the effect of the visualization, the present invention adjusts the resolution of the resultant output image to be the same as the input image. The result of the pretreatment without using a mask is that the noise is not completely removed and a small amount of shadow remains. In the result of preprocessing by using the mask, the noise is completely removed, which shows that the preprocessing by using the mask is very effective for the noise-resisting cleaning task when the half-digital half-physical countermeasure sample is processed.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.

Claims

1. An attack image defense method based on denoising and super-resolution reconstruction fusion is characterized by comprising the following steps:

step 1: constructing a training image sample library and evaluation indexes;

step 3: defining a loss function of the countermeasure network;

(1) The self-encoder network loss function is shown in equation (1):

wherein L is _cha Representing Charbonnier loss, L _edge Represents edge loss, X _s Representing the denoised image, Y representing the group-trunk image, S representing each stage, lambda taking 0.05;

the calculation formula of Charbonnier loss is shown as formula (2):

the calculation formula of the edge loss is shown as formula (3):

wherein delta represents the Laplacian, epsilon takes 10 ^-3 ；

the total loss function L is shown in equation (5):

L＝L1+L2 (5)

step 4: training a countermeasure defense network;

2. The attack image defense method based on denoising and super-resolution reconstruction fusion according to claim 1, wherein the attack image defense method is characterized in that: the training image sample library in step 1 includes a pure digital countermeasure data set and a half digital semi-physical countermeasure data set, wherein the pure digital countermeasure data set refers to all pixel areas of a clean image that are attacked by countermeasure noise, and the half digital semi-physical countermeasure data set refers to a rectangular pixel area of the clean image that is attacked by the countermeasure noise set but not other areas; each data set contains 3 noise classes, BIM, FGSM, and PGD, respectively.

3. The attack image defense method based on denoising and super-resolution reconstruction fusion according to claim 1, wherein the attack image defense method is characterized in that: the evaluation indexes in the step 1 comprise MSE, SSIM, and EfficientNet network identification accuracy.

4. A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.

5. A computer readable storage medium, characterized by storing computer executable instructions that, when executed, are adapted to implement the method of claim 1.