CN114936976A

CN114936976A - Restoration method for generating anti-network haze image based on memory perception module

Info

Publication number: CN114936976A
Application number: CN202210579337.7A
Authority: CN
Inventors: 许晓燕; 董文德; 徐贵力
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-08-23

Abstract

The invention relates to a restoration method for generating an anti-network haze image based on a memory perception module. Firstly, under a countermeasure network framework generated under a Pix2Pix condition, a generator G capable of generating a restored image according to a haze image and a discriminator D for discriminating whether the restored image belongs to a real image are introduced, and network training is performed. The method comprises the steps of taking the countermeasure loss, the pixel reconstruction loss and the characteristic perception loss as a whole network target loss function, inputting a pair of haze-clear image data sets into a network in the network training process, training a generator G and a discriminator D, updating network parameters of the G and the D by adopting an optimizer, taking whether a target loss function value is converged as a judgment condition for finishing the training, and finally obtaining a local optimal solution of a network model. Compared with the traditional image defogging method based on the physical model, the method realizes the one-step recovery of the haze image, and the network has higher operation efficiency and can obtain the high-quality recovered image.

Description

Restoration method for generating anti-network haze image based on memory perception module

Technical Field

The invention belongs to the technical field of computer digital image processing, and particularly relates to a restoration method for generating an anti-network haze image based on a memory perception module.

Background

In the application fields of daily photography or medical image detection and the like, due to the existence of the atmosphere, the imaging quality is inevitably influenced by physical factors such as radiation, absorption and scattering of particles in the atmosphere, and especially in haze or rain and snow weather, the image quality degradation is more serious. The image degradation seriously damages the identification degree of image details and reduces the use value, so the research on the image defogging method has urgent application requirements and has great significance on fully mining the application value of the image.

The traditional single image defogging method is generally based on a physical model method, and has two defects, namely, the physical model defogging is a typical uncertain problem, needs strong prior conditions or assumptions and is not suitable for image defogging of any scene, and the restoration of an image is estimated, and intermediate variables related to haze imaging need to be estimated firstly, so that the accumulation of intermediate errors and the instability of a final result are caused.

Disclosure of Invention

In view of the defects of the prior art, the invention aims to provide a method for restoring a generated anti-network haze image based on a memory perception module, which can restore the haze image in a haze weather in one step.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

the invention discloses a restoration method for generating an anti-network haze image based on a memory perception module, which comprises the following steps:

step 1, constructing a haze-clear data set S: the data set S comprises haze images and clear images, and the haze images correspond to the clear images one by one;

step 2, generating a confrontation network framework based on a Pix2Pix condition, and constructing and introducing a generator G and a discriminator D by combining an image defogging task; the generator G is used for learning the mapping relation between the haze image domain and the clear image domain, and converting the input haze image into the clear image domain to obtain a restored image; the discriminator D is used for judging whether the restored image is a real image or not, and the discriminator D is matched with the generator G to carry out countermeasure training;

step 3, initializing network parameters, and constructing a target loss function to constrain the optimization directions of a generator G and a discriminator D in the confrontation training; wherein the target loss function comprises pixel reconstruction loss, feature perception loss and countermeasure loss generated by countermeasure training;

step 4, forward calculation is carried out on a generator G and a discriminator D by utilizing the pair of haze images and clear images in the haze-clear data set S, and target loss functions of the generator G and the discriminator D are obtained respectively;

step 5, fixing the network parameters of the generator G, and updating the parameters of the discriminator D network by using an Adam optimizer;

step 6, fixing network parameters of the discriminator D, and updating the parameters of the generator G network by using an Adam optimizer;

and 7, circularly executing the step 4, the step 5 and the step 6 until loss function values of the generator G and the discriminator D calculated in the step 4 are converged at the same time, obtaining a local optimal solution of the generator network parameters at the moment, and calculating the restoration image estimation value by using the generator with complete training.

Further, in step 2, the defogging network uses Pix2Pix to generate a countermeasure network as a basic framework for image defogging, and uses the U-Net network as a backbone network of the generator network G. In order to enable G to better remove image haze and restore the characteristics of image texture, structure and the like, an attention mechanism and a recurrent neural network are introduced, G is designed into a network with stronger generation capacity, namely a generator G network is constructed on the basis of a haze information perception memory sub-network and an improved U-Net sub-network, and the improved U-Net sub-network realizes image restoration, including defogging and image texture restoration; the haze information perception memory subnetwork realizes the attention of different degrees to different haze concentrations.

Further, the structural design of the generator G comprises a haze information perception memory sub-network and a modified U-Net sub-network. Specifically, the G network construction steps are as follows:

step 2-1, introducing a haze information perception memory sub-network to guide the network to focus on an area with dense fog and output a haze information attention diagram;

step 2-2, the improved U-Net sub-network is derived from a U-Net network, a pooling layer and a batch standardization layer are abandoned on the basis of the U-Net, a residual dense module is introduced to replace a simple convolution layer, a LeakyReLU function is introduced to replace a ReLU function, and the performance of the network is improved from the aspects of feature extraction capability, nonlinear expression capability and the like;

the system comprises a Residual dense module (RDB), a characteristic fusion layer and a plurality of modules, wherein the RDB is constructed by the dense communication layer and the characteristic fusion layer, the dense communication layer comprises 3 ResNet Residual modules, the input of each Residual module is the sum of the outputs of all the Residual modules in the front on corresponding channels, shallow characteristics can be directly transmitted to a deeper network, the characteristics of all the modules in the front are connected by the characteristic fusion layer, finally, the characteristics are fused by a convolution layer with convolution kernel 1 multiplied by 1, and the dense characteristics are maintained in a self-adapting mode;

and 2-3, overlapping the input image with the output of the haze information perception memory sub-network in the channel direction, and inputting the overlapped input image into the improved U-Net sub-network to obtain a final restored image. The haze image is input into the haze information perception memory sub-network to obtain a haze information attention diagram, the haze information attention diagram and the haze image are input into the improved U-Net sub-network after being overlapped in the channel direction, and the U-Net sub-network outputs a final restored image.

Further, the haze information perception memory sub-network is constructed on the basis of an inclusion perception module, a ResNet residual module and an LSTM long-short term memory module, wherein the inclusion perception module and the ResNet residual module are respectively used for expanding the width and the depth of the network, and the LSTM long-short term memory module transmits low-layer feature information to a high-layer network to enable the network to pay more attention to feature information related to haze stage by stage. The haze information perception memory subnetwork comprises a plurality of layers of structures, the modules are sequentially connected into the layers of structures, an inclusion perception module and a ResNet residual module are respectively used for expanding the width and the depth of a network in each layer, an LSTM long-term and short-term memory module is used for transmitting information, and the LSTM long-term and short-term memory module is connected with an LSTM module in the next layer so as to transmit low-layer characteristic information to a high-layer network and enable the network to pay more attention to the characteristic information related to haze stage by stage.

Further, a residual error intensive module in the improved U-Net sub-network is constructed based on an intensive communication layer and a feature fusion layer, the intensive communication layer comprises 3 ResNet residual error modules, the input of each residual error module is the sum of the outputs of all the residual error modules on the corresponding channel, and shallow features can be directly transmitted to a deeper network; and the characteristic fusion layer connects the characteristics of all the residual modules, and finally performs characteristic fusion through the convolution layer of the convolution kernel, so as to adaptively retain the dense characteristics.

Further, in step 2, the judgment D is used for judging whether the image generated by G belongs to a clear image domain, and the countertraining is performed in cooperation with the generator, and the network mainly includes two parts as follows:

the first part is a multi-scale convolution module which performs size halving and channel multiplication operations on the input image. After the input image passes through the convolution modules with convolution kernel of 5 × 5, step size of 1, step size of 2 and step size of 4, an original image size feature map, 1/2 original image size feature map and 1/4 original image size feature map are obtained respectively. Wherein, the convolution module of each scale comprises 1 convolution layer and 1 LeakyReLU activation function layer.

The second part is a feature extraction module. The module is formed by connecting a plurality of convolution modules in series, performs convolution feature extraction on an input feature graph for a plurality of times, and each convolution module is formed by a convolution layer with a convolution kernel of 5 multiplied by 5 and 1 activation function layer. The features of different scales extracted by the first part of multi-scale convolution module are sent to convolution layers of different depths of the module so as to improve the loss of shallow feature information of a deep network.

The third part is an output module. The module is formed by convolution with the step length of 1, and an output characteristic diagram with the channel number of 1 is obtained.

Further, the overall target loss function in step 3 is dominated by WGAN-GP to combat loss, smoothen-L ₁ The pixel reconstruction loss and the characteristic perception loss are formed, and the calculation steps are as follows:

and 3-1, the antagonistic loss refers to the loss generated in the antagonistic training between the generator G and the discriminator D and is composed of a generator loss component and a discriminator loss component. By taking the idea of WGAN-GP as reference, wasserstein distance is adopted as the resistance loss, and the loss function calculates the trueness degree of a real image relative to a false image.

And 3-2, measuring the loss value between the generated image and the original input image through the reconstruction loss. Conventional L ₁ And L ₂ The loss does not make the reconstruction loss converge to a meaningful state, so the Smooth-L is adopted ₁ And the loss function is used as pixel reconstruction loss to compare the difference between the generated image and the real image and perform constraint optimization.

And 3-3, using the characteristic perception loss for the requirement of higher-level characteristics, wherein the perception loss of the network is composed of haze information perception loss and multi-scale perception loss, and the haze information perception loss is given by attention diagrams and MSE (mean square error) loss of a clear image and a haze image difference image.

Further, the multi-scale perception loss is given by a VGG19 classification network trained on an ImageNet data set, a restored image generated by a generator G and a clear image are simultaneously input into a VGG19 network, layers 2, 7, 12, 21 and 30 are respectively extracted to serve as feature perception layers, and L is adopted ₂ And calculating a loss value between the two corresponding feature layers by the function, and recording the loss value as multi-scale perception loss.

Further, the forward calculation in step 4 means that the haze image is input to G to obtain a restored image and a loss function value of G, the restored image and the clear image are input to D to calculate a loss function value of D, and the loss function of G and D is the corresponding overall loss function in step 4.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a network schematic in an embodiment of the invention;

FIG. 3 is a diagram of a haze information attention subnetwork structure of a generator in an embodiment of the present invention;

FIG. 4 is a diagram of an embodiment of a generator automatic codec subnetwork;

FIG. 5 is a diagram illustrating a network structure of a discriminator according to an embodiment of the present invention;

FIGS. 6(a), (b), and (c) are comparative images of a first set of clear images, haze images, and restored images according to an embodiment of the present invention;

FIGS. 7(a), (b), and (c) are comparative images of a second set of clear images, haze images, and restored images according to an embodiment of the present invention;

fig. 8(a), (b) and (c) are contrast images of a second set of clear images, haze images and restored images according to the embodiment of the present invention.

Detailed Description

In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.

The invention discloses a restoration method for generating an anti-network haze image based on a memory perception module. The method is characterized in that a countermeasure network framework is generated based on conditions, and an improved generator G and an improved discriminator D are respectively introduced, wherein the generator G learns the mapping relation from a haze image to a clear image domain, and the discriminator D is matched with G to perform countermeasure training. In the process of the countertraining, the optimization directions of loss function constraints G and D are introduced, when the loss functions are converged, the generator achieves local optimization, and the estimated value of the restored image can be solved by using the generator with complete training.

The method provided by the invention can realize the one-step restoration of the haze image in the haze weather, has higher operation efficiency and can obtain the restored image with high quality.

As shown in fig. 1, the method for generating the anti-network haze image based on the memory perception module according to the present invention includes the following steps:

step 1, constructing a haze-clear data set S, wherein the data set S comprises haze images and clear images, and the haze images correspond to the clear images one to one.

And 2, as shown in fig. 2, the defogging network generates a countermeasure network framework based on Pix2Pix, and an improved generator G and an improved discriminator D are respectively introduced in combination with an image defogging task. The generator G learns the mapping relation between the haze image domain and the clear image domain, the input haze image is converted to the clear image domain to obtain a restored image, the discriminator D judges whether the restored image generated by the generator G is a real image or not, and the discriminator D is matched with the generator G to perform countermeasure training.

In order to enable the generator G to better remove haze information and recover texture and structural features of an image, the invention designs G into a network with stronger generation capability based on the ideas of an attention mechanism and a residual error network. The structural design of G comprises a haze information perception memory sub-network and an improved U-Net sub-network. The concrete G network construction steps are as follows:

step 2-1, the haze information perception memory sub-network is shown in fig. 3(a) and has the functions of guiding the network to focus on a region with dense fog and outputting a haze information attention map, the network is constructed by an inclusion perception module, a ResNet residual module and an LSTM long and short term memory module, the inclusion perception module is shown in fig. 3(b), the inclusion perception module and the ResNet residual module are respectively used for expanding the width and the depth of the network, the LSTM long and short term memory module transmits low-layer feature information to a high-layer network, and the network focuses on feature information related to the haze more stage by stage.

Step 2-2, the improved U-Net sub-network is as shown in fig. 4(c), the invention abandons the pooling layer and the batch standardization layer on the basis of the U-Net network, introduces a residual dense module to replace a simple convolution layer, introduces a LeakyReLU function to replace a ReLU function, and respectively improves the performance of the network from the aspects of feature extraction capability, nonlinear expression capability and the like.

The system comprises a dense module network diagram, a Residual dense module (RDB), a dense communication layer and a feature fusion layer, wherein the dense communication layer comprises 3 ResNet Residual modules, the Residual dense module (RDB) is constructed by the dense communication layer and the feature fusion layer, the input of each Residual module is the sum of the outputs of all the previous Residual modules on corresponding channels, shallow features can be directly transmitted to a deeper network, the feature fusion layer connects the features of all the previous layers, finally, feature fusion is carried out through a convolution layer with convolution kernel 1 x 1, and dense features are adaptively reserved.

And 2-3, overlapping the input image with the output of the haze information perception memory sub-network in the channel direction, and inputting the overlapped input image into the improved U-Net sub-network to obtain a final restored image.

In step 2, the discriminator D is shown in fig. 5, and D is used to judge whether the image generated by G belongs to a clear image, and cooperates with the generator to perform countermeasure training, and the network mainly includes three parts as follows:

the first part is a multi-scale convolution module which performs size halving and channel multiplication operations on the input image. After the input image passes through convolution modules with convolution kernel of 5 × 5, step size of 1, step size of 2 and step size of 4, an original image size feature map, 1/2 original image size feature map and 1/4 original image size feature map are obtained. Wherein, the convolution module of each scale comprises 1 convolution layer and 1 LeakyReLU activation function layer.

The second part is a feature extraction module. The feature extraction module is formed by connecting a plurality of convolution modules in series, performs convolution feature extraction on an input feature graph for a plurality of times, and each convolution module is formed by a convolution layer with a convolution kernel of 5 multiplied by 5 and 1 activation function layer. The shallow feature information of different scales extracted by the first part of multi-scale convolution module is sent to convolution layers of different depths of the module so as to improve the loss of the shallow feature information of a deep network.

Step 3, initializing network parameters, and constructing a target loss function to constrain the optimization directions of a generator G and a discriminator D in countermeasure training; the target loss function comprises pixel reconstruction loss, feature perception loss and countermeasure loss generated by countermeasure training.

The penalty loss means a loss in the penalty training between the generator G and the discriminator D, and is composed of a generator penalty component and a discriminator penalty component. By taking the idea of WGAN-GP as reference, wasserstein distance is adopted as the resistance loss, and the loss function calculates the trueness degree of a real image relative to a false image.

Using reconstruction loss for measurementLoss values between the reconstructed image and the original input image. Conventional L ₁ And L ₂ The loss does not make the reconstruction loss converge to a meaningful state, so the Smooth-L is adopted ₁ And the loss function is used as pixel reconstruction loss to compare the difference between the generated image and the real image and perform constraint optimization.

The characteristic perception loss is used for meeting the requirement of higher-level characteristics, and the perception loss of the network is composed of haze information perception loss and multi-scale perception loss. Wherein, the haze information perception loss is given by attention force diagram and MSE loss of a clear image and a haze image difference image. The multi-scale perception loss is given by a VGG19 classification network trained on an ImageNet data set, a restored image and a clear image generated by a generator G are simultaneously input into a VGG19 network, layers 2, 7, 12, 21 and 30 are respectively extracted to be used as feature perception layers, and L is adopted ₂ And calculating a loss value between the two corresponding feature layers by the function, and recording the loss value as multi-scale perception loss.

And 4, performing forward calculation on the generator G and the discriminator D by using the paired data sets to obtain the loss function values of the generator G and the discriminator D. The forward calculation refers to inputting the haze image data set into G to obtain a loss value and a restored image of G, and inputting the restored image and the clear image into D to obtain a loss function value of D.

And 5, fixing the network parameters of the generator G, and updating the parameters of the discriminator D network by using an Adam optimizer. The momentum term of the optimizer is 0.5, the attenuation rate is 0.999, the initial learning rate is 0.0001, and a linear attenuation strategy is used for the learning rate, when the training frequency reaches the 50 th time, the learning rate starts to linearly attenuate from 0.0001, until the learning rate attenuates to 0 at the 100 th time, and the network stops training.

And 6, fixing the network parameters of the discriminator D, and updating the parameters of the generator G network by using an Adam optimizer. The learning rate parameter settings of the optimizer are the same as step 5.

And 7, circularly executing the step 4, the step 5 and the step 6 until the loss function value calculated in the step 4 is converged, obtaining a local optimal solution of the generator network parameters at the moment, and calculating the restoration image estimation value by using the generator with complete training.

As shown in fig. 6(a) (b) (c), fig. 7(a) (b) (c), and fig. 8(a) (b) (c), which are contrast images of a clear image, a haze image, and a restored image of three groups of images, it can be seen intuitively from the images that the method for restoring a network haze image generated based on a memory perception module provided by the present invention can restore a haze image in a haze weather in a further position, and can restore the haze image in a high quality, thereby significantly improving the image quality and improving the resolution capability of image details.

While the invention has been described in terms of its preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. A method for generating and restoring an anti-network haze image based on a memory perception module is characterized by comprising the following steps:

step 1, constructing a haze-clear data set S: the data set S comprises haze images and clear images, and the haze images correspond to the clear images one to one;

step 3, initializing network parameters G and D, and constructing a target loss function to restrict and generate the optimization directions of a generator G and a discriminator D in the countermeasure training; the target loss function comprises pixel reconstruction loss, feature perception loss and countermeasure loss generated by countermeasure training;

step 5, fixing the network parameters of the generator G, and updating the network parameters of the discriminator D by using an Adam optimizer;

step 6, fixing the network parameters of the discriminator D, and updating the network parameters of the generator G by using an Adam optimizer;

and 7, judging whether the target loss functions of the generator G and the discriminator D are converged simultaneously, returning to the step 4 if not, otherwise, finishing the training to obtain a local optimal solution of the network parameters of the generator G, and calculating a restored image estimation value, namely a defogged image, by using the trained generator.

2. The method for generating the anti-network haze image restoration based on the memory perception module as claimed in claim 1, wherein the generator G network uses a modified U-Net sub-network as a backbone network, and introduces a haze information perception memory sub-network constructed based on an attention mechanism and a recurrent neural network, so that the generator G can realize the defogging function and also can realize the function of restoring the texture and structural features of the image.

3. The method for generating the anti-network haze image based on the memory perception module according to the claim 2, wherein the specific steps of constructing the generator G in the step 2 are as follows:

step 2-1, introducing a haze information perception memory subnetwork, and outputting a haze information attention diagram;

2-2, the improved U-Net sub-network abandons a pooling layer and a batch standardization layer on the basis of the classical U-Net network, introduces a residual dense module to replace a simple convolution layer, introduces a LeakyReLU function to replace a ReLU function, and respectively improves the performance of the network in the aspects of feature extraction capability and nonlinear expression capability;

4. The method for generating the antagonistic network haze image restoration based on the memory perception module according to claim 3, wherein the haze information perception memory sub-network is constructed based on an increment perception module, a ResNet residual module and an LSTM long and short term memory module, the haze information perception memory sub-network comprises a plurality of layers of structures, the modules are sequentially connected into a layer structure, the increment perception module and the ResNet residual module are respectively used for expanding the width and the depth of the network in each layer, the LSTM long and short term memory module is used for transmitting information, and the LSTM long and short term memory module is connected with the LSTM module in the next layer so as to transmit low-layer feature information into the high-layer network and enable the network to pay more attention to the feature information related to haze stage by stage.

5. The method for generating anti-network haze image restoration based on memory perception module according to claim 4, wherein the residual dense module in the improved U-Net sub-network is constructed based on a dense communication layer and a feature fusion layer, the dense communication layer comprises 3 ResNet residual modules, the input of each residual module is the sum of the outputs of all previous residual modules on the corresponding channel, and the shallow feature can be directly transmitted to the deeper network; and the characteristic fusion layer connects the characteristics of all the residual modules, and finally performs characteristic fusion through the convolution layer of the convolution kernel, so as to adaptively retain the dense characteristics.

6. The method for restoring the haze image of the generated antagonistic network based on the memory perception module according to any one of claims 1 to 5, wherein in the step 2, a discriminator D is used for judging whether the image generated by G belongs to a clear image or not, and is matched with a generator G for antagonistic training, and the discriminator D network comprises a multi-scale convolution module, a feature extraction module and an output module; firstly, the input image is subjected to size halving and channel multiplication operations through a multi-scale convolution module, then, the input feature graph is subjected to multiple times of convolution feature extraction through a feature extraction module, and finally, the feature graph is output through an output module.

7. The method for generating anti-network haze image restoration based on the memory perception module as claimed in claim 6, wherein the pixel reconstruction loss in the step 3 is used for measuring a loss value between the generated image and the original input image, comparing the difference between the generated image and the real image, and performing constraint optimization;

the characteristic perception loss comprises haze information perception loss and multi-scale perception loss, wherein the haze information perception loss is obtained by attention force drawing, MSE loss of a clear image and a difference image of a haze image; the multi-scale perception loss is obtained by a classification network, the restored image and the clear image generated by the generator G are simultaneously input into the classification network, the feature perception layers are extracted, and the loss value between the feature layers is calculated to be the multi-scale perception loss;

the confrontation loss comprises a generator loss component and a discriminator loss component, and the degree of the real image relative to the real degree of the false image is calculated by adopting the wasserstein distance as a confrontation loss function.

8. The method for generating anti-network haze image restoration based on the memory perception module as claimed in claim 7, wherein the forward calculation in the step 4 is: and inputting the haze image into a generator G to obtain a restored image and a loss function value of the generator G, then inputting the restored image and the clear image into a discriminator D, and calculating to obtain the loss function value of D.