CN115222592A

CN115222592A - Underwater image enhancement method based on super-resolution network and U-Net network and training method of network model

Info

Publication number: CN115222592A
Application number: CN202210733444.0A
Authority: CN
Inventors: 刘磊; 陈海秀; 金肃钦
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2022-10-21

Abstract

The invention discloses an underwater image enhancement method based on a super-resolution network and a U-Net network and a training method of a network model, and belongs to the technical field of underwater image processing. According to the invention, a depth residual error super-resolution network and a U-Net underwater image enhancement network based on an SK attention mechanism and a K estimation module are constructed, so that the resolution of an image can be improved, the image blur can be eliminated, and a natural color enhancement image can be generated.

Description

Underwater image enhancement method based on super-resolution network and U-Net network and training method of network model

Technical Field

The invention relates to an underwater image enhancement method based on a super-resolution network and a U-Net network and a training method of a network model, belonging to the technical field of underwater image processing.

Background

With the development of the times and the progress of scientific technology, people continuously know and expand objects such as underwater organisms, underwater resources and the like, but due to the fact that an underwater complex environment and water bodies attenuate greatly, water molecules, various microorganisms and the like in the water bodies have certain absorption and reflection effects on light, the problems that the obtained underwater images are low in existing degree, low in contrast, fuzzy in outline, disordered in color and the like are caused, and the low-quality underwater images bring great difficulty to researchers to analyze underwater targets and recognize and detect the underwater targets. Therefore, in what way to enhance the details of the underwater image, recovering the information in the underwater image becomes a challenging problem.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides an underwater image enhancement method based on a super-resolution network and a U-Net network and a training method of a network model, so as to solve the problem that underwater images are difficult to enhance in the prior art.

In order to solve the technical problem, the invention is realized by adopting the following scheme:

the invention provides an underwater image enhancement method based on a super-resolution network and a U-Net network, which comprises the following steps:

acquiring an underwater original image;

inputting the underwater original image into a trained depth residual super-resolution network model, and outputting a high-resolution underwater image;

inputting the high-resolution underwater image into a trained U-Net network model, and outputting an underwater enhanced image;

the depth residual super-resolution network model and the U-Net network model both comprise a generator network and a discriminator network;

the generator network of the depth residual super-resolution network model comprises a plurality of convolution blocks consisting of a depth residual channel attention block DRCAB and an additional convolution layer with a tanh activation function;

the generator network of the U-Net network model comprises an encoder module, a K estimation module, a converter module, an SK attention module and a decoder module which are sequentially connected.

The invention also provides a training method of the depth residual super-resolution network model and the U-Net network model, the trained depth residual super-resolution network model and the U-Net network model are used for enhancing the underwater image, and the training method of the depth residual super-resolution network model and the U-Net network model comprises the following steps:

acquiring a training data set, wherein the training data set comprises an original image and a distorted image mapped with the original image;

and respectively inputting the image samples in the training data set into a pre-established depth residual super-resolution network model and a U-Net network model to perform model alternating iterative training until the loss function value of the model does not decrease, and finishing the training.

Preferably, the training data sets are respectively a USR-248 super-resolution data set and an EUVP paired underwater image data set.

Preferably, the pre-established depth residual super-resolution network model and the pre-established U-Net network model both comprise a generator network and a discriminator network.

Preferably, the generator network of the pre-established depth residual super-resolution network model comprises three convolution blocks connected in sequence, and each convolution block comprises a depth residual channel attention block DRCAB, a convolution layer and a tanh activation function layer.

Preferably, the DRCAB includes a first convolutional layer, a first BN batch normalization layer, a first softmax activation function layer, a second convolutional layer, a second BN batch normalization layer, a third convolutional layer, an avgpool2d average pooling layer, a fourth convolutional layer, a second softmax activation function layer, a fifth convolutional layer, and an upsampling layer, which are sequentially connected.

Preferably, the generator network of the pre-established U-Net network model comprises an encoder module, a K estimation module, a converter module, an SK attention module and a decoder module which are connected in sequence; the encoder and the decoder of each layer in the generator network are connected in a jump connection mode.

Preferably, the SK attention module comprises a Split module, a Fuse module and a Select module which are connected in sequence.

Preferably, the Split module performs multiple convolutions on the input image by using 2 convolution kernels with different sizes;

the Fuse module calculates the weight parts of the 2 convolution kernels, and sums the feature maps of the two parts according to elements:

in the above formula, the first and second carbon atoms are,

is the weight profile extracted by the first convolution kernel,

is a weight profile extracted by a second convolution kernel;

U _c generation of a feature map S by globally averaged pooling layers _c ，S _c And generating a compact characteristic diagram z through the full connection layer, wherein the calculation formula is as follows:

z＝f(S _C )＝δ(B(S _C W))

in the formula, C, H and W are the sizes of input images, delta is a ReLU activation function, and B is a BN batch standardization layer;

and the Select module calculates the weights of the 2 convolution kernels through a softmax activation function, applies the weights to the feature graph z, obtains 2 new feature graphs, and then performs connection fusion to obtain a final output image.

Preferably, the method for calculating the model loss function value includes:

calculating global similarity loss:

L ₂ (G)＝E _X～Y [||Y-G(X)|| ₂ ]

in the above formula, X is a distorted image, Y is a real image, and G is a generationA network of devices, G (X) is an image generated by the generator, wherein E _X～Y Representing the expectation that the distorted image is a real image, | | Y-G (X) | charging ₂ A distance between the real image and the image generated by the generator;

calculating the perception loss:

in the above formula, r, G and b respectively represent the difference of the normalized values of the red, green and blue channels between the image G (X) generated by the generator and the real image Y,

average of the red channels;

calculating the content loss:

in the above formula, X and Y are respectively a distorted image and a real image,

showing the feature maps extracted from the fourth and fifth convolutional layers of the pre-trained VGG-19 network,

the distance between the feature map of the real image and the feature map of the distorted image is calculated;

the total loss was calculated:

L _g (G)＝λ _c L _C (G)+λ _p L _P (G)+λ ₂ L ₂ (G)

in the above formula, λ _c ，λ _p And λ ₂ Are weight values for content loss, perceptual loss, and global similarity loss.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, a depth residual error super-resolution network (SRDRCAM) and a U-Net underwater image enhancement network based on an SK attention mechanism and a K estimation module are constructed, so that the resolution of an image can be improved, the image blur can be eliminated, and a natural color enhancement image can be generated.

2. In the invention, besides global similarity loss and perception loss, a content loss structure is additionally added, the overall structure of an input image is reserved by optimizing the global similarity loss, and the perception loss is optimized, so that a network can better recover the detail information of an underwater image; the content is lost so that the content of the output image is more similar to the input image.

3. The invention provides an end-to-end network structure, and the method does not need any underwater image imaging model parameters in the stages of training and testing.

4. The network parameters are fewer, the network training speed is higher, and the performance is better compared with other models.

Drawings

FIG. 1 is a network architecture of an SK attention mechanism provided by an embodiment of the present invention;

FIG. 2 is a network structure of a depth residual channel attention block provided by an embodiment of the present invention;

FIG. 3 is a generator network structure of a deep residual super-resolution network model provided by an embodiment of the invention;

FIG. 4 is a generator network structure of the U-Net network model provided by an embodiment of the present invention;

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in the orientations and positional relationships indicated in the drawings, which are based on the orientations and positional relationships indicated in the drawings, and are used for convenience in describing the present invention and for simplicity in description, but do not indicate or imply that the device or element so referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be construed as limiting the present invention.

Example 1:

the embodiment provides a training method of a depth residual super-resolution network (SRDRCAM) model and a U-Net network model, and the trained depth residual super-resolution network model and the U-Net network model can effectively enhance underwater images. The training steps are as follows:

the method comprises the following steps: preparing a training data set

The 2x, 4x and 8x super resolution networks were trained using the prior published super resolution dataset USR-248, which contains 2x, 4x and 8x images, eliminating detail blur of the underwater images. The EUVP paired underwater image datasets are used to train an improved U-Net network. The training iteration times of the super-resolution network and the U-Net network are set to be 10 times, and the batch processing size is set to be 1. And the images are all resized to 256x256.

Step two: network structure for constructing SRDRCAM and U-Net network structure

The SRDRCAM network architecture includes a generator network and an arbiter network. Therein, as shown in fig. 3, the generator network consists of three depth residual channel attention blocks DRCAB, three convolutional layers Conv and three tanh activation functions, each DRCAB block, convolutional layer (Conv) and tanh activation function constituting a super resolution per 2 x. The network structure of the depth residual attention block DRCAB is shown in fig. 2, where the DRCAB includes a convolution layer and eight repeated residual channel attention multiplication blocks, followed by a convolution layer, and finally an upsampling layer, and specifically includes a first convolution layer, a first BN batch normalization layer, a first softmax activation function layer, a second convolution layer, a second BN batch normalization layer, a third convolution layer, an avgpool2d average pooling layer, a fourth convolution layer, a second softmax activation function layer, a fifth convolution layer, and an upsampling layer, which are sequentially connected. Inputting the image size of wxh x3, and obtaining 2w x 2h x3 output through DRACMB, conv and tanh for the first time; obtaining output of 4w x 4h x3 through DRACMB, conv and tanh for the second time; and obtaining 8w x 8h x3 output through the third DRACMB, conv and tanh, and zooming to eliminate the blurring of the underwater image.

A U-Net network structure is improved, and a K-estimation module and a clear image generation module are designed. The K estimation module is the core of the improved U-Net network, and is inspired by the use of an image enhancement algorithm of the U-Net in the space, and an improved U-Net framework is used for generating underwater image features and enhancing underwater images. An SK attention mechanism is also added into the U-Net network to modify the architecture of the U-Net. The network structure of the SK attention mechanism is shown in fig. 1. SK attention networks have different weights for different convolution kernels, i.e. a network that dynamically generates convolution kernels for images of different scales. The composition mainly comprises three parts of Split, fuse and Select:

the Split part is to perform multiple convolution operations on the input image using convolution kernels of different sizes, and the present invention employs convolution kernels of sizes 3x3 and 5x 5.

The Fuse part is a part for calculating the weight of each convolution kernel, and the feature maps of the two parts are summed according to elements, and the calculation formula is as follows:

in the above-mentioned formula, the compound has the following structure,

is a weight feature extracted by a convolution kernel of size 3x3,

is a weight feature map extracted by a 5x5 convolution kernel. U shape _c Generating channel statistical information through a Global Average Pooling (GAP) layer to generate a feature map S _c ，S _c Dimension Cx1, S _c And generating a compact characteristic diagram z (dimension is dx 1) through the full connection layer, wherein the calculation formula is as follows:

z＝f(S _C )＝δ(B(S _C W))

d＝max(C/r,L)

wherein C (number of channels), H (height), W (width) are the size of the input image, delta is the ReLU activation function, B is the BN batch normalization layer, L is the optimal value selected according to the size of two convolution kernels, the value in the invention is set as 32, r is the compression factor; the dimension of z is the number of convolution kernels, the dimension of W is dxC, and d represents the characteristic dimension after full connection.

And the Select part calculates the weights of the 2 convolution kernels through softmax, then applies the weights to the feature map z to obtain 2 new feature maps, and then performs connection fusion to obtain a final output image.

The underwater image enhancement network (U-Net) comprises a generator network and a discriminator network. As shown in fig. 4, the generator network includes an encoder module, a K estimation module, a converter module, an SK attention module, and a decoder module, which are connected in sequence; the encoder and the decoder of each layer in the generator network are connected in a skip connection mode; and compensating the color model by using an end-to-end underwater image enhancement network U-Net network to generate a natural color enhancement image.

Step three: optimizing a loss function

Because the common error measurement can not reflect the optimization degree of the image in all aspects, the invention solves the problem by optimizing the loss function, so that the output image is closer to the real image. The invention uses 3 loss functions, respectively global similarity loss (L) ₂ ) Loss of perception (L) _p ) Content loss (L) _c ). The method comprises the following specific steps:

global similarity loss function:

L ₂ (G)＝E _X～Y [||Y-G(X)|| ₂ ]

in the above equation, X is a distorted image, Y is a real image, G is a generator, and G (X) is an image generated by the generator, where E is _X～Y Representing the expectation that the distorted image is a real image, | | Y-G (X) | charging ₂ Is the distance between the real image and the image generated by the generator.

Global similarity loss function L ₂ Refers to the overall visual effect, which is used to measure the difference between the real image and the image enhanced by the method of the present invention, and aims to improve the visual quality of the output image.

Perceptual loss function:

in the above equation, r, G, and b represent the difference of the normalized values of the red, green, and blue channels between the generated image G (X) and the real image Y, respectively.

The average of the red channel.

The invention can eliminate the blue-green color cast of the image by adopting the perception loss, so that the generated image is more real, and the distortion of the image is reduced.

Content loss function:

because the red component of the light is most seriously attenuated underwater, which causes the color of the underwater image to be greenish or bluish, for the construction of the loss function, the invention introduces a content loss function L besides global similarity loss and perception loss _c The color distribution of the image is corrected, so that the details of the enhanced image are clearer. The calculation formula is as follows:

showing high-level feature maps extracted from the fourth and fifth convolutional layers of a pre-trained VGG-19 network,

the distance between the feature map of the real image and the feature map of the distorted image.

Calculate the total loss function: combining together the loss functions of the multiple modes:

L _g (G)＝λ _c L _C (G)+λ _p L _P (G)+λ ₂ L ₂ (G)；

Step four: network model training and setup

The network model training of the invention is actually the training of two networks, namely an SRDRCAM network and a U-Net network. Firstly, a super-resolution public data set USR-248 is adopted to train an SRDRCAM network, and a discriminator judges whether the image is a real image or a false image. Then training the generator under the discriminator, and optimizing the network of the generator according to the global similarity loss function, the perception loss function and the content loss function in the third step. The optimization process is completed through a Pythrch frame, loss is input into the optimizer, the optimizer performs minimization processing on the loss, the arbiter and the generator are iterated in sequence in an alternating mode until the loss function value is not reduced any more, and network training is completed. And after the SRDRCAM network model is trained, then the U-Net network model is trained, the U-Net network trains the U-Net network by adopting the paired data sets EUVP, and the steps are repeated continuously until the network training is finished.

Inputting a low-resolution image into a trained SRDRCAM network model to obtain a deblurred high-resolution image, inputting the generated high-resolution image into the trained U-Net network model, and compensating the color model by the U-Net network by using an end-to-end underwater image enhancement network to generate a natural color enhancement image. At the moment, the network training is completely finished, and the goal of enhancing the underwater image is achieved.

This example uses an ADAM optimizer to train the model and set the learning rate to 0.0002, the momentum to 0.5, the batch size to 1, and the iteration cycles for both the SRDRCAM and U-Net networks to 20. In this embodiment, the network of this embodiment is implemented using a pytorech framework, and the network model is trained using NVIDIA RTX3060GPU and i5 10440 KFCPU.

Example 2

The embodiment provides an underwater image enhancement method based on a super-resolution network and a U-Net network, and the underwater image enhancement is carried out by adopting the depth residual error super-resolution network model and the U-Net network model which are trained in the embodiment 1. The underwater image enhancement method comprises the following steps:

acquiring an underwater original image;

and inputting the underwater image with high resolution into the trained U-Net network model, and outputting an underwater enhanced image.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. An underwater image enhancement method based on a super-resolution network and a U-Net network is characterized by comprising the following steps:

acquiring an underwater original image;

inputting an underwater original image into a trained depth residual super-resolution network model, and outputting a high-resolution underwater image;

2. A method for training a depth residual super-resolution network model and a U-Net network model, wherein the trained depth residual super-resolution network model and the U-Net network model are used for underwater image enhancement according to claim 1, and the method for training the depth residual super-resolution network model and the U-Net network model comprises the following steps:

3. The method for training the depth residual super-resolution network model and the U-Net network model according to claim 2, wherein the training datasets are a USR-248 super-resolution dataset and an EUVP paired underwater image dataset, respectively.

4. The method for training the deep residual super-resolution network model and the U-Net network model of claim 2, wherein the pre-established deep residual super-resolution network model and the U-Net network model each comprise a generator network and a discriminator network.

5. The method for training the depth residual super-resolution network model and the U-Net network model according to claim 4, wherein the generator network of the pre-built depth residual super-resolution network model comprises three sequentially connected convolution blocks, each convolution block comprising a depth residual channel attention block DRCAB, a convolution layer and a tanh activation function layer.

6. The method for training the deep residual super-resolution network model and the U-Net network model of claim 5, wherein the DRCAB comprises a first convolutional layer, a first BN batch normalization layer, a first softmax activation function layer, a second convolutional layer, a second BN batch normalization layer, a third convolutional layer, an avgpool2d average pooling layer, a fourth convolutional layer, a second softmax activation function layer, a fifth convolutional layer, and an upsampling layer, which are connected in sequence.

7. The method for training the deep residual super-resolution network model and the U-Net network model of claim 4, wherein the generator network of the pre-established U-Net network model comprises an encoder module, a K estimation module, a converter module, an SK attention module and a decoder module which are connected in sequence; the encoder and the decoder of each layer in the generator network are connected in a jump connection mode.

8. The method for training the deep residual super-resolution network model and the U-Net network model of claim 7, wherein the SK attention module comprises a Split module, a Fuse module and a Select module which are connected in sequence.

9. The method for training the deep residual super-resolution network model and the U-Net network model of claim 8, wherein the Split module performs multiple convolutions on the input image by using convolution kernels of 2 different sizes;

in the above-mentioned formula, the compound has the following structure,

is the weight profile extracted by the first convolution kernel,

is the weight feature graph extracted by the second convolution kernel;

z＝f(S _C )＝δ(B(S _C W))

10. The method for training the deep residual super-resolution network model and the U-Net network model according to claim 2, wherein the method for calculating the model loss function value comprises:

calculating global similarity loss:

L ₂ (G)＝E _X～Y [||Y-G(X)|| ₂ ]

in the above equation, X is a distorted image, Y is a real image, G is a generator network, and G (X) is an image generated by a generator, where E is _X～Y Representing the expectation that the distorted image is a real image, | | Y-G (X) | charging ₂ A distance between the real image and the image generated by the generator;

calculating the perception loss:

average of the red channels;

calculating the content loss:

the total loss was calculated:

L _g (G)＝λ _c L _C (G)+λ _p L _P (G)+λ ₂ L ₂ (G)