CN116630188A

CN116630188A - Underwater image enhancement method, system, electronic equipment and storage medium

Info

Publication number: CN116630188A
Application number: CN202310583227.2A
Authority: CN
Inventors: 王骥; 钟远昊; 罗圳
Original assignee: Guangdong Ocean University
Current assignee: Guangdong Ocean University
Priority date: 2023-05-23
Filing date: 2023-05-23
Publication date: 2023-08-22

Abstract

The application provides an underwater image enhancement method, an underwater image enhancement system, electronic equipment and a storage medium, which comprise the following steps: acquiring a target underwater image; determining an underwater enhanced image according to the target underwater image and the image enhancement network; the saturation and brightness of the underwater enhanced image are higher than those of the target underwater image; the image enhancement network is constructed according to a gating fusion frame; the gating fusion framework comprises a confidence map generator and an image refiner which are connected in sequence; the confidence map generation module is constructed according to the selective kernel convolution and spatial attention module; the confidence map generator is configured to generate a predicted confidence map. The application solves the problems of color deviation, background blurring, low contrast, low visibility and the like of the underwater image in the prior art.

Description

Underwater image enhancement method, system, electronic equipment and storage medium

Technical Field

The application belongs to the field of image enhancement, and particularly relates to an underwater image enhancement method, an underwater image enhancement system, electronic equipment and a storage medium.

Background

In recent years, underwater image enhancement plays an important role in underwater resource exploration, aquatic robot detection and underwater archaeology. Although images taken from underwater have benefits in marine resource development, there are still some issues to be addressed, such as image distortion caused by light absorption, and image blurring caused by scattering (including forward and backward scattering). In addition, attenuation of underwater light also causes some underwater image problems such as low contrast, color spots, low visibility, and blurred details. These problems greatly reduce the efficiency of ocean resource development. Therefore, it is important to improve the visual quality, contrast and color characteristics of underwater images to accurately mine the underwater world.

Disclosure of Invention

The application aims to provide a method, a system, electronic equipment and a storage medium for enhancing an underwater image, which can enhance the visual quality, contrast and color characteristics of the underwater image.

In order to achieve the above object, the present application provides an underwater image enhancement method, comprising:

acquiring a target underwater image;

determining an underwater enhanced image according to the target underwater image and the image enhancement network; the saturation and brightness of the underwater enhanced image are higher than those of the target underwater image;

the image enhancement network is constructed according to a gating fusion frame; the gating fusion framework comprises a confidence map generator and an image refiner which are connected in sequence; the confidence map generation module is constructed according to the selective kernel convolution and spatial attention module;

the confidence map generator is used for generating a predicted confidence map;

the image refiner is used for preprocessing the target underwater image.

Further, the method for determining the image enhancement network comprises the following steps:

acquiring training data; the training data comprises training underwater images and corresponding underwater enhanced images;

constructing a gating fusion frame network;

and inputting the training data into the gating fusion frame network, training according to a loss function, and determining the trained gating fusion frame network as an image enhancement network.

Further, determining an underwater enhanced image according to the target underwater image and the image enhancement network specifically includes:

performing gamma correction, white balance and histogram equalization algorithm on the target underwater image to obtain a refined image;

and multiplying the refined image with the confidence map to obtain the underwater enhanced image.

Further, the loss function is MS-SSIM loss, perception loss and MAE loss;

the MS-SSIM loss and MAE loss are used to maintain high frequency region, color and brightness information; the perceived loss measures the similarity of images that match the human visual system.

Further, the image refiner consists of a selective kernel convolution and a two-dimensional convolution layer for performing preprocessing of the target underwater image.

The application also provides an underwater image enhancement system, comprising:

the acquisition module is used for acquiring training data;

a generation module for generating an enhanced image

The application also provides an electronic device comprising a memory for storing a computer program and a processor for running the computer program to cause the electronic device to perform the underwater image enhancement method according to the above.

The present application also provides a computer readable storage medium storing a computer program which when executed by a processor implements the above-described underwater image enhancement method.

The application has the technical effects that:

the application solves the problems of color deviation, background blurring, low contrast, low visibility and the like of the underwater image in the prior art.

Drawings

The accompanying drawings illustrate various embodiments by way of example in general and not by way of limitation, and together with the description and claims serve to explain the inventive embodiments. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. Such embodiments are illustrative and not intended to be exhaustive or exclusive of the present apparatus or method.

FIG. 1 shows a selective kernel convolution schematic of the present application;

FIG. 2 shows a schematic diagram of a spatial attention module of the present application;

FIG. 3 shows a schematic diagram of the network architecture of the SCAUIE-Net of the present application;

FIG. 4 shows a schematic diagram of the structure of the image refiner of the present application;

fig. 5 shows a schematic diagram of a selective kernel block of the present application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

The application proposes a CNN-based underwater image enhancement model using spatial and channel attention, called SCAUIE-Net. First, the present application introduces the case of input generation of underwater images in the UIEB dataset. A network architecture including the selective core block and the spatial attention module is then described. Finally, the loss function used in SCAUIE-Net is introduced.

Input generation

In order to meet the lighting conditions and complex underwater scenarios required for training data, the present application obtains input through a variety of preprocessing operations. The present application generates three inputs by applying white balance, histogram equalization and gamma correction algorithms, respectively. The result is then obtained using a fusion strategy of the mixed color features. The application directly applies the white balance technology to minimize the chromatic aberration effect of the whole scene. The application adopts histogram equalization on Lab color space to improve contrast and fade dark area. In the gamma correction algorithm, the present application sets the gamma value to 0.7.

Selective kernel convolution

The size of the receptive field plays an important role in image color perception. For underwater images, the enhancement effect may benefit from adaptively adjusting the size of the receptive field. Thus, the present application uses an automatic selection operation, i.e. "selective kernel convolution", in multiple kernels having different kernel sizes. Specifically, SK convolution, split, fuse, and Select, is implemented by three operators, as shown in fig. 2, where a double-branch case is shown. Thus, in this example, there are only two cores with different core sizes, but it is easily extended to the multi-branch case.

Split. For any given feature map X εR ^{H′×W′×C′} Under default, the application firstly performs two conversionsAnd->Kernel sizes 3 and 5, respectively. These two +.>And->Are composed of efficient grouping/deep convolution, batch normalization, and ReLU functions in sequence. To further improve efficiency, the traditional convolution with a convolved 5 x 5 kernel is replaced by an extended convolution. 3x3 kernel and dilation convolution with dilation size 2.

Fuse. The aim of the application is to enable neurons to adapt their receptive field size adaptively according to the stimulation content. The basic idea is to use gates to control the multiple branches carrying different scale information into neurons of the next layer. To achieve this goal, the gate needs to integrate information from all branches. The application first fuses the results of multiple branches by means of element addition.

The present application then embeds global information, such as s e R, by simply using the global average set to generate statistics across the channel ^C Specifically, the 1 st element of s is calculated by spatial dimension reduction U. The 1 st element of c-s is calculated by reducing the spatial dimension by U.

H×W:

Furthermore, a compact function z ε R ^d×1 A compact feature is created to provide guidance for accurate and adaptive selection. This is achieved by a simple full connectivity (fc) layer while reducing the dimensions to increase efficiency.

Where δ is the ReLU function.Indicating batch normalization. W epsilon R ^d×C To investigate the effect of d on model efficiency, the present application uses a reduction rate r to control its value.

d＝max(C/r,L), (4)

Where L represents d (l=32 is a typical setting in the experiments of the present application).

Select. A cross-channel soft attention (soft attention) is used to adaptively select information of different spatial scales, which is guided by compact feature descriptors. Specifically, a softmax operator is applied on the number between channels.

Wherein A, B is E R ^C×d And a, b indicate soft attention vectors for the following casesAnd->Respectively. Note that a _c ∈R ^1×d Lines 1A and a of c _c Is the 1 st element of c, likewise B _c And b _c In the case of two branches, matrix B is superfluous, because a _c +b _c =1, the final profile V is obtained by attention weights on various kernels.

Wherein V= [ V ₁ ,V ₂ ,…,V _c ],V _c ∈R ^H×W . Note that the present application provides a formula for the case of two branches, and more branches can be easily deduced by extending the formulas (1), (5) and (6).

Space attention module

The present application generates a spatial attention pattern by utilizing the spatial relationship of features. Unlike channel awareness, spatial awareness is concerned with "where" is an information part that complements channel awareness. To calculate spatial attention, the present application first applies the average pooling and maximum pooling operations along the channel axis and concatenates them to generate an efficient feature descriptor. Applying the collective operations along the channel axis has proven to be effective in highlighting the information region. On tandem feature descriptors, the present application applies selective-kernel convolution layers to generate spaceNote that figure M _s (F)∈R ^H×W Encoded for emphasis or compression.

The present application generates two-dimensional feature maps by aggregating the channel information of the feature maps using two pooling operations.And->Each representing an average pooling feature and a maximum pooling feature of the entire channel. These maps are then concatenated and convolved by a standard convolution layer to produce the two-dimensional spatial attention map of the present application. In short, the method for calculating the spatial attention is as follows:

where σ represents the sigmoid function sum f ^7×7 Representing a selective kernel convolution operation, the filter size is 7 x 7.

Network structure

1) And (3) an integral structure. The SCAUIE-Net is shown. The method is a gating fusion network, learns three signal charts and respectively represents the most important input characteristics. Then, the input is fused with the confidence map to obtain a fused image. The sum of the fused images is the result of the enhancement.

The proposed architecture of SCAUIE-Net consists of two parts. An image refiner and a confidence map generator. The components used in SCAUIE-Net are a selection block and a spatial attention module. The image refiner is a common full CNN. The signal diagram generator uses U-Net as the backbone. In order to reduce chromatic aberration and artifacts caused by white balance, histogram equalization and gamma correction algorithms, the application adds three image refiners and feeds three derived inputs and the original input into the image refiners. The application then feeds the refined inputs to a confidence map generator, respectively, to predict the confidence map. Finally, the refined three inputs are multiplied by the three learned confidence maps to achieve the final enhanced result.

I _en ＝R _WB ⊙C _WB +R _HE ⊙C _HE +R _C ⊙C _GC (1)

Wherein I is _en Is an enhanced result. As indicated by the ". Sur, the matrix was generated element by element. R is R _WB ,R _HE and R _GC And respectively inputting refined results after white balance, histogram equalization and gamma correction algorithm processing. C (C) _WB ,C _HE And C _GC Is a learned signal chart.

2) Image rethrer. Image scanner is a shallow CNN. It consists of selective kernel convolution and two-dimensional convolution layers, each followed by a ReLU. At the first layer, a 1x1 convolution is used to increase the number of feature channels to 32. Then, the selective kernel convolution is applied twice to centrally process the channel information. The second selective deconvolution layer doubles the number of characteristic channels. At the last layer, a 1x1 convolution is used to reduce the dimension, mapping 64 feature channels into a 3 channel refined image.

3) A confidence map generator (Confidence Generator). The backbone of the confidence map generator is U-Net, which performs well in image processing. Similar to U-Net, the confidence map generator consists of a contracted path and an expanded path. The puncture path includes repeated application of two 3x3 convolutions, each with a ReLU, a 2x2 maxpooling operation, downsampling by 2, a selective kernel block, which is a basic residual block constructed from the selective kernel convolutions, and a spatial attention module for extracting spatial attention weights. The present application doubles the number of characteristic channels in each downsampling step. The dilation path includes upsampling the feature map followed by a 2x2 convolution, a selective kernel block, and a spatial attention module. The use of the selective kernel block and the spatial attention module is similar to the contraction step. In each expansion step, the application halves the number of feature channels.

Network loss function

The end-to-end training of SCAUIE-Net is supervised by three penalty components, which includeAnd

1) SSIM loss: to enhance the underwater image from the standpoint of brightness, contrast, and structure, the error function of the perception-driven SSIM is effective. The SSIM (p) of a pixel is defined as:

wherein x, y represents the position p of the pixel; mu (mu) _x ，μ _y Sum sigma _x ，σ _y Representing the mean value and standard deviation p of the pixel, respectively; sigma (sigma) _xy Representing the covariance of x, y. C (C) ₁ ，C ₂ Is a small constant for maintaining the stability of x and y. l (p), c (p) and s (p) thus, the loss function of SSIM can be written as epsilon (p) =1-SSIM (p):

2) MS-SSIM loss: in practice, subjective evaluations of specific images vary due to factors that vary from image to image. The multi-scale approach facilitates the inclusion of image details at different resolutions. The present application suggests using a multi-scale version of the SSIM, MS-SSIM, rather than fine tuning the settings. Given a two-dimensional pyramid M layer consisting of multiple layers, MS-SSIM is defined as

Wherein l _M And cs _j Is a term defined in the present application in the selection of 4.5.1, and M and j are each, for convenience, the present application sets α=β _j Similar to equation (10), the loss function of MS-SSIM can be written asAnd (3) downwards.

3) Perceptual loss: the perceived loss may produce visually pleasing and realistic results. The present application defines the perceived loss in terms of the ReLU activation layer of a pre-trained 19-layer VGG network. Since the deep layer can well represent semantic information and can fully reserve image content and overall spatial structure, the application selects 5_4 layer from VGG19 to make it sensitive to semantic. The perceived loss is expressed as the distance I between the characteristic representation of the enhanced underwater image and the reference underwater image _en And referencing a distance between the underwater images. I _gt ：

Wherein phi is _j (x) The representation j represents the 3 rd convolutional layer (after activation) of the VGG19 network pre-trained on the ImageNet dataset. N is the number of each batch in the training process. C (C) _j H _j W _j Representing the dimension of the feature map of the 3 rd convolutional layer in the VGG19 network. j represents the size of the feature map of the 6 th convolutional layer in the VGG19 network. C (C) _j ，H _j And W _j Is the number, height and width of the feature map.

4) MAE loss. Due to l ₂ The loss function may cause artifacts. l (L) ₁ Is applied to l ₁ Instead of l ₂ Loss function l ₁ The definition is simple as follows.

Where P is the index of the pixel and P is the patch. x (p) and y (p) are the pixel values and the group Truth in the processed patch, respectively. The derivative of the counter-propagation is also simple, because Thus, for each pixel P, in P:

the derivative of (2) is not defined at 0. Therefore, a convention of sign (0) =0 is used. In the following case the network will not update the weight +.>

5) The term weight is lost. MS-SSIM retains the contrast in the high frequency region. l (L) ₁ Color and brightness are preserved, while the persistence Loss preserves semantic information. To capture the best features of these functions, the present application proposes to combine them, each loss term having a weight super parameter: α, β, γ.

Wherein α=2, β=0.000025 and γ=0.0025 are empirically set.

Experiment

The application first introduces training details of SCAUIE-Net. The model is then trained with the UIEB data set. In addition, the present application performs qualitative and quantitative comparisons with conventional, physical-based and recent deep learning-based methods to evaluate the proposed network of the present application. These methods include histogram equalization, GDCP, UDCP, UGGAN, water-Net, ucolor, ucolor. Finally, the present application conducted an ablation study to demonstrate the effectiveness of each component in the SCAUIE-Net.

Details of implementation

For training, the input to the network of the present application is a real world underwater image. A random set of 800 pairs of real world images extracted from the UIEB dataset was used to train the network of the present application. Due to the limited memory of the present application, the present application resizes the input image to 112 x 112, and flipping and rotation are used to obtain 7 enhanced versions of the original training data. For testing, the remaining real world images are considered a test set.

The proposed SCAUIE-Net is implemented on Ubuntu20 with PyTorch and Nvidia 2080Ti GPU. In the training process, the application adopts a learning method of batch processing mode, the batch processing amount is 16, and the duration is set to 300. The filter weights for each layer are initialized by a standard gaussian distribution. The bias is initialized to a constant. The present application trains the model of the present application using ADAM and sets the learning rate to 0.0001. The application uses ReduceLROnPlateau as a learning rate decay strategy. When the loss stopped decreasing within 10 epochs, the learning rate decreased by a factor of 0.50.

Experiments performed on the UIEB dataset

The application first selects underwater images from UIEB and then classifies these images into five categories: greenish and blueish images, yellowish images, low back-scatter scenes (the distance between the camera and the scene is short) and high back-scatter scenes (the distance between the camera and the scene is long). The different categories of images are then enhanced in different ways. In addition, the application performs qualitative comparison on the enhancement results of different methods.

Photographs taken underwater always show color spots due to the different decay ratios of red, green and blue light. In addition, particles suspended under water absorb blue light, resulting in yellowish stains. The yellowish color will deepen as the distance the light travels under water increases.

Furthermore, back scattering will result in haze occlusion of the underwater image, as light from the atmosphere is reflected by suspended particles. Histogram equalization effectively improves the contrast of the image, however, histogram equalization can lead to significant oversaturation. The GDCP lightens the underwater image. UDCP can significantly decolorize underwater images, but exacerbates chromatic aberration. The UWGAN improves the brightness and contrast of the underwater image, but the enhanced image is bluish. Water-Net can effectively reduce the artifact, but has local oversaturation phenomenon.

The Ucolor has less chromatic aberration, and the method provided by the application improves proper contrast and saturation, so that the prospect is more natural, and obvious chromatic aberration still exists. In summary, most methods can effectively remove haze and improve the quality of underwater images. However, introducing artifacts, excessive enhancement, and mottle remains a problem to overcome for deep learning based approaches. To quantitatively evaluate the performance of the different methods, the present application selects three common full reference indicators (i.e., MSE, PSNR, and SSIM) to evaluate the enhancement results on the UIEB dataset. A higher PSNR or lower MSE score means that the enhanced image is closer to the reference image in terms of image content. A higher SSIM means that the enhanced image is more similar in image structure to the reference image. Meanwhile, the present application selects underwater color image quality evaluation (uci qe) and Underwater Image Quality Measurement (UIQM) as non-reference image quality indexes. Uci qe evaluates underwater quality by color density, saturation, and contrast. The UIQM measures underwater image quality by underwater color, underwater image sharpness, and underwater image contrast.

The complete reference results of the different methods on the UIEB dataset are reported in table 1. In addition, non-reference results of the different methods on the UIEB dataset are also reported in table 2. A higher uci qe or UIQM score indicates a better human visual perception. As shown in table 1, the SCAUIE-Net proposed by the present application is excellent in all indexes, and the Ucolor is second best in the full reference index. The highest score obtained by SCAUIE-Net indicates that the method of the present application can handle the details better. The scores of uci qe and UIQM are shown in table 2. In table 2, histogram Equalization (HE) performed best in uci qe, and GDCP performed second best in uci qe; UGGAN ranks best in UIQM, and Ucolor gets second best in UIQM. The poor non-reference index generated from SCAUIE-Net indicates that the underwater non-reference index does not measure human eye perception well.

Table 1 full reference image quality assessment at MSE, PSNR and SSIM on UIEB dataset

Table 2 non-reference image quality assessment of UIEB dataset with uci qe and UIQM

Ablation study

To demonstrate the role of the spatial attention module and the optional core block in the network of the present application, the present application uses SCAUIE-Net without spatial attention module and without optional core block as ablation study. As shown in table 3, the spatial attention module can significantly improve the performance of the overall model, although it reduces the performance of the UIQM; the selective kernel block improves the performance of the UIQM, although its improvement is less obvious than the spatial attention module.

The space attention module and the selective kernel attention can effectively remove background color cast, so that the image is more real. While these components can effectively handle color cast, they do not perform well in preserving image detail and edge profile information. The spatial attention module is insensitive to the local color of the image and the background color is relatively monotonic. The selective kernel block may obtain reasonable underwater images as compared to the spatial attention module, although the spatial attention module may obtain more visually pleasing images.

TABLE 3 image quality assessment of non-spatial attention module and optional kernel block

The application constructs an underwater image enhancement method named SCAUIE-Net, and simultaneously uses a space attention mechanism and a channel attention mechanism. The present application contemplates the use of a gated fusion policy and attention mechanism on the UIEB dataset. Compared with Water-Net, the application uses U-Net structure as backbone, which enlarges the depth and width of network. Furthermore, the spatial attention module and the selection block of the network may perceive color differences of the underwater image in different color channels and spatial regions. The contrast and saturation of the output image is further improved in combination with various image quality loss functions. From the full reference index, the PSNR of SCAUIE-Net is 3.8156 higher than Water-Net, while the SSIM of SCAUIE-Net is 0.1289 higher than Water-Net. In the experimental part, the application verifies the effectiveness of the model through qualitative comparison and quantitative comparison with other underwater image enhancement methods. In addition, the present application also conducted ablative studies to demonstrate the effectiveness of each component in the SCAUIE-Net.

The foregoing is only a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art, who is within the scope of the present application, should make equivalent substitutions or modifications according to the technical solution of the present application and the inventive concept thereof, and should be covered by the scope of the present application.

Claims

1. An underwater image enhancement method, comprising the steps of:

acquiring a target underwater image;

the confidence map generator is configured to generate a predicted confidence map.

2. The underwater image enhancement method according to claim 1, wherein the image enhancement network determining method is:

constructing a gating fusion frame network;

3. The underwater image enhancement method according to claim 1, characterized in that determining an underwater enhanced image from the target underwater image and an image enhancement network, in particular comprises:

4. The underwater image enhancement method of claim 2, wherein the loss function is MS-SSIM loss, perceptual loss, MAE loss.

5. An underwater image enhancement method as in claim 3, wherein,

the image refiner consists of a selective kernel convolution and a two-dimensional convolution layer for preprocessing the target underwater image.

6. An underwater image enhancement system, comprising:

the acquisition module is used for acquiring training data;

and the generation module is used for generating the enhanced image.

7. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the underwater image enhancement method according to any of claims 1 to 5.

8. A computer readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the underwater image enhancement method as claimed in any of claims 1 to 5.