CN115861094A

CN115861094A - Lightweight GAN underwater image enhancement model fused with attention mechanism

Info

Publication number: CN115861094A
Application number: CN202211465305.0A
Authority: CN
Inventors: 冯建新; 韩亚军; 潘成胜; 孙传林; 蔡远航
Original assignee: Dalian University
Current assignee: Dalian University
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-03-28

Abstract

The invention discloses a lightweight GAN underwater image enhancement model integrating an attention mechanism, which comprises a generation network and a discrimination network; the generation network comprises an Encoder Encoder and a Decoder Decoder, wherein the Encoder Encoder comprises a depth separable convolution module and an attention module, and the Decoder Decoder comprises a plurality of upsampling convolution modules; the discrimination network obtains images through the Markov discriminator PatchGAN, the model has good effect when processing synthetic underwater images and real underwater images, and can better correct color cast and contrast, and ensure that detailed information is not lost.

Description

Lightweight GAN underwater image enhancement model fused with attention mechanism

Technical Field

The invention relates to the technical field of underwater image enhancement, in particular to a lightweight GAN underwater image enhancement model fused with an attention mechanism.

Background

There are many important information resources in water and ocean, and it is very necessary to acquire information resources in ocean safely and reliably, because these information resources are very helpful to research marine life and exploration resources. However, the marine environment is different from the land, the underwater environment is complex, and the problems of absorption, scattering and the like of visible light can be encountered underwater, so that the problems of color cast, blurred details and the like of acquired underwater information such as videos or images and the like are caused. Such problems can have serious impact on subsequent underwater target tracking and detection. Therefore, the underwater image quality is improved, and the method has very important significance in eliminating detail blurring.

Methods commonly used for image enhancement can be classified into physical and non-physical according to imaging models. For the enhancement of underwater graphics, a physical model is modeled according to an optical principle and mainly embodied in the aspects of denoising and color correction, such as dark channel prior DCP, underwater DCP and the like. The accurate recovery of the underwater image depends on the prior knowledge of the model, however, the prior knowledge is often weak in robustness in different underwater scenes, and serious estimation deviation is caused. In addition, the fundamental parameters of underwater depth and light propagation coefficient are difficult to obtain. The non-physical model does not consider the physical degradation mechanism of the underwater image, and modifies specific pixel values in the degraded image to enhance the image effect. Specific enhancing methods are histogram sliding stretching, multi-scale fusion and the like. Such methods do not rely on physical imaging models, and thus these methods are often insufficient to recover the original scene features, especially color features.

As the end-to-end modeling of a complex nonlinear system can be realized by deep learning, the quality of a perception image is improved, and convincing results are obtained in low-level visual tasks such as denoising, deblurring, perception enhancement, contrast adjustment and the like. Therefore, more and more researchers apply deep learning in underwater image enhancement methods. Especially, in recent years, the development of generation of a countermeasure network (GAN) is rapid, and more non-physical models are applied to underwater image enhancement. Jie Li and the like provide an unsupervised generation confrontation network Water-GAN which can correct the underwater color cast problem of monocular images and has good real-time performance. The Fabbri and the like propose an UGAN (underster GAN) algorithm suitable for an underwater scene based on a GAN model, and aiming at the problem of lack of an underwater image data set, the data set is expanded by using cycleGAN, so that the contrast and definition of the underwater image can be effectively improved. Islam et al propose a FUnIE-GAN model, which is a model for enhancing underwater images in real time, and the model establishes a loss function according to the overall content, detail style, style and local texture information of the images to evaluate the image quality. Zhang et al have improved on the basis of the original generation of a countermeasure network, and in order to avoid blurring of image details, the image edges are sharpened by introducing gradient loss, and although the definition of the image is improved by the model, the color distortion problem exists in the aspect of processing the color cast problem. Hambarde, P, etc. provides an end-to-end underwater generation confrontation network UW-GAN, which can carry out clear depth prediction on a single underwater image and provides a synthetic underwater image generation method for a large-scale database.

However, the mapping from the degraded image to the clear image is directly learned based on the GAN model, and the inference speed of the network model can not be improved while the underwater image quality is enhanced aiming at different water body conditions. The UW-GAN network model uses a VGG16 model as an encoder in a U-Net frame generation network, although the image enhancement effect is good, the model parameters are large, and the reasoning speed is slow. The FUnIE-GAN network model can enhance underwater images in real time, has high reasoning speed, but has poor processing effect aiming at different water body conditions, such as processing color cast complex problem, can cause excessive enhancement or insufficient enhancement, and the like, and reduces the robustness of the model.

Disclosure of Invention

Aiming at the problems that the underwater image has low contrast ratio to cause image detail blurring, the image is blue or green due to color distortion, the inference speed of the conventional network model is low and the like, the invention provides a lightweight GAN underwater image enhancement model integrating an attention mechanism.

In order to achieve the purpose, the application provides a lightweight GAN underwater image enhancement model integrating an attention mechanism, which comprises a generation network and a discrimination network; the generation network comprises an Encoder Encoder and a Decoder Decoder, wherein the Encoder Encoder comprises a depth separable convolution module and an attention module, and the Decoder Decoder comprises a plurality of upsampling convolution modules; the discrimination network acquires images through a Markov discriminator PatchGAN.

Further, the depth separable convolution module is formed by combining a depth convolution (DW) and a point-by-point convolution (PW).

Further, the encoder comprises 5 depth separable convolution modules, wherein each module extracts image features by using depth convolution with a convolution kernel size of 3 × 3 and a step size of 2, and then adjusts the number of channels by using convolution with a convolution kernel size of 1 × 1.

Further, the deep convolution with the convolution kernel size of 3 × 3 and the convolution with the convolution kernel size of 1 × 1 are connected with the normalized BN and the ReLU activation functions.

Further, the depth separable convolution module is followed by an attention module CBAM, which comprises a channel attention module CAM and a spatial attention module SAM.

Furthermore, the input image is firstly subjected to feature extraction through a depth separable convolution module and then input into a channel attention module CAM to obtain a weight distribution map of the input features, wherein the weight distribution map displays important features in the input image; and finally inputting the data into a space attention module SAM to obtain important feature positions.

Furthermore, the Decoder includes 5 upsampling convolution modules, the first 4 upsampling convolution modules adopt a 3 × 3 filter, and the step length is 2 convolutional layers; connecting the normalized BN and the ReLU activation functions behind each convolution layer; the 5 th upsampling convolution module converts the feature map to a 256 x3 image output.

Furthermore, the front 4 layers of the discrimination network use 3 × 3 convolutional layers and perform 2-time down-sampling operation, a BN layer and a leakage ReLU active layer are added after each convolutional layer, a Tanh active layer is added after the 5 th layer, and 5 convolutional layers are used for converting an input image of 256 × 256 × 6 (a real image and a generated image) into an output image of 16 × 16 × 1, so as to finally obtain a matrix with the size of 16 × 16, wherein each element in the matrix represents a receptive field in the input image, and local features of the image, such as local texture and detail form, can be better captured.

As a further step, three loss functions of confrontation loss, global similarity loss and content loss are combined to train the model, and the adopted loss function is defined as follows:

wherein L is _WGAN Is a penalty function;

is a global similarity loss function; l is a radical of an alcohol _con Is a content loss function; lambda [ alpha ] ₁ ，λ ₂ Is a weighting factor used to balance the loss functions.

As a further step, the penalty function is expressed as:

wherein,

to generate samples on a straight line between the image and the corresponding point of the target image, is the weight of the gradient penalty, λ _GP Is 10;

the global similarity loss function is:

the content loss function is:

wherein,

representing the extracted high-level features; x and y respectively represent an input original underwater image and a training set target image; G. d denotes a generation network and a countermeasure network, respectively.

Compared with the prior art, the technical scheme adopted by the invention has the advantages that: according to the underwater image enhancement model, patchGAN is used as a discrimination network, and the generated network replaces the VGG16 model with the extremely large parameter quantity in the original U-Net feature extraction network to extract the underwater degradation image features on the basis of the FUnIE-GAN model, so that the parameter quantity of the network model is reduced, and the inference rate of the network model is improved. And an attention mechanism is added to the feature extraction module to enhance the underwater image. The model has good effect in processing synthetic underwater images and real underwater images. And color cast and contrast can be better corrected, and detailed information is ensured not to be lost.

Drawings

FIG. 1 is a schematic diagram of an image enhancement model, where a is a generation network and b is a discrimination network;

FIG. 2 is a diagram of a depth separable convolution;

FIG. 3 is a diagram of the CBAM architecture of the attention Module;

FIG. 4 is a comparison of the loss function training process;

FIG. 5 is a graph comparing processing results of different methods;

FIG. 6 is a graph comparing results of the presence of CBAM modules;

FIG. 7 is a comparison graph showing details of the presence of CBAM modules.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the application, i.e., the embodiments described are only a subset of, and not all embodiments of the application.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

The application provides a lightweight GAN model, a generated network uses MobileNet to replace an original U-Net to generate a VGG16 model with a large network parameter quantity on the basis of a FUnIE-GAN model, and the model parameter quantity is reduced, so that the problem of low inference rate of the generated network is solved. Compared with other models, the reasoning speed is improved by 1.2 to 3 times. In order to enable the model network to pay more attention to the space and channel information of a water body influence larger area, a channel and space attention mechanism is introduced into a feature extraction module, and the purposes of removing underwater image color interference and enhancing image details are achieved.

As shown in fig. 1, a lightweight GAN underwater image enhancement model with attention mechanism fusion comprises a generation network and a discrimination network; the generation network includes an Encoder and a Decoder, the Encoder includes a depth separable convolution module and an attention module, the Decoder includes a plurality of upsampling convolution modules, takes the output of the Encoder as the input of the Decoder, and enhances feature learning. The connection mode can reduce the information loss caused by the model during the down-sampling period; the discrimination network acquires images through a Markov discriminator PatchGAN.

As shown in the left side view of fig. 1, the Encoder replaces the original convolution module with the depth separable convolution in the generated network as the feature extraction network according to the characteristics of small volume and high precision of the lightweight MobileNet model. The depth separable convolution module is formed by combining a depth convolution (DW) and a point-by-point convolution (PW); the basic structure of the MobileNet is shown in fig. 2.

The encoder includes 5 depth separable convolution modules, as shown in fig. 2, in each of which image features are extracted by depth convolution with a convolution kernel size of 3 × 3 and a step size of 2, and then the number of channels is adjusted by convolution with a convolution kernel size of 1 × 1. The deep convolution with the convolution kernel size of 3 multiplied by 3 and the convolution with the convolution kernel size of 1 multiplied by 1 are connected with the normalized BN and the ReLU activation functions. The depth separable convolution module is followed by an attention module CBAM, which comprises a channel attention module CAM and a spatial attention module SAM, as shown in fig. 3. The input image is firstly subjected to feature extraction through a depth separable convolution module, and then is input into a channel attention module CAM to obtain a weight distribution graph of the input features, wherein the weight distribution graph can display which information in the input image is important. The spatial attention module SAM focuses on where the information in the input features is more important. The combination of the two attention mechanisms enhances the learning functionality of the network.

The Decoder comprises 5 upsampling convolution modules, wherein the first 4 upsampling convolution modules adopt a filter of 3 multiplied by 3, and the step length is a convolution layer of 2; connecting the normalized BN and the ReLU activation functions behind each convolution layer; the 5 th upsampling convolution module converts the feature map to a 256 x3 image output.

As shown in the right side view of fig. 1, the inventive discrimination network uses a markov discriminator (PatchGAN), whose inputs are ground truth and the resulting image, and whose output is a feature matrix of size 16 × 16 instead of numbers, unlike conventional discriminators. This is equivalent to dividing the input image into different small blocks, thereby improving the discrimination accuracy. The operation of segmenting the image into small blocks can also help the model obtain more detailed information in the training process. In the discrimination network, the first 4 layers use 3 × 3 convolutional layers and perform 2-time down-sampling operation, a BN layer and a Leaky ReLU active layer are added after each convolutional layer, a Tanh active layer is added after the 5 th layer, 5 convolutional layers are used for converting an input image of 256 × 256 × 6 (a real image and a generated image) into an output image of 16 × 16 × 1, and finally a matrix with the size of 16 × 16 is obtained, wherein each element in the matrix represents a receptive field in the input image, and local features of the image, such as local texture and detail forms, can be captured better.

In order to better restore the visual effect of the image and simultaneously keep the detail characteristics as much as possible, the invention integrates three loss functions of the confrontation loss, the global similarity loss and the content loss to train the model. The loss function of the present invention is defined as follows:

wherein L is _WGAN Is a penalty function;

is a global similarity loss function; l is _con Is a content loss function; lambda ₁ ，λ ₂ As a weighting factor for balancing the loss functions of the terms, where ₁ ＝0.7，λ ₂ ＝0.3。

The traditional GAN is optimized on the basis of JS (Jensen Shannon divergence) and KL (Kullback Leibler divergence), but under the optimization, the network loss training is unstable, the problems of gradient disappearance and the like are caused, and the model training is collapsed due to the problems. Thus, the present invention uses a WGAN with a gradient penalty (WGAN-GP), with the penalty function expressed as:

wherein,

to generate samples on a straight line between the image and the corresponding point of the target image, is the weight of the gradient penalty, λ _GP Is 10./>

Due to L ₁ Loss is not easy to introduce blurring, the invention uses L between the generated network output image and the target image ₁ Generating network G by optimizing distance, and outputting network GThe training set maintains consistency of image information with reference to data, which is defined as follows:

L _L1 ＝E[||y-G(x)|| ₁ ]

in order to make the generated image more real and improve the visual effect of the enhanced image, content loss is added in the objective function, and the perception distance is used as the form of content loss. Introducing a VGG19 pre-training network, constructing content loss by extracting high-level features of the 3 rd convolutional layer output before the 4 th maximal pooling, which is defined as follows:

wherein,

representing the extracted high-level features; G. d represents a generation network and a confrontation network respectively, and x and y represent an input original underwater image and a training set target image respectively.

The method is realized based on a Pythrch deep learning framework, an experimental CPU is 12th Gen Intel (R) Core (TM) i5-12500H 3.10GHz, and a GPU is NVIDIA GeForce RTX3050Ti. The optimizer of the training model uses Adam, batch size Batch-size 8, initial learning rate set to 0.01, a total of 150 epochs, every 30 epochs, with the learning rate multiplied by 0.5. Global similarity loss weight coefficient lambda ₁ Is 0.7, content loss weight factor lambda ₂ Is 0.3. The method selects an underwater public data set EUVP as a training set to verify the validity of the model, the data set comprises more than 12000 pairs of underwater images and corresponding ground clear images, 7000 underwater images and corresponding clear images are selected as the training set, and the rest images are used as test data. Compared with the traditional model and the deep learning model in subjective and objective aspects, the model has better effect on enhancing the underwater image under complex conditions according to experimental results.

The invention starts from the training process and the training result, and carries out combined training test on a CBAM module and a depth separable convolution module in a generator network U-Net frame encoder. Selecting the underwater image quality evaluation metric (UIQM) and the inference speed (average run time per image measured on a single GPU), table 1 shows various combinations of CBAM modules and downsampling convolution modules, where F1-F4 represent the first to fourth block convolution modules, respectively. The results of the evaluations were tested on the EUVP data set after training was completed using the same test set.

As can be seen from Table 1, adding a CBAM module after each convolution module is equivalent to adding a CBAM module after each convolution module separately in the inference speed of the model, but the invention is obviously superior to other combinations in the effect of processing test images.

TABLE 1 comparison of the results of the single layer convolution module and CBAM combinations

Table 2 shows the test evaluation results of the combination of the two-layer upsampling convolution module and CBAM. As can be seen from Table 2, the effect of adding CBAM after two layers of convolution modules is obviously better than that of adding CBAM after only one layer of convolution modules, but the reasoning speed of the model is relatively slower.

TABLE 2 comparison of the results of the two-layer convolution Module and CBAM combination

Table 3 shows the results of the test evaluations of the combination of the triple layer convolution module and CBAM. As can be seen from Table 3, the results of the test when the three-layer convolution module and CBAM are combined are better and better compared to the two-layer combination.

TABLE 3 comparison of the results of the three-layer convolution Module and CBAM combination

In summary, compared to adding CBAM after the current three-layer convolution module and adding CBAM for each layer, the UIQM value has already approached the combination of adding CBAM for each layer of convolution module, and the inference speed is faster. Thus, the present invention may also use a combination of the first three convolutional modules and the CBAM module as an encoder structure to generate the network.

The invention performs combined training and analysis on the loss function according to the training process and the training result. FIG. 4 is a comparison of loss function combinations during training. Wherein L is _all In order to combine the three kinds of loss functions,

for the combination of the antagonism loss function with the global similarity loss function, L _WGAN+C A combination of content loss and antagonistic loss; n represents the number of training iterations and loss is the training loss function value.

As can be seen from fig. 4 (a), when only the antagonistic loss function is used for training, the vibration amplitude is large, the convergence process is unstable, and the number of required cycles is large. Compared with the prior art, the invention adopts three combined function forms to train more stably and has smaller discrete degree. As can be seen in FIG. 4 (b), L _all A curve,

The curves behave similarly and the loss function value is significantly lower than L _WGAN+C Curve line. But as the number of iterations increases, it is up to>

After the curve is converged, small-amplitude fluctuation L appears _all Curve is compared with->

The curve is more stable. In conclusion, the invention has better and more stable performance by using the combination of three loss functions. In addition, the evaluation indexes of the invention select peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM). Table 4 shows the results of the test evaluations on the EUVP data set after model training was completed. As can be seen from Table 4, the combination of the three loss functions is obviously due to the combination of other loss functionsForm (a).

TABLE 4 comparison of evaluation indexes for different combinations of loss functions

In order to verify the effectiveness of the model of the invention on the underwater real image, the test data of the invention are 150 underwater images. FIG. 7 shows a comparison of effect plots on different models, where the compared conventional models include Underwater Dark Channel Prior (UDCP), underwater image restoration (IBLA) based on image blur, and underwater image restoration (ULAP) based on a fast scene depth estimation model; the comparative Deep learning model comprises an underwater image enhancement network (UGAN) based on antagonism, a generation model (Deep SESR) based on a residual error network and a rapid underwater image enhancement (FUnIE-GAN) based on a GAN network. Fig. 5 (a) is a real underwater image, and fig. 5 (b) -5 (h) are respectively a UDCP, IBLA, ULAP, UGAN, deep-SESR, FUnIE-GAN and a processing result of the present invention. Compared with other models, the model provided by the invention has the advantages that the processing result is closer to a real image, the color cast condition of an underwater image caused by water body reasons is eliminated, the processed image has higher contrast and definition, and the color is more bright.

In order to more objectively analyze and evaluate the performance of the model of the present invention, underwater image quality measurement UIQM and natural image evaluation index NIQE are used as evaluation indexes. The UIQM is composed of three evaluation indexes of underwater image color measurement UICM, underwater image definition measurement UISM and underwater image contrast measurement UIconM. The larger the value of UIQM, the higher the quality of the image, and the formula is

UIQM＝c ₁ ×UICM+c ₂ ×UISM+c ₃ ×UIconM

Wherein, set up c ₁ ＝0.0282,c ₂ ＝0.2953,c ₃ ＝3.5753。

The NIQE does not need to be trained by using a distorted image of human eye subjective evaluation, and compared with the traditional evaluation indexes PNSR and SSIM, the NIQE can more effectively reflect the quality of image reconstruction. The smaller the value of the NIQE index, the higher the quality of the image, which is expressed by the formula

Wherein v is ₁ And v ₂ Respectively representing mean vectors of a natural multivariate Gaussian model and a multivariate Gaussian model of a distorted image; sigma ₁ Sum Σ ₂ Covariance matrices of the natural MVG model and the distorted image MVG model are respectively represented.

Table 5 shows the results of the images processed in various ways on the evaluation index UIQM. It can be seen that after the model enhancement, most results obtain the optimal evaluation index, compared with the average value of the evaluation indexes of the 6 comparison models, the UIQM is obviously improved, and compared with the FUnIE-GAN algorithm, the UIQM is improved by about 0.21.

TABLE 5 different UIQM evaluation index comparisons

Table 6 shows the results of the images processed in the various ways on the evaluation index NIQE. It can be seen that, in contrast to the other modes, the mean value of the NIQE is significantly lower than that of the other models, with a reduction of the NIQE of about 0.65 compared to the FUnIE-GAN algorithm.

TABLE 6 comparison of NIQE evaluation indexes by different methods

In conclusion, according to the comparison of the results of various algorithms on the no-reference image quality evaluation indexes UIQM and NIQE, the algorithm of the invention is proved to be capable of better recovering underwater images and solving the problems of color cast and the like.

In order to further verify the contribution of the improved point to the algorithm performance, a detailed ablation experiment is carried out, the effectiveness of a attention mechanism module (CBAM) and a lightweight feature extraction module is verified, and result analysis is carried out through comparison experiments with or without the CBAM module and the attention mechanism module. Fig. 7 shows comparison of underwater image processing results with or without CBAM module.

As can be observed from fig. 6, the underwater image processed by the CBAM module has higher contrast and higher definition than the image processed without the CBAM module. As can be seen from the 2 nd graph, the CBAM module can better deal with the problem of color cast caused by the influence of the water body. As can be seen from the 1 st and 5 th images, the images processed by the CBAM module have the problem of oversaturation of color compensation, and the images processed by the CBAM module are closer to the ground real images, so that the processing effect is better. The UIQM evaluation indexes of the 5 groups of test images were calculated by the present invention, and the experimental results are shown in table 7. The image processed by the CBAM attention module exists and the UIQM is significantly higher than that of the non-attention mechanism module.

TABLE 7 comparison of UIQM/NIQE evaluation index results with CBAM Module

Fig. 7 shows a comparison of the processing of the attention module with CBAM or not in the details of the picture, fig. 7 (a) shows the processing result without CBAM, and fig. 7 (b) shows the processing result with CBAM. As can be seen from the figure, the pictures processed by the CBAM module are clearer in detail.

In order to verify the effectiveness of the lightweight feature extraction module, the invention performs an estimation process on the running time of each model, wherein Deep-SESR, UGAN and FUnIE-GAN are based on a Deep learning model. The results are shown in Table 8. The result is the time required to process a 256 x 256 image size on the CPU, and it can be seen that the present model, although not the fastest, has a processing time comparable to FUnIE-GAN.

TABLE 8 comparison of run times for different methods(s)

The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. A lightweight GAN underwater image enhancement model integrating an attention mechanism comprises a generation network and a discrimination network; the generation network comprises an Encoder Encoder and a Decoder Decoder, and is characterized in that the Encoder Encoder comprises a depth separable convolution module and an attention module, and the Decoder Decoder comprises a plurality of upsampling convolution modules; the discrimination network acquires an image through a Markov discriminator PatchGAN.

2. The attention mechanism fused lightweight GAN underwater image enhancement model as claimed in claim 1, wherein the depth separable convolution module is formed by combining depth convolution and point-by-point convolution.

3. The lightweight GAN underwater image enhancement model fused with attention mechanism as claimed in claim 1, wherein said encoder comprises 5 depth separable convolution modules, each module extracts image features by depth convolution with convolution kernel size of 3 x3 and step size of 2, and then adjusts the number of channels by convolution with convolution kernel size of 1 x 1.

4. The lightweight GAN underwater image enhancement model fused with attention mechanism as claimed in claim 3, wherein said deep convolution with convolution kernel size of 3 x3, convolution with convolution kernel size of 1 x 1 followed by concatenation are normalized BN and ReLU activation functions.

5. The attention mechanism fused lightweight GAN underwater image enhancement model as claimed in claim 1, wherein said depth separable convolution module is followed by an attention module CBAM, said attention module CBAM comprising a channel attention module CAM and a spatial attention module SAM.

6. The attention mechanism fused lightweight GAN underwater image enhancement model as claimed in claim 5, wherein the inputted image is firstly subjected to feature extraction by a depth separable convolution module, and then inputted into a channel attention module CAM to obtain a weight distribution map of the inputted features, wherein the weight distribution map displays important features in the inputted image; and finally inputting the data into a space attention module SAM to obtain important feature positions.

7. The lightweight GAN underwater image enhancement model fused with attention mechanism as claimed in claim 1, wherein said Decoder comprises 5 upsampling convolution modules, the first 4 upsampling convolution modules adopt 3 x3 filter, and convolution layer with step size of 2; connecting the normalized BN and the ReLU activation functions behind each convolution layer; the 5 th upsampling convolution module converts the feature map to a 256 x3 image output.

8. The lightweight GAN underwater image enhancement model fused with attention mechanism as claimed in claim 1, wherein the front 4 layers of the discrimination network use 3 × 3 convolutional layers and perform 2-fold down-sampling operation, a BN layer and a leakage ReLU active layer are added after each convolutional layer, a Tanh active layer is added after the 5 th layer, 5 convolutional layers are used to convert 256 × 256 × 6 input images into 16 × 16 × 1 output images, and finally a matrix with size of 16 × 16 is obtained, and each element in the matrix represents a receptive field in the input images.

9. The lightweight GAN underwater image enhancement model fused with attention mechanism as claimed in claim 1, wherein three loss functions of countermeasure loss, global similarity loss and content loss are integrated to train the model, and the adopted loss function is defined as follows:

wherein L is _WGAN Is a penalty function;

10. The lightweight GAN underwater image enhancement model fused with attention mechanism as claimed in claim 9, wherein said countermeasure loss function is expressed as:

wherein,

to generate samples on a straight line between the image and the corresponding point of the target image, is the weight of the gradient penalty, λ _GP Is 10; />

The global similarity loss function is:

the content loss function is:

wherein,

representing the extracted high-level features; x and y respectively represent an input original underwater image and a training set target image; G. d denotes a generation network and a countermeasure network, respectively. />