CN115565056A

CN115565056A - Underwater image enhancement method and system based on condition generation countermeasure network

Info

Publication number: CN115565056A
Application number: CN202211179797.7A
Authority: CN
Inventors: 李振波; 李一鸣; 李飞
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2023-01-03

Abstract

The invention provides an underwater image enhancement method and system based on a condition generation countermeasure network, which correct the color of a degraded underwater image by extracting and fusing multi-scale local features and global features, improve the feature extraction effect by constructing an attention module AMU for underwater image enhancement, improve the quality of the generated image by introducing perception loss and total variation loss in training and inhibit the occurrence of noise. The method can provide clear underwater environment information for high-level visual tasks such as behavior monitoring, disease identification and the like of intelligent aquaculture, and promote healthy and sustainable development of intelligent intensive aquaculture.

Description

Underwater image enhancement method and system based on condition generation countermeasure network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an underwater image enhancement method and system based on a condition generation countermeasure network.

Background

By 2021, the world aquaculture industry has reached $ 2094.2 billion in size. With the rapid development of the aquaculture industry, the requirements of work such as fish school behavior monitoring and fish disease identification are gradually expanded, so that clear underwater images are required to provide highly available image resources for the high-level visual tasks. At present, relevant researches prove that compared with an original image, the enhanced image has better improvement effects on key point matching, target detection, target tracking and the like. In the smart aquaculture industry, clear underwater image resources are needed for visual work such as underwater biological monitoring, underwater fish tracking and the like. However, unlike the atmospheric environment, in the underwater environment, the water body has the absorption and scattering effects on light, and suspended particles exist in water, which cause degradation phenomena such as color cast and blurring of the underwater image, and hinder the development of related underwater work.

The degraded underwater image is difficult to be directly applied to related underwater work of intelligent aquaculture, and challenges are brought to the traditional image processing technology. Therefore, researchers have gradually developed relevant studies for underwater image enhancement. The traditional underwater image enhancement method adopts fixed parameters and a physical model, and enhances a degraded image by adjusting the pixel value of the image. However, the above method only processes images in a single environment, and cannot adapt to various complicated underwater environments. Convolutional Neural Networks (CNNs) have found widespread use in many computer vision tasks due to their excellent performance. Therefore, researchers began to introduce CNNs into the field of underwater image enhancement. Providing an underwater image enhancement frame UIE-Net based on CNN for carrying out color correction; and introducing a residual learning strategy, and providing an underwater residual convolutional neural network URCNN by combining CNN. Since the emergence of the generation countermeasure network, the network has wide application in the fields of image processing, text generation, audio and video generation and the like, the network can generate data similar to reality in a countermeasure mode, and the characteristic makes up the defect that an underwater image data set lacks images before degradation. WaterGAN has also been proposed for generating paired underwater image datasets and performing color correction; by combining a cyclic generation countermeasure network (CycleGAN) and a dark channel preoperative algorithm, other researchers propose an underwater image restoration method based on a multi-scale cyclic generation countermeasure network (MCycle GAN) and a new underwater image enhancement model FUnIE-GAN based on a conditional generation countermeasure network (CGAN), and further construct a data set EUVP containing paired and unpaired underwater images. These learning-based methods are trained on large amounts of data to accommodate a variety of underwater environments. Therefore, the improvement of the clarity of underwater image extraction and the like are problems to be solved in urgent need of intelligent culture at the present stage.

Disclosure of Invention

In order to solve the problems, the invention provides an underwater image enhancement method and system based on a condition generation countermeasure network, which are used for carrying out color correction on degraded underwater images so as to provide a clear visual environment for subsequent visual work.

On one hand, the invention provides an underwater image enhancement method for generating a countermeasure network based on conditions, which comprises the following steps:

step 1: acquiring paired image sets of the underwater degraded images and the corresponding pure images, and dividing the paired image sets into a training set and a testing set;

step 2: scaling all images to the same size;

and step 3: model construction, comprising: global and local feature extraction is carried out on the image based on the structure of the coder and the decoder; fusing the global features with the local features of each scale respectively; performing layer-by-layer upsampling on the global features to perform image restoration, wherein each upsampling layer is connected with the fusion features of the corresponding scale; sending the generated image to a discriminator network, judging whether the image is from real data or not, and prompting the generator network to adjust;

and 4, step 4: training and testing the model, and storing the tested model;

and 5: and processing the actual underwater image by using the tested model.

Preferably, the codec structure is a modified U-Net network, which includes 8 layers of downsampling and performs global and local feature extraction on the input image by a layer-by-layer convolution.

More preferably, each of the down-sampling layers is composed of a LeakyRelu layer, a two-dimensional convolution layer, and a batch normalization layer.

Preferably, the attention module for underwater image enhancement is constructed by replacing the global averaging pooling module in SENEt with batch normalization scale factors in the NAM module based on the SENEt and NAM modules in the downsampling process.

More preferably, in the attention module, the input feature map is subjected to batch normalization layer and 1 × 1 convolution processing, multiplied by the weight coefficient, subjected to a ReLU activation function, 1 × 1 convolution layer and sigmoid activation function, and finally subjected to layer jump connection with the input feature map.

Preferably, global and local feature fusion is performed before layer-skipping connection is performed on the result of layer-by-layer up-sampling and the down-sampling result with the same resolution, and the fusion process is as follows:

step 4-1: global feature f is transformed by convolution layer with convolution kernel size 1 × 1 and step size 1 _g C number of channels _g Local feature map f adjusted to correspond to scale i _l The same number of channels c _i The step is expressed as f _g1 ＝F _conv (f _g ，W)

Wherein, F _conv Represents a convolution operation, W is a learnable weight;

step 4-2: to f is paired _g1 Making copies with the number h _i ×w _i Wherein h is _i And w _i Local feature map f of scale i _l Length and width of the operation expressed as

f _g2 ＝F _copy (f _g1 ，num＝h _i ×w _i )

Step 4-3: will f is _g2 Remodelling into _l Same dimension h _i ×w _i ×c _i

f _g3 ＝F _re (f _g2 ，size＝h _i ×w _i ×c _i )

Wherein, F _re A reshaping operation is shown.

Step 4-4: will f is _g3 And f _l Performing a connecting operation

f _out ＝F _concat (f _l ，f _g3 )。

Preferably, the image restoration is performed based on a modified U-Net network, which includes 8-layer upsampling and corresponds to a downsampling layer.

More preferably, each of the upsampling layers includes a ReLU layer, a bilinear upsampling layer, a convolution layer, and a batch normalization layer.

Preferably, the overall objective function of the model training loss is:

wherein, WGAN-GP, L1, L _p 、L _TV Are all loss functions, λ ₁ ＝10 ^-1 ,λ ₂ ＝10 ^-2 ,λ ₃ ＝10 ^-3 ；

Wherein x is a degraded underwater image, gt is a real underwater image with good details,

for uniform sampling between the generated image G (x) and the real image gt, λ =10.

In another aspect, the present invention provides an underwater image enhancement system for generating a countermeasure network based on a condition, including:

the data set construction module is used for acquiring paired image sets of the underwater degraded images and the corresponding pure images, and dividing the paired image sets into a training set and a testing set;

the image processing module is used for scaling all the images into the same size;

a model building module comprising: extracting global and local features of the image based on the structure of a coder-decoder; fusing the global features with the local features of all scales respectively; performing layer-by-layer upsampling on the global features to perform image restoration, wherein each upsampling layer is connected with the fusion features of the corresponding scale; sending the generated image to a discriminator network, judging whether the image is from real data or not, and prompting the generator network to adjust;

the model training and testing module is used for inputting the image into the model for training and testing and storing the tested model;

and the model application module is used for processing the actual underwater image by using the tested model.

The invention has the beneficial effects that: aiming at the degradation phenomenon of the underwater image, the invention provides an underwater image enhancement method and system based on a condition generation confrontation network. An attention module AMU for underwater image enhancement is constructed at the tail end of the feature extraction network, so that the feature extraction effect is improved; the trained model weight difference measurement is utilized to highlight key features, and weight sparsity punishment is applied to the attention module, so that the calculation efficiency is improved; the perception loss and the total variation loss are introduced, so that the generated image has high-level semantic information similar to a real image, the image generation effect of a generator network is enhanced, and image noise is inhibited.

Drawings

FIG. 1 is a diagram of a prior art GAN model architecture;

FIG. 2 is a flow chart of an underwater image enhancement method for a condition generating countermeasure network according to an embodiment of the present invention;

FIG. 3 is a diagram of an SE module architecture with embedded ResNet;

FIG. 4 is a channel attention submodule diagram;

FIG. 5 is a spatial attention submodule diagram;

FIG. 6 is a block diagram of the SE module and AMU module;

fig. 7 is a visual comparison of an image enhanced using the method of the present invention in a UGAN dataset.

Detailed Description

The embodiments are described in detail below with reference to the accompanying drawings.

Due to the adoption of a confrontation type training mode of generating a confrontation network (GAN), the method has good performance in the fields of text generation, image processing and the like. The GAN contains two models, a generator and a discriminator, and during the network training process, the generator spoofs the discriminator by receiving random noise z to produce an instance similar to the original data, denoted G (z). The discriminator is used to determine whether the generator-produced instance is artificially forged or from genuine data. The input of the discriminator is x, namely the example generated by the generator; the output is D (x), i.e. the probability that x is the true data. The two parties are alternately optimized in the continuous iteration process, so that the balance of the two parties is achieved, namely, the generator can generate an example with better details, and the output result of the generator is difficult to judge by the judger. The overall flow of GAN is shown in figure 1 below.

The goal function of the GAN model is as follows:

wherein, the first and the second end of the pipe are connected with each other,

refers to updating the parameters of the discriminator D by maximizing the cross entropy loss V (D, G) with the generator fixed.

Refers to the generator to minimize this cross-entropy loss in the case that the discriminator maximizes the true and false instance cross-entropy loss V (D, G). During the training process, the parameters of the discriminator are generally updated first, because the training is performed at the beginningIn addition, the performance of the discriminator is poor, and the function of pushing the generator to generate higher quality examples cannot be achieved.

Compared with the traditional generation of the countermeasure network, the underwater image enhancement method based on the condition generation countermeasure network has the advantages that condition information is introduced into the input of the condition generation countermeasure network, and the generation result of the whole network is more stable and controllable. Fig. 2 is a flowchart thereof. The method comprises the following steps:

step 1: data set construction

And selecting a UGAN data set as a training and testing data set of the method, wherein the UGAN data set is generated by learning the mapping relation between the degraded image and the pure image through the cycleGAN. In this embodiment the data set contains 6128 pairs of images. 6000 pairs of images were selected as the training set, and the remaining 128 pairs of images were the test set.

Step 2: image pre-processing

The image preprocessing is mainly to unify the image size and scale the image into an image with a consistent size. All images are scaled to a size of 256 x 256 in embodiments of the present invention.

And step 3: multi-scale feature extraction

The global feature map generally contains the overall information of the image, such as color, texture, shape and the like, and can enhance the perception capability of the model to the scene environment. The invention refers to a classical U-Net network and carries out global and local feature extraction based on a codec structure. In the global and local feature extraction network, 4-layer down sampling in the original U-Net network is expanded to 8-layer down sampling, so that local features with more scales are extracted, and semantic information of a global feature map is enriched. In addition, different from the maximum pooling method for down-sampling in the U-Net network, the global and local feature extraction is carried out on the input image in a layer-by-layer convolution mode, each down-sampling layer consists of a LeakyRelu layer, a two-dimensional convolution layer (the convolution kernel size is 4, the step length is 2) and a batch normalization layer, and the output size is 1 × 1 × c _g ，c _g Is the number of channels. The down-sampling mode can improve the extraction effect of local features, so that the generated image has more details.

With the increase of the number of the down-sampling layers, the number of the included features is gradually increased, and in order to enable the network to focus on key features of the image, an Attention Module (AMU) for underwater image enhancement is constructed based on a SEnet and NAM module in the down-sampling process, and the module focuses on detail information and context information, so that the feature extraction effect is improved.

The SEnet model can be conveniently embedded in other network structures. SENET focuses more on the connection in the channel dimension, including both the Squeeze and the Excitation operations. In the Squeeze operation, the model encodes features of the entire space in channel dimensions into a global feature map using global average pooling. In the Excitation operation, the SENET can learn the weight coefficient of each channel, and the distinguishing capability of the model for the characteristics of each channel is enhanced. In related experiments, the SE module is embedded into other networks, such as ResNet, VGG-16 and the like, and obviously improves the error index, and the SE module embedded into ResNet is shown in FIG. 3.

A normalization-based channel attention module (NAM) aims at focusing on insignificant feature weights, by imposing sparse weight penalties on the attention module, maintaining network performance while improving weight computation efficiency. The NAM module is integrated based on the CBAM module, redesigning the channel and spatial attention sub-modules. In the residual network, this module is embedded at the end of the residual structure, and in the channel attention submodule, this module uses the scale factor in the batch normalization, whose formula is as follows:

wherein gamma and beta are trainable transformation parameters,

and

respectively in small batches

Mean and standard deviation of (d). Channel attention submodule is shown in FIG. 4, M _c Representing the output, gamma is the scale factor of each channel, and omega is the weight of each channel.

The normalized scale factor is also applied in the spatial attention sub-module, named pixel normalization. The spatial attention submodule is shown in FIG. 5, M _s Representing the output, λ is a scaling factor.

In the invention, the global average pooling module in SENET is replaced by a batch normalization scale factor in a NAM module to improve the inhibition effect of the non-salient features. The structure of the AMU module is shown in fig. 6, and the input feature map is subjected to batch normalization layer and 1 × 1 convolution processing, multiplied by a weight coefficient, subjected to a ReLU activation function, a 1 × 1 convolution layer and a sigmoid activation function, and finally subjected to layer-skipping connection with the input feature map.

And 4, step 4: global and local feature fusion

In order to enable the global feature map with high-level semantic information to improve the processing effect of a low-resolution image and the color and detail of the image after enhancement, a global and local feature fusion module is constructed before the layer-by-layer up-sampling result and the down-sampling result with the same resolution are subjected to layer skipping connection, so that artifacts generated in the enhanced image are suppressed. The flow of the module is as follows:

first, the global feature map f is formed by a convolution layer with a convolution kernel size of 1 × 1 and a step size of 1 _g C number of channels _g Local feature map f adjusted to correspond to scale i _l The same number of channels c _i The step is represented as f _g1 ＝F _conv (f _g ，W)

Wherein, F _conv Representing a convolution operation, W is a learnable weight.

Then, for f _g1 Making a copy with the number h _i ×w _i Wherein h is _i And w _i Local feature map f of scale i _l Length and width of, the operation being represented as

f _g2 ＝F _copy (f _g1 ，num＝h _i ×w _i )

Then, f is mixed _g2 Remodelling with f _l Same dimension h _i ×w _i ×c _i

f _g3 ＝F _re (f _g2 ，size＝h _i ×w _i ×c _i )

Wherein, F _re A remolding operation is shown.

Finally, f is _g3 And f _l Performing a connecting operation

f _out ＝F _concat (f _l ，f _g3 )

To this end, the global feature map completes the convolution, copy, reshaping and connection steps.

And 5: feature upsampling

And performing image restoration on the global feature map by up-sampling layer by layer, and performing connection operation on each up-sampling layer and the fusion features with the same size to correct the color cast phenomenon in the original image. And expanding the 4-layer upsampling in the U-Net network into 8-layer upsampling, wherein the 8-layer upsampling corresponds to a downsampling layer in a characteristic extraction stage. Each upsampled layer includes a ReLU layer, a bilinear upsampled layer, a convolutional layer (convolutional kernel size 4, step size 2), and a batch normalization layer, with the output size of 256 × 256 × c _g ，c _g Is the number of channels.

Step 6: image discrimination

The generated image is sent to a discriminator network PatchGAN for discrimination, and the input is 256 × 256 × c _g ，c _g For the number of channels, patchGAN maps the input image into an N × N matrix, each point in the matrix represents a discrimination value for a small region of the image, such discrimination can discriminate more details of the image, and when all regions have good details, the whole image is discriminated as true.

And 7: model training and testing

And introducing a loss function of the WGAN-GP in the model training stage to stabilize the training, wherein the formula is as follows:

for uniform sampling between the generated image G (x) and the real image gt, λ represents a weighting factor.

In addition, the conventional L1 penalty can cause the generator to generate less ambiguity than the L2 penalty, and therefore, the present invention introduces the L1 penalty, whose formula is as follows:

the invention introduces a perception loss function, and restricts the generated image on the depth characteristic layer surface, so that the generated image has high-level semantic information similar to a real image. The perception loss model is trained on the basis of a VGG-19 network, and weight distribution is carried out on the characteristic matching of each module, wherein the formula is as follows:

wherein the content of the first and second substances,

a jth convolution layer, J representing a reference image,

to an enhanced image.

In order to reduce the noise of the generated image and increase the smoothness of the image, the invention introduces the traditional total variation loss function, and the formula is as follows:

wherein the content of the first and second substances,

in order to be a horizontal gradient operator, the method comprises the following steps of,

representing a vertical gradient operator.

The overall objective function is as follows:

before training, all pictures involved in training were scaled to a size of 256 × 256. Model training was performed on Intel (R) Xeon (R) E5-2630 v4 and NVIDIA GTX 1080, with the environment configured as Pytroch 1.5, with the weight of the loss function set at λ =10, λ ₁ ＝10 ^-1 ,λ ₂ ＝10 ^-2 ,λ ₃ ＝10 ^-3 . An Adam optimizer is introduced to replace the traditional gradient descent optimization algorithm, and the initial learning rate is set to be 1e ^-4 ，β ₁ ＝0.5，β ₂ =0.99, batch size set to 16, and number of iterations of model training was 50.

A comparison of UGAN dataset evaluation metrics is shown in table 1.

TABLE 1 UGAN data set evaluation index comparison

	PSNR↑	SSIM↑	UIQM↑	UCIQE↑
					Fusion	18.2647	0.6437	2.7266	0.0625
IBLA	20.2019	0.6059	3.1725	0.0523
					UDCP	18,6979	0.6171	3.5883	0.0415
ULAP	20.6336	0.6535	3.3515	0.0533
					UGAN	23.3311	0.7497	2.8354	0.0392
FunieGAN	22.8422	0.7248	3.1934	0.0788
					WaterNet	23.5637	0.7491	2.4786	0.0393
Style-Transfer	24.2179	0.7714	2.9364	0.0695
					UWCNN	17.2855	0.6332	2.3561	0.0452
MLFcGAN	25.1974	0.7982	4.1145	0.0533
					MA-cGAN	26.1698	0.8281	5.0935	0.0638

MA-cGAN has obvious advantages in PSNR and SSIM indexes. Notably, the traditional approach generally scores lower than the learning-based approach, which also reflects the advantages of the learning-based approach. On no reference scale, MA-cGAN has a clear advantage in UIQM, which means that the image enhanced by the method of the present invention is at a good level in terms of color balance, sharpness and contrast.

Because of the lack of the matched pure images of the real images, the invention only selects the non-reference index to evaluate the quality of the enhanced result, and the comparison result is shown in the table 2.

TABLE 2 comparison of true data set evaluation indices

	UIQM↑	UICM↑	UISM↑	UIConM↑	UCIQE↑
						Fusion	3.8687	3.2421	1.8985	0.0536	0.0465
IBLA	3.7646	3.6631	1.3435	0.0643	0.0476
						UDCP	3.4876	3.1727	1.0694	0.0319	0.0297
ULAP	3.6588	3.6379	1.2364	0.0719	0.0488
						UGAN	2.4739	2.5876	0.9506	0.0374	0.0314
FunieGAN	2.5422	3.1297	1.1004	0.0526	0.0469
						WaterNet	2.7389	2.6592	1.0301	0.0584	0.0421
Style-Transfer	3.3795	3.0789	1.1373	0.0939	0.0513
						UWCNN	2.5208	2.2156	1.1123	0.0417	0.0494
MLFcGAN	3.4831	3.0118	1.2366	0.0549	0.0536
						MA-cGAN	4.0794	3.3511	1.1626	0.0517	0.0562

MA-cGAN performed well in the UIQM and UCIQE indices, indicating that the results using the method of the present invention have good color density and sharpness. In the aspect of no reference index, the traditional method is superior to the learning-based method, the enhanced result of the methods may have better saturated colors, but the images with higher saturation degree may not be used for subsequent target detection and other works. The results also show that the method of the invention (MA-cGAN) can be applied to a variety of underwater environments.

The comparison of the enhanced images in the UGAN dataset is shown in fig. 7. The results show that the learning-based method can achieve better effects than the traditional method. The results of conventional methods are mostly supersaturation phenomena such as UDCP, IBLA and olap. For Fusion, there is an overexposure as a result. In the learning-based approach, some texture information is lost as a result of GAN-based approaches such as UGAN, funieGAN, and Style-Transfer. Meanwhile, the results of CNN-based methods, including waters net and UWCNN, lack detailed information. Unlike the above results, the effect of MLFcGAN appears to be more natural. The results using the method of the invention are further optimized with respect to color saturation compared to MLFcGAN.

And 8: and carrying out underwater image processing by using the trained model.

In addition, the invention also provides an underwater image enhancement system based on the condition generation countermeasure network, which comprises:

the data set construction module is used for acquiring an underwater degraded image and a corresponding pure image set and dividing the underwater degraded image and the corresponding pure image set into a training set and a testing set;

the global and local feature extraction module is used for extracting global and local features of the image based on the structure of the coder and the decoder;

the global and local feature fusion module is used for fusing the global feature map with the local features of each scale respectively;

the characteristic up-sampling module is used for carrying out up-sampling on the global characteristic diagram layer by layer to carry out image restoration, and each up-sampling layer is connected with the fusion characteristic of the corresponding scale;

the image discrimination module is used for sending the generated image into the discriminator network, judging whether the image comes from real data or not and prompting the generator network to adjust;

and the model application module is used for processing the underwater image by using the tested model.

The present invention is not limited to the above embodiments, and any changes or substitutions that can be easily made by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. An underwater image enhancement method for generating a countermeasure network based on conditions comprises the following steps:

step 2: scaling all images to the same size;

and 4, step 4: training and testing the model, and storing the tested model;

and 5: and processing the actual underwater image by using the tested model.

2. The underwater image enhancement method for generating the countermeasure network based on the conditions as claimed in claim 1, wherein the codec structure is a modified U-Net network, which includes 8 layers of downsampling, and performs global and local feature extraction on the input image by a layer-by-layer convolution manner.

3. The underwater image enhancement method based on the condition generating countermeasure network of claim 2, wherein each of the down-sampling layers is composed of a LeakyRelu layer, a two-dimensional convolution layer, and a batch normalization layer.

4. The method for enhancing underwater images based on the conditional generation countermeasure network of claim 2, wherein an attention module for underwater image enhancement is constructed by replacing a global average pooling module in SENEt with a batch normalization scale factor in an NAM module based on SENEt and NAM modules in a downsampling process.

5. The underwater image enhancement method based on the condition generation countermeasure network of claim 4, wherein in the attention module, the input feature map is processed by batch normalization layer and 1 x 1 convolution, multiplied by weight coefficients, then processed by ReLU activation function, 1 x 1 convolution layer and sigmoid activation function, and finally connected with the input feature map in a layer jump way.

6. The underwater image enhancement method for generating the countermeasure network based on the condition as claimed in claim 2, wherein global and local feature fusion is performed before layer-skipping connection is performed on the results of layer-by-layer up-sampling and the down-sampling results with the same resolution, and the fusion process is as follows:

step 4-1: by convolution of volumes with kernel size 1X 1 and step size 1Build-up global feature f _g C number of channels _g Local feature map f adjusted to correspond to scale i _l The same number of channels c _i The step is represented as

f _g1 ＝F _conv (f _g ，W)

Wherein, F _conv Represents a convolution operation, W is a learnable weight;

step 4-2: to f is paired _g1 Making copies with the number h _i ×w _i Wherein h is _i And w _i Local feature map f of scale i _l Length and width of, the operation being represented as

f _g2 ＝F _copy (f _g1 ，num＝h _i ×w _i )

Step 4-3: will f is _g2 Remodelling into _l Same dimension h _i ×w _i ×c _i

f _g3 ＝F _re (f _g2 ，size＝h _i ×w _i ×c _i )

Wherein, F _re A reshaping operation is shown.

Step 4-4: will f is _g3 And f _l Performing a connecting operation

f _out ＝F _concat (f _l ，f _g3 )。

7. The underwater image enhancement method based on the condition generation countermeasure network of claim 2, wherein the image restoration is performed based on a modified U-Net network, which comprises 8-layer up-sampling and corresponds to a down-sampling layer.

8. The method of claim 7, wherein each of the upsampling layers comprises a ReLU layer, a bilinear upsampling layer, a convolution layer, and a batch normalization layer.

9. The underwater image enhancement method based on the condition generation countermeasure network of claim 1, wherein the overall objective function of the model training loss is as follows:

wherein, WGAN-GP, L1, L _p 、L _TV Are all loss functions, λ ₁ ＝10 ^-1 ，λ ₂ ＝10 ^-2 ，λ ₃ ＝10 ^-3 ；

10. An underwater image enhancement system for generating a confrontation network based on conditions, comprising:

a model building module comprising: global and local feature extraction is carried out on the image based on the structure of the coder and the decoder; fusing the global features with the local features of each scale respectively; performing layer-by-layer upsampling on the global features to perform image restoration, wherein each upsampling layer is connected with the fusion features of the corresponding scale; sending the generated image to a discriminator network, judging whether the image is from real data or not, and prompting the generator network to adjust;