CN114936983A

CN114936983A - Underwater image enhancement method and system based on depth cascade residual error network

Info

Publication number: CN114936983A
Application number: CN202210680325.3A
Authority: CN
Inventors: 赵铁松; 蔡晓文; 江楠峰; 胡可鉴; 陈炜玲; 胡锦松
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-06-16
Filing date: 2022-06-16
Publication date: 2022-08-23

Abstract

The invention relates to an underwater image enhancement method and system based on a depth cascade residual error network, wherein the method comprises the following steps: s1: constructing a deep cascade residual error network; constructing a training set and a test set according to a proportion; s2: partitioning an input image, and then respectively inputting the partitioned input image into three cascaded subnets of a deep cascaded residual error network to enable the network to carry out forward propagation to obtain a clear image output by the trained network; s3: calculating a loss value of the output image compared with the target image, and performing error back propagation according to the loss value to update the network weight; s4: judging whether the deep cascade residual error network is trained completely, if so, selecting an optimal model of the network; s5: inputting the test set into the optimal model for testing, and judging whether the optimal model reaches the expectation; s6: and inputting the underwater degraded image into the tested depth cascade residual error network to obtain an enhanced underwater image. The method and the system are favorable for correcting the color deviation of the underwater image, improving the contrast and the definition and improving the overall visual effect.

Description

Underwater image enhancement method and system based on depth cascade residual error network

Technical Field

The invention belongs to the technical field of image enhancement and restoration, and particularly relates to an underwater image enhancement method and system based on a depth cascade residual error network.

Background

Underwater images are often subject to noise, color distortion, and low contrast due to attenuation of light as it propagates through the water. These problems add to the difficulty of various tasks, such as automated detection and identification of fish and marine species environments. Accordingly, a number of underwater image enhancement methods have been proposed to recover or enhance degraded underwater images. In order to improve the quality of underwater images, various methods based on prior enhancement, physical models and deep learning are fully explored. A priori based methods aim at directly processing image pixel values to enhance features of a particular image, such as color, contrast, and brightness; physical model-based methods then use image characteristics and physical imaging models to recover sharp images. Recently, deep neural networks have achieved significant performance in both advanced visual tasks and image processing due to their powerful modeling capabilities and the ability to learn rich features from large amounts of training data. Meanwhile, some underwater image enhancement methods based on deep learning are also proposed to improve the image quality by extracting effective features from the synthetic data. Although these methods based on deep learning have made great progress in underwater image tasks, the performance of the current methods still has much room for improvement. Underwater images have different types of distortion, some methods based on deep learning adopt a fixed end-to-end supervised training mode, and lack sufficient flexibility to process degraded images, so that the details of the images are lost.

Disclosure of Invention

The invention aims to provide an underwater image enhancement method and system based on a depth cascade residual error network, which are beneficial to correcting the color deviation of an underwater image, improving the contrast and the definition and improving the overall visual effect.

In order to achieve the purpose, the invention adopts the technical scheme that: an underwater image enhancement method based on a depth cascade residual error network comprises the following steps:

step S1: constructing a deep cascade residual error network and setting parameters of the deep cascade residual error network; constructing a training set and a testing set in proportion, wherein the training set comprises an underwater degraded image and a corresponding real image;

step S2: partitioning the underwater degraded images in the training set according to a set proportion, and then respectively inputting the partitioned underwater degraded images into three cascade subnets of the deep cascade residual error network, so that the deep cascade residual error network performs forward propagation to obtain a clear image output by the trained network;

step S3: calculating a loss value of an output image of the deep cascade residual error network compared with a corresponding real image, and performing error back propagation according to the loss value to update a weight value of the deep cascade residual error network;

step S4: judging whether the deep cascading residual error network is trained, if so, selecting the best model of the deep cascading residual error network after training and executing the step S5, otherwise, returning to execute the step S2;

step S5: inputting the test set into the optimal model of the deep cascade residual error network for testing, judging whether the optimal model meets the expected requirement according to the test result, if so, executing the next step S6, otherwise, returning to execute the step S2 again;

step S6: and inputting the underwater degraded image to be enhanced into the tested depth cascade residual error network to obtain the enhanced underwater image.

Furthermore, the deep cascade residual error network consists of three cascade subnets, and the three cascade subnets are used for recovering degraded underwater images from coarse to fine step by step; dividing an input image into blocks according to the proportion of 4-2-1 and inputting the blocks into a depth cascade residual error network, namely dividing the image into 4 non-overlapping blocks and inputting the blocks into a first subnet, dividing the image into 2 non-overlapping blocks and inputting the blocks into a second subnet, and inputting an original image into a third subnet; the first two subnets adopt a gated codec subnetwork for learning context information, and the third subnetwork adopts an original resolution subnetwork for reserving required fine textures under the condition of not using any up-down sampling operation; in order to further improve the inter-subnet information transfer and visual quality, the deep cascaded residual network embeds different modules between different subnetworks: embedding a detail enhancement module DEB to learn multi-scale features of the image; a supervisory restoration module SRB is embedded to fuse the previous information for final restoration.

Furthermore, the gated codec subnetwork firstly adopts the channel attention module to consider different weighting information contained in different channel features, and secondly utilizes the expanded convolutional layer to replace the transposed convolutional layer to improve the spatial resolution of the features in the decoder, further expands the acceptance domain, and avoids detail loss.

Further, the native resolution sub-network retains details from the input image to the output image without using any down-sampling operations; in consideration of the influence of the color and the water body of the underwater image, the original resolution sub-network adopts the channel attention block and the pixel attention block to obtain pixels and channel information so as to generate better enhancement; the native resolution sub-network is composed of a plurality of native resolution blocks, each native resolution block containing a channel attention block and a pixel attention block.

Further, the detail enhancement module embeds detail features of different scales based on a multilayer pyramid structure to obtain a final result; the detail enhancement module comprises two 3 x 3 front ends and other 1 x 1 convolutional layers; firstly, the output of a first subnet passes through a front-end convolutional layer, and 1/8, 1/16 and 1/32 down-sampling are carried out on the output of the front-end convolutional layer to establish a three-scale detail pyramid; secondly, the 1 × 1 convolution layer is used for dimensionality reduction, and the image is up-sampled to the original size; finally, the outputs are connected, and a final output is generated through a 3 x 3 convolutional layer; reconstructing details of the underwater image in the first sub-network by fusing features of different scales, and transmitting the rich detail feature map to the next sub-network; the detail enhancement module is specifically represented as follows:

r ₀ ＝σ(C _3-1 (C _3-2 (I _{net 1-out} ))),

r ₁ ＝D ₈ (r ₀ ),r ₂ ＝D ₁₆ (r ₀ ),r ₃ ＝D ₃₂ (r ₀ ),

r ₁₁ ＝σ(C _1-1 (r ₁ )),r ₂₂ ＝σ(C _1-2 (r ₂ )),r ₃₃ ＝σ(C _1-3 (r ₃ )), (1)

r ₄ ＝U ₈ (r ₁₁ ),r ₅ ＝U ₁₆ (r ₂₂ ),r ₆ ＝U ₃₂ (r ₃₃ ),

D _out ＝C _3-3 (Cat(r ₄ ,r ₅ ,r ₆ )),

wherein C is _i-j Representing convolutional layers, i represents the size of the convolutional kernel, j represents the jth convolutional layer, σ is the Relu activation function, D _p And U _p Respectively representing pooling and upsampling operations, and p represents the size of the scale.

Further, the supervision restoration module generates an attention map to suppress information with the help of supervision prediction using an output of the second subnet as a supervision signalFewer features and only useful features are allowed to be trained; the process is represented as: first, the output of the second sub-network is processed using a 1 × 1 convolutional layer to generate a corresponding residual image, i.e., y ₀ (ii) a At the same time, the input image of the third sub-network is processed by the same method to generate y ₁ (ii) a Then y is ₁ Addition to y ₀ In (1), generating y ₂ Generating an attention map by a 1 × 1 convolutional layer and sigmoid activation function; next, the generated attention map is associated with y ₀ Multiplication to obtain y ₃ ，y ₃ Useful information containing more enhanced images; third, using skip connections, will y ₃ Combined with supervisory signals to generate y ₄ (ii) a Finally, let y ₄ And y ₁ Combining to obtain a final characteristic diagram, and inputting the final characteristic diagram into an original resolution sub-network; specifically, the following are shown:

y ₀ ＝C _1-4 (Out _Stage2 ),y ₁ ＝C _1-5 (In _Stage3 ),

y ₂ ＝y ₀ +y ₁ ,

y ₃ ＝ω(C _1-6 (y ₂ ))*y ₀ , (2)

y ₄ ＝y ₃ +Out _{Sub-Network-2} ,

S _out ＝Cat(y ₄ ,y ₁ ),

where ω is the sigmoid activation function, C _i-j Represents the convolutional layer, i represents the size of the convolutional kernel, and j represents the jth convolutional layer.

Further, the weighted sum of the smoothing L1 loss and the perception loss is used as the training loss of the network, the network training process is evaluated in real time, and the network and data obtained by training are stored in real time; where the smoothed L1 loss function is expressed as:

wherein, y' _i And y _i Representing the real image and the enhanced image at pixel i, N being the total number of pixels; in order to obtain a more realistic image, a perceptual loss function is introduced, namely, the characteristic difference between an output image and a realistic image is measured;

the perceptual loss function is expressed as:

wherein, V _j (Φ(y′ _i ) ) and V _j Φ(y _i ) Respectively representing an enhanced feature map and a real feature map of the ith layer of the VGG network; c _j ，H _j ，W _j Dimension representing a feature map of a jth convolutional layer in the VGG network; m is the number of features used in the perceptual loss function;

the total loss function is weighted by the two functions described above, and is represented as:

L _loss ＝L _S +λ*L _per (6)

where λ is used to adjust the relative weights of the components of the perceptual loss function.

Further, the network training process is evaluated by using performance evaluation indexes PSNR and SSIM with reference and performance evaluation indexes UCIQE and UIQM without reference.

Further, training adopts an underwater real data set UIEB; the UIEB data set consists of 890 real underwater degraded images and corresponding real images as well as 60 underwater degraded images to be enhanced.

The invention also provides an underwater image enhancement system based on the deep cascade residual error network, which comprises a memory, a processor and a computer program instruction which is stored on the memory and can be run by the processor, wherein when the processor runs the computer program instruction, the steps of the method can be realized.

Compared with the prior art, the invention has the following beneficial effects: the underwater image enhancement method and system based on the depth cascade residual error network solve the problem that various underwater distortions cannot be simultaneously solved in the existing underwater image enhancement algorithm. The method constructs a deep cascade residual error network, and the degraded image is enhanced from coarse to fine through a plurality of cascade sub-networks. The first two subnetworks use an attention and gate fusion strategy to learn multi-scale context information, while the last subnet is used to retain fine spatial detail. In order to further generate a real image, the method also embeds detail enhancement blocks and a supervision recovery module between different subnets, and gradually refines the rough image residual by utilizing detail recovery and attention supervision. Experimental results prove that the method can correct the color deviation of the underwater image, improve the contrast and the definition and improve the overall visual effect.

Drawings

FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention;

FIG. 2 is a general framework structure diagram of a deep concatenated residual error network in an embodiment of the present invention;

FIG. 3 is a block diagram of a native resolution sub-network in an embodiment of the present invention;

FIG. 4 is a block diagram of a detail enhancement module in an embodiment of the present invention;

fig. 5 is a block diagram of a supervisory restoration module in an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the present embodiment provides an underwater image enhancement method based on a depth cascade residual error network, including the following steps:

step S1: constructing a deep cascade residual error network (CURE-Net) and setting parameters of the deep cascade residual error network; and constructing a training set and a testing set in proportion, wherein the training set comprises the underwater degraded images and the corresponding real images.

Step S2: and partitioning the underwater degraded images in the training set according to a set proportion, and then respectively inputting the partitioned underwater degraded images into three cascaded subnets of the deep cascaded residual error network, so that the deep cascaded residual error network carries out forward propagation to obtain a clear image output by the trained network.

Step S3: and calculating a loss value of the output image of the depth cascade residual error network compared with the corresponding real image, and performing error back propagation according to the loss value to update the weight value of the depth cascade residual error network.

Step S4: and judging whether the deep cascading residual error network is trained, if so, selecting the best model of the deep cascading residual error network after training and executing the step S5, otherwise, returning to execute the step S2.

Step S5: inputting the test set into the optimal model of the deep cascade residual error network for testing, judging whether the optimal model meets the expected requirement according to the test result, if so, executing the next step S6, otherwise, returning to execute the step S2 again.

As shown in fig. 2, the deep cascade residual network is composed of three cascade subnets, and the three cascade subnets are used for recovering degraded underwater images from coarse to fine step by step; dividing an input image into blocks according to the proportion of 4-2-1 and inputting the blocks into a depth cascade residual error network, namely dividing the image into 4 non-overlapping blocks and inputting the blocks into a first subnet, dividing the image into 2 non-overlapping blocks and inputting the blocks into a second subnet, and inputting an original image into a third subnet; the first two subnets adopt a Gate Encoder-Decoder Sub-Network (Gate Encoder-Decoder Sub-Network) for learning context information, and the third subnet adopts an Original Resolution Sub-Network (Original Resolution Sub-Network) for reserving required fine textures without using any up-down sampling operation; in order to further improve the information transfer and visual quality between the subnetworks, the deep cascade residual network embeds different modules between different subnetworks: embedding a detail Enhancement module DEB (detail Enhancement Block) to learn the multi-scale features of the image; a supervisory Recovery module srb (supervisory Recovery block) is embedded to fuse the previous information for final Recovery.

As shown in fig. 2, as seen in the GESNet, the gated codec subnetwork first adopts a Channel Attention Block (Channel Attention Block) to consider different weighting information included in different Channel features, and then uses an expanded convolutional layer to replace a transposed convolutional layer to improve the spatial resolution of the features in the decoder, further expand the acceptance domain, and avoid detail loss.

As shown in fig. 3, the Original Resolution Sub-Network (Original Resolution Sub-Network) retains details from the input image to the output image without using any down-sampling operation; considering the influence of the color and water body of the underwater image, the original resolution sub-network adopts a channel Attention Block (ChannelAttention Block) and a Pixel Attention Block (Pixel Attention Block) to obtain pixels and channel information so as to generate better enhancement; the native Resolution Sub-network is composed of a plurality of native Resolution blocks (native Resolution Sub-Networks), each native Resolution block containing a channel attention block and a pixel attention block.

As shown in fig. 4, the detail enhancement module embeds detail features of different scales based on a multi-layer pyramid structure to obtain a final result; the detail enhancement module comprises two 3 x 3 front ends and other 1 x 1 convolutional layers; firstly, the output of a first subnet passes through a front-end convolutional layer, and 1/8, 1/16 and 1/32 down-sampling are carried out on the output of the front-end convolutional layer to establish a three-scale detail pyramid; secondly, the 1 × 1 convolution layer is used for dimensionality reduction, and the image is up-sampled to the original size; finally, the outputs are connected, and a final output is generated through a 3 x 3 convolutional layer; reconstructing details of the underwater image in the first sub-network by fusing features of different scales, and transmitting the rich detail feature map to the next sub-network; the detail enhancement module is helpful for recovering the color of the underwater image and improving the visibility of the underwater image, and is specifically represented as follows:

r ₀ ＝σ(C _3-1 (C _3-2 (I _{net 1-out} ))),

r ₁ ＝D ₈ (r ₀ ),r ₂ ＝D ₁₆ (r ₀ ),r ₃ ＝D ₃₂ (r ₀ ),

D _out ＝C _3-3 (Cat(r ₄ ,r ₅ ,r ₆ )),

wherein C is _i-j Denotes convolution layer, i denotes the size of convolution kernel, j denotes the jth convolution layer, σ is Relu activation function, D _p And U _p Respectively representing pooling and upsampling operations, and p represents the size of the scale.

As shown in fig. 5, the supervision restoration module generates an attention map to suppress less informative features and allow only useful features to be trained with the help of supervision prediction using the output of the second subnet as a supervision signal; the process is represented as: first, the output of the second sub-network is processed using a 1 × 1 convolutional layer to generate a corresponding residual image, i.e., y ₀ (ii) a At the same time, the input image of the third sub-network is processed in the same way to generate y ₁ (ii) a Then y is ₁ Is added to y ₀ In (1), generating y ₂ By 1X 1 of convolutional layers and siA gmoid activation function to generate an attention graph; next, the generated attention map is associated with y ₀ Multiplication to obtain y ₃ ，y ₃ Useful information containing more enhanced images; third, using skip connections, will y ₃ Combined with supervisory signals to generate y ₄ (ii) a Finally, mixing y ₄ And y ₁ Combining to obtain a final feature map, and inputting the final feature map into the original resolution sub-network; specifically, the following are shown:

y ₀ ＝C _1-4 (Out _Stage2 ),y ₁ ＝C _1-5 (In _Stage3 ),

y ₂ ＝y ₀ +y ₁ ,

y ₃ ＝ω(C _1-6 (y ₂ ))*y ₀ , (2)

y ₄ ＝y ₃ +Out _{Sub-Network-2} ,

S _out ＝Cat(y ₄ ,y ₁ ),

In this embodiment, a weighted sum of a smooth L1 Loss and a Perceptual Loss (Perceptual Loss) is used as a training Loss of the network, a network training process is evaluated in real time, and the network and data obtained by training are stored in real time; where the smoothed L1 loss function is expressed as:

wherein, y' _i And y _i Representing the real image and the enhanced image at pixel i, N being the total number of pixels; in order to obtain a more realistic image, a perceptual loss function is introduced, i.e. the difference in features between the output image and the realistic image is measured.

The perceptual loss function is expressed as:

wherein, V _j (Φ(y′ _i ) And V) _j Φ(y _i ) Respectively representing an enhanced feature map and a real feature map of the ith layer of the VGG network; c _j ，H _j ，W _j Dimension representing a feature map of a jth convolutional layer in the VGG network; m is the number of features used in the perceptual loss function;

L _loss ＝L _S +λ*L _per (6)

In this embodiment, the network training process is evaluated by using the performance evaluation indexes PSNR and SSIM with references and the performance evaluation indexes UCIQE and UIQM without references.

In this embodiment, the training uses an underwater true data set UIEB; the UIEB data set consists of 890 real underwater degraded images and corresponding real images as well as 60 underwater degraded images to be enhanced.

The embodiment also provides an underwater image enhancement system based on the depth cascade residual error network, which comprises a memory, a processor and computer program instructions stored on the memory and capable of being executed by the processor, wherein when the processor executes the computer program instructions, the steps of the method can be realized.

Experiments prove that the method provided by the invention is superior to the most advanced method at present. Wherein the comparison algorithm comprises: IBLA, RGHS, ULAP, UWCNN WaterNet, LCNet and Ucolor. The experiments were performed on UIEB datasets and the specific experimental results were as follows:

in addition, the present embodiment also performs ablation experiments of the module to prove the effectiveness of the module proposed by the present invention, and the specific data is shown in the following table:

the UIEB test results in the table show the performance improvement with different modules. Obviously, when a Detail Enhancement module (Detail Enhancement Block) and the supervision restoration module (supervisory Recovery Block) are simultaneously applied between two subnets, the PSNR reaches a maximum value of 26.55 db; when a supervision Recovery Block (Supervised Recovery Block) or a Detail Enhancement Block (Detail Enhancement Block) is not used, the performance of the network is affected to different degrees; without any module connections between subnetworks, a significant degradation in network performance occurs, with PSNR reaching only 25.08 db. Reasonable use of a Detail Enhancement module (Detail Enhancement Block) and a supervisory Recovery module (supervisory Recovery Block) provides more outstanding contribution to the network.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims

1. An underwater image enhancement method based on a depth cascade residual error network is characterized by comprising the following steps:

2. The underwater image enhancement method based on the depth cascade residual error network is characterized in that the depth cascade residual error network consists of three cascade subnetworks, and degraded underwater images are gradually recovered from thick to thin by using the three cascade subnetworks; dividing an input image into blocks according to the proportion of 4-2-1 and inputting the blocks into a depth cascade residual error network, namely dividing the image into 4 non-overlapping blocks and inputting the blocks into a first subnet, dividing the image into 2 non-overlapping blocks and inputting the blocks into a second subnet, and inputting an original image into a third subnet; the first two subnets adopt a gated codec subnetwork for learning context information, and the third subnet adopts an original resolution subnetwork for reserving required fine textures under the condition of not using any up-down sampling operation; in order to further improve the information transfer and visual quality between the subnetworks, the deep cascade residual network embeds different modules between different subnetworks: embedding a detail enhancement module DEB to learn multi-scale features of the image; a supervisory restoration module SRB is embedded to fuse the previous information for final restoration.

3. The underwater image enhancement method based on the depth cascade residual error network as claimed in claim 2, wherein the gated codec subnetwork firstly adopts the channel attention module to consider different weighting information contained in different channel features, and secondly utilizes the expanded convolutional layer to replace the transposed convolutional layer to improve the spatial resolution of the features in the decoder, further expand the acceptance domain and avoid detail loss.

4. The underwater image enhancement method based on the depth cascade residual network of claim 2, characterized in that the original resolution sub-network retains details from input image to output image without using any down-sampling operation; in consideration of the influence of the color and the water body of the underwater image, the original resolution sub-network adopts the channel attention block and the pixel attention block to obtain pixels and channel information so as to generate better enhancement; the native resolution sub-network is composed of a plurality of native resolution blocks, each native resolution block containing a channel attention block and a pixel attention block.

5. The underwater image enhancement method based on the depth cascade residual error network as claimed in claim 2, wherein the detail enhancement module embeds detail features of different scales based on a multilayer pyramid structure to obtain a final result; the detail enhancement module comprises two 3 x 3 front ends and other 1 x 1 convolutional layers; firstly, the output of a first subnet passes through a front-end convolutional layer, and 1/8, 1/16 and 1/32 down-sampling are carried out on the output of the front-end convolutional layer to establish a three-scale detail pyramid; secondly, the 1 × 1 convolution layer is used for dimensionality reduction, and the image is up-sampled to the original size; finally, the outputs are connected, and a final output is generated through a 3 x 3 convolutional layer; reconstructing details of the underwater image in the first sub-network by fusing features of different scales, and transmitting the rich detail feature map to the next sub-network; the detail enhancement module is specifically represented as follows:

6. The underwater image enhancement method based on the depth cascade residual error network as claimed in claim 2, characterized in that the supervision restoration module uses the output of the second subnet as a supervision signal, and with the help of supervision prediction, generates an attention map to suppress less informative features and allow only useful features to be trained; the process is represented as: first, the output of the second sub-network is processed using a 1 × 1 convolutional layer to generate a corresponding residual image, i.e., y ₀ (ii) a At the same time, the input image of the third sub-network is processed in the same way to generate y ₁ (ii) a Then y is ₁ Is added to y ₀ In (1), generating y ₂ Generating an attention map by a 1 × 1 convolutional layer and sigmoid activation function; next, the generated attention map is associated with y ₀ Multiplication to obtain y ₃ ，y ₃ Useful information containing more enhanced images; third, using skip connections, will y ₃ Combined with supervisory signals to generate y ₄ (ii) a Finally, mixing y ₄ And y ₁ Combining to obtain a final feature map, and inputting the final feature map into the original resolution sub-network; specifically, the following are shown:

7. The underwater image enhancement method based on the deep cascade residual error network is characterized in that the weighted sum of the smooth L1 loss and the perception loss is used as the training loss of the network, the network training process is evaluated in real time, and the network and the data obtained by training are stored in real time; where the smoothed L1 loss function is expressed as:

wherein, y' _i And y _i Representing the real image and the enhanced image at pixel i, N being the total number of pixels; in order to obtain a more real image, a perception loss function is introduced, namely, the characteristic difference between an output image and a real image is measured;

the perceptual loss function is expressed as:

wherein, V _j (Φ(y′ _i ) And V) _j Φ(y _i ) Respectively represent VAn enhanced characteristic diagram and a real characteristic diagram of the ith layer of the GG network; c _j ，H _j ，W _j Dimension representing a feature map of a jth convolutional layer in the VGG network; m is the number of features used in the perceptual loss function;

the total loss function is weighted by the two functions, expressed as:

L _loss ＝L _S +λ*L _per (6)

8. The underwater image enhancement method based on the deep cascade residual error network as claimed in claim 7, wherein the network training process is evaluated by using performance evaluation indexes PSNR and SSIM with reference and performance evaluation indexes UCIQE and UIQM without reference.

9. The underwater image enhancement method based on the depth cascade residual error network is characterized in that training adopts an underwater true data set UIEB; the UIEB data set consists of 890 real underwater degraded images and corresponding real images as well as 60 underwater degraded images to be enhanced.

10. An underwater image enhancement system based on a depth cascade residual network, comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, which when executed by the processor, are capable of implementing the method steps of any of claims 1 to 9.