CN117151987A

CN117151987A - Image enhancement method and device and electronic equipment

Info

Publication number: CN117151987A
Application number: CN202210565061.7A
Authority: CN
Inventors: 张玉; 周圆; 李硕士; 陈维强
Original assignee: Hisense Group Holding Co Ltd
Current assignee: Hisense Group Holding Co Ltd
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2023-12-01

Abstract

The application provides an image enhancement method, an image enhancement device and electronic equipment, comprising the following steps: acquiring an underwater image to be processed; performing image enhancement processing on the underwater image to be processed based on the target enhancement model to obtain an image-enhanced target underwater image; the target enhancement model is obtained after at least one parameter adjustment is performed on the enhancement model to be trained based on a loss function, the loss function is constructed based on countermeasures loss corresponding to output images with the same scale as the input image in at least two output images output by the enhancement model to be trained, and pixels and losses corresponding to the at least two output images respectively, and the scales of the at least two output images are different. Optimizing the enhancement model to be trained by combining the loss function constructed by the counterloss and the at least two pixel level losses to obtain a target enhancement model, so that a clear and accurate target underwater image which is consistent with the content structure of the underwater image to be processed is obtained through the target enhancement model, and the image enhancement effect of the underwater image is improved.

Description

Image enhancement method and device and electronic equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image enhancement method, an image enhancement device, and an electronic device.

Background

The underwater image is used as a main medium for recognizing, exploring and developing the ocean, and has irreplaceable functions in the ocean fields such as ocean engineering, resource development and the like. However, the special underwater imaging environment causes complex degradation phenomena such as color distortion, detail blurring and the like of the directly acquired underwater image, and the usability of the acquired underwater image in visual tasks such as target detection, semantic segmentation and the like is greatly reduced.

Therefore, in order to improve the usability of the underwater image in visual tasks such as target detection, semantic segmentation and the like, the underwater image is generally subjected to image enhancement processing so as to obtain a clear underwater image; therefore, the image enhancement processing of the underwater image becomes a hot spot problem in computer vision.

Currently, image enhancement methods for performing image enhancement processing on an underwater image can be classified into three types, namely a restoration method based on a conventional imaging model, an enhancement method based on a non-physical model, and a method based on deep learning. However, when the image enhancement processing is performed on the underwater image by the restoration method based on the traditional imaging model and the enhancement method based on the non-physical model, the method has higher computational complexity and poorer instantaneity, the generalization capability of the algorithm is poor, and the image enhancement effect is general. While the method based on the deep learning has relatively low computational complexity and good image enhancement effect when the image enhancement processing is carried out on the underwater image, the method based on the deep learning still has the problems of incomplete color cast recovery, insufficient detail enhancement and the like.

Therefore, how to improve the image enhancement effect of the underwater image is a problem that needs to be solved at present when the image enhancement processing is performed on the underwater image.

Disclosure of Invention

The application provides an image enhancement method, an image enhancement device and electronic equipment, which are used for improving the image enhancement effect of an underwater image.

In a first aspect, an embodiment of the present application provides an image enhancement method, including:

acquiring an underwater image to be processed;

performing image enhancement processing on the underwater image to be processed based on the target enhancement model to obtain an image-enhanced target underwater image;

the target enhancement model is obtained after at least one parameter adjustment is performed on the enhancement model to be trained based on a loss function, the loss function is constructed based on countermeasures loss corresponding to a first type of output image with the same scale as an input image in at least two output images output by the enhancement model to be trained, and pixel level loss corresponding to each of the at least two output images, and the scales of the at least two output images are different.

In the embodiment of the application, firstly, based on the countermeasures loss corresponding to a first class of output images with the same scale as an input image in at least two output images output by an enhancement model to be trained and the loss function constructed by pixel level loss corresponding to each of the at least two output images, at least one parameter adjustment is carried out on the enhancement model to be trained so as to jointly complete the optimization of the enhancement model to be trained, a target enhancement model is obtained, and the enhancement image which is clear and accurate and consistent with the content structure of the input image can be obtained by the target enhancement model obtained after the optimization, wherein the clear and accurate comprises the characteristics of complete color cast recovery and full detail enhancement of the enhancement image obtained after the image enhancement; then, based on a target enhancement model, performing image enhancement processing on the acquired underwater image to be processed, and acquiring an image enhanced target underwater image; at this time, the target underwater image is an enhanced image which is clear and accurate and consistent with the content structure of the underwater image to be processed. Therefore, when the image enhancement method provided by the application is used for carrying out image enhancement processing on the underwater image, the image enhancement effect of the underwater image is improved.

In one possible implementation, the target enhancement model includes: a first encoder, a first attention network, and a first decoder;

based on the target enhancement model, performing image enhancement processing on the underwater image to be processed, including:

extracting features of an underwater image to be processed through a first encoder to obtain at least two initial feature images;

respectively carrying out attention processing on at least two initial feature graphs through a first attention network to acquire corresponding target attention diagrams;

and based on the size of the obtained at least two target attention patterns, the first decoder sequentially splices the target attention patterns with the upsampled feature patterns with the same size according to the mode of the size from small to large until the spliced image is the same as the underwater image to be processed, and then determines and outputs the target underwater image based on the spliced image.

In the embodiment of the application, a specific implementation mode for carrying out enhancement processing on an underwater image to be processed based on a target enhancement model is provided, so that the target underwater image which is clear and accurate and consistent with the content structure of the underwater image to be processed is obtained based on the target enhancement model.

In one possible implementation manner, feature extraction is performed on an underwater image to be processed by a first encoder, and at least two initial feature graphs are obtained, including:

extracting features of the underwater image to be processed through a first encoder to obtain a first type initial feature map with the same scale as the underwater image to be processed, and performing downsampling on the first type initial feature map at least once to obtain a second type initial feature map.

In the embodiment of the application, the underwater image to be processed is subjected to feature extraction to obtain the first type of initial feature images with the same scale as the image under the tree to be processed, and the second type of initial feature images obtained after at least one downsampling of the first type of initial feature images are extracted to extract the initial feature images with different scales so as to extract more comprehensive feature information and ensure the accuracy of the subsequent image enhancement processing.

In one possible implementation manner, the attention processing is performed on at least two initial feature graphs respectively, including:

for any initial feature map, carrying out channel attention processing on the initial feature map, obtaining a channel attention map, and carrying out multiplication processing on the channel attention map and the initial feature map to obtain an intermediate feature map;

And performing spatial attention processing on the intermediate feature map to obtain a spatial attention map, and performing multiplication processing on the spatial attention map and the intermediate feature map to obtain a corresponding target attention map.

In the embodiment of the application, the initial feature map is sequentially subjected to channel attention processing and space attention processing to obtain the corresponding target attention map, so that the detail information of the underwater image to be processed is enriched, and the accuracy of the subsequent image enhancement processing is ensured.

In one possible implementation manner, the first decoder includes an error feedback network, and takes an input up-sampling feature map corresponding to the spliced image as a high-scale feature map, and takes a feature map corresponding to the input up-sampling feature map when up-sampling is not performed as a low-scale feature map;

after the target attention is spliced with the upsampled feature map with the same scale, the following operations are performed through an error feedback network:

downsampling the high-scale feature map to obtain a downsampled feature map, determining feature errors between the downsampled feature map and the low-scale feature map, wherein the downsampled feature map has the same scale as the low-scale feature map;

and carrying out deconvolution operation on the characteristic errors to obtain high-scale errors corresponding to the high-scale characteristic images, and carrying out addition processing on the high-scale errors and the spliced images after convolution operation.

In the embodiment of the application, an error feedback network is introduced into the first decoder, and error feedback correction in the characteristic reconstruction process is realized through the error feedback network, so that the inherent connection such as semantic consistency exists among characteristic graphs with different scales, and the clear accuracy of an output target underwater image and the consistency with the content structure of an underwater image to be processed are further ensured.

In one possible implementation, the enhancement model to be trained includes: a second encoder, a second attention network, a second decoder, a discrimination network, and a supervisory optimization network; the target enhancement model is obtained by the following steps:

selecting a training sample pair from the training data set, wherein the training sample pair comprises: an original image and a corresponding synthesized underwater image;

performing feature extraction on the synthesized underwater image through a second encoder to obtain at least two training feature images;

respectively carrying out attention processing on at least two training feature graphs through a second attention network to acquire corresponding training attention diagrams;

at least two training attention diagrams are spliced with corresponding training up-sampling feature diagrams with the same scale through a second decoder, and at least two output images are obtained, wherein the at least two output images comprise a first type of output images with the same scale as the synthesized underwater image and a second type of output images with the scale smaller than that of the synthesized underwater image;

Determining, by the discrimination network, a countermeasures loss based on a first comparison result between the first class output image and the original image;

determining corresponding pixel level loss respectively based on a second comparison result between the first class output image and the original image and a third comparison result between the second class output image and the reference image with the same scale through a supervision optimization network, wherein the reference image is obtained by downsampling the original image;

and constructing a loss function based on the loss resistance and the determined at least two pixel level losses, and carrying out at least one parameter adjustment on the enhancement model to be trained through the loss function until the conditions are met, so as to obtain the target enhancement model.

In the embodiment of the application, a specific mode for training an enhanced network to be trained is provided so as to obtain a target enhanced model for image enhancement processing; when the enhancement network to be trained is trained, the enhancement model to be trained is subjected to at least one parameter adjustment based on the countermeasures loss corresponding to the output image with the same scale as the input image in at least two output images output by the enhancement model to be trained and the loss function constructed by the pixel level loss corresponding to each of the at least two output images, so that the enhancement model to be trained is optimized jointly, and the object enhancement model obtained after optimization can be guaranteed to obtain an enhancement image which is clear and accurate and consistent with the content structure of the input image, wherein the clear and accurate enhancement image comprises the characteristics of complete color cast recovery and full detail enhancement of the enhancement image obtained after image enhancement.

In one possible implementation, the composite underwater image is determined from the original image based on a physical imaging model of the underwater image that was successfully trained.

In the embodiment of the application, the synthesized underwater image is determined based on the physical imaging model of the underwater image with successful training, so that the training sample pair in the training data set is accurately determined, the accuracy of the target enhancement model with successful training is further ensured on the basis of ensuring the accuracy of the training sample pair, namely, when the underwater image is subjected to image enhancement through the target enhancement model, the accuracy of the underwater image after image enhancement is ensured, and the image enhancement effect of the underwater image is improved.

In one possible implementation, performing at least one parameter adjustment on the enhancement model to be trained based on the loss function includes:

and based on the loss function, carrying out at least one parameter adjustment on the enhancement model to be trained through an optimizer.

In the embodiment of the application, based on the loss function, the to-be-trained enhancement model is subjected to at least one parameter adjustment through the optimizer, and the to-be-trained enhancement model is subjected to optimization treatment through the optimizer, so that the accuracy of the optimization of the to-be-trained enhancement model is ensured.

In a second aspect, an embodiment of the present application provides an image enhancement apparatus, including:

the acquisition unit is used for acquiring the underwater image to be processed;

the processing unit is used for carrying out image enhancement processing on the underwater image to be processed based on the target enhancement model, and obtaining the target underwater image after image enhancement;

the target enhancement model is obtained after at least one parameter adjustment is performed on the enhancement model to be trained based on a loss function, the loss function is based on the counterloss determined by a first type of output image with the same scale as the input image in at least two output images output by the enhancement model to be trained, and the pixel level loss corresponding to the at least two output images is constructed, and the scales of the at least two output images are different.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor, wherein the memory is used for storing a computer program; and a processor for executing a computer program to implement the steps of the image enhancement method provided by the embodiment of the application.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of the image enhancement method provided by the embodiments of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium; when the processor of the electronic device reads the computer program from the computer-readable storage medium, the processor executes the computer program, so that the electronic device executes the steps of the image enhancement method provided by the embodiment of the application.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a first enhancement model to be trained according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a second enhancement model to be trained according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a second encoder according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a second attention network according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a first second decoder according to an embodiment of the present application;

fig. 7 is a schematic diagram of a discrimination network according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a specific structure of an enhancement model to be trained according to an embodiment of the present application;

FIG. 9 is a flowchart of a method for training an enhancement model to be trained according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a second decoder according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a specific structure of another enhancement model to be trained according to an embodiment of the present application;

FIG. 12 is a flowchart of another method for training an enhancement model to be trained according to an embodiment of the present application;

FIG. 13 is a schematic diagram of a target enhancement model according to an embodiment of the present application;

FIG. 14 is a flowchart of a method for image enhancement according to an embodiment of the present application;

FIG. 15 is a schematic diagram of the enhancement results of each algorithm on a synthesized underwater image according to an embodiment of the present application;

FIG. 16 is a schematic diagram of the enhancement results of each algorithm on a real underwater image according to an embodiment of the present application;

FIG. 17 is a flowchart of an embodiment of an image enhancement method according to the present application;

FIG. 18 is a block diagram of an image enhancement device according to an embodiment of the present application;

FIG. 19 is a block diagram of another image enhancement device according to an embodiment of the present application;

FIG. 20 is a block diagram of an electronic device according to an embodiment of the present application;

fig. 21 is a block diagram of another electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantageous effects of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The word "exemplary" is used hereinafter to mean "serving as an example, embodiment, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The terms "first," "second," and the like, herein below are used for descriptive purposes only and are not to be construed as either explicit or implicit relative importance or as indicative of the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more features, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.

The following briefly describes the design concept of the embodiment of the present application.

The application relates to the technical field of image processing, in particular to the technical field of image enhancement processing of underwater images.

Because the underwater image directly acquired in the underwater imaging environment has complex degradation phenomenon, the usability of the underwater image in visual tasks such as target detection, semantic segmentation and the like is reduced. Therefore, in order to improve the usability of the underwater image in visual tasks such as target detection, semantic segmentation and the like, image enhancement processing is generally performed on the underwater image to obtain a clear underwater image.

In the related art, image enhancement methods for performing image enhancement processing on an underwater image can be classified into three types, that is, a restoration method based on a conventional imaging model, an enhancement method based on a non-physical model, and a method based on deep learning, respectively.

However, when the image enhancement processing is performed on the underwater image by the restoration method based on the traditional imaging model and the enhancement method based on the non-physical model, the method has higher computational complexity and poorer instantaneity, the generalization capability of the algorithm is poor, and the image enhancement effect is general. Thus, a deep learning-based method with relatively low computational complexity and good image enhancement results. Although the image enhancement processing is carried out on the underwater image by the method based on the deep learning, the method based on the deep learning has relatively low computational complexity and good image enhancement effect, the method based on the deep learning still has the problems of incomplete color shift recovery, insufficient detail enhancement and the like, wherein the color shift refers to that the hue and the saturation of a certain color in the image are obviously different from those of a real image, and the incomplete color shift recovery refers to that pixel points with unrecovered hue and saturation exist in the enhanced image; that is, after the underwater image is enhanced by the deep learning method, the image obtained after the image enhancement is not clear and accurate enough, and the image enhancement effect is poor.

In view of the above, embodiments of the present application provide an image enhancement method, apparatus and electronic device, and in particular, to an underwater image multi-scale progressive enhancement algorithm based on a coding and decoding structure and an attention mechanism, so as to enhance an image enhancement effect of an underwater image when the underwater image is subjected to image enhancement processing.

In the embodiment of the application, firstly, parameter adjustment is carried out on an enhancement model to be trained at least once so as to obtain a target enhancement model which is successfully trained; then, performing image enhancement processing on the underwater image to be processed through a target enhancement model which is successfully trained, and obtaining a target underwater image after image enhancement;

when the target enhancement model which is successfully trained is obtained by carrying out parameter adjustment on the enhancement model to be trained at least once, the enhancement model to be trained is constructed through a second encoder, a second attention network, a second decoder, a discrimination network and a supervision optimization network, and the enhancement model to be trained is trained in the following manner, so that the target enhancement model which is successfully trained is obtained:

First, a training sample pair is selected from a training data set, wherein the training sample pair comprises: an original image and a corresponding synthesized underwater image;

secondly, inputting the synthesized underwater image in the training sample pair into a second encoder, and extracting features of the synthesized underwater image through the second encoder to obtain at least two training feature images;

then, inputting the obtained at least two training feature images into a second attention network, and respectively carrying out attention processing on the at least two training feature images through the second attention network to obtain corresponding training attention diagrams;

then, inputting the obtained at least two training attention patterns into a second decoder, and respectively splicing the at least two training attention patterns with corresponding training up-sampling feature patterns with the same scale through the second decoder to obtain at least two output images, wherein the at least two output images comprise a first type of output images with the same scale as the synthesized underwater image and a second type of output images with the scale smaller than the synthesized underwater image;

finally, inputting the first class output image into a discrimination network, and determining the countermeasures loss based on a first comparison result between the first class output image and the original image through the discrimination network; inputting the obtained at least two output images into a supervision optimization network, and respectively determining corresponding pixel level losses based on a second comparison result between the first class output images and the original images and a third comparison result between the second class output images and the reference images with the same scale through the supervision optimization network, wherein the reference images are obtained by downsampling the original images; and constructing a loss function based on the loss resistance and the determined at least two pixel level losses, and performing at least one parameter adjustment on the enhancement model to be trained through the loss function until the condition is met, so as to obtain the target enhancement model successfully trained.

In the embodiment of the present application, artificial intelligence (Artificial Intelligence, AI) and Machine Learning (ML) technologies are involved.

Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence.

Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. Artificial intelligence techniques mainly include computer vision techniques, natural language processing techniques, machine learning/deep learning, and other major directions. As artificial intelligence technology research and advances, artificial intelligence has been developed and applied in a variety of fields.

Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Compared with the data mining, which finds the mutual characteristics among big data, the machine learning is more focused on the design of an algorithm, so that a computer can automatically learn the rules from the data and predict unknown data by utilizing the rules.

Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like. Reinforcement learning (Reinforcement Learning, RL), also known as re-excitation learning, evaluation learning, or reinforcement learning, is one of the paradigm and methodology of machine learning to describe and solve the problem of agents (agents) through learning strategies to maximize returns or achieve specific goals during interactions with an environment.

In the embodiment of the application, firstly, through the countermeasures loss corresponding to the output image with the same scale as the input image in at least two output images output by the enhancement model to be trained and the loss function constructed by the pixel level loss corresponding to each of the at least two output images, the enhancement model to be trained is subjected to at least one parameter adjustment so as to jointly complete the optimization of the enhancement model to be trained, the target enhancement model is obtained, and the enhancement image which is clear and accurate and consistent with the content structure of the input image can be obtained by the target enhancement model obtained after the optimization, wherein the clear and accurate comprises the characteristics of complete color cast recovery and full detail enhancement of the enhancement image obtained after the image enhancement; then, based on a target enhancement model, performing image enhancement processing on the acquired underwater image to be processed, and acquiring an image enhanced target underwater image; at this time, the target underwater image is an enhanced image which is clear and accurate and consistent with the content structure of the underwater image to be processed. Therefore, when the image enhancement method provided by the application is used for carrying out image enhancement processing on the underwater image, the image enhancement effect of the underwater image is improved.

After the design idea of the embodiment of the present application is introduced, some simple descriptions are made below for application scenarios applicable to the technical solution of the embodiment of the present application, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present application and are not limiting. In the specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario according to an embodiment of the present application. The application scenario includes a terminal device 110 and a server 120, where the terminal device 110 and the server 120 may communicate through a communication network.

In an alternative embodiment, the communication network may be a wired network or a wireless network. Accordingly, the terminal device 110 and the server 120 may be directly or indirectly connected through wired or wireless communication. For example, the terminal device 110 may be indirectly connected to the server 120 through a wireless access point, or the terminal device 110 may be directly connected to the server 120 through the internet, which is not limited herein.

In the embodiment of the present application, the terminal device 110 includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a desktop computer, an electronic book reader, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, and the like; the terminal equipment can be provided with a client for acquiring and displaying the underwater image, wherein the client can be a software application (such as a browser, video software and the like) or a webpage, an applet and the like;

The server 120 is a background server corresponding to software, web pages, applets, etc., or a server dedicated for image enhancement processing, and the present application is not limited in detail. The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform.

It should be noted that, the image enhancement method in the embodiment of the present application may be performed by an electronic device, which may be the server 120 or the terminal device 110, that is, the method may be performed by the server 120 or the terminal device 110 separately, or may be performed by both the server 120 and the terminal device 110 together.

When the terminal device 110 performs alone, for example, the terminal device 110 may acquire the underwater image to be processed, and then perform image enhancement processing on the underwater image to be processed based on the target enhancement model, so as to acquire the target underwater image after image enhancement.

When the server 120 performs alone, for example, the terminal device 110 may acquire the underwater image to be processed, and then send the underwater image to be processed to the server 120, and the server 120 performs image enhancement processing on the underwater image to be processed based on the target enhancement model, so as to acquire the target underwater image after image enhancement.

When the server 120 and the terminal device 110 perform together, for example, the terminal device 110 may perform on at least two initial feature images of the acquired underwater image to be processed, and transmit the acquired at least two initial feature images to the server 120, and the server 120 acquires the target underwater image after image enhancement based on the at least two initial feature images.

In the following, the server alone is mainly used as an example, and the present application is not limited thereto. That is, in a specific implementation, the terminal device 110 obtains an underwater image to be processed, and transmits the underwater image to be processed to the server 120, after the server 120 obtains the underwater image to be processed, the server 120 may perform image enhancement processing on the underwater image to be processed by adopting the image enhancement method of the embodiment of the present application, so as to obtain a target underwater image after image enhancement.

It should be noted that, the number of the terminal devices 110 and the servers 120 is not limited in practice, and is not particularly limited in the embodiment of the present application, which is shown in fig. 1 for illustration only.

In the embodiment of the present application, when the number of servers 120 is plural, plural servers 120 may be formed into a blockchain, and the servers 120 are nodes on the blockchain; the image enhancement method as disclosed in the embodiment of the application, wherein the set of training quantities involved can be stored on a blockchain.

Based on the above application scenario, the image enhancement method provided by the exemplary embodiment of the present application is described below with reference to the above application scenario described above, and it should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principle of the present application, the embodiment of the present application is not limited in this respect, and the embodiments described herein are only used for illustrating and explaining the present application, and are not used for limiting the present application, and the embodiments of the present application and features in the embodiments may be combined with each other without conflict.

The image enhancement method provided by the embodiment of the application is realized through the target enhancement model, so that if the image enhancement method provided by the embodiment of the application is realized, the enhancement model to be trained is firstly trained, and the target enhancement model with successful training is obtained. Next, a training method of the image enhancement model will be described in detail with reference to the first embodiment.

Embodiment one: the training method of the image enhancement model is to obtain the target image enhancement model.

Referring to fig. 2, fig. 2 schematically provides a schematic diagram of a first enhancement model to be trained in an embodiment of the present application, where the enhancement model to be trained 20 includes: a generation network 21, a discrimination network 22, and a supervision optimization network 23;

wherein the generating network 21 is configured to output at least two output images, where the at least two output images include: a first type of output image having the same scale as the input synthesized underwater image, and a second type of output image having a scale smaller than the synthesized underwater image; the discrimination network 22 is used for determining the countermeasures loss based on the first class output image and the corresponding original image; the supervision optimization network 22 is configured to determine respective corresponding pixel level losses based on at least two output images and respective corresponding reference images with the same scale, where the reference images include an original image, and an image obtained by downsampling the original image, and the original image is an original image corresponding to a synthesized underwater image, that is, an image before the synthesized underwater image is synthesized; the enhancement model to be trained is subjected to at least one parameter adjustment with a loss function constructed based on the countermeasures loss and the at least two pixel level losses.

In one possible implementation, the generating network 21 is composed of a second encoder 210, a second attention network 211 and a second decoder 212, with the aim of learning the mapping relationship between the underwater image to the clear air grid image, to generate a target underwater image of the high quality underwater scene. Referring to FIG. 3, FIG. 3 schematically provides a schematic diagram of a second enhancement model to be trained in an embodiment of the present application;

the second encoder 210 is configured to perform feature extraction on the input synthesized underwater image, and obtain at least two training feature graphs; referring to fig. 4, fig. 4 schematically provides a schematic diagram of a second encoder according to an embodiment of the present application, where the second encoder 210 includes five sets of convolution layers, and the five sets of convolution layers perform four downsampling operations as an example; all convolution layers in the second encoder are cascade connection of convolution operation, instance regularization operation and linear rectification function activation operation, and the downsampling operation is maximum pooling operation, wherein the convolution operation is at least one of convolution operation with a convolution kernel size of 3×3, convolution operation with a convolution kernel size of 5×5 and convolution operation with a convolution kernel size of 7×7. It should be noted that fig. 4 is only an exemplary illustration, and the second encoder provided in the present application may further include more convolution layers.

The second attention network 211 is configured to perform attention processing on the training feature map output by the second encoder, and obtain a corresponding training attention map, where the second attention network is a mixed domain second attention network, and includes a channel attention network and a spatial attention network, that is, the second attention network sequentially applies an attention mechanism in two dimensions of the channel and the space to obtain a corresponding training attention map; referring to fig. 5, fig. 5 is a schematic diagram of a second attention network in an embodiment of the present application, where a channel attention process is first taken as an example, and then a spatial attention process is performed; in the embodiment of the application, the content information enhancement and the underwater tone removal of the underwater scene are effectively realized by adding the mixed domain second attention network into the jump connection structure of the second encoder and the second decoder.

The second decoder 212 is configured to perform feature stitching on at least two training attention patterns output by the second encoder and the obtained training upsampled feature map with the same scale, and output at least two output images; the second decoder 212 includes at least two sets of feature stitching layers, one set of convolution layers connected to each set of feature stitching layers, and an image output layer connected to at least two sets of convolution layers, and up-sampling operation is performed between two adjacent sets of convolution layers, referring to fig. 6, fig. 6 schematically provides a schematic diagram of a first second decoder according to an embodiment of the present application, where the schematic diagram includes four sets of feature stitching layers, five sets of convolution layers, three sets of image output layers, and five sets of convolution layers in the second decoder 212. It should be noted that fig. 6 is only an exemplary illustration, and the second decoder provided in the present application may further include more convolutional layers.

All convolution layers in the second decoder are cascade connection of convolution operation, instance regularization operation and linear rectification function activation operation, and the convolution operation is at least one of convolution operation with a convolution kernel size of 3×3, convolution operation with a convolution kernel size of 5×5 and convolution operation with a convolution kernel size of 7×7; the feature splicing layer is used for splicing the two feature matrixes with the same dimension in the channel dimension; the image output layer is a convolution operation with the convolution kernel size of 1 multiplied by 1; the up-sampling operation is a deconvolution operation.

It should be noted that, in the embodiment of the present application, the upsampling operation in the second decoder may be replaced by a bilinear interpolation operation.

In one possible implementation, the discrimination network 22 is configured to compare the first type of output image output by the generation network and having the same scale as the synthesized underwater image with the original image, and determine a corresponding contrast loss, so as to adjust network parameters of the enhancement network to be trained through the contrast loss, so that the output image after image enhancement is closer to the original image.

In the embodiment of the present application, the architecture of the markov discriminator is used for the discrimination network, and referring to fig. 7, fig. 7 exemplarily provides a schematic diagram of the discrimination network in the embodiment of the present application, where five convolution layers are included in the discrimination network. The first four convolution layers are cascade connection of convolution operation, instance regularization operation and linear rectification function activation operation, wherein the convolution operation is at least one of convolution operation with a convolution kernel size of 3 multiplied by 3, convolution operation with a convolution kernel size of 5 multiplied by 5 and convolution operation with a convolution kernel size of 7 multiplied by 7; the last convolution layer is a convolution operation with a convolution kernel size of 3 x 3.

In the embodiment of the application, the optimized supervision network is used for determining the respective corresponding pixel level loss based on at least two output images and the respective corresponding reference images with the same scale, so as to adjust the network parameters of the enhanced network to be trained through a plurality of pixel level losses, realize the multi-scale output and multi-supervision process, further realize the reconstruction of high-resolution images from thick to thin, and improve the definition and accuracy of the target underwater image after the underwater image is enhanced.

Specifically, referring to fig. 8, the embodiment of fig. 8 provides a schematic structural diagram of an enhancement model to be trained according to an embodiment of the present application, where each portion included in the enhancement model to be trained and its corresponding function are referred to above.

Based on the enhancement model to be trained shown in fig. 8, in the embodiment of the present application, a model training method for training the enhancement model to be trained is further provided, so as to obtain a target enhancement model that is successfully trained. Referring to fig. 9, fig. 9 is a flowchart for exemplary providing a method for training an enhancement model to be trained in an embodiment of the present application, including the following steps:

step S900, selecting a training sample pair from the training data set, wherein the training sample pair comprises: an original image and a synthesized underwater image corresponding to the original image;

In the embodiment of the application, the enhancement model to be trained is realized based on a PyTorch deep learning framework, and the deep learning is a data-driven modeling mode. In the model training process, a training data set is firstly determined, images in the NYU-V2 data set are used as original images to meet the requirement of a model on training data, and a corresponding synthesized underwater image is generated by utilizing an underwater image physical imaging model aiming at each original image; and then, taking the original image and the corresponding synthesized underwater image as a training sample pair, and training the enhancement model to be trained.

Step S901, inputting a synthesized underwater image into a second encoder, and carrying out feature extraction on the synthesized underwater image through the second encoder to obtain at least two training feature images;

referring to fig. 8, when a synthetic underwater image is input into an encoder, firstly, feature extraction is performed on the synthetic underwater image through a group of convolution layers, and a first training feature map with the same scale as the synthetic underwater image is obtained; and then, carrying out downsampling operation on the first class of training feature images to obtain training downsampling feature images, and carrying out feature extraction on the training downsampling feature images through another group of convolution layers to obtain second class of training feature images with the scale smaller than that of the synthesized underwater image.

In one possible implementation, at least one downsampling operation is performed in the second encoder, so that at least two training feature maps are available through the second encoder.

Step S902, inputting the obtained at least two training feature images into a second attention network, and respectively performing attention processing on the at least two training feature images through the second attention network to obtain corresponding training attention diagrams;

the second attention network provided by the embodiment of the application consists of a channel attention network and a space attention network; thus, when the training feature map is attentively processed through the second attentiveness network: firstly, carrying out channel attention processing on a training feature map through a channel attention network to acquire a channel training attention map; then, carrying out feature map multiplication operation on the channel training attention map and the training feature map to obtain an intermediate training feature map; then, the intermediate training feature map is subjected to spatial attention processing through a spatial attention network, and a spatial training attention map is obtained; and finally, performing feature map multiplication operation on the spatial training attention map and the intermediate training feature map to obtain a corresponding training attention map.

The following describes in detail an example of performing attention processing on one training feature map to obtain a corresponding training attention map.

The training feature map in the second encoder is defined as F, the dimension size of F is h×w×c, where h represents high, w represents wide, and c represents the number of channels. First, channel attention processing is performed on F, and then spatial attention processing is performed on the result of channel attention. The channel attention focuses on which channel features are meaningful, firstly, the training feature map is compressed in the space dimension (h×w), and two channel descriptors of 1×1×c are obtained by aggregating space information through maximum pooling and global average pooling. Then, two channel descriptors of 1×1×c are input into a shared network composed of one multi-layer perceptron (Multilayer Perceptron, MLP), and a transform result output is obtained. Then, the two groups of feature vectors are added and combined element by element and then pass through a Sigmoid function (sigma) to obtain a weight coefficient M of normalized channel training attention diagram _c . The process can be represented by the following formula:

M _c (F)＝σ(MLP(MaxPool(F))+MLP(AvgPool(F))

finally, the weight coefficient M _c And multiplying the training feature images F to obtain a new feature matrix corresponding to the intermediate training feature images F' passing through the channel attention mechanism, thereby realizing the purpose of focusing on meaningful channels and ignoring useless channels.

The spatial attention network takes as input an intermediate training feature map F' (dimension h×w×c) that completes the channel attention process, which positions in the focus feature map are the areas that need attention. Similar to the implementation principle of channel attention, the spatial attention network firstly performs maximum pooling and global average pooling operations on the input intermediate training feature diagram F' along the channel dimension to obtain two feature diagrams with the size of h×w×1, and splices the two feature diagrams with the size of h×w×1 in the channel dimension to generate a feature descriptor. Then, through a 7×7 convolution layer, the weighting coefficient M of the spatial training attention diagram is obtained by activating the same with the Sigmoid function (sigma) _s . The process is expressed as:

M _s (F)＝σ(f ^7×7 ([MaxPool(F)；MLP(AvgPool(F)]))

finally, the weight coefficient M _s With input intermediate training feature map F'And multiplying to obtain a new feature matrix of training attention force diagram through a spatial attention mechanism, and realizing the purpose of focusing on a meaningful region and ignoring a useless region.

Step S903, inputting the obtained at least two training attention patterns into a corresponding second decoder, and sequentially splicing the training attention patterns with training up-sampling feature patterns of the same scale according to the mode of the scale from small to large based on the scale size of the obtained at least two training attention patterns through a feature splicing layer in the second decoder to obtain at least two output images;

Wherein the scales of the at least two output images are different, i.e. the resolutions of the at least two output images are different; and the at least two output images include: a first type of output image having the same scale as the synthesized underwater image, and a second type of output image having a scale smaller than the synthesized underwater image.

It should be noted that, in the embodiment of the present application, the scale may be replaced by the resolution.

Taking a group of feature stitching layers as an example, the acquisition of an output image will be described in detail with reference to fig. 8: inputting the training attention diagram and the training up-sampling feature diagram into a feature splicing layer, splicing the training attention diagram and the training up-sampling feature diagram in the feature splicing layer, performing corresponding operation on the spliced feature diagram through a convolution layer, and finally outputting a corresponding output image through an image output layer.

It should be noted that, fig. 8 is a schematic structural diagram of acquiring three output images by using three sets of convolution layers, and fig. 8 is merely an exemplary illustration and is not a unique representation.

Step S904, inputting the obtained first class output image with the same scale as the synthesized underwater image into a discrimination network, and determining the countermeasures loss based on a first comparison result between the first class output image and an original image through the discrimination network;

In the model training process, a first class of output images output by a generating network and the corresponding original images are input into a judging network to obtain a judging matrix with the size of 1 multiplied by 16, wherein each element of the matrix corresponds to one bigger receptive field on two input images, the two input images are judged in each local receptive field range, and the countermeasures are determined; and providing high-frequency supervision information in aspects of style, local semantic content and the like for the underwater image enhancement task through joint optimization of the discrimination network and the generation network.

Step S905, inputting the acquired at least two output images into a supervision optimization network, and respectively determining corresponding pixel level losses based on a second comparison result between the first class output image and an original image and a third comparison result between the second class output image and a reference image with the same scale through the supervision optimization network, wherein the reference image is obtained by downsampling the original image;

in the embodiment of the application, for the second type of output image output with low resolution, reference images which are obtained by downsampling original images to respective images with the same resolution are used for supervision, and for the first type of output image output with high resolution, the original images are used for supervision. Through the multi-scale output and multi-supervision process, the high-resolution image reconstruction from coarse to fine is realized, and the definition and accuracy of the enhancement result are improved.

Step S906, constructing a loss function based on the loss of the at least two pixel levels determined and the anti-loss;

in an embodiment of the present application, the loss function may be expressed as:

L＝L _adv +λL _multi

wherein L is _adv To combat losses, L _multi For multi-scale pixel loss, lambda is a weight parameter corresponding to the multi-scale pixel loss, and is determined by adopting a cross verification mode.

Specifically, L _adv Can be expressed as:

L _adv ＝Ε{[logD(x _i )]+[log(1-D(G(y _i )))]}

wherein x is _i Representing the original image, y _i Is represented by x _i Synthetic underwater image synthesized by underwater physical imaging model, G (y) _i ) Generating network outputEnhancement of the result, D (G (y) _i ) D (x) _i ) And judging the judging result output by the network.

L _multi Can be expressed as:

L _multi ＝τ ₁ L _p1 +τ ₂ L _p2 +τ ₃ L _p3

wherein L is _p1 ，L _p2 ，L _p3 The pixel level loss of the model in three scales is represented, and an L1 loss function is adopted for the pixel level loss to ensure the definition of the generated image. τ ₁ 、τ ₂ And τ ₃ The weights of pixel losses of different scale reconstruction results from high to low in the multi-scale pixel losses are respectively represented, and the weights are determined by adopting a cross-validation mode.

It should be noted that, in the embodiment of the present application, λ and τ are taken in the setting of the super-parameters of the loss function ₁ 、τ ₂ And τ ₃ The values of (2) are 10, 0.6, 0.3, 0.1, respectively.

Step S907, carrying out at least one parameter adjustment on the enhancement model to be trained through the loss function until the condition is met, and obtaining a target enhancement model;

The condition may be that the convergence condition is satisfied, or the adjustment times are satisfied to reach an upper limit, for example, the adjustment times are set to 100, and when the parameter adjustment times reach 100 times, the target enhancement model is obtained.

In the embodiment of the application, based on a loss function, at least one parameter adjustment is carried out on the enhancement model to be trained through an optimizer; wherein, the optimizer is an ADAM optimizer, and momentum attenuation indexes beta_1 and beta_2 of the optimizer are respectively set to 0.50 and 0.999, and the initial learning rate is 0.001.

In the application, the optimization process of the enhancement model to be trained is completed by combining the loss function formed by the contrast loss and the pixel loss, so that the enhancement model to be trained can obtain the enhanced image which is clear and accurate and consistent with the original image content structure.

In the embodiment of the application, in order to ensure the accuracy of model training, an error feedback network is introduced into the second decoder to realize error feedback correction in the characteristic reconstruction process, so that the inherent relations such as semantic consistency among characteristic graphs with different scales are kept as far as possible, the error feedback network is restored and reconstructed step by multiple scales, the loss of detail information of images is prevented, and the definition and accuracy of images obtained after the underwater images are enhanced are improved.

Referring to fig. 10, fig. 10 schematically provides a schematic diagram of a second decoder incorporating an error feedback network in an embodiment of the present application; the error feedback network is used to determine the error between the images, including a convolution layer and a deconvolution layer.

Therefore, in an embodiment of the present application, an enhancement model to be trained including an error feedback network is provided, and referring to fig. 11, fig. 11 schematically provides a specific structural diagram of another enhancement model to be trained in an embodiment of the present application.

Based on the enhancement model to be trained shown in fig. 11, in the embodiment of the present application, another model training method for training the enhancement model to be trained is provided to obtain a target enhancement model that is successfully trained. Referring to fig. 12, fig. 12 is a flowchart for exemplary providing a method for training an enhancement model to be trained in an embodiment of the present application, including the following steps:

step S1200, selecting a training sample pair from the training data set, where the training sample pair includes: an original image, and a composite underwater image corresponding to the original image.

Step S1201, inputting the synthesized underwater image into a second encoder, and performing feature extraction on the synthesized underwater image by the second encoder to obtain at least two training feature images.

Step S1202, inputting the obtained at least two training feature graphs into a second attention network, and performing attention processing on the at least two training feature graphs through the second attention network to obtain corresponding training attention diagrams.

Step S1203, inputting the obtained at least two training attention patterns into a corresponding second decoder, and sequentially splicing the training attention patterns with training up-sampling feature patterns of the same scale according to the scale from small to large based on the scale of the obtained at least two training attention patterns through a feature splicing layer in the second decoder to obtain at least two spliced images;

step S1204, after the acquired at least two spliced images pass through the convolution layer, inputting the spliced images into an error feedback network of a corresponding second decoder, and after correcting the spliced images subjected to convolution operation through the error feedback network, outputting at least two output images through an image output layer;

next, taking a stitched image as an example, the correction processing of the stitched image after the convolution operation by the error feedback network will be described in detail.

In one possible implementation manner, the training upsampling feature map corresponding to the stitched image is used as the training high-scale feature map f _H ' namely, a high-scale feature map obtained after only upsampling and other operations, and a feature map corresponding to the spliced image when the training upsampling feature map is not upsampled is used as a training low-scale feature map f _L I.e. the low-scale feature map of the previous layer output in the second decoder; at this time, correction processing is performed on the spliced image after the convolution operation through an error feedback network, including:

for training high-scale feature map f _H ' downsampling to obtain corresponding downsampled feature map, and determining downsampled feature map and training low-scale feature map f _L The feature errors between the feature images, wherein the downsampled feature images have the same scale as the low-scale feature images;

deconvolution operation is carried out on the feature errors to obtain a training high-scale feature map f _H 'corresponding high-scale error', and adding and processing the spliced image after the high-scale error and convolution operation to obtain a training high-scale reconstruction feature map f obtained after error feedback correction optimization _H 。

Because f _L And f _H ' both represent enhanced feature patterns so that they differ only in scale if the high-scale feature f is to be trained _H ' downsampling by the convolutional layer, again resulting in low-scale features f _L ' should be consistent with the original training low-scale feature map f _L Is completely consistent. However, in model training and application, f is due to the characteristic loss and deviation in the processes of up-sampling and the like _L ' and f _L There may be errors. By calculating a feature error E between two low-scale features _L And deconvolution to obtain E _L Corresponding high scale error E _H . Will E _H The feature correction of the up-sampling process can be realized by adding and processing the spliced image after the convolution operation, and a training high-scale reconstruction feature map f after error feedback correction optimization is obtained _H Therefore, the characteristic loss in the decoding process is reduced as much as possible, and the model performance is optimized.

Step S1205, inputting the obtained first class output image with the same scale as the synthesized underwater image into a discrimination network, and determining the countermeasures loss based on the first comparison result between the first class output image and the original image through the discrimination network.

Step S1206, inputting the obtained at least two output images into a supervision optimization network, and determining, by the supervision optimization network, corresponding pixel level losses based on a second comparison result between the first class output image and the original image and a third comparison result between the second class output image and a reference image of the same scale, wherein the reference image is obtained by downsampling the original image.

Step S1207, constructing a loss function based on the at least two pixel level losses determined and the anti-loss.

Step S1208, performing parameter adjustment on the enhancement model to be trained at least once through the loss function until the condition is met, and obtaining the target enhancement model.

It should be noted that, the steps not illustrated in fig. 12 refer to fig. 9, and the detailed description is not repeated here.

In the application, when the enhancement network to be trained is trained, the enhancement network to be trained is mainly based on the countermeasures loss corresponding to the output image with the same scale as the input image in at least two output images output by the enhancement model to be trained and the loss function constructed by the pixel level loss corresponding to each of the at least two output images, and the enhancement model to be trained is subjected to at least one parameter adjustment so as to jointly complete the optimization of the enhancement model to be trained, thereby ensuring that the enhancement image which is obtained through the optimization and has the same structure as the content of the input image can be obtained, wherein the sharpness and accuracy comprise the characteristics of complete color cast recovery and full detail enhancement of the enhancement image obtained after the image enhancement.

Based on the model training method, after a target enhancement model which is successfully trained is obtained, performing image enhancement processing on an underwater image to be processed through the target enhancement model; next, the enhancement processing of the underwater image based on the target image enhancement model will be described in detail with reference to the implementation of the second embodiment.

Embodiment two: the image enhancement method is to enhance the underwater image based on the target image enhancement model.

In the embodiment of the present application, a target enhancement model for image enhancement processing is provided, and referring to fig. 13, fig. 13 schematically provides a schematic diagram of a target enhancement model in the embodiment of the present application, where the target enhancement model is obtained by the model training method provided in the first embodiment, that is, the target enhancement model is obtained after at least one parameter adjustment on the enhancement model to be trained based on a loss function, where the loss function is based on a countermeasures loss corresponding to an output image with the same scale as an input image in at least two output images output by the enhancement model to be trained, and pixel-level losses corresponding to at least two output images are respectively constructed, and scales of at least two output images are different, and detailed training process is not repeated here.

Referring to fig. 14, fig. 14 is a flowchart illustrating a method for enhancing an image according to an embodiment of the present application, wherein the method includes the following steps:

step S1400, acquiring an underwater image to be processed;

the underwater image to be processed can be a real underwater image obtained through a terminal device, such as a real underwater image shot by the terminal device or a real underwater image obtained through a searching mode by the terminal device, or a synthesized underwater image generated by utilizing an underwater image physical imaging model; namely, the embodiment of the application is suitable for scenes for carrying out image enhancement on the real underwater image and the synthesized underwater image.

Step S1401, performing image enhancement processing on the underwater image to be processed based on the target enhancement model, and obtaining the target underwater image after image enhancement.

In order to verify that the image enhancement method provided in the present application is superior to other image enhancement methods in the related art, in the embodiment of the present application, image enhancement processing is performed on at least one synthetic underwater image through a target enhancement model, a target underwater image is obtained after the image enhancement processing, and the target underwater image is compared with the underwater image after the image enhancement processing through other image enhancement methods, and with reference to fig. 15, a schematic diagram of each algorithm enhancement result on the synthetic underwater image is exemplarily provided in fig. 15; and comparing the enhancement index data of the target underwater image subjected to enhancement processing through the target enhancement model with enhancement index data corresponding to the underwater image subjected to enhancement processing through other image enhancement methods, wherein referring to table 1, table 1 provides the comparison result of the enhancement index data of the algorithm on the synthesized underwater image in an exemplary manner.

TABLE 1

Similarly, in the embodiment of the present application, image enhancement processing is performed on at least one real underwater image through a target enhancement model, a target underwater image is obtained after the image enhancement processing, and the target underwater image is compared with the underwater image after the image enhancement processing through other image enhancement methods, and referring to fig. 16, fig. 16 is an exemplary schematic diagram for providing each algorithm enhancement result on the real underwater image; and comparing the enhancement index data of the target underwater image subjected to enhancement processing through the target enhancement model with enhancement index data corresponding to the underwater image subjected to enhancement processing through other image enhancement methods, wherein table 2 is used for exemplarily providing the comparison result of the enhancement index data of the algorithm on the real underwater image with reference to table 2.

TABLE 2

From the comparison data, the target enhancement model provided by the application can reach 23.9484dB of peak signal-to-noise ratio (Peak Signal to Noise Ratio, PSNR) and 0.9032 of structural similarity (Structural Similarity Index, SSIM) for synthesized underwater images. For a real underwater image, the underwater image quality evaluation index (Underater Image Quality Measurement, UIQM) index can reach 4.5284, which is superior to other image enhancement methods compared with the real underwater image.

In the embodiment of the application, when the target underwater image after image enhancement is acquired based on the target enhancement model, the underwater image to be processed can be a real underwater image or a synthesized underwater image. However, whether it is a true underwater image or a synthesized underwater image, the operation flow of the image enhancement method provided by the embodiment of the present application is unchanged.

Referring to fig. 17, fig. 17 is a flowchart for illustrating a specific implementation method of image enhancement in an embodiment of the present application, including the following steps:

step S1700, acquire an underwater image to be processed.

Step S1701, inputting the underwater image to be processed into a first encoder of a target enhancement model, and extracting features of the underwater image to be processed through the first encoder to obtain at least two initial feature images.

In one possible implementation manner, feature extraction is performed on the underwater image to be processed through a first encoder, a first type initial feature map with the same scale as the underwater image to be processed is obtained, and a second type initial feature map obtained after at least one downsampling of the first type initial feature map is performed.

In step S1702, the obtained at least two initial feature maps are input into a first attention network of the target enhancement model, and attention processing is performed on the at least two initial feature maps through the first attention network, so as to obtain corresponding target attention map.

In one possible implementation manner, for any initial feature map, performing channel attention processing on the initial feature map, obtaining a channel attention map, and fusing the channel attention map with the initial feature map to obtain an intermediate feature map;

and performing spatial attention processing on the intermediate feature map to obtain a spatial attention map, and fusing the spatial attention map with the intermediate feature map to obtain a corresponding target attention map.

Step S1703, inputting the obtained at least two target attention diagrams into a first decoder of a target enhancement model, and respectively splicing the obtained target attention diagrams with the input up-sampling feature diagrams with the same scale through a feature splicing layer in the first decoder to obtain at least two spliced images;

Step S1704, performing convolution operation on at least two spliced images through a convolution layer in the first decoder, and determining the spliced images after the convolution operation;

step S1705, taking the input up-sampling feature map corresponding to the stitched image as a high-scale feature map, and taking the feature map corresponding to the input up-sampling feature map when up-sampling is not performed as a low-scale feature map;

step S1706, downsampling the high-scale feature map to obtain a downsampled feature map, and determining feature errors between the downsampled feature map and the low-scale feature map, wherein the downsampled feature map and the low-scale feature map have the same scale;

step S1707, deconvolution operation is carried out on the characteristic errors to obtain high-scale errors corresponding to the high-scale characteristic images, and addition processing is carried out on the high-scale errors and the spliced images after convolution operation to obtain characteristic reconstruction images;

step S1708, after determining that the feature reconstruction after the addition processing is the same as the dimension of the underwater image to be processed, determining a target underwater image based on the spliced image and outputting the target underwater image.

The steps S1703 to S1707 are performed to sequentially process the target attention patterns from small to large based on the size of the at least two obtained target attention patterns.

According to the application, firstly, based on the countermeasures of the corresponding countermeasures of the output images with the same scale as the input images in at least two output images output by the enhancement model to be trained and the loss functions constructed by the pixel level losses corresponding to the at least two output images, at least one parameter adjustment is carried out on the enhancement model to be trained so as to jointly complete the optimization of the enhancement model to be trained, the target enhancement model is obtained, and the enhancement images which are clear and accurate and consistent with the content structure of the input images can be obtained by the target enhancement model obtained after the optimization, wherein the clear and accurate characteristics comprise complete color cast recovery and full detail enhancement of the enhancement images obtained after the image enhancement; then, based on a target enhancement model, performing image enhancement processing on the acquired underwater image to be processed, and acquiring an image enhanced target underwater image; at this time, the target underwater image is an enhanced image which is clear and accurate and consistent with the content structure of the underwater image to be processed. Therefore, when the image enhancement method provided by the application is used for carrying out image enhancement processing on the underwater image, the image enhancement effect of the underwater image is improved.

Embodiment III: an image enhancement device.

The method embodiment of the present application is based on the same inventive concept, and the image enhancement device is provided in the embodiment of the present application, and the principle of solving the problem of the device is similar to that of the method of the embodiment, so that the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 18, fig. 18 exemplarily provides an image enhancement apparatus 1800 according to an embodiment of the present application, the image enhancement apparatus 1800 including:

an acquisition unit 1801 for acquiring an underwater image to be processed;

the processing unit 1802 is configured to perform image enhancement processing on an underwater image to be processed based on a target enhancement model, and obtain an image-enhanced target underwater image;

The processing unit 1802 is specifically configured to:

and based on the size of the acquired at least two target attention patterns, the first decoder sequentially splices the target attention patterns with the upsampled feature patterns with the same size according to the mode of the size from small to large until the acquired spliced image is the same as the underwater image to be processed, and then determines and outputs a target underwater image based on the spliced image.

In one possible implementation, the processing unit 1802 is specifically configured to:

the processing unit 1802 performs the following operations through the error feedback network:

In a possible implementation manner, the image enhancement apparatus further includes a training unit 1803, as shown in fig. 19, which is another image enhancement apparatus 1900 according to an embodiment of the present application, where an enhancement model to be trained includes: a second encoder, a second attention network, a second decoder, a discrimination network, and a supervisory optimization network; the target enhancement model is obtained by the training unit 1803 by:

In one possible implementation, the training unit 1803 is specifically configured to:

In the embodiment of the application, firstly, based on the countermeasures loss corresponding to the output image with the same scale as the input image in at least two output images output by the enhancement model to be trained and the loss function constructed by the pixel level loss corresponding to each of the at least two output images, at least one parameter adjustment is carried out on the enhancement model to be trained so as to jointly complete the optimization of the enhancement model to be trained, the target enhancement model is obtained, and the enhancement image which is clear and accurate and consistent with the content structure of the input image can be obtained by the target enhancement model obtained after the optimization, wherein the clear and accurate comprises the characteristics of complete color cast recovery and full detail enhancement of the enhancement image obtained after the image enhancement; then, based on a target enhancement model, performing image enhancement processing on the acquired underwater image to be processed, and acquiring an image enhanced target underwater image; at this time, the target underwater image is an enhanced image which is clear and accurate and consistent with the content structure of the underwater image to be processed. Therefore, when the image enhancement method provided by the application is used for carrying out image enhancement processing on the underwater image, the image enhancement effect of the underwater image is improved.

For convenience of description, the above parts are respectively described as functionally divided into units (or modules). Of course, the functions of each unit (or module) may be implemented in the same piece or pieces of software or hardware when implementing the present application.

Embodiment four: an electronic device.

Having described the image enhancement method and apparatus according to the exemplary embodiments of the present application, the same inventive concept as the method embodiments of the present application described above are further provided in the embodiments of the present application, where the electronic device may be a terminal device or a server.

Next, an electronic device for image enhancement according to another exemplary embodiment of the present application is described.

In this embodiment, when the electronic device is a server, it may be the server 120 shown in fig. 1, and the electronic device may have a structure as shown in fig. 20, including a memory 2001, a communication module 2003, and one or more processors 2002.

A memory 2001 for storing a computer program for execution by the processor 2002. The memory 2001 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a program required for running an instant messaging function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.

Memory 2001 may be a volatile memory (RAM), such as a random-access memory (RAM); the memory 2001 may be a nonvolatile memory (non-volatile memory), such as a read only memory (rom), a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or memory 2001, is any other medium that can be used to carry or store a desired computer program in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. Memory 2001 may be a combination of the above.

The processor 2002 may include one or more central processing units (central processing unit, CPU) or digital processing units, or the like. A processor 2002 for implementing the image enhancement method when invoking a computer program stored in a memory 2001.

The communication module 2003 is used for communication with the terminal device and other servers.

The specific connection medium between the memory 2001, the communication module 2003 and the processor 2002 is not limited in the embodiment of the present application. The embodiment of the present application is shown in fig. 20, where the memory 2001 and the processor 2002 are connected by a bus 2004. The bus 2004 is shown in bold in fig. 20, and the connection between other components is merely illustrative, and not limiting. The bus 2004 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 20, but only one bus or one type of bus is not depicted.

The memory 2001 stores therein a computer storage medium having stored therein computer executable instructions for implementing the image enhancing method of the embodiment of the present application.

In this embodiment, when the electronic device is a terminal device, it may be the terminal device 110 shown in fig. 1, and the structure of the electronic device may include, as shown in fig. 21: communication component 2110, memory 2120, display unit 2130, camera 2140, sensor 2150, audio circuitry 2160, bluetooth module 2170, processor 2180, and the like.

The communication component 2110 is for communicating with a server. In some embodiments, a circuit wireless fidelity (Wireless Fidelity, wiFi) module may be included, where the WiFi module belongs to a short-range wireless transmission technology, and the electronic device may help the user to send and receive information through the WiFi module.

Memory 2120 may be used to store software programs and data. The processor 2180 performs various functions of the terminal device 110 and data processing by executing software programs or data stored in the memory 2120. Memory 2120 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The memory 2120 stores an operating system that enables the terminal device 110 to operate. The memory 2120 of the present application may store an operating system and various applications, and may store code for performing the image enhancement method of the present application.

The display unit 2130 may also be used to display information input by a user or information provided to a user and a graphical user interface (graphical user interface, GUI) of various menus of the terminal device 110. In particular, the display unit 2130 may include a display screen 2132 disposed on a front side of the terminal device 110. The display 2132 may be configured in the form of a liquid crystal display, light emitting diodes, or the like. The display unit 2130 may be used to display an underwater image to be processed, an image-enhanced target underwater image, and the like in the embodiment of the present application.

The display unit 2130 may also be used to receive input numeric or character information, generate signal inputs related to user settings and function control of the terminal device 110, and in particular, the display unit 2130 may include a touch screen 2131 disposed on the front of the terminal device 110, may collect touch operations on or near the user, such as clicking buttons, dragging scroll boxes, and the like.

The touch screen 2131 may cover the display screen 2132, or the touch screen 2131 may be integrated with the display screen 2132 to implement input and output functions of the terminal device 110, and after integration, the touch screen may be abbreviated as touch screen. The display unit 2130 in the present application can display an application program and corresponding operation steps.

The camera 2140 may be used to capture still images, and a user may transmit the underwater image to be processed taken by the camera 2140 to other devices. The camera 2140 may be one or more. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the processor 2180 for conversion into a digital image signal.

The terminal device may further comprise at least one sensor 2150, such as an acceleration sensor 2151, a distance sensor 2152, a fingerprint sensor 2153, a temperature sensor 2154. The terminal device may also be configured with other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, light sensors, motion sensors, and the like.

Audio circuitry 2160, speakers 2161, microphone 2162 may provide an audio interface between the user and terminal device 110. The audio circuit 2160 may transmit the received electrical signal converted from audio data to the speaker 2161, and the electrical signal is converted into a sound signal by the speaker 2161 for output. The terminal device 110 may also be configured with a volume button for adjusting the volume of the sound signal. On the other hand, the microphone 2162 converts the collected sound signals into electrical signals, which are received by the audio circuit 2160 and converted into audio data, which are output to the communications component 2110 for transmission to, for example, another terminal device 110, or to the memory 2120 for further processing.

The bluetooth module 2170 is used for exchanging information with other bluetooth devices having bluetooth modules through bluetooth protocol. For example, the terminal device may establish a bluetooth connection with a wearable electronic device (e.g., a smart watch) also provided with a bluetooth module through the bluetooth module 2170, thereby performing data interaction.

The processor 2180 is a control center of the terminal device, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal device and processes data by running or executing software programs stored in the memory 2120, and calling data stored in the memory 2120. In some embodiments, the processor 2180 may include one or more processing units; the processor 2180 may also integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., and a baseband processor that primarily handles wireless communications. It will be appreciated that the baseband processor described above may not be integrated into the processor 2180. The processor 2180 of the present application may run an operating system, application programs, user interface displays and touch responses, as well as the image enhancement methods of the embodiments of the present application. In addition, the processor 2180 is coupled to a display unit 2130.

Fifth embodiment: program product.

In some possible embodiments, aspects of the image enhancement method provided by the present application may also be implemented in the form of a program product comprising a computer program for causing an electronic device to carry out the steps of the image enhancement method according to the various exemplary embodiments of the application as described in the present specification when the program product is run on an electronic device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and comprise a computer program and may run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having a computer-usable computer program embodied therein.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. An image enhancement method, comprising:

acquiring an underwater image to be processed;

performing image enhancement processing on the underwater image to be processed based on a target enhancement model to obtain an image-enhanced target underwater image;

the target enhancement model is obtained after at least one parameter adjustment is performed on the enhancement model to be trained based on a loss function, the loss function is constructed based on countermeasures corresponding to a first type of output image with the same scale as an input image in at least two output images output by the enhancement model to be trained, and pixel level losses corresponding to the at least two output images respectively, and the scales of the at least two output images are different.

2. The method of claim 1, wherein the target enhancement model comprises: a first encoder, a first attention network, and a first decoder;

the image enhancement processing is carried out on the underwater image to be processed based on the target enhancement model, and the image enhancement processing comprises the following steps:

extracting features of the underwater image to be processed through the first encoder to obtain at least two initial feature images;

respectively carrying out attention processing on the at least two initial feature images through the first attention network to acquire corresponding target attention diagrams;

And based on the size of the obtained at least two target attention patterns, the first decoder sequentially splices the target attention patterns with the upsampled feature patterns with the same size according to the mode of the size from small to large until the obtained spliced image is the same as the size of the underwater image to be processed, and then determines and outputs the target underwater image based on the spliced image.

3. The method of claim 2, wherein the extracting features of the underwater image to be processed by the first encoder, to obtain at least two initial feature maps, includes:

and extracting the characteristics of the underwater image to be processed through the first encoder to obtain a first type initial characteristic image with the same scale as the underwater image to be processed, and performing at least one downsampling on the first type initial characteristic image to obtain a second type initial characteristic image.

4. The method of claim 2, wherein the performing attention processing on the at least two initial feature maps, respectively, comprises:

performing channel attention processing on any initial feature map to obtain a channel attention map, and multiplying the channel attention map with the initial feature map to obtain an intermediate feature map;

And carrying out spatial attention processing on the intermediate feature map to obtain a spatial attention map, and carrying out multiplication processing on the spatial attention map and the intermediate feature map to obtain a corresponding target attention map.

5. The method according to claim 2, wherein the first decoder includes an error feedback network, and an input upsampled feature map corresponding to the stitched image is used as a high-scale feature map, and a feature map corresponding to the input upsampled feature map when upsampled is used as a low-scale feature map;

after the target attention is spliced with the upsampled feature map with the same scale, the following operations are executed through the error feedback network:

downsampling the high-scale feature map to obtain a downsampled feature map, and determining feature errors between the downsampled feature map and the low-scale feature map, wherein the downsampled feature map and the low-scale feature map have the same scale;

and performing deconvolution operation on the characteristic errors to obtain high-scale errors corresponding to the high-scale characteristic images, and adding and processing the high-scale errors and the spliced images after convolution operation.

6. The method according to any one of claims 1 to 5, wherein the enhancement model to be trained comprises: a second encoder, a second attention network, a second decoder, a discrimination network, and a supervisory optimization network; the target enhancement model is obtained by the following steps:

selecting a training sample pair from a training dataset, the training sample pair comprising: an original image and a corresponding synthesized underwater image;

extracting features of the synthesized underwater image through the second encoder to obtain at least two training feature images;

respectively carrying out attention processing on the at least two training feature maps through the second attention network to acquire corresponding training attention diagrams;

splicing the at least two training attention diagrams with corresponding training up-sampling feature diagrams with the same scale respectively through the second decoder to obtain at least two output images, wherein the at least two output images comprise a first type of output image with the same scale as the synthesized underwater image and a second type of output image with the scale smaller than that of the synthesized underwater image;

Determining, by the supervisory optimization network, respective pixel level losses based on a second comparison result between the first class output image and the original image, and a third comparison result between the second class output image and a reference image of the same scale, the reference image being obtained by downsampling the original image;

and constructing a loss function based on the countermeasures and the determined at least two pixel-level losses, and carrying out parameter adjustment on the enhancement model to be trained at least once through the loss function until the conditions are met, so as to obtain the target enhancement model.

7. The method of claim 6, wherein the synthetic underwater image is determined from the original image based on a successfully trained underwater image physical imaging model.

8. The method of claim 6, wherein the performing at least one parameter adjustment on the enhancement model to be trained based on the loss function comprises:

and based on the loss function, carrying out parameter adjustment on the enhancement model to be trained at least once through an optimizer.

9. An image enhancement apparatus, comprising:

the processing unit is used for carrying out image enhancement processing on the underwater image to be processed based on the target enhancement model, and obtaining a target underwater image after image enhancement;

the target enhancement model is obtained after at least one parameter adjustment is performed on the enhancement model to be trained based on a loss function, the loss function is constructed based on the countermeasures loss determined by the first type of output images with the same scale as the input images in at least two output images output by the enhancement model to be trained, and the pixel level loss corresponding to each of the at least two output images is constructed, and the scales of the at least two output images are different.

10. An electronic device, comprising:

a memory for storing a computer program executable by the processor;

the processor is connected to the memory and configured to perform the method of any of claims 1-8.