WO2022247232A1

WO2022247232A1 - Image enhancement method and apparatus, terminal device, and storage medium

Info

Publication number: WO2022247232A1
Application number: PCT/CN2021/137821
Authority: WO
Inventors: 陈翔宇; 刘翼豪; 章政文; 乔宇; 董超
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2021-05-27
Filing date: 2021-12-14
Publication date: 2022-12-01
Also published as: CN113298740A

Abstract

The present application relates to the technical field of deep learning, and provides an image enhancement method and apparatus, a terminal device, and a storage medium, capable of being improving the quality of an enhanced image in an image enhancement task. The image enhancement method comprises: acquiring an image to be processed; and inputting said image into a trained image enhancement model for processing, and outputting an enhanced image, the image enhancement model comprising a master network and a conditional network, the master network being of a U-net structure, and when said image is processed, extracting a plurality of feature tensors of different scales from said image by means of the conditional network, and respectively inputting said image and the plurality of feature tensors of different scales into network layers of corresponding scales in the master network for processing to obtain the enhanced image.

Description

An image enhancement method, device, terminal equipment and storage medium

technical field

The present application relates to the field of deep learning technology, and in particular to an image enhancement method, device, terminal equipment and storage medium.

Background technique

Image enhancement tasks generally include operations such as dehazing, denoising, deraining, super-resolution, decompression artifacts, deblurring, and high dynamic range (High Dynamic Range, HDR) reconstruction. With the wide application of deep learning technology in the fields of image processing and computer vision, how to improve the effect of image enhancement has become the focus of image enhancement tasks based on neural networks.

technical problem

For example, the U-net structure of the network as a special convolutional neural network (Convolutional Neural Network Networks, referred to as CNN), although it can well extract the spatial features of different scales of the input image in the processing of image enhancement tasks. However, the pure U-net structure network cannot specifically process the input features, so it is difficult to handle the features with large differences in the enhancement task at the same time, which affects the image enhancement effect, resulting in poor image quality in some areas of the enhanced image.

technical solution

In view of this, the present application provides an image enhancement method, device, terminal equipment, and storage medium, which can improve the quality of an enhanced image in an image enhancement task.

In the first aspect, the present application provides an image enhancement method, including: acquiring an image to be processed; inputting the image to be processed into a trained image enhancement model for processing, outputting an enhanced image, the image enhancement model includes a main network and a conditional network, and the main network It is a U-net structure. When processing the image to be processed, multiple feature tensors of different scales are extracted from the image to be processed through the conditional network, and the image to be processed and multiple feature tensors of different scales are respectively input to the main network. The network layer of the corresponding scale is processed to obtain an enhanced image.

Optionally, the main network includes M downsampling layers and M upsampling layers, the conditional network includes a shared convolution layer and M+1 feature extraction modules, and the M+1 feature extraction modules include different numbers of downsampling operations ; Extract multiple feature tensors of different scales from the image to be processed through the conditional network, including:

The intermediate features are extracted from the image to be processed through the shared convolutional layer; the intermediate features are respectively input into M+1 feature extraction modules for processing, and M+1 feature tensors of different scales are obtained.

Optionally, the main network also includes a first SFT layer and a plurality of residual modules, and the residual module includes alternately arranged second SFT layers and convolutional layers; the first SFT layer is connected to the input side of M downsampling layers and M On the output side of the upsampling layers, multiple residual modules are interspersed between M downsampling layers and M upsampling layers, and M+1 feature tensors of different scales are input to the first SFT layer and the second SFT layer respectively. In the SFT layer of the corresponding scale in the SFT layer.

Optionally, the image enhancement model also includes a weight network, the weight network includes skip connections and multi-layer convolutional layers; the enhanced image is obtained by fusing the output of the main network with the original features, and the original features are obtained from the weight network extracted from the image to be processed.

Optionally, the image to be processed is an LDR image, and the enhanced image is an HDR image.

Optionally, the image enhancement model is obtained after training the preset image enhancement initial model using a preset loss function and a training set; wherein, the training set includes a plurality of LDR image samples and the HDR corresponding to each LDR image sample Image sample, the preset loss function is used to describe the L1 loss between the value obtained by the Tanh function of the HDR predicted image and the value obtained by the Tanh function of the HDR image sample, and the HDR predicted image is the initial model of the image enhancement An image is obtained after processing the LDR image sample.

In a second aspect, the present application provides an image enhancement device, including:

The acquiring unit is used to acquire the image to be processed.

The processing unit is used to input the image to be processed into the trained image enhancement model for processing, and output the enhanced image. The image enhancement model includes a main network and a conditional network. The main network is a U-net structure. When processing the image to be processed, through The conditional network extracts multiple feature tensors of different scales from the image to be processed, and inputs the image to be processed and multiple feature tensors of different scales to the network layer of the corresponding scale in the main network for processing to obtain an enhanced image.

Optionally, the image enhancement model also includes a weight network, and the weight network includes a skip connection and a multi-layer convolutional layer; the enhanced image is obtained by fusing the output of the main network with the original feature, and the original feature is obtained by weight The network extracts from the image to be processed.

In a third aspect, the present application provides a terminal device, including: a memory and a processor, where the memory is used to store a computer program; and the processor is used to execute the method described in any one of the above first aspects when calling the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above-mentioned first aspects is implemented.

In a fifth aspect, an embodiment of the present application provides a computer program product, which, when the computer program product runs on a processor, causes the processor to execute the method described in any one of the above-mentioned first aspects.

Beneficial effect

The image enhancement method, device, terminal equipment and storage medium provided by this application add a conditional network to the main network of the U-net structure, and use the conditional network to extract multiple feature tensors of different scales from the image to be processed and input them to the In the main network, after the main network extracts the spatial features of different scales from the image to be processed, it can specialize the spatial features of different scales based on the multiple feature tensors of different scales, so as to retain the spatial features of different scales. effective information, thereby improving the quality of the enhanced image.

Description of drawings

FIG. 1 is a network architecture diagram 1 of an image enhancement model provided by an embodiment of the present application;

Fig. 2 is a network architecture diagram 2 of an image enhancement model provided by the embodiment of the present application;

FIG. 3 is a schematic flowchart of a method for an HDR reconstruction task provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a reconstruction effect of an HDR reconstruction task provided by an embodiment of the present application;

Fig. 5 is a schematic structural diagram of an image enhancement device provided by an embodiment of the present application

FIG. 6 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.

Embodiments of the present invention

In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.

For the image enhancement task, the present application provides an image enhancement method. After the image to be processed is acquired, the image to be processed is input into the image enhancement model provided by the application for processing, and the enhanced image is output. Among them, the image enhancement model provided by this application is to add a conditional network to the main network of the U-net structure, and use the conditional network to extract multiple feature tensors of different scales from the image to be processed as adjustment information and input them into the main network. After the main network extracts the spatial features of different scales from the image to be processed, it can specialize the spatial features of different scales based on the multiple feature tensors of different scales, so as to retain the effective information of the spatial features of different scales, so that Effectively improve the image enhancement effect and improve the quality of the enhanced image.

The technical solution of the present application will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.

First, an exemplary image enhancement model provided by the present application is introduced with reference to FIG. 1 . The image enhancement model is deployed in an image processing device, which may be a mobile terminal such as a smart phone, a tablet computer, or a camera, or a device capable of processing image data such as a desktop computer, a robot, or a server.

The image enhancement model provided by this application includes a main network and a condition network. Among them, the main network adopts a U-net structure, which includes M downsampling layers and M upsampling layers connected to the M downsampling layers by jumping. When the U-net structure network performs image enhancement tasks, it gradually extracts spatial features of different scales through layer-by-layer down-sampling layers, and restores the spatial features of corresponding scales through layer-by-layer up-sampling layers to identify Enhanced information corresponding to the pixel to achieve image enhancement.

In the embodiment of the present application, a conditional network is added to the main network of the U-net structure. The conditional network can be used to extract multiple feature tensors of different scales from the image to be processed, and the multiple feature tensors of different scales are used as adjustment The information is input into the main network, so that the main network can specialize the spatial features of different scales based on the feature tensors of different scales, so as to retain the effective information of the spatial features of different scales, thereby effectively improving the image enhancement effect. Improves the quality of enhanced images.

In one example, the conditional network can be designed based on the number of spatial features of different scales processed by the main network. For example, M downsampling layers of the main network mean that the main network can process spatial features of M+1 scales including the original scale. Then, correspondingly, the conditional network can output at most M+1 feature tensors of different scales corresponding to the spatial features of the M+1 scales.

For example, when the main network includes M downsampling layers and corresponding M upsampling layers, the conditional network may include shared convolutional layers and M+1 feature extraction modules, and the M+1 feature extraction modules include different numbers of downsampling operation. When processing an image to be processed, the conditional network first extracts intermediate features from the image to be processed by sharing a convolutional layer (for example, including multiple convolutional layers); then the intermediate features are respectively input into M+1 feature extraction modules for processing , so as to obtain M+1 feature tensors of different scales.

Exemplarily, as shown in FIG. 1 , assuming that M=2, the main network includes 2 downsampling (Down) layers and corresponding 2 upsampling (Up) layers. Then, the spatial features processed by the main network include the original scale (large scale) spatial features extracted from the image to be processed, and the intermediate scale (smaller than the large scale) obtained after the first layer of downsampling layer is downsampled on the original scale. ) spatial features, the small-scale (smaller than the intermediate scale) spatial features obtained after the second downsampling layer performs downsampling on the intermediate scale.

Correspondingly, the conditional network can include 3 feature extraction modules, and the first feature extraction module can be composed of a convolution (Conv) layer (take 3 convolution layers as an example in Figure 1), which includes 0 downsampling operations, The scale of the output feature tensor 1 is the original scale, which is used to specifically process the spatial features of the original scale. The second feature extraction module can be composed of a convolutional layer and a downsampling layer (take 2 convolutional layers and a downsampling layer as an example in Figure 1), that is, it includes a downsampling operation, and the output feature sheet The scale of quantity 2 is the intermediate scale, which is used to specifically process the spatial features of the intermediate scale. The third feature extraction module can be composed of a convolutional layer and 2 downsampling layers (take 1 convolutional layer and 2 downsampling layers as an example in Figure 1), which includes 2 downsampling operations, and the output feature sheet The scale of quantity 3 is a small scale, which is used for specific processing of small-scale spatial features.

In the embodiment of this application, the spatial feature transform (Spatial Feature Transform, SFT) layer can be designed in the main network to make the feature tensors of different scales output by the conditional network act on the spatial features of the corresponding scale. Exemplarily, the main network further includes a first SFT layer and multiple residual modules. The first SFT layer is connected to the input side of the M downsampling layers and the output side of the M upsampling layers. The residual module includes alternately arranged second SFT layers and convolutional layers; multiple residual modules are interspersed between M downsampling layers and M upsampling layers, and M+1 feature tensors of different scales are respectively input To the SFT layer of the corresponding scale in the first SFT layer and the second SFT layer, to perform spatial feature transformation on the spatial features of different scales, and realize the specific processing of the spatial features of different scales.

For example, in Figure 1, the main network includes a convolutional layer, the first SFT layer (SFT layer1), a convolutional layer, a downsampling layer Down1, and two residual modules (Residual block), downsampling layer Down2, N (N is an integer greater than or equal to 1) residual modules, convolutional layer, upsampling layer Up1, two residual modules, upsampling layer Up2, SFT layer1, two convolutions Floor.

Among them, the output of Down1 is skipped connected to the output of Up2, the input of Down2 is skipped connected to the output of Up1, and the output of Down2 is skipped connected to the outputs of N residual modules. The residual module includes two sets of alternately arranged second SFT layers (ie, SFT layer2) and convolutional layers.

Feature tensor 1 is input to the first SFT layer SFT layer11 and SFT layer5, feature tensor 2 is input to SFT layer2 in the two residual modules between Down1 and Down2, and two residual modules between Up1 and Up2 SFT layer2 in difference module. Feature tensor 3 is input to SFT layer2 in N residual modules located between Down2 and Up1.

As shown in Figure 1, the SFT layer can include two sets of convolutional layers (in Figure 1, each set of convolutional layers includes two convolutional layers as an example), and the feature tensor output by the conditional network is processed by a set of convolutional layers Get the modulation parameter a. The modulation parameter a is multiplied by the output feature of the previous layer of the SFT layer to obtain the transformed feature. Then the feature tensor is processed by another set of convolutional layers to obtain the modulation parameter b. The modulation parameter b is added to the transform feature to obtain the output feature of the SFT layer.

That is to say, the modulation parameters (a, b) are learned from the feature tensor output by the conditional network through the SFT layer, and then adaptive affine transformation is performed on the spatial features of the corresponding scale based on the modulation parameters (a, b), so as to realize Specific processing of different spatial features to retain more effective spatial information.

It can be understood that, compared with the traditional U-net structure network, when using the image enhancement network provided by this application to perform image enhancement on the image to be processed, since more effective spatial information is retained, more details can be recovered texture, and effectively denoising and quantizing losses. Therefore, the image enhancement effect can be improved, and the image quality of the enhanced image can be improved.

Optionally, in order to further reduce the loss of original features in the input image by the traditional U-net network, the image enhancement model of the present application may also include a weight network. That is to further increase the weight network on the main network. The weight network is used to extract raw features from the image to be processed.

Exemplarily, as shown in FIG. 2 , the weight network includes multiple convolutional layers (four convolutional layers are taken as an example in FIG. 2 ), and the input of the multi-layer convolutional layer is skip-connected to the output. In this example, the enhanced image output by the image enhancement model is obtained by performing feature fusion on the output of the main network and the original features. Among them, feature fusion can be performed on the output of the main network and the original features by means of superposition.

By designing the weight network, the original features can be learned from the image to be processed without manual estimation, so that more and more accurate original features can be retained in the image enhancement task. Moreover, the weight network structure provided by this application is simple, easy to optimize, and can reduce the training difficulty of the image enhancement network under the condition that the original features are fully preserved.

It is worth noting that the network framework provided by this application is universal. It can be applied to any image enhancement tasks or tasks that use image enhancement effects as evaluation indicators. Image defogging, denoising, deraining, super-resolution, decompression artifacts, deblurring, HDR reconstruction and other image enhancement tasks.

It can be understood that for different image enhancement tasks, the initial model can be trained by designing corresponding training sets and loss functions, so as to obtain image enhancement models suitable for different image enhancement tasks.

For example, based on a training set composed of low-resolution image samples and corresponding high-resolution image samples, the image enhancement initial model can be trained to obtain an image enhancement model that can be applied to super-resolution image enhancement tasks. Based on the training set composed of blurred image samples and corresponding clear image samples, the initial image enhancement model is trained to obtain an image enhancement model that can be applied to the image enhancement task of deblurring.

It can be understood that the image enhancement model may be pre-trained by the image processing device, or the file corresponding to the image enhancement model may be transplanted to the image processing device after being pre-trained by other devices. That is to say, the execution subject for training the image enhancement model and the execution subject for performing the image enhancement task using the image enhancement model may be the same or different. For example, when other devices are used to train the image enhancement initial model, after the other devices complete the training of the image enhancement initial model, the model parameters are fixed to obtain the corresponding file of the image enhancement model. This file is then ported to an image processing device.

The following uses the HDR reconstruction task to illustrate the training process and effect of the image enhancement model provided by this application.

Exemplarily, an image enhancement initial model is constructed first. That is, after building the U-net initial network, design the corresponding conditional initial network and weight initial network based on the initial main network, and add the conditional initial network and weight initial network to the initial main network.

For the HDR reconstruction task, collect the corresponding training set. The training set includes a plurality of image sample pairs, and each image sample pair includes an LDR image sample and an HDR image sample corresponding to the LDR image sample. For example, image sample pairs may be collected by a mobile phone, a camera, or the like. It is also possible to use an open source algorithm to obtain corresponding LDR image samples based on the HDR image samples to be made public.

In one embodiment, because in the LDR image, the pixels in the generally highlighted area (ie, overexposed area) and the pixels in the non-highlighted area (ie, normally exposed area) have a large difference, therefore, it is easy to Causes the initial model to focus on areas with larger pixel values during training. That is to say, during the training process of the initial model, it may focus on restoring the brightness and texture details of the highlighted area, while ignoring the noise and quantization loss problems that may exist in the non-highlighted area.

Therefore, for the HDR reconstruction task, the present application provides a loss function Tanh_L1, which is used to describe the L1 loss between the value obtained by the Tanh function for the HDR prediction image and the value obtained by the Tanh function for the HDR image sample. That is, the Tanh function is added on the basis of the L1 loss function, where the HDR prediction image is the image obtained after the initial image enhancement model processes the LDR image sample.

The expression of Tanh_L1 is as follows:

Y represents the HDR prediction image obtained by processing the LDR image sample through the image enhancement initial model, and H represents the HDR image sample corresponding to the LDR image sample.

The Tanh function can perform non-linear compression on pixel values, so after being applied to the L1 loss function, it can balance the pixels in the highlighted area and the pixels in the non-highlighted area in the LDR image sample, and reduce the pixels due to the highlighted area and the non-highlighted area. The pixel difference in the region is too large, which will affect the training effect of the initial model.

Based on the training set and Tanh_L1, the gradient descent method can be used to iteratively train the image enhancement initial model. When the model converges (that is, the value of Tanh_L1 is not decreasing), the trained image enhancement model can be obtained.

As shown in Figure 3, when using the image enhancement model for HDR reconstruction. After the image processing device acquires the LDR image to be processed, it can input the LDR image to the image enhancement model, and the LDR image is respectively input to the weight network, the main network and the conditional network. The original features are extracted from the LDR image by the weight network; the feature tensors of different scales are extracted from the LDR image by the conditional network; the feature tensors of different scales are input to the main network, and the main network is used for spatial features of different scales of the LDR image When processing, the feature tensors of different scales are used to specifically process the spatial features of different scales to obtain the output features of the main network. The output feature is fused with the original feature to obtain an HDR image.

Exemplarily, the reconstruction effect of HDR reconstruction can be referred to as shown in FIG. 4 .

To sum up, when using the image enhancement model provided by this application to perform HDR reconstruction tasks, firstly, it is not necessary to collect LDR images with different exposures in the same scene. After the model training is completed, only a single LDR image is needed to restore the corresponding HDR image. . Secondly, compared with the previous HDR reconstruction method based on a single LDR image, this application adopts an end-to-end training method, which has a simple model, high training efficiency, and good model effect.

Finally, since the image enhancement model provided by this application enters the conditional network in the main network, the main network can perform specific processing on spatial features of different scales, and the Tanh_L1 loss function is used in the training process, so that the image enhancement provided by this application The model is able to simultaneously process denoising and dequantization losses on non-highlight regions while restoring brightness and texture details in highlight regions. That is to say, in the HDR reconstruction task, the image enhancement model provided by this application can jointly realize the HDR reconstruction task, denoising task, and dequantization loss task.

Therefore, the image enhancement model can be directly added to the camera's post-processing process to improve the camera's shooting quality from the perspective of software. Of course, the image enhancement model can also be used as an image/video post-enhancement means to enhance the image quality of the existing LDR data.

Based on the same inventive concept, as the implementation of the above method, the embodiment of the present application provides an image enhancement device, the embodiment of the device corresponds to the foregoing method embodiment, for the convenience of reading, the embodiment of the device no longer compares the foregoing method embodiment The details in the present invention will be described one by one, but it should be clear that the device in this embodiment can correspondingly implement all the content in the foregoing method embodiments.

FIG. 5 is a schematic structural diagram of an image enhancement device provided by an embodiment of the present application. As shown in FIG. 5 , the image enhancement device provided by this embodiment includes: an acquisition unit 501 and a processing unit 502 .

The obtaining unit 501 is configured to obtain an image to be processed.

The processing unit 502 is configured to input the image to be processed into a trained image enhancement model for processing, and output the enhanced image. The image enhancement model includes a main network and a conditional network, the main network is a U-net structure, and the When processing the image to be processed, a plurality of feature tensors of different scales are extracted from the image to be processed through the conditional network, and the image to be processed and the feature tensors of multiple different scales are respectively input Go to the network layer of the corresponding scale in the main network for processing to obtain the enhanced image.

Optionally, the main network includes M downsampling layers and M upsampling layers, the conditional network includes a shared convolution layer and M+1 feature extraction modules, and the M+1 feature extraction modules include Different numbers of downsampling operations; extracting multiple feature tensors of different scales from the image to be processed through the conditional network, including:

Extract intermediate features from the image to be processed through the shared convolutional layer; input the intermediate features to the M+1 feature extraction modules for processing, and obtain M+1 feature sheets of different scales quantity.

Optionally, the main network further includes a first SFT layer and a plurality of residual modules, and the residual module includes alternately arranged second SFT layers and convolutional layers; the first SFT layer is connected to the M The input side of the downsampling layer and the output side of the M upsampling layers, the plurality of residual modules are interspersed between the M downsampling layers and the M upsampling layers, M+1 The feature tensors of different scales are respectively input into SFT layers of corresponding scales in the first SFT layer and the second SFT layer.

Optionally, the image enhancement model also includes a weight network, and the weight network includes a skip connection and a multi-layer convolution layer; the enhanced image is obtained by merging the output of the main network with the original features , the original feature is extracted from the image to be processed through the weight network.

Optionally, the image enhancement model is obtained by using a preset loss function and a training set to train a preset image enhancement initial model; wherein, the training set includes a plurality of LDR image samples and each of the LDR image samples The HDR image sample corresponding to the sample, the preset loss function is used to describe the L1 loss between the value obtained by the Tanh function of the HDR predicted image and the value obtained by the Tanh function of the HDR image sample, the HDR predicted image is The image enhancement initial model obtains an image after processing the LDR image sample.

The image enhancement device provided in this embodiment can execute the above-mentioned method embodiment, and its implementation principle and technical effect are similar, and details are not repeated here.

Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Completion of modules means that the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above system, reference may be made to the corresponding process in the foregoing method embodiments, and details will not be repeated here.

Based on the same inventive concept, the embodiment of the present application also provides a terminal device. As shown in FIG. 6 , the terminal device 6 of this embodiment includes: a processor 60 , a memory 61 , and a computer program 62 stored in the memory 61 and operable on the processor 60 . When the processor 60 executes the computer program 62 , the steps in the above embodiments of the image enhancement method are implemented, for example, steps S101 to S104 shown in FIG. 1 . Alternatively, when the processor 60 executes the computer program 62, the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 401 to 403 shown in FIG. 4 , are realized.

Exemplarily, the computer program 62 can be divided into one or more modules/units, and the one or more modules/units are stored in the memory 61 and executed by the processor 60 to complete this application. The one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 62 in the terminal device 6 .

Those skilled in the art can understand that FIG. 6 is only an example of the terminal device 6, and does not constitute a limitation on the terminal device 6. It may include more or less components than those shown in the figure, or combine certain components, or different components. For example, the terminal device 6 may also include an input and output device, a network access device, a bus, and the like.

The processor 60 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

The storage 61 may be an internal storage unit of the terminal device 6 , such as a hard disk or memory of the terminal device 6 . The memory 61 can also be an external storage device of the terminal device 6, such as a plug-in hard disk equipped on the terminal device 6, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 61 may also include both an internal storage unit of the terminal device 6 and an external storage device. The memory 61 is used to store the computer program and other programs and data required by the terminal device 6 . The memory 61 can also be used to temporarily store data that has been output or will be output.

The terminal device provided in this embodiment can execute the foregoing method embodiment, and its implementation principle and technical effect are similar, and details are not repeated here.

The embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in the foregoing method embodiment is implemented.

The embodiment of the present application further provides a computer program product, which, when the computer program product runs on a terminal device, enables the terminal device to implement the method described in the foregoing method embodiments when executed.

If the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, all or part of the procedures in the methods of the above embodiments in the present application can be completed by instructing related hardware through computer programs, and the computer programs can be stored in a computer-readable storage medium. The computer program When executed by a processor, the steps in the above-mentioned various method embodiments can be realized. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form. The computer-readable storage medium may at least include: any entity or device capable of carrying computer program codes to a photographing device/terminal device, a recording medium, a computer memory, a read-only memory (Read-Only Memory, ROM), a random access Memory (Random Access Memory, RAM), electrical carrier signals, telecommunication signals, and software distribution media. Such as U disk, mobile hard disk, magnetic disk or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunication signals under legislation and patent practice.

In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts that are not detailed or recorded in a certain embodiment, refer to the relevant descriptions of other embodiments.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

In the embodiments provided in this application, it should be understood that the disclosed device/device and method can be implemented in other ways. For example, the device/device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or Components may be combined or integrated into another system, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and/or components, but does not exclude one or more other Presence or addition of features, wholes, steps, operations, elements, components and/or collections thereof.

It should also be understood that the term "and/or" used in the description of the present application and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations.

As used in this specification and the appended claims, the term "if" may be construed, depending on the context, as "when" or "once" or "in response to determining" or "in response to detecting ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be construed, depending on the context, to mean "once determined" or "in response to the determination" or "once detected [the described condition or event] ]” or “in response to detection of [described condition or event]”.

In addition, in the description of the specification and appended claims of the present application, the terms "first", "second", "third" and so on are only used to distinguish descriptions, and should not be understood as indicating or implying relative importance.

Reference to "one embodiment" or "some embodiments" or the like in the specification of the present application means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "including", "comprising", "having" and variations thereof mean "including but not limited to", unless specifically stated otherwise.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit it; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present application. scope.

Claims

An image enhancement method, characterized in that the method comprises:

Get the image to be processed;

The image to be processed is input into a trained image enhancement model for processing, and an enhanced image is output. The image enhancement model includes a main network and a conditional network. The main network is a U-net structure, and the image to be processed is processed When , multiple feature tensors of different scales are extracted from the image to be processed through the conditional network, and the image to be processed and the feature tensors of multiple different scales are respectively input into the main network The network layer of the corresponding scale is processed to obtain the enhanced image.
The method according to claim 1, wherein the main network includes M downsampling layers and M upsampling layers, and the conditional network includes a shared convolution layer and M+1 feature extraction modules, the M+1 feature extraction modules include different numbers of downsampling operations;

The extraction of multiple feature tensors of different scales from the image to be processed through the conditional network includes:

extracting intermediate features from the image to be processed through the shared convolutional layer;

The intermediate features are respectively input to the M+1 feature extraction modules for processing, and M+1 feature tensors of different scales are obtained.
The method according to claim 2, wherein the main network further includes a first SFT layer and a plurality of residual modules, and the residual module includes alternately arranged second SFT layers and convolutional layers; the The first SFT layer connects the input side of the M downsampling layers and the output side of the M upsampling layers, and the plurality of residual modules are interspersed between the M downsampling layers and the M upsampling layers Between the sampling layers, M+1 feature tensors of different scales are respectively input to the SFT layers of corresponding scales in the first SFT layer and the second SFT layer.
The method according to claim 1, wherein the image enhancement model also includes a weight network, and the weight network includes a skip connection and a multi-layer convolutional layer; the enhanced image is obtained by adding the output of the main network It is obtained after performing feature fusion with original features, and the original features are extracted from the image to be processed through the weight network.
The method according to any one of claims 1-4, wherein the image to be processed is an LDR image, and the enhanced image is an HDR image.
The method according to claim 5, wherein the image enhancement model is obtained after training a preset image enhancement initial model using a preset loss function and a training set;

Wherein, the training set includes a plurality of LDR image samples and HDR image samples corresponding to each of the LDR image samples, and the preset loss function is used to describe the difference between the value obtained by the HDR prediction image through the Tanh function and the HDR image sample after the The L1 loss between values obtained by the Tanh function, the HDR prediction image is an image obtained after the image enhancement initial model processes the LDR image sample.
An image enhancement device, characterized in that it comprises:

an acquisition unit, configured to acquire an image to be processed;

The processing unit is used to input the image to be processed into the trained image enhancement model for processing, and output the enhanced image. The image enhancement model includes a main network and a conditional network, and the main network is a U-net structure. For the image to be processed, a plurality of feature tensors of different scales are extracted from the image to be processed through the conditional network, and the image to be processed and the feature tensors of multiple different scales are respectively input into The network layer of the corresponding scale in the main network performs processing to obtain the enhanced image.
The image enhancement device according to claim 7, wherein the image enhancement model further comprises a weight network, and the weight network comprises a skip connection and a multi-layer convolutional layer; the enhanced image is obtained by combining the main network The output of is obtained after performing feature fusion with the original feature, and the original feature is extracted from the image to be processed through the weight network.
A terminal device, characterized in that it includes: a memory and a processor, the memory is used to store a computer program; the processor is used to execute the method described in any one of claims 1-6 when calling the computer program method.
A computer-readable storage medium on which a computer program is stored, wherein the computer program implements the method according to any one of claims 1-6 when executed by a processor.