CN116051428A

CN116051428A - Deep learning-based combined denoising and superdivision low-illumination image enhancement method

Info

Publication number: CN116051428A
Application number: CN202310332399.2A
Authority: CN
Inventors: 彭成磊; 洪宇宸; 苏鸿丽; 刘知豪; 潘红兵; 王宇宣
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-05-02
Anticipated expiration: 2043-03-31
Also published as: CN116051428B

Abstract

The invention discloses a deep learning-based combined denoising and superdivision low-light image enhancement method, and belongs to the field of computer vision. The method comprises the steps of organizing a trained enhancement network, a denoising network, a super-division network and a global linear brightness enhancement module into an overall network containing all processing flows according to a certain sequence, inputting a low-illumination image to be processed into the network, and sequentially carrying out low-illumination image enhancement, global linear brightness enhancement, denoising and super-resolution reconstruction to obtain a high-definition color image with enhanced brightness. The invention can not only ensure the color fidelity and the detail restoration degree, but also improve the signal-to-noise ratio and the definition while enhancing the image brightness. In addition, the method has certain flexibility, and whether the input image sample is processed by the global linear brightening module, the denoising network and the super-division network is determined according to different conditions, so that the quality evaluation index of the output image in the aspect of subjectivity and objectivity can be ensured to be higher.

Description

Deep learning-based combined denoising and superdivision low-illumination image enhancement method

Technical Field

The invention relates to a deep learning-based combined denoising and superdivision low-light image enhancement method, and belongs to the field of computer vision.

Background

The low-illumination image enhancement is a bottom task in the field of computer vision, and is to use computer technology to process the problems of low brightness, low contrast, noise, artifacts and the like existing in the image with insufficient illumination, so as to improve the visual quality, and maintain the texture, structure and color information of the object in the original image so as to meet the needs of people in research and use in various fields. When the light of the shot scene is insufficient, the overall brightness of the shot image is too low, useful information is difficult to obtain from the shot image by human eyes, and noise and information loss are easy to occur during low-light imaging, so that the imaging quality of the image is low. The imaging quality is improved by artificial light supplement or prolonged exposure time, but the two methods are not applicable to all application scenes; in addition, improving the hardware performance of the photographing apparatus also contributes to improving the imaging quality to some extent, however, the cost is high. Therefore, in practical application, the enhancement of the low-illumination image quality by the enhancement algorithm is of great importance.

The existing low-light image enhancement method mainly comprises a method based on the Retinex theory, a method based on histogram equalization, a method based on a defogging model and a method based on deep learning. The method based on the Retinex theory considers that the image consists of illumination and object reflection, wherein the object reflection does not change along with illumination, and the influence of image degradation caused by different illumination conditions can be removed by estimating illumination conditions of the low-illumination image and removing the low-illumination image. The histogram equalization-based method roughly considers that the histogram of the image under normal illumination is uniformly distributed, and the numerical value of each pixel point in the image is redistributed by utilizing the statistical characteristic of the image to obtain an enhancement result. The method based on the defogging model has the thought that the low-light image enhancement task is converted into the defogging task, and the specific operation is that the low-light image is reversed, then the defogging method is used for processing, and finally the reversed phase is carried out again to obtain the enhancement result. Most of deep learning-based methods adopt convolutional neural networks with structures similar to U-Net, and most of the deep learning-based methods use paired low-light images and images under normal light to train in a supervised learning mode, so that the network learns the mapping relation from the low-light images to the normal light images.

The existing low-light image enhancement methods based on deep learning have generally achieved better enhancement effects than the traditional methods, but usually focus on improving the overall brightness of the image, so that overexposure of local areas is easy to cause, and many methods do not consider the problem of noise suppression, so that the noise of the enhanced image is obvious. In addition, the environment with extremely weak light

) Under the condition that image noise is complex in distribution, serious in color degradation, poor in enhancement effect, small in brightness, and easy to have the problems of fuzzy details, dark area artifacts, color deviation and high noise level in the existing algorithm. Although some low-light image enhancement methods take noise into consideration in network structural design, other methods directly use synthesized or real noisy pictures to perform supervised training so as to enable the network to acquire the denoising capability implicitly, the generalization of denoising on a real test set is not obvious; in addition, a method of combining denoising, deblurring and super-resolution reconstruction is reported, but the result of directly processing a low-light image by the method is poor, and the output result is still dark and is not suitable for processing the low-light image. Therefore, a method is needed to take into account the low-light image enhancement, denoising and super-resolution reconstruction, so that the overall and comprehensive image quality of the low-light image can be enhanced. / >

Disclosure of Invention

Based on the above background, it is necessary to provide an image enhancement method for synchronously completing low-light image enhancement, denoising and super-resolution reconstruction based on deep learning, which can reduce phenomena such as detail blurring, local overexposure, dark area artifact, color offset and the like while improving the global brightness of an image, inhibit noise and improve the definition of the image.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a deep learning-based combined denoising and superdivision low-light image enhancement method is composed of three sub-networks (enhancement network LLENT, denoising network DeNet and superdivision network SRNet) and a global linear brightness enhancement module GLBM, and comprises the following steps:

s1: inputting the low-light image sample into an enhancement network LLENet (Low Light Enhancement Network), wherein the network is a convolutional neural network with a Laplacian pyramid structure in whole and inside, and the low-light image enhancement work is completed by the convolutional neural network;

s2: judging whether the average brightness of the enhanced image is higher than a certain threshold L1, if yes, neglecting the step, otherwise, judging whether the maximum brightness value of the enhanced image is smaller than a certain threshold, if yes, carrying out a global linear brightening operation on the image by using a global linear brightening module GLBM (Global Linear Brightening Module), otherwise, skipping the step;

S3: judging whether the noise level estimated value of the image is higher than a certain threshold sigma and the average brightness is higher than a certain threshold L2, if so, inputting the image into a denoising network DeNet (Denoising Network), wherein the network adopts a convolutional neural network with a structure similar to U-Net, and the denoising work is completed by the convolutional neural network; otherwise, skipping the step;

s4: and judging whether the resolution of the image is smaller than a certain threshold value R, if so, inputting the image into a super-resolution network SRNet (Super Resolution Network), wherein the network adopts a Swin Transformer, and the super-resolution reconstruction work is completed by the Swin Transformer. And finally obtaining the high-definition color image subjected to low-illumination image enhancement, global linear brightness, denoising and super-resolution reconstruction.

And sequentially carrying out the 4 steps to finally obtain the high-definition color image subjected to low-light image enhancement, global linear brightness, denoising and super-resolution reconstruction. Note that the order of processing of the above steps is determined by tuning during the experiment, and the detailed description is discussed in the detailed description. Preferably, the threshold parameters (L1, L2, σ, R) may also be determined by tuning during the experiment.

In one embodiment, the method, the enhancement network LLENet is a convolutional neural network having a laplacian pyramid structure in its entirety and in its interior; the denoising network DeNet adopts a convolutional neural network with a structure similar to U-Net; the super-division network SRNet adopts a Swin Transformer.

In one embodiment, in the step S1, the enhancement network LLENet is a convolutional neural network having a laplacian pyramid structure in its entirety and inside, and three branches are totally provided, which correspond to three levels of the laplacian pyramid structure of the enhancement network LLENet in its entirety, respectively, wherein two branches (denoted as branch 1, branch 2) are composed of a convolutional layer, a transpose convolutional layer, a jump connection structure, a laplacian module having a multi-scale residual structure, and a unet++ module, the parameter settings are completely consistent, the step sizes of the convolution and the transpose convolution are all set to 2, and in the other branch (denoted as branch 3), the transpose convolutional layer is replaced with a normal convolutional layer, and the step size of the convolution is set to 1. An input image needs to be decomposed by a Laplacian pyramid to obtain three images with different sizes, and then the three images are respectively input into the three branches, specifically, the sizes of the three images are sequentially halved, an input branch 1 with the largest size (the size is kept the same as that of the original input image), an input branch 2 with the medium size and an input branch 3 with the smallest size.

In one embodiment, the laplacian pyramid structure in the enhancement network LLENet is derived from a laplacian module with a multi-scale residual structure, the module is also internally provided with N branches, an input feature map can be respectively input into the N branches after being decomposed by the laplacian pyramid, the constituent units and parameter settings of the N branches are completely consistent, and the constituent units are sequentially connected from input to output in a convolution layer, an instance normalization, a linear rectification function, a convolution layer, an instance normalization and a jump. Specifically, the number of branches inside the laplace modules of the enhanced network LLENet branch 1 and branch 2 is 2, and the number of branches inside the laplace module of branch 3 is 3.

In one embodiment, the unet++ module adopts a unet++ network structure, and the operations only include convolution, upsampling, downsampling and jump connection, and this module is located in the middle of the enhancement network LLENet, where the front is connected to the encoder part of the enhancement network LLENet branch and the rear is connected to the decoder, and it overall presents a U-shaped feature pyramid structure with dense connections between the convolution layers, and serves as a powerful feature extractor. The number of sets of unet++ modules represents that the downsampling operation is performed at most several times inside the modules, and the number of sets of unet++ modules in different branches of the enhanced network LLENet is different, specifically, the number of sets of unet++ modules in branch 1 and branch 2 is 2, and the number of sets of unet++ modules in branch 3 is 3.

In one embodiment, the loss functions of the enhanced network LLENet include data loss, laplace loss, color loss, structure loss, area loss, and perceptual loss, and the training set employs a published real data set SICE (Part I) and an emulated data set MIT-Adobe FiveK.

In one embodiment, in the step S2, the average brightness of the enhanced image is calculated first

Wherein->

Representing the image after low illumination enhancement, H, W, C respectively representing the height, width and channel number of the image, if +. >

Less than the set threshold +.>

Further judge->

Whether or not is less than 255, wherein->

The average brightness value which we want the image to reach is the global linear brightness operation by using the global linear brightness module GLBM if the average brightness value is the average brightness value, and the operation is expressed as:

This step further improves the brightness of the image on the basis of enhancing the network LLENT, resulting in a brighter image +.>

。

In one embodiment, in step S3, the noise level estimation uses the algorithm proposed by David L Donoho et al in 1994, which is based on the wavelet estimation of the standard deviation of gaussian noise. Only when the noise level estimate of the image is greater than the threshold

And the average brightness value is greater than the threshold value +.>

The reason why we denoise an image is that denoising usually blurs details and makes the image smoother as a whole, if the noise level of the image itself is low, denoising will cause loss of detail information, and if the average brightness of the image is too small, denoising will usually cause serious detail loss. We need to trade-off the contradiction between low noise level, high signal-to-noise ratio and detail reduction, so we need to choose whether to denoise the image or not according to the situation.

In one embodiment, in the step S3, the denoising network dent is a convolutional neural network having a 4-stage U-Net structure as a whole, each stage of the U-Net is composed of denoising basic units repeated N times, and the denoising basic units are composed of layer normalization, 1*1 convolution layers, 3*3 depth separable convolution, simple gate units and simplified channel attention mechanisms and jump connections, wherein the simple gate units are equivalent to a simplified version of GELU activation function, which divides an input feature map into two parts by channels, and then multiplies the two parts by elements to obtain an output; the simplified channel attention mechanism removes the original 1 convolution layers and two activation function layers, performs the global average pooling operation on the input feature map, performs 1*1 convolution operation only once, and then multiplies the input feature map with the original input feature map according to the channel to obtain output. The training set uses SIDD and the loss function is peak signal to noise ratio loss.

In one embodiment, in the step S4, when the resolution of the image is smaller, we want to increase the resolution of the image by reconstructing the network SRNet based on the super resolution of the Swin Transformer, specifically, note that the larger of the width and height of the image is S (unit: pixel), if

4 times up-sampling super-resolution reconstruction is carried out on the image; if->

Then, super-resolution reconstruction of 3 times up-sampling is carried out on the image; if->

Then, 2 times up-sampling super-resolution reconstruction is carried out on the image; if->

No operation is performed. The super-division network SRNet adopts a Swin Transformer, and the network structure consists of three parts, namely shallow feature extraction, depth feature extraction and high-resolution image reconstruction, wherein the shallow feature extraction part consists of two convolution layers with ReLU activation functions and a convolution module similar to an acceptance module; the depth feature extraction part consists of K depth feature extraction modules, a convolution layer and a jump connection, wherein the depth feature extraction modules consist of N Swin transform modules and a convolution layer, and the Swin transform modules sequentially consist of layer normalization, window-multi-head attention mechanismThe device comprises a layer normalization module, a full-connection layer module, a jump connection module, a layer normalization module, a sliding window-multi-head attention mechanism module, a jump connection module and a jump connection module, wherein the layer normalization module, the full-connection layer module and the jump connection module are connected in series; the high resolution image reconstruction portion is a sub-pixel convolution layer from which the pixel reconstruction operation is performed. The loss function of the super-division network SRNet adopts +. >

The training set includes DIV2K, flickr2K, OST, WED, FFHQ, manga109 and SCUT-CTW1500.

In one embodiment, the enhancement network LLENet, the denoising network DeNet and the super-division network SRNet are respectively trained, and after the training is completed, the three sub-networks and the global linear brightness module GLBM are organized into an overall network for end-to-end low-light image enhancement.

The invention also provides a low-light image enhancement system based on deep learning and combining denoising and superdivision, which is used for executing the low-light image enhancement method of the invention, and comprises the following steps:

an enhancement network LLENet for enhancing the brightness of the low-light image;

the global linear brightness enhancement module GLBM further enhances the global brightness of the image on the basis of the enhanced sub-network;

the denoising network DeNet is used for reducing image noise and improving signal to noise ratio;

and the super-division network SRNet is used for improving the image definition.

The low-light image enhancement system further comprises: the average brightness analysis module is used for analyzing the brightness of the image; the noise level analysis module is used for analyzing the noise level of the image; and the resolution analysis module is used for analyzing the resolution of the image.

The invention also provides an electronic device comprising a memory, a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the low-light image enhancement method of the invention.

The invention also provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the low-light image enhancement method of the invention.

The invention has the following beneficial effects:

(1) The low-light image enhancement network LLENet is a convolutional neural network adopting a laplacian pyramid structure in both an image domain and a feature domain, wherein the image domain refers to an image set obtained by preprocessing an image before being input into the network, and the feature domain refers to a feature map set in the network. Before inputting a low-light image sample into a network, carrying out Laplacian pyramid decomposition to obtain three images of original size (Laplacian residual), 1/2 original size (Laplacian residual) and 1/4 original size (1/4 downsampling result), and respectively inputting the three images into a fine granularity branch 1, a sub-fine granularity branch 2 and a coarse granularity branch 3, wherein the three branches represent the integral pyramid structure of the network; in the feature domain, the laplace module with the multi-scale residual structure completes the work of decomposition and reconstruction of the laplace pyramid inside the laplace module. This method of decomposing the image into different sizes for processing can produce a multi-scale feature representation, which can integrate the advantages of different scales together, e.g. fine-grained (high resolution) branching is more focused on location, detail information, and coarse-grained (low resolution) branching is more focused on semantic information. Overall, the benefits of using the laplacian pyramid structure both throughout and within the network are: the Laplace residual error on the finer granularity pyramid level guides the encoder-decoder architecture to accurately restore local details, and the coarsest granularity Laplace pyramid level forces the related network to adjust global illumination, so that the enhancement result has more natural color distribution and higher contrast; in addition, the benefits of using a jump connection in both the network as a whole and in the internal laplacian pyramid structure are: the method is beneficial to enhancing the information flow in the training stage and avoiding the problem of gradient disappearance, thereby realizing stable convergence and achieving a smaller loss function value.

(2) The global linear brightness enhancement module GLBM can further enhance the brightness of the image on the basis of enhancing the network LLENT, so that the image can be seen more clearly. The low-light image enhancement method (LLENT+GLBM) not only shows prominence on objective image evaluation indexes such as peak signal-to-noise ratio (PSNR), structural Similarity Index (SSIM), brightness sequence error (LOE), learning perception image block similarity (LPIPS) and the like, but also has relatively good subjective feeling to people.

(3) The denoising basic unit in the denoising network DeNet adopts simple operations with relatively low parameter and calculation amount, such as 1*1 convolution, 3*3 deep separable convolution, simple gate units and simplified channel attention mechanism, and the denoising network DeNet is U-Net formed by stacking the simple basic units, so that excellent effects can be obtained, and the design of the U-Net structure and the denoising basic unit adopted by the denoising network DeNet is reasonable. The denoising network DeNet in the invention is superior to most methods in parameter quantity, calculation quantity and denoising performance on real images.

(4) Although the super-division network SRNet adopts a Swin transducer, the super-division network SRNet still basically belongs to the category of a convolutional neural network, and the network structure of the super-division network SRNet consists of three parts of shallow feature extraction, depth feature extraction and high-resolution image reconstruction, wherein the depth feature extraction part consists of a Swin transducer module with jump connection, and a self-attention mechanism in the Swin transducer module is very good at capturing global information interaction between contexts. The super-division network SRNet integrates the advantages of the convolutional neural network and the Swin Transformer, and on one hand, the super-division network SRNet has the advantage of processing large-size images due to the local attention mechanism of the convolutional neural network; on the other hand, since a shift window mechanism is adopted in the Swin transducer module, the method has the advantage of modeling the length Cheng Yilai. In addition, the super-division network SRNet achieves very excellent performance under the condition of small parameter quantity.

(5) The invention combines low-illumination image enhancement with denoising and super-resolution reconstruction, can better solve the problems of detail blurring, local overexposure, dark area artifact, color offset and the like while enhancing the brightness in the enhancement process of the low-illumination image, improves the color fidelity and detail reduction degree as much as possible on the premise of reducing noise, can improve the image resolution, and finally obtains a high-definition color image with high contrast, high definition, high signal-to-noise ratio, vivid color and rich detail. This multi-angle enhancement of image quality helps to improve the processing effect of high-level visual tasks.

Drawings

FIG. 1 is a schematic flow diagram of the combined denoising and superdivision low-light image enhancement method of the present invention, and also a schematic structure diagram of organizing each sub-module (enhancement network LLENT, global linear brightness module GLBM, denoising network DeNet and superdivision network SRNet) into a whole network;

FIG. 2 is a schematic diagram of the architecture of a low-light image enhancement network LLENT;

FIG. 3 is a schematic diagram of the structure of a Laplace module with a multi-scale residual structure in a low-light image enhancement network LLENT;

FIG. 4 is a schematic diagram of the structure of the UNet++ module in the low-light image enhancement network LLENT, with the broken lines representing concatenation in the channel dimension;

Fig. 5 is a schematic structural diagram of a denoising network deet and its denoising basic unit;

fig. 6 is a schematic structural diagram of a super-resolution reconstruction network SRNet and its depth feature extraction module;

FIG. 7 is a low-light image selected from the LOL dataset in one embodiment of the invention;

FIG. 8 is the result of processing FIG. 7 with the low light image enhancement network LLENT of the present invention;

FIG. 9 is a result of the processing of FIG. 8 with the global linear luminance module GLBM of the present invention;

FIG. 10 is the result of the processing of FIG. 9 with the denoising network DeNet of the present invention;

fig. 11 is a result of processing fig. 10 by the super-resolution reconstruction network SRNet of the present invention, and the original resolution is 600× 400,4 times the super-resolution reconstruction, and the resolution becomes 2400×1600.

Detailed Description

The following describes the scheme of the invention in detail with reference to the accompanying drawings.

Example 1

As shown in fig. 1, a method for enhancing a low-light image by combining denoising and superdivision based on deep learning includes the following steps:

s1: the low-light image sample is input into the enhancement network LLENet, from which the low-light image enhancement work is done. The workflow and training of the enhanced network LLENet is described in detail below.

The input image is decomposed by Laplacian pyramid at the input end of the network, and the original image is recorded as

Sampling 1/2 down twice to obtain +.>

The three-level laplacian pyramid decomposition can be represented by the following formula:

where the subscript k represents the level of the laplacian pyramid,

and->

Respectively representing 1/2 up-sampling and 1/2 down-sampling operations using bilinear interpolation algorithm,

And->

Representing a Laplace residual map, and +.>

1/4 downsampling diagram directly taken as original diagram +.>

。

、

And->

Respectively inputting the processing results of the

branches

1, 2 and 3 into the network as follows:

wherein the method comprises the steps of

Representing branch k. The process of laplacian pyramid reconstruction at the network output can be represented by the following formula:

the final processing result of the enhanced network LLENet.

The composition of branch 1 and branch 2 is identical to the parameter setting, and branch 2 is described in detail as an example.

Input branch 2, passing through the convolution layer with step size 2, is halved in size, and enters the laplace module with multi-scale residual structure with level 2, corresponding to the multi-scale-residual laplace module (L in fig. 32) The process of laplacian pyramid decomposition and reconstruction is also needed inside the module, and the level 2 indicates that the input feature map is downsampled only once. The Laplacian pyramid decomposition and reconstruction of a feature domain is completely consistent with the image domain, where the decomposition operation can be represented by the following formula:

Wherein the method comprises the steps of

Characteristic map representing input to a multi-scale-residual laplace module, < >>

Representation->

Is->

Downsampling result,/->

And->

Respectively 1/2 upsampling and 1/2 downsampling operations using bilinear interpolation algorithms. The composition of each branch inside the multi-scale-residual Laplace module is completely consistent with the parameter setting, note +.>

The processing done for each branch (convolutional layer, instance normalization, linear rectification function, convolutional layer, and instance normalization in order), plus a jump connection, the operation done for each branch can be expressed by the following formula:

to this end, the laplacian pyramid reconstruction operation can be expressed as:

the output feature map representing the multi-scale-residual laplace module is then input to the unet++ module with the level 2 in the middle position through a convolution layer with the step length of 2 and a laplace module with the multi-scale residual structure with the level 2, as shown by unet++ -L2 in fig. 4, the structure adds more up-sampling nodes to the middle part of the U-shaped backbone on the basis of U-Net, more dense jump connection is introduced, and the level 2 represents that the feature map is subjected to 2 down-sampling operations at most. The reason for using the unet++ module here is: the network with the U-Net structure has excellent image feature learning and reconstruction capability, is widely applied to the image enhancement network, and compared with the U-Net network, the network with the UNet++ network structure has more compact connection among various layers, is more beneficial to the omnibearing learning of the feature information of the image and reduces the detail loss of the image. After the feature map is input to the unet++ -L2 module, the feature map passes through a first convolution layer (0, 0), transversely follows a next convolution layer (0, 1), and leads out a branch jump to be connected to an output node (0, 2); in the longitudinal direction, the data enter a convolution layer (1, 0) of the next level through 1/2 downsampling, then enter a convolution layer (2, 0) of the second level through 1/2 downsampling, the convolution results of (2, 0) are subjected to 2-time upsampling, and then are subjected to channel dimension splicing with the convolution results of (1, 0), and an obtained splicing characteristic diagram passes through one convolution layer (1, 1); in addition, the convolution result of (1, 0) is up-sampled by 2 times and then is also carried out with the convolution result of (0, 0) And (3) channel splicing, wherein the spliced characteristic diagram is subjected to channel splicing by 2 times of up-sampling results of (1, 1) and jump connections led out by (0, 0) after passing through the convolution layers (0, 1), and finally, the output characteristic diagram of the UNet++ -L2 module is obtained after passing through the convolution layers (0, 2).

The output feature map of the unet++ module sequentially passes through a multi-scale-residual laplace module, a transposed convolution layer with the step length of 2 (the size is increased by two times), jump connection of element-by-element addition before the second convolution layer, the multi-scale-residual laplace module, the transposed convolution layer with the step length of 2 (the size is increased by two times), and the feature map from the branch input position

After the jump connection of the element-wise addition, the processing result of branch 2 is finally obtained +.>

. As is evident from a combination of fig. 2, the branches are centred on the unet++ module and overall exhibit a symmetrical structure of encoder-decoder.

From the above description, it can be seen that up to 5 downsampling operations in one branch are performed, and channel stitching requires that the two feature map sizes be identical, which limits the input image size of the network to be an integer multiple of 32, and only requires adding a preprocessing operation that scales the width and height to an integer multiple of 32 before the laplace pyramid decomposition, since 32 is relatively small for most image sizes, the scaling operation in preprocessing has less effect on the image shape.

The difference in network structure between branch 3 and

branches

1 and 2 is that branch 3 employs a level 3 multi-scale-residual laplace module and unet++ module (see fig. 3 and 4, respectively), the convolution layer step size is set to 1, and the two transposed convolution layers are replaced by a convolution layer with step size 1.

The enhancement network LLENet adopts a training mode of supervised learning, a training set (a noisy real data set SICE (Part I) and a clean simulation data set MIT-Adobe FiveK) are paired low-light images and normal exposure images, and a loss function comprises 6 parts and is expressed by the following formula:

wherein,,

the weight value representing each item accounting for the total loss is an adjustable parameter;

Is data loss, ++>

Represents the level of the Laplacian pyramid, +.>

Is->

Enhancement after branch reconstruction, < >>

Is the true value of the paired normal exposure images +.>

Downsampling result,/->

And->

Representing height and width, respectively, this L2 loss directs the enhancement result to approach the true value as much as possible from the pixel average;

Is the loss of the laplace (r),

laplacian residual plot representing the reduction of

branches

1 and 2,>

representing a Laplacian residual plot calculated from the pairing truth values from the same Laplacian pyramid decomposition scheme, such +. >

Loss ratio->

Loss of local sharpness that better maintains the enhancement results;

Is the color loss, the dot product symbol in the formula represents the vector inner product of three channels of pixel points RGB,/and->

Representing cosine similarity between the predicted color value and the true color value, the term loss helps to improve color fidelity of the enhanced result;

Is a structural loss, also called SSIM loss, < >>

And->

Representing the mean of the input image and the mean of the output image, +.>

Representing covariance between input image and output image, < >>

And->

Representing the variance of the input image and the output image, +.>

And->

Is a constant, the loss function characterizes the structural similarity error of the output image and the real image;

the area loss is that firstly, the pixel points of the image are arranged according to the brightness value, the first 40% darkest pixel is selected as the low brightness area, thus separating the low brightness area from other parts of the image, then, the weight distribution is carried out according to the enhancement amplitude required by different brightness areas, the weight distribution is carried out>

And->

Weak light area of enhanced image and reference image, respectively,/->

And->

The rest of the enhancement image and the reference image, respectively, weight +.>

And->

Respectively taking 4 and 1;

is the perceptual loss, also called LPIPS loss, E and G represent enhancement and reference pictures, respectively,/-, respectively >

A feature map representing the output of the jth convolutional layer in the ith convolutional module of the VGG-19 network,/for the jth convolutional layer>

Representation ofThe three dimensions of the feature map, the reason for adopting the perceptual loss, is: in addition to the low-level information of the image, we also need to use the high-level information of the image to improve the visual quality, the basic idea is to use a pre-trained network model as a content extractor to process the enhanced image and the real image respectively, and construct a loss function by utilizing the difference between the enhanced image and the real image, and the function value of the loss function represents the consistency of the high-dimensional characteristics between the two images.

S2: determining whether to perform global linear brightness operation by using global linear brightness module GLBM according to the average brightness and maximum brightness value of the enhanced image, specifically, firstly calculating the average brightness of the enhanced image

Wherein->

Representing the image after low illumination enhancement, H, W, C respectively representing the height, width and channel number of the image, if +.>

If the total weight of the total weight is less than 0.2, further judging +.>

Whether or not is less than 255, wherein->

Is the average brightness value which we hope the image to reach, if so, the operation of global linear brightness enhancement is performed:

This step further improves the brightness of the image on the basis of enhancing the network LLENT, resulting in a brighter image +. >

。

S3: and judging whether the noise level estimated value of the enhanced and lightened image is higher than a certain threshold value and whether the average brightness is higher than a certain threshold value, and if so, denoising the enhanced and lightened image by using a denoising network DeNet. The specific workflow and training mode of the denoising network DeNet are described in detail below.

The standard deviation of the gaussian noise of the image was first estimated using the wavelet-based method proposed by David L Donoho et al in 1994, and the image was de-noised only when this value was greater than 0.04 and the normalized average luminance value of the image was greater than 0.4.

As shown in FIG. 5, the denoising network DeNet is a convolutional neural network having a 4-stage U-Net structure as a whole, each of the stages k of U-Net being composed of repetition

The secondary denoising basic unit consists of a layer normalization layer, a 1*1 convolution layer, 3*3 depth separable convolution, a simple gate unit, a simplified channel attention mechanism and jump connection, wherein the simple gate unit is equivalent to a simplified version of GELU activation function, divides an input feature map into two parts according to a channel, and multiplies the two parts according to elements to obtain output; the simplified channel attention mechanism removes the original 1 convolution layers and two activation function layers, performs the global average pooling operation on the input feature map, performs 1*1 convolution operation only once, and then multiplies the input feature map with the original input feature map according to the channel to obtain output. The training set of the denoising network DeNet adopts SIDD, and the loss function is simply taken as PSNR loss between the denoising image and the reference image:

Wherein the method comprises the steps of

Representing the result of the noisy image X after processing by the denoising network dent, Y represents a clean reference image.

S4: when the resolution of the image is smaller, super-resolution reconstruction is carried out on the image by using a super-resolution network SRNet, specifically, the larger size of the image in the width and height is recorded as S, if

Then to the image4 times up-sampling super-resolution reconstruction is carried out; if->

No operation is performed. The method has the advantages that the method adopts a higher up-sampling rate for the lower resolution image and a lower up-sampling rate for the higher resolution image, so that the processing time is less, the occupied video memory is lower, and meanwhile, the clear image quality of the final image can be ensured.

As shown in fig. 6, the super-division network SRNet adopts a Swin Transformer structure, and the network structure is composed of three parts of shallow feature extraction, depth feature extraction and high-resolution image reconstruction: (1) The shallow feature extraction part consists of two convolution layers with ReLU activation functions and a convolution module similar to an acceptance module, wherein convolution kernels with different sizes adopted by the convolution layers mean receptive fields with different sizes, and final splicing means fusion of features with different scales; (2) The depth feature extraction part consists of K depth feature extraction modules, a convolution layer and a jump connection, wherein the depth feature extraction modules consist of N Swin transform modules and a convolution layer, and the Swin transform modules are formed by layer normalization, window-multi-head attention mechanism and jump connection in sequence, layer normalization, full connection layer, jump connection, layer normalization, sliding window-multi-head attention mechanism and jump connection, and the layer normalization, full connection layer and jump connection are connected in series; (3) The high resolution image reconstruction portion is a sub-pixel convolution layer from which the pixel reconstruction operation is performed.

The training set of the super-network SRNet includes DIV2K, flickr2K, OST, WED, FFHQ, manga109 and SCUT-CTW1500, whose loss functions employ a combination of L1 pixel loss and perceived loss, which can be expressed by the following formula:

wherein the method comprises the steps of

Representing the super-resolution reconstruction result of the super-division network SRNet on the low-quality image, < ->

Representing a high resolution reference picture,/for example>

Output characteristics map representing the jth convolutional layer of ResNet-50, +.>

Representing three dimensions of the feature map.

In summary, as shown in fig. 1, the method provided in this embodiment combines low-light image enhancement with denoising and super-resolution reconstruction, and the whole process flow is composed of 4 parts including an enhancement network LLENet, a global linear brightness enhancement module GLBM, a denoising network DeNet and a super-division network SRNet, where the enhancement network LLENet adopts a laplace pyramid structure and a unet++ module, the denoising network DeNet adopts a simple gate unit, a simplified channel attention mechanism and a U-Net architecture, and the Swin transform adopted by the super-division network SRNet is a relatively core design in this embodiment, so that the method provided in this embodiment can also increase the signal-to-noise ratio of the low-light image under extremely low illuminance, optimize the brightness distribution of the output image, and present a high-definition color image.

In this embodiment, a combination experiment of four modules of low-light image enhancement, global linear brightness, denoising and super-resolution reconstruction is performed, and experimental results are shown in fig. 7-11.

In addition, in order to explore the sequence of the four modules and the influence of certain threshold parameters on the image processing effect, and simultaneously in order to test the actual performance and generalization capability of the method, 3 test sets with high processing difficulty are selected and manufactured in the embodiment, and the following comparison experiment is carried out.

Comparative experiment 1: the real data set LOL containing 500 matched image pairs is taken as a test set, the influence of the execution sequence of the enhancement network and the global linear brightness enhancement module on the image processing effect is verified, specifically, the quality of the image processing effect is measured by an average index value obtained on the test set by the method, and the experimental result is shown in table 1. In table 1, "≡" indicates that the larger the index value is, the better the index value is, in contrast to "" indicates that the smaller the index value is, the LLENT-GLBM indicates that the execution order is that the low light image is enhanced first and then the global linear luminance is performed, and the GLBM-LLENT indicates that the execution order is that the low light image is enhanced first and then the global linear luminance is performed. From the data in the table, it can be seen that the processing results of the former execution order are superior to the latter in all four indexes of PSNR, SSIM, LOE, LPIPS. It should be noted that only those pictures whose overall brightness is below the threshold L1 will be processed by the global linear lighting module.

Table 1 comparison of results of first enhancement then brightening with first brightening then enhancement

Comparative experiment 2: the self-made synthetic data set containing 100 matched image pairs is taken as a test set, the influence of the execution sequence of an enhancement network, a global linear brightening module and a denoising network on the image processing effect is verified, the experimental result is shown in a table 2, wherein LLENT-GLBM-DeNet represents that the execution sequence is the enhancement network, the global linear brightening module and the denoising network, deNet-LLENT-GLBM represents that the execution sequence is the denoising network, the enhancement network and the global linear brightening module, and the processing effect of the enhancement network, the global linear brightening module and the denoising network is obviously better than that of the global linear brightening module from the data in the table. It should be noted that the test set adopted in the experiment has complex image scene and strong noise, and PSNR and SSIM reach 17.987741 and 0.533924 respectively, which are excellent indexes.

Table 2 comparison of results of enhancement-brightness-denoising-enhancement-brightness

Comparative experiment 3: the test set is consistent with the comparison experiment 2, and the threshold value parameters L2 and

the influence of the value of (2) on the image processing effect is shown in table 3. From the data in the table it can be seen that l2=0.4,/and +>

The image processing effect of the parameter combination is best. It should be noted that L2 and +. >

The value of (2) cannot be obtained too much, otherwise, the step of denoising basically does not exist, so that the denoising network loses the meaning of existence, and the experiment does not have the steps of L2 and +.>

Testing was performed.

TABLE 3 threshold parameters L2 and

comparison of results at different values

Comparative experiment 4: 50 matched image pairs are randomly extracted from a real data set SICE (Part II) and subjected to proper preprocessing, the effect of the super-division network on the image processing effect when the super-division network is located at different positions in a processing flow is verified, the experimental result is shown in a table 4, wherein LLENT-GLBM-DeNet-SRNet represents that the execution sequence is an enhancement network, a global linear brightening module, a denoising network and the super-division network, LLENT-GLBM-SRNet-DeNet represents that the execution sequence is the enhancement network, the global linear brightening module, the super-division network and the denoising network, SRNet-LLENT-GLBM-DeNet represents that the execution sequence is the super-division network, the enhancement network, the global linear brightening module and the denoising network, and the image processing effect of the method is the best when the execution sequence is LLENT-GLBM-DeNet-SRNet can be seen from the data in the table. Note that in this experiment, the up-sampling rate of the supernetwork was set to 3.

Table 4 comparison of results for superdivision networks located at different locations in a process flow

It should be noted that the processing difficulty of the three test sets adopted in the comparison experiments 1-4 increases in sequence, so that the occurrence of the decrease in PSNR and SSIM indexes is a normal phenomenon, and it does not mean that the test index obtained by combining the 4 modules together is significantly decreased compared with the test index obtained by singly adopting one of the methods, and therefore, the following comparison experiment needs to be supplemented.

Comparative experiment 5: the test set is consistent with the comparison experiment 4, and the influence on the image processing effect when a certain step in the processing flow is omitted is verified, the experimental result is shown in the table 5, the PSNR and SSIM indexes of the four operation combinations are similar, the best PSNR value is obtained in the whole flow, but the SSIM value is slightly worse, but the processing result of the whole flow is best in terms of human subjective feeling, and the contrast, the definition, the signal-to-noise ratio, the color vividness and the detail richness are all the best.

Table 5 comparison of results of ablation experiments omitting a step in the process flow

Embodiment 2 an electronic device with Low-light image enhancement

An electronic device comprising a memory and a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the low-light image enhancement method of embodiment 1.

The processor may also be referred to as a CPU (Central Processing Unit ). The processor may be an integrated circuit chip having signal processing capabilities. The processor may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. A general purpose processor may be a microprocessor, or the processor may be any conventional processor or the like.

Further, the storage processor may store instructions and data necessary for operation.

Alternatively, the electronic device may be a notebook computer, a server, a development board.

Embodiment 3 a computer-readable storage medium

A computer readable storage medium storing computer instructions that, when executed by a processor, perform the low-light image enhancement method of embodiment 1.

In the computer-readable storage medium, instructions/program data are stored, which when executed, implement the method described in embodiment 2 of the present application. Wherein the instructions/program data may be stored in the form of a program file in the storage medium as a software product, so that a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) performs all or part of the steps of the methods of the embodiments of the present application.

Alternatively, the aforementioned storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or various media capable of storing program codes, or a computer, a server, a mobile phone, a tablet, or other devices.

Embodiment 4A Low-light image enhancement System

A deep learning based joint denoising and superdivision low-light image enhancement system, comprising:

Further, the low-light image enhancement system may further include: the average brightness analysis module is used for analyzing the brightness of the image; the noise level analysis module is used for analyzing the noise level of the image; and the resolution analysis module is used for analyzing the resolution of the image.

The low-light image enhancement system can realize the following method:

s1: inputting the low-light image sample into an enhancement network LLENT, and completing the work of enhancing the low-light image by the low-light image sample;

S2: judging whether the average brightness of the enhanced image is higher than a certain threshold L1, if yes, neglecting the step, otherwise, judging whether the maximum brightness value of the enhanced image is smaller than a certain threshold, and if yes, carrying out global linear brightness operation on the image by using a global linear brightness module GLBM;

s3: judging whether the noise level estimated value of the image is higher than a certain threshold sigma and the average brightness is higher than a certain threshold L2, if so, inputting the image into a denoising network DeNet, and finishing denoising work by the image;

s4: judging whether the resolution of the image is smaller than a certain threshold value R, if so, inputting the image into a super-resolution network SRNet, and completing the work of super-resolution reconstruction by the image.

The above description is only a specific embodiment of the present invention, and is not intended to limit the present invention in any way. It should be noted that the resolution of the image is not limiting to the invention and the content of the image is not limiting to the invention. The scope of the present invention is not limited thereto, and any changes or substitutions that would be easily recognized by those skilled in the art within the scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The low-light image enhancement method based on the deep learning and combining denoising and superdivision is characterized by comprising the following steps of:

s2: judging whether the average brightness of the enhanced image is higher than a certain threshold L1, if yes, neglecting the step, otherwise, judging whether the maximum brightness value of the enhanced image is smaller than a certain threshold, and if yes, carrying out global linear brightness operation on the image by using a global linear brightness module GLBM; otherwise, skipping the step;

s3: judging whether the noise level estimated value of the image is higher than a certain threshold sigma and the average brightness is higher than a certain threshold L2, if so, inputting the image into a denoising network DeNet, and finishing denoising work by the image; otherwise, skipping the step;

2. The low-light image enhancement method according to claim 1, wherein the enhancement network LLENet is a convolutional neural network having a laplace pyramid structure in its entirety and inside; the denoising network DeNet adopts a convolutional neural network; the super-division network SRNet adopts a Swin Transformer.

3. The low-light image enhancement method according to claim 1, wherein in step S2, the enhanced image is calculated firstAverage brightness of image

Wherein->

Representing the image after low illumination enhancement, wherein H, W and C respectively represent the height, width and channel number of the image; if->

Less than the set threshold +.>

Further judge->

Whether or not is less than 255, wherein->

The average brightness value of the expected image is obtained, if yes, the global linear brightness is performed by using a global linear brightness module GLBM, and the operation is expressed as follows:

Obtaining a brightened image +.>

。

4. The low-light image enhancement method according to claim 1, wherein in the step S4, when the resolution of the image is smaller than a certain threshold R, the resolution is increased by a super-resolution reconstruction network SRNet based on Swin transform; specifically, if the larger of the width and height of the image is S

No operation is performed.

5. The method according to claim 1, wherein in the step S3, the denoising network dent is a convolutional neural network having a 4-stage U-Net structure as a whole, each level of the U-Net is composed of denoising basic units repeated N times, and the denoising basic units are composed of layer normalization, 1*1 convolution layers, 3*3 depth separable convolution, simple gate units and simplified channel attention mechanisms and jump connections, wherein the simple gate units are equivalent to a simplified version of a GELU activation function, which divides an input feature map into two parts by channels, and then multiplies the two parts by elements to obtain an output; the simplified channel attention mechanism removes the original 1 convolution layers and two activation function layers, performs global average pooling on the input feature map, performs 1*1 convolution operation only once, and then multiplies the input feature map with the original input feature map according to channels to obtain output; the loss function is peak signal to noise loss.

6. The low-light image enhancement method according to claim 1, wherein the enhancement network LLENet, the denoising network DeNet and the super-division network SRNet are respectively trained, and the three sub-networks and the global linear luminance module GLBM are organized into an overall network for end-to-end low-light image enhancement after training is completed.

7. A deep learning-based joint denoising and superdivision low-light image enhancement system for performing the low-light image enhancement method of any one of claims 1-6, comprising:

8. The low-light image enhancement system of claim 7, further comprising: the average brightness analysis module is used for analyzing the brightness of the image; the noise level analysis module is used for analyzing the noise level of the image; and the resolution analysis module is used for analyzing the resolution of the image.

9. An electronic device comprising a memory and a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the low-light image enhancement method of any of claims 1-7.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the low-light image enhancement method of any of claims 1-7.