CN114581337B

CN114581337B - Low-light image enhancement method combining multi-scale feature aggregation and lifting strategies

Info

Publication number: CN114581337B
Application number: CN202210278847.0A
Authority: CN
Inventors: 蒋斌; 王仁君; 杨超
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2024-04-05
Anticipated expiration: 2042-03-17
Also published as: CN114581337A

Abstract

The invention relates to a low-light image enhancement method combining multi-scale feature aggregation and lifting strategies, belongs to the technical field of image enhancement, improves a low-light image enhancement model of a coding-decoding architecture based on a convolutional neural network, and provides a multi-scale feature aggregation module (FBAM) and a noise removal module (BPM) combining the lifting strategies and a pixel attention mechanism. The method has the advantages that the method is based on an error feedback mechanism, and a back projection technology is used, so that all the previous characteristics can be considered when the current characteristics are aggregated; the latter can improve the signal-to-noise ratio of the image and model the relationship between each pixel point in the image, helping the network to better identify the image content, thereby emphasizing commonality and removing differences.

Description

Low-light image enhancement method combining multi-scale feature aggregation and lifting strategies

Technical Field

The invention relates to a low-light image enhancement method combining multi-scale feature aggregation and lifting strategies, and belongs to the technical field of image enhancement.

Background

Low-light image enhancement is a research hotspot in the field of computer vision in recent years, and is widely applied to various advanced visual tasks such as object detection, semantic segmentation and the like, and at the same time, is also applied to the real fields of full-day automatic driving, visual monitoring, computed photography and the like. The low-light image enhancement technology can improve the visibility and contrast of photos taken in low light, backlight and extremely low light, strengthen the details of the content of the photos and improve the aesthetic perception of the photos. Therefore, the low-light image enhancement has strong practical significance and use value. There are two conventional algorithms for low-light image enhancement, one based on histogram equalization and the other based on Retinex theory. The former expands the dynamic range of the image by equalizing the image and improves the contrast of the image, the latter treats the image as both illumination and reflection components by some kind of a priori or regularization, treats the reflection component as the final enhancement result, and derives the reflection component by predicting the illumination component. Although both of these methods can enhance low-light images to some extent, they both have their own limitations. While histogram equalization-based methods focus only on enhancing the contrast of the image but disregarding noise present in the image, it is difficult for Retinex theory-based methods to find an effective a priori or regularization. Thus, the enhanced image obtained by the conventional algorithm may not be cleaned up or even amplified in noise, and the image may have problems of artifacts, color deviation, overexposure, and the like. In recent years, a large number of low-light image intensifiers based on deep learning have been proposed, with remarkable success.

Compared with the traditional algorithm, the low-light image enhancement method based on deep learning has higher accuracy, robustness and faster speed, and benefits from the fine network structure design and the dependence on a large number of training data sets. However, there is almost no deep learning-based method that focuses on the detail content in the image during enhancement of the low-light image, which results in the problems of losing detail information in the enhanced image result, causing excessive smoothing of the image, and the like. In addition, most deep learning-based methods preserve and even amplify noise in the original image. In summary, the current low-light image enhancement method based on deep learning cannot ensure the preservation of detail content in the image and the removal of noise in the image, so that the application scene and effect of the technology are limited. In order to ensure natural color of the enhanced result of the low-light image, complete removal of abundant details and redundant information, and improvement of the quality of the enhanced result, the method needs to be improved aiming at the existing model method.

Disclosure of Invention

The invention aims to improve a low-light image enhancement model of a coding-decoding architecture based on a convolutional neural network, and provides a low-light image enhancement method combining multi-scale feature aggregation and promotion strategies.

The technical scheme of the invention is that the method comprises the following steps:

step 1, data collection and processing are carried out,

(1.1) in order to train the low-light image enhancement model, enough image data is needed, and two low-light image data sets with open sources are adopted for model training; the first dataset contains 916 Zhang Diguang images and 1016 Zhang Zhengchang light images, which are unpaired; the second dataset contained 2117 Zhang Diguang images and 2117 Zhang Zhengchang light images, the two images being paired;

(1.2) for the images in both data sets, training images with the resolution of 320×320 are obtained by adopting random clipping, and then the images are subjected to random horizontal overturn for data enhancement; for the test set, a total of 672 images of 6 open source low light image data sets are adopted for testing, wherein 5 test data sets are unpaired data sets, and the other test data set contains 440 Zhang Diguang images, wherein 100 data sets are unpaired data sets without corresponding reference images; another 340 are pairs of datasets with corresponding reference images; step 2, constructing a model,

(2.1) designing a low-light image enhancement model based on the encoded-decoded U-net structure, wherein the model consists of a generator and two discriminators; one global discriminator is used for discriminating the whole image, and one local discriminator is used for discriminating the image block;

(2.2) in the encoding stage of the generator, for each scale feature, aggregating it to all subsequent levels using a multi-scale feature aggregation module, abbreviated as FABM; in the decoding stage, summarizing the characteristics extracted by each stage of encoder, and combining a denoising module, namely BPM (binary phase shift keying), so as to enhance the image;

the multi-scale feature aggregation module of (2.2.1), abbreviated as FABM, in the encoder for class C feature map E _C First, calculate its sum among different scales, the previous (C-1) level feature map E _{i_A} (i=1,., C-1)

Where p represents the projection operator, which will map E _C Upsampling, j denotes the number of upsampling, E _C Will be up-sampled to and from each E _{i_A} (i=1,., C-1) the same dimensions;

then using the back projection differenceDe-updating feature map E _C Thereby obtaining the polymerization characteristic E _{C_A} ：

Wherein bp represents the backprojection operator, which will backproject the differenceDownsampling, j denotes the number of downsampling, +.>Will be downsampled to the sum feature map E _C The same size;

finally obtained feature E _{C_A} Features of all dimensions between them are considered; assuming that stage C is the last stage of the encoder, then E _{C_A} The output of the encoder after aggregating the features of each stage;

then each stage of the decoder is operated in the same way for E _{C_A} Re-polymerizing to obtain enhanced feature D by fully utilizing the polymerized feature _{C_E} ：

D _{C_E} ＝bp _C-1 [p _C-1 (D _C )-E _{C_A} ]+D _C (3)

Wherein D is _C Representing a characteristic diagram of a C level in a decoder, wherein bp and p respectively represent a back projection operator and a projection operator; in the decoder, bp represents upsampling and p represents downsampling;

the denoising module of (2.2.2), abbreviated as BPM, usesThe previous estimate improves the current signal; first, feature C, which is currently required to be enhanced _f And features P which were previously considered to have been enhanced _f Added to obtain enhanced features S of high signal-to-noise ratio _f The higher the signal-to-noise ratio, the easier the denoising:

S _f ＝C _f +P _f (4)

this operation ensures C _f And P _f Implicit and unconstrained fusion of (1);

then the strengthening feature S _f Into a pixel attention block proposed by the invention, which will characterize the graph X ε R ^H×W× C compressing in channel dimension to obtain a pixel attention map M ^P ∈R ^H×W×1 Multiplying the original feature image X by the original feature image X to output a feature image X ^P ∈R ^H×W×C In the feature map, each pixel is associated with each other:

X ^P ＝X·σ(conv(avg(x))) (5)

wherein σ, conv and avg represent sigmoid function, convolution operation and average pooling operation, respectively;

will strengthen the feature S _f After the pixel attention block is sent, the relation between the pixel points of the characteristic image is established, and a recovered signal strengthening result R is obtained _f ：

R _f ＝P(S _f ) (6)

Wherein P represents the pixel attention block proposed by the invention; r is R _f The method can represent an emphasized signal of noise, so that the denoising module can better identify the image content, and a better denoising effect is obtained;

finally, subtracting the previous feature P _f To remove redundant information:

O _f ＝R _f -P _f (7)

wherein O is _f Representing the final output of the denoising module; the whole process reduces the difference between local block modeling and global recovery tasks;

(2.3) finally, carrying out residual error on the output of the encoder and the input image to obtain a final result; in addition to the global discriminant, a local discriminant is used to facilitate and stabilize training, image blocks are randomly cropped from the output image and the normal illumination image, and then sent to the local discriminant;

and 3, a loss function, wherein the low-light image enhancement model provided by the invention adopts counterloss and perception loss. The countering losses are from LSGAN, whose mathematical specifications are as shown in formulas (8) and (9):

wherein L is _D And L _G Representing the loss of the arbiter and the generator, p, respectively _r (x) Representing the distribution of normal light images/image blocks, p _f (y) represents the distribution of low-light images/image blocks; x and y each represent a group p _r (x) And p _f (y) the obtained samples, D representing the arbiter, G representing the generator, E representing the expected value;

the perceived loss is based on a feature map output in a pre-trained vgg-16 network, which is defined as:

wherein phi is _ij Representing extraction of a feature map obtained from the j-th convolutional layer of the i-th block of the vgg-16 network; w (w) _ij ，h _ij And c _ij Representing the dimension of the feature map, I _x And I _y Representing a low light image/image block and a generated normal light image/image block, respectively.

The invention improves a low-light image enhancement model of a coding-decoding architecture based on a convolutional neural network, and provides a multi-scale feature aggregation module (FBAM) and a noise removal module (BPM) combining a lifting strategy and a pixel attention mechanism. The former is based on an error feedback mechanism, using a back-projection technique, all previous features can be considered when aggregating current features. The latter can improve the signal-to-noise ratio of the image and model the relationship between each pixel in the image, helping the network to better identify the image content, thereby emphasizing commonalities and removing differences (often referred to as noise).

The invention has the main advantages that:

1. the method has the advantages that an encoded-decoded low-light image enhancement model is built, and the model can effectively enhance a low-light image through single-stage training, so that the image has rich details, a clean background and natural colors;

2. the multi-scale feature aggregation module can consider all scale features and effectively aggregate the features, so that the network can effectively save the content of the image while enhancing the low-light image, and an image with rich details is generated.

3. The denoising module can effectively improve the signal-to-noise ratio of the image and model the relation between each pixel point in the image, so that the network can be helped to better identify the image content, and therefore the commonality is emphasized, and the difference (usually referred to as noise) is eliminated.

Drawings

FIG. 1 is a schematic diagram of the overall structure of the present invention.

FIG. 2 is a schematic diagram of a multi-scale feature aggregation module.

Fig. 3 is a schematic diagram of a pixel attention module.

Detailed Description

The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention;

in addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other. The invention will be described in more detail below with reference to the accompanying drawings. Like elements are denoted by like reference numerals throughout the various figures. For clarity, the various features of the drawings are not drawn to scale.

A method of low-light image enhancement incorporating a multi-scale feature aggregation and promotion strategy according to an embodiment of the present invention is described below with reference to fig. 1 to 3, comprising the steps of:

step 1, data collection and processing are carried out,

(1.1) in order to train the low-light image enhancement model, enough image data is needed, and the invention adopts two low-light image data sets with open sources to train the model; the first dataset contains 916 Zhang Diguang images and 1016 Zhang Zhengchang light images, which are unpaired; the second dataset contained 2117 Zhang Diguang images and 2117 Zhang Zhengchang light images, the two images being paired;

(1.2) for the images in both data sets, training images with the resolution of 320×320 are obtained by adopting random clipping, and then the images are subjected to random horizontal overturn for data enhancement; for a test set, the invention adopts 6 open source low light image data sets to test 672 images, wherein 5 test data sets are unpaired data sets, and the other test data set contains 440 Zhang Diguang images, wherein 100 data sets are unpaired data sets without corresponding reference images; another 340 are pairs of datasets with corresponding reference images;

step 2, constructing a model,

(2.1) the present invention designs a low-light image enhancement model based on the encoded-decoded U-net structure, the overall structure of which is shown in fig. 1. Consists of a generator and two discriminators; one global discriminator is used for discriminating the whole image, and one local discriminator is used for discriminating the image block;

(2.2) in the encoding stage of the generator, for each scale feature, the present invention uses the proposed multi-scale feature aggregation module, abbreviated as FABM, to aggregate it to all subsequent levels; in the decoding stage, the invention gathers the extracted characteristics of each stage of encoder, combines the proposed denoising module, called BPM for short, and enhances the image;

the multi-scale feature aggregation module of (2.2.1), abbreviated as FABM, is shown in FIG. 2. In the encoder, for level C feature map E _C First, calculate its sum among different scales, the previous (C-1) level feature map E _{i_A} (i=1, …, C-1)

finally obtained feature E _{C_A} Features of all dimensions between them are considered. Assuming that stage C is the last stage of the encoder, then E _{C_A} I.e. the output of the encoder after aggregating the features of each stage. Each stage of the decoder then proceeds in the same mannerPair E _{C_A} Re-polymerizing to obtain enhanced feature D by fully utilizing the polymerized feature _{C_E} ：

D _{C_E} ＝bp _C-1 [p _C-1 (D _C )-E _{C_A} ]+D _C (3)

Wherein DC represents a C-level feature diagram in the decoder, and bp and p represent a back projection operator and a projection operator respectively; in the decoder, bp represents upsampling and p represents downsampling;

the denoising module of (2.2.2), abbreviated as BPM, uses the previous estimate to improve the current signal. Fig. 1 depicts its working mechanism. First, feature C, which is currently required to be enhanced _f And features P which were previously considered to have been enhanced _f Added to obtain enhanced features S of high signal-to-noise ratio _f The higher the signal-to-noise ratio, the easier the denoising:

S _f ＝C _f +P _f (4)

this operation ensures C _f And P _f Implicit and unconstrained fusion of (1);

then the strengthening feature S _f A pixel attention block proposed by the present invention is fed in, which pixel attention block is shown in fig. 3. It will feature map X ε R ^H×W×C Compression in the channel dimension, eventually obtaining a pixel attention map M ^P ∈R ^H ^×W×1 Multiplying the original feature image X by the original feature image X to output a feature image X ^P ∈R ^H×W×C In the feature map, each pixel is associated with each other:

X ^P ＝X·σ(conv(avg(x))) (5)

R _f ＝P(S _f ) (6)

Wherein P represents the pixel attention block proposed by the present invention；R _f The method can represent an emphasized signal of noise, so that the denoising module can better identify the image content, and a better denoising effect is obtained;

finally, subtracting the previous feature P _f To remove redundant information:

O _f ＝R _f -P _f (7)

(2.3) finally, carrying out residual error on the output of the encoder and the input image to obtain a final result; in addition to the global discriminant, the invention also uses a local discriminant to promote and stabilize training, randomly cropping image blocks from the output image and the normal illumination image, and then sending the image blocks into the local discriminant;

It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Claims

1. A low-light image enhancement method combining multi-scale feature aggregation and boosting strategies, characterized by comprising the steps of:

step 1, data collection and processing are carried out,

(1.2) for the images in both data sets, training images with the resolution of 320×320 are obtained by adopting random clipping, and then the images are subjected to random horizontal overturn for data enhancement; for the test set, a total of 672 images of 6 open source low light image data sets are adopted for testing, wherein 5 test data sets are unpaired data sets; another test dataset contained 440 Zhang Diguang images, 100 of which were unpaired datasets without corresponding reference images; another 340 are pairs of datasets with corresponding reference images; step 2, constructing a model,

D _{C_E} ＝bp _C-1 [p _C-1 (D _C )-E _{C_A} ]+D _C (3)

Wherein D is _C Representing a characteristic diagram of a C level in a decoder, wherein bp and p respectively represent a back projection operator and a projection operator;

in the decoder, bp represents upsampling and p represents downsampling;

(2.2.2) the denoising module, abbreviated as BPM, uses the previous estimate to improve the current signal; first, feature C, which is currently required to be enhanced _f And features P which were previously considered to have been enhanced _f Added to obtain enhanced features S of high signal-to-noise ratio _f The higher the signal-to-noise ratio, the easier the denoising:

S _f ＝C _f +P _f (4)

this operation ensures C _f And P _f Implicit and unconstrained fusion of (1);

then the strengthening feature S _f Into a pixel attention block which subjects the feature map X ε R ^H×W×C Compression in the channel dimension, eventually obtaining a pixel attention map M ^P ∈R ^H×W×1 Multiplying the original feature image X by the original feature image X to output a feature image X ^P ∈R ^H×W×C In the feature map, each pixel is associated with each other:

X ^P ＝X·σ(conv(avg(x))) (5)

R _f ＝P(S _f ) (6)

Wherein P represents a pixel attention block; r is R _f Representing the emphasized signal of noise, so that the denoising module better recognizes the image content, thereby obtaining better denoising effect;

finally, subtracting the previous feature P _f To remove redundant information:

O _f ＝R _f -P _f (7)

step 3, loss function, low-light image enhancement model adopts counterloss and perception loss; the countering losses are from LSGAN, whose mathematical specifications are as shown in formulas (8) and (9):

wherein L is _D And L _G Representing the loss of the arbiter and the generator, p, respectively _r (x) Representing the distribution of normal light images/image blocks, p _f (y) represents the distribution of low-light images/image blocks; x and y each represent a group p _r (x) And p _f (y) obtainedThe sample, D represents the discriminator, G represents the generator, E represents the expected value;