WO2023236445A1

WO2023236445A1 - Low-illumination image enhancement method using long-exposure compensation

Info

Publication number: WO2023236445A1
Application number: PCT/CN2022/131018
Authority: WO
Inventors: 刘家瑛; 匡浩玮; 郑书泓; 黄浩峰; 郭宗明
Original assignee: 北京大学
Priority date: 2022-06-09
Filing date: 2022-11-10
Publication date: 2023-12-14
Also published as: CN115240022A

Abstract

Disclosed in the present invention is a low-illumination image enhancement method using long-exposure compensation. The method comprises: 1) collecting a low-illumination training data set, wherein each training sample in the low-illumination training data set comprises a low-illumination image and a normal-illumination image of the same scene, and generating, according to each training sample, a group of a short-exposure image, a long-exposure image and a real-illumination image, which correspond to each other, so as to obtain a synthetic data set S; 2) training a low-illumination enhancement model by using the synthetic data set S, wherein the low-illumination enhancement model comprises M-1 feature alignment modules and M-1 brightening modules; and 3) inputting a short-exposure image to be brightened and a corresponding blurred long-exposure image into the trained low-illumination enhancement model, so as to obtain a corresponding low-illumination enhanced image. The present invention can significantly improve the performance of low-illumination image enhancement.

Description

A low-light image enhancement method using long exposure compensation

Technical field

The invention belongs to the field of low-light image enhancement of digital images, and relates to a low-light image enhancement method using long exposure compensation.

Background technique

Low light is a common image degradation. Insufficient light is usually caused by low-light shooting environment, camera failure, wrong parameter settings, etc. The enhancement of low-light images has always attracted attention from industry and academia.

Traditional low-light image enhancement methods can be divided into three categories. Based on the method of uniformly adjusting image brightness, low-light images can be brightened by uniformly adjusting the global brightness of the entire image. Based on the method of retinal cerebral cortex theory, the image is decomposed into two parts, the reflectance layer and the illumination layer, and prior knowledge is used to manually set constraints and adjust to achieve the purpose of enhancing low-light images. Based on the deep learning method, a data-driven convolution model is designed and trained end-to-end on a large data set. Only one parameter forward pass of the low-light image is required during inference.

However, the enhancement of low-light images is a multi-solution problem. One low-light image can correspond to multiple ideal normal-light images. The uncertainty of this optimization goal creates challenges for accurate and flexible low-light image enhancement. . Methods based on uniformly adjusting image brightness cannot solve the problems of local overexposure and signal noise. Methods based on retinal cerebral cortex theory cannot meet the requirements of general automation. Methods based on deep learning are difficult to generalize to handle images under various lighting conditions. Therefore, traditional low-light image enhancement methods are difficult to generalize to handle low-light images under various lighting conditions, and cannot meet the needs of practical applications.

Contents of the invention

In response to the above problems, the purpose of the present invention is to provide a low-light image enhancement method using long-exposure compensation. By introducing an easily obtained blurred long-exposure image, the brightness, color and other information of the blurred long-exposure image are used to add value to the low-light enhancement problem. Constraints reduce the uncertainty of low-light image enhancement problems, make the optimization goals clearer, and comprehensively improve low-light enhancement performance.

The technical solutions adopted by the present invention are as follows:

A low-light image enhancement method using long exposure compensation, the steps include:

1) Collect a low-light training data set, wherein each training sample in the low-light training data set includes low-light images and normal illumination images of the same scene; generate a set of corresponding short-exposure images, long-exposure images according to each training sample images and real illumination images to obtain a synthetic data set S;

2) Use the synthetic data set S to train a low-light enhancement model. The low-light enhancement model includes M-1 feature alignment modules and M-1 brightening modules; where, for the same group in the synthetic data set S The photo long exposure image I _long and the short exposure image I _short in the image are respectively mapped to the feature space by the _low -light _enhancement model to obtain the corresponding short exposure features.

and long exposure characteristics

And input it into the first feature alignment module;

3) The i-th feature alignment module applies the i-th scale long exposure feature to the input

with i-th scale short exposure features

Align; among them, the i-th feature alignment module aligns the i-th scale short exposure features

Perform convolution processing to obtain an attention map A ⁱ , and then use the attention map A ⁱ to compare the i-th scale long exposure features

Perform soft threshold filtering operation to get

where "⊙" represents element-wise multiplication; then

and

Together, we perform downsampling and pass it into the convolutional layer to predict and output the i+1th scale long exposure feature.

and will

Perform downsampling separately and pass it to the convolutional layer to predict and output the i+1th scale short exposure feature.

Align the M-1th feature module to predict the output M-th scale long exposure feature

Mth scale short exposure features

Perform splicing as the feature M+1 scale long exposure feature

M+1th scale short exposure features

Among them, i=1～M-1,

4) The i-th brightening module converts the M+i-th scale long exposure features

M+ith scale short exposure features

Perform splicing, upsample the spliced features, and then combine the upsampled features with the Mi-th scale short exposure features

After connection, the M+i+1th scale short exposure feature is obtained through the convolution layer.

right

Features obtained by upsampling and Mi-th scale long exposure features

After connection, the M+i+1-th scale long exposure feature is obtained through the convolution layer.

The 2M-scale short exposure features output by the M-1 brightening module

As the optimization target I _normal , the 2M scale long exposure feature

As an auxiliary, the low-light enhancement model is optimized; wherein, the total loss function of training and optimizing the low-light enhancement model is L=L _rec +λ _SSIM L _SSIM +λ _LPIPS L _LPIPS +λ _a L _a ; λ _SSIM , λ _LPIPS and λ _a are weight terms, L _rec is the average absolute error loss function between the optimization target I _normal and the real value I _GT under normal lighting; L _SSIM is the structural similarity between the optimization target I _normal and the real value I _GT under normal lighting. loss function; L _LPIPS is the perceptual image block similarity learning loss function; L _a is the average absolute error loss function between the auxiliary output I _assist and the real value I _GT under normal lighting;

5) Input the short exposure image to be brightened and the corresponding blurred long exposure image into the trained low-light enhancement model to obtain the corresponding low-light enhancement image.

Furthermore, the low-light enhancement model also includes a detail removal module, which is used to eliminate the detailed features of the long-exposure image I _long and then map them to the feature space to obtain the long-exposure features.

Further, a real-shot data set R is obtained. Each group of images in the real-shot data set R includes three images taken of the same scene, namely a short exposure image, a long exposure image and a real illumination image; using the The real-shot data set R is used to evaluate the trained low-light enhancement model.

Further, the long-exposure images in the synthetic data set S are synthesized according to the normal illumination images in the training samples; the short-exposure images in the synthetic data set S are low-light images in the training samples. The synthetic data set The real illumination image in S is the normal illumination image of the training sample.

Further, the blur kernel space model is used to process the normal illumination images in the training samples to obtain the long exposure images in the synthetic data set S.

Further, L _rec =‖I _normal -I _GT ‖; L _a =‖I _assist -I _GT ‖.

Further, L _SSIM =1-SSIM(I _normal ,I _GT ); where, SSIM(x, y) represents the structural similarity of two images x and y;

μ _x is the mean value of x, μ _y is the mean value of y,

is the variance of x,

is the variance of y, σ _xy is the covariance of x and y, and c ₁ and c ₂ are constants used to maintain stability.

further,

Among them, φ _l (I _normal ) represents the l-th layer feature of the image I _normal extracted by the VGG network, and H _l and W _l represent the width and height of φ _l (I _normal ) respectively.

The present invention also provides a server, including a memory and a processor. The memory stores a computer program. The computer program is configured to be executed by the processor. The computer program includes instructions for executing each step in the above method. .

The present invention also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are implemented.

Compared with the existing technology, the positive effects of the present invention are:

Since the present invention uses long-exposure images as illumination compensation, the alignment module and the brightening module in step 2 of the "Detailed Implementation" below can effectively utilize the long-exposure features to achieve feature interaction between long-exposure images and short-exposure images. , thereby performing a brightening operation on low-light images with a more specific brightness target. This design alleviates the problem of unclear brightening targets encountered by other low-light enhancement technologies, allowing the present invention to significantly improve performance.

This invention significantly improves the performance of low-light image enhancement. On the LEC-LOL-Real low-light enhancement benchmark data set, it can increase the peak signal-to-noise ratio (Peak Signal to Noise Ratio) of the general low-light enhancement model AGLLNet from 14.93 to 25.15.

Description of drawings

Figure 1 shows the training framework diagram of the low-light image enhancement network using long exposure compensation.

Figure 2 is the framework diagram of the feature alignment sub-module.

Figure 3 is a frame diagram of the brightening sub-module.

Figure 4 is a comparison picture before and after enhancement according to the method of the present invention, in which (1) is a low-light image, (2) is a long-exposure blurred image, and (3) is an enhanced image.

Detailed ways

In order to make the above-mentioned features and advantages of the present invention more obvious and easy to understand, embodiments are given below and described in detail with reference to the accompanying drawings. It should be noted that the specific number of layers, modules, functions, settings of certain layers, etc. given in the following embodiments are only preferred implementations and are not intended to be limiting. Those skilled in the art can It should be understandable to select the number and set some layers according to actual needs.

This embodiment discloses a low-light enhancement method for long exposure compensation. The specific description is as follows:

Step 1: Collect the low-light training data set. For each low-light/normal-light image data pair, use the blur kernel space model to synthesize multiple different long-exposure blur images to build a combination of normal illumination/short exposure/synthetic long-exposure. The long-exposure compensated low-light enhanced synthetic data set S composed of image pairs is used for training and testing of the network; among them, the short-exposure/real-light images are obtained directly from the collected low-light/normal-light image pairs, that is, the collected low-light images The image is a short exposure image in the set S, and the collected normal illumination image is a real illumination image in the set S. Take three images of short exposure/long exposure/normal illumination in the same scene that is different from the scene in the training data set S to form a data pair, and build a long exposure compensation low-light enhanced real-shot data set R for network testing. The synthetic data set S collected in this step can be used for the training of this method and other subsequent methods, and the real shot data set R collected can be used for the evaluation and comparison of various low-light enhancement methods.

Step 2: Build a low-light enhanced training framework.

The structure of the network is shown in Figure 1, which includes the feature alignment module S2L, the highlighting module L2S, and the detail removal module DRP.

The following takes the long exposure image I _long and the short exposure image I _short of a pair of photos in the data set S as an example to introduce the network framework. The long exposure image I _long and the short exposure image I _short first pass through the convolution layer, followed by a normalization layer and linear rectification function (ReLU) to map the image to the feature space to obtain the initial short exposure feature.

and long exposure characteristics

In particular, because long exposure pictures provide brightness and illumination information, we do not want the detailed information of long exposure to interfere with the model. Before the long exposure is input into the long exposure feature decoding module, a 16x downsampling and 16x downsampling are added. Upsampling module DRP to eliminate detailed features. This operation enhances the robustness of the method and allows it to adapt to various blur forms of long-exposure images.

In order to get the brightness characteristics of the long exposure image in each layer

Detailed features with short exposure images

Alignment to achieve effective feature interaction, the feature alignment module S2L is added to the model. The structure of the feature alignment module is shown in Figure 2. Short exposure features

First pass through a convolution layer to obtain an attention map.

Then use this attention map A ⁱ to compare the long exposure features

Perform soft threshold filtering operation to get

where "⊙" represents element-wise multiplication. This operation selectively utilizes features extracted from long-exposure images in the spatial dimension, effectively alleviating the impact of interference information in long-exposure images on this method. after,

and

Together, they downsample and pass into the convolutional layer to predict the next long exposure feature.

Downsample separately and pass it into the convolutional layer to predict the next short exposure feature

The next feature is half the size of the previous feature in the spatial dimension, but twice the size of the previous feature in the channel dimension.

After (M-1) feature alignment modules, we obtained multi-scale short exposure features

and long exposure features

Below, the model decodes them from feature space to image space through a feature decoding stage. During the feature decoding phase, guidance proceeds in the opposite direction. Long exposure characteristics

Will guide short exposure features

decoding, because we need brightness features to guide the enhancement of short exposure images, M is an integer greater than 2.

Similar to the feature alignment module S2L, a highlighting module L2S is also added to the model, and its structure is shown in Figure 3. The highlight module inputs the long exposure features of the previous size

and short exposure features

and input features connected via skip connections

Long and short exposure characteristics of the size equivalent encoding stage

More specifically, long exposure characteristics

First with the short exposure feature

connected. Then, the upsampled features are combined with the skip-connected features through the upsampling module.

are connected and passed through the convolutional layer to obtain the features of the next scale.

Perform upsampling separately, and combine the upsampled features with the skip-connected features.

Compared with the previous scale, the features of the next scale are twice the original size in the spatial dimension, but half the original size in the channel dimension.

Step 3: Use the constructed data set S to train the model, use the output of the model in the short-exposure decoding module as the optimization target I _normal , and use the output of the long-exposure decoding module as the auxiliary output I _assist to train the model. The total loss function term of the low-light image enhancement model using long exposure compensation is:

L＝L _rec +λ _SSIM L _SSIM +λ _LPIPS L _LPIPS +λ _a L _a

Among them, λ _SSIM , λ _LPIPS and λ _a are weight items. Usually λ _SSIM is set to 0.4, λ _LPIPS is set to 1, and λ _a is set to 1. The training batch size of the model is 16, the Adam optimizer is used, the initial learning rate is 1×10 ^-4 , the optimizer hyperparameters are set to β ₁ =0.9, β ₂ =0.999, and the weight decay parameter is 1×10 ^-4 . In addition, in order to avoid gradient explosion, the gradient value in gradient backpropagation will be truncated in the [-0.1, 0.1] interval. During training, 256 × 256 pixel patches are randomly cropped and a two-stage training strategy is used. In the first stage, 1.5×10 ⁵ iterations are trained without adding the attention mechanism of the feature alignment module S2L. Afterwards, the attention mechanism is added and then 3×10 ⁴ iterations are trained with an initial learning rate of 1×10 ^-5 .

1) L _rec is the average absolute error loss function between the optimization target I _normal and the real value I _GT under normal lighting:

L _rec =‖I _normal -I _GT ‖,

2) L _SSIM is the structural similarity loss function between the optimization target I _normal and the real value I _GT under normal lighting:

L _SSIM = 1-SSIM (I _normal , I _GT ),

Among them, SSIM(x,y) represents the structural similarity of two images x and y, which can be calculated according to the following method:

where μ _x is the mean of x, μ _y is the mean of y,

is the variance of x,

is the variance of y, σ _xy is the covariance of x and y. c ₁ =(k ₁ L) ² and c ₂ =(k ₂ L) ² are constants used to maintain stability. L is the dynamic range of pixel values. k ₁ =0.01, k ₂ =0.03.

3) L _LPIPS is the perceptual image block similarity learning loss function:

The image features used to calculate L _LPIPS here are extracted using the VGG network pretrained on ImageNet. The image features are obtained by inputting the output result I _normal of the network and the normal illumination picture I _GT of the image pair into the pre-trained VGG network model respectively. Among them, φ _l (I) represents the l-th layer feature of image I extracted by the VGG network, and H _l and W _l represent the width and height of φ _l (I) respectively. The design of the above three loss functions constrains the brightening results of this method in three methods: absolute error, image structure and image perception similarity, so that the brightening results are close to the user's expectations.

4) L _a is the average absolute error loss function between the auxiliary output I _assist and the real value I _GT under normal lighting:

L _a =‖I _assist -I _GT ‖.

This loss function finds a shortcut for parameter optimization in the network, making parameter optimization more efficient during network training.

Step 4: Use the real-shot data set R to evaluate the trained low-light enhancement model.

Step 5: In the inference stage, input the short exposure image to be brightened and the corresponding blurred long exposure image, and finally output the desired low-light enhancement result.

In practical applications, if you need to capture images indoors with insufficient lighting or in outdoor environments at night, you can first use the camera's short exposure mode to capture a short exposure image, as shown in Figure 4(1) for the low-light image; then use The camera's long exposure mode captures a long exposure image. This short-exposure image will have the problem of insufficient brightness, and this long-exposure image will have the problem of blur, as shown in Figure 4(2), the long-exposure blurred image. At this time, by using the technical solution proposed by the present invention and inputting the short-exposure image to be brightened and the corresponding blurred long-exposure image, a clear low-light enhancement result can be obtained, as shown in Figure 4(3) after the enhancement.

The above embodiments are only used to illustrate the technical solutions of the present invention but not to limit them. Those of ordinary skill in the art can modify or equivalently replace the technical solutions of the present invention without departing from the spirit and scope of the present invention. The scope of protection shall be determined by the claims.

Claims

A low-light image enhancement method using long exposure compensation, the steps include:

1) Collect a low-light training data set, wherein each training sample in the low-light training data set includes low-light images and normal illumination images of the same scene; generate a set of corresponding short-exposure images, long-exposure images according to each training sample images and real illumination images to obtain a synthetic data set S;

2) Use the synthetic data set S to train a low-light enhancement model. The low-light enhancement model includes M-1 feature alignment modules and M-1 brightening modules; where, for the same group in the synthetic data set S The photo long exposure image I long and the short exposure image I short in the image are respectively mapped to the feature space by the low -light enhancement model to obtain the corresponding short exposure features.
and long exposure characteristics
And input it into the first feature alignment module;

3) The i-th feature alignment module applies the i-th scale long exposure feature to the input
with i-th scale short exposure features
Align; among them, the i-th feature alignment module aligns the i-th scale short exposure features
Perform convolution processing to obtain an attention map A i , and then use the attention map A i to compare the i-th scale long exposure features
Perform soft threshold filtering operation to get
where "⊙" represents element-wise multiplication; then
and
Together, we perform downsampling and pass it into the convolutional layer to predict and output the i+1th scale long exposure feature.
and will
Perform downsampling separately and pass it to the convolutional layer to predict and output the i+1th scale short exposure feature.
Align the M-1th feature module to predict the output M-th scale long exposure feature
Mth scale short exposure features
Perform splicing as the feature M+1 scale long exposure feature
M+1 scale short exposure features
Among them, i=1～M-1,

4) The i-th brightening module converts the M+i-th scale long exposure features
M+ith scale short exposure features
Perform splicing, upsample the spliced features, and then combine the upsampled features with the Mi-th scale short exposure features
After connection, the M+i+1th scale short exposure feature is obtained through the convolution layer.
right
Features obtained by upsampling and Mi-th scale long exposure features
After connection, the M+i+1-th scale long exposure feature is obtained through the convolution layer.
The 2M-scale short exposure features output by the M-1 brightening module
As the optimization target I normal , the 2M scale long exposure feature
As an auxiliary, the low-light enhancement model is optimized; wherein, the total loss function of training and optimizing the low-light enhancement model is L=L rec +λ SSIM L SSIM +λ LPIPS L LPIPS +λ a L a ; λ SSIM , λ LPIPS and λ a are weight terms, L rec is the average absolute error loss function between the optimization target I normal and the real value I GT under normal lighting; L SSIM is the structural similarity between the optimization target I normal and the real value L GT under normal lighting. loss function; L LPIPS is the perceptual image block similarity learning loss function; L a is the average absolute error loss function between the auxiliary output I assist and the real value I GT under normal lighting;

5) Input the short exposure image to be brightened and the corresponding blurred long exposure image into the trained low-light enhancement model to obtain the corresponding low-light enhancement image.
The method according to claim 1, characterized in that said building a low-light enhancement model further includes a detail removal module for eliminating detailed features of the long-exposure image I long and then mapping them to the feature space to obtain the long-exposure features.
The method according to claim 1, characterized in that a real-shot data set R is obtained, and each group of images in the real-shot data set R includes three images taken of the same scene, namely a short exposure image, Long exposure images and real illumination images; use the real shot data set R to evaluate the trained low-light enhancement model.
The method according to claim 1 or 2 or 3, characterized in that the long exposure image in the synthetic data set S is obtained by synthesizing the normal illumination images in the training sample; the short exposure image in the synthetic data set S is obtained is the low-illumination image in the training sample, and the real illumination image in the synthetic data set S is the normal illumination image of the training sample.
The method according to claim 1, 2 or 3, characterized in that the long exposure image in the synthetic data set S is obtained by using a blur kernel space model to process the normal illumination images in the training samples.
The method according to claim 1 or 2 or 3, characterized in that L rec =‖I normal -I GT ‖; L a =‖I assist -I GT ‖.
The method according to claim 1, characterized in that, L SSIM =1-SSIM (I normal , I GT ); wherein, SSIM (x, y) represents the structural similarity of two images x and y;
μ x is the mean value of x, μ y is the mean value of y,
is the variance of x,
is the variance of y, σ xy is the covariance of x and y, and c 1 and c 2 are constants used to maintain stability.
The method according to claim 1, characterized in that:

Among them, φ l (I normal ) represents the l-th layer feature of the image I normal extracted by the VGG network, and H l and W l represent the width and height of φ l (I normal ) respectively.
A server, characterized in that it includes a memory and a processor, the memory stores a computer program, the computer program is configured to be executed by the processor, the computer program includes a component for executing any one of claims 1 to 8 Instructions for each step in the method.
A computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a processor, the steps of the method of any one of claims 1 to 8 are implemented.