CN116137023B

CN116137023B - Low-illumination image enhancement method based on background modeling and detail enhancement

Info

Publication number: CN116137023B
Application number: CN202310425949.5A
Authority: CN
Inventors: 潘磊; 田俊; 栾五洋; 郑远; 傅强; 张永; 王艾; 赵枳晴; 李俊辉; 王梦琪
Original assignee: Civil Aviation Flight University of China
Current assignee: Civil Aviation Flight University of China
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-06-20
Anticipated expiration: 2043-04-20
Also published as: CN116137023A

Abstract

The invention relates to a low-illumination image enhancement method based on background modeling and detail enhancement, which comprises the following steps: acquiring a plurality of low-illumination images and paired normal illumination images, and preprocessing to obtain a training data set; constructing an image enhancement network model, and constraining the image enhancement network model by using a target loss function; and training the image enhancement network model by using the training data set until the error converges to a preset value, so as to obtain a trained image enhancement network model. The image enhancement network model can carry out background modeling and detail recovery on the low-illumination image, and a global feature fusion enhancement module is designed to realize global information fusion and self-adaptive enhancement. The method can enhance the captured image under the low illumination condition and solve the problems of low contrast, color distortion, noise amplification, detail loss and the like of the low illumination image.

Description

Low-illumination image enhancement method based on background modeling and detail enhancement

Technical Field

The invention relates to the technical field of artificial intelligence and computer vision, in particular to a low-illumination image enhancement method based on background modeling and detail enhancement.

Background

Images are one of the important forms of information that people express and communicate, and with the development of imaging devices and machine vision technology, people have been able to capture very high quality images or videos. However, insufficient illumination can cause problems of low contrast, color distortion, noise amplification, detail loss and the like of a captured image, reduce image quality, and severely restrict performance of downstream tasks, so that low-illumination image enhancement becomes a big research hotspot in the field of computer vision.

The low-illumination image enhancement technology has wide application scenes, and can improve the image quality shot at night, so that night scene photos become clearer and brighter. In many occasions, people often need to take pictures at night, such as night scenes, figures, animals and the like, but the pictures are often dull and fuzzy due to insufficient light. The low-illumination image enhancement technology can enhance the brightness, contrast and definition of the image by processing the image, so that the night scene photo becomes more vivid and real.

The early-starting method is mainly based on histogram equalization and Retinex theory, wherein the histogram equalization is a classical image enhancement technology, and the contrast map of the low-illumination image is improved by transforming the pixel distribution histogram of the low-illumination image into a uniform distribution. However, the main disadvantage of the method is that the real illumination factor is rarely considered, the enhanced result is inconsistent with the real scene in subjective vision, and the phenomena of color distortion and local overexposure exist; the theoretical basis of the Retinex model is trichromatic theory and color constancy, and the image enhancement method based on the Retinex theory is essentially an image decomposition and illumination estimation problem because the Retinex model can decompose the image S perceived by the human eye into a reflection component R and an illumination component L, i.e., s=r×l. Although the traditional low-illumination image enhancement method based on Retinex theory can improve the overall brightness of the image to a certain extent, the enhanced low-illumination image is often accompanied with problems of noise amplification, unbalanced brightness and the like, and the enhancement effect is not ideal.

With the rapid development of the deep learning algorithm, many scholars try to solve the problem of low-illumination image enhancement by using a neural network, and the low-illumination image enhancement algorithm based on the deep learning has become the mainstream technology of the current low-illumination image enhancement. However, the existing image enhancement method based on deep learning does not consider the correlation between background modeling and detail restoration, so that the enhanced image has the problems of color distortion or incomplete detail restoration and the like.

Disclosure of Invention

The invention aims to aggregate background modeling and detail enhancement, and provides a low-illumination image enhancement method based on background modeling and detail enhancement, which is beneficial to remarkably improving the performance of low-illumination image enhancement.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

the low-illumination image enhancement method based on background modeling and detail enhancement comprises the following steps:

step 1, acquiring a plurality of low-illumination images and paired normal illumination images, and preprocessing to obtain a training data set;

step 2, constructing an image enhancement network model, and restraining the image enhancement network model by using a target loss function;

step 3, training the image enhancement network model by using a training data set until the error converges to a preset value to obtain a trained image enhancement network model;

and 4, inputting the captured image under the low-illumination condition into a trained image enhancement network model to obtain the image with normal illumination.

The step 1 specifically comprises the following steps:

acquiring a plurality of low-illumination images and paired normal illumination images, and adding labels to each low-illumination image;

randomly cutting each low-illumination image with the size of H multiplied by W multiplied by 3 into images with the size of M multiplied by 3, and adopting the same random cutting mode for the corresponding normal illumination image, wherein H, W is the height and the width of the low-illumination image and the normal illumination image, and M is the height and the width of the cut image;

and carrying out data enhancement on the cut image by adopting a random inversion, random rotation and random cutting mode to obtain a training data set.

In the step 2, the constructed image enhancement network model comprises a plurality of background modeling and detail recovering modules with different scales and a global feature fusion enhancement module;

inputting a low-illumination image I to a first background modeling and detail recovering module, and obtaining a feature map after downsampling, upsampling and channel fusion operations of other background modeling and detail recovering modules with different scales

，i=1,2,3,4；

Inputting feature graphs into global feature fusion enhancement module

Obtaining a characteristic diagram F _out ；

Map F of the characteristics _out Added with the low-illumination image I to obtain an enhanced image I _out 。

Each background modeling and detail recovery module has the same structure and comprises a position coding sub-module, a convolution sub-module, a self-attention sub-module based on self-attention and a feedforward neural network sub-module; the output end of the position coding sub-module is respectively connected with the convolution sub-module and the self-attention sub-module based on self-attention, and the output end of the convolution sub-module and the output end of the self-attention sub-module based on self-attention are respectively connected with the feedforward neural network sub-module.

Inputting a low-illumination image I into a position coding submodule, and increasing the dimension of the low-illumination image I from H multiplied by W multiplied by 3 to H multiplied by W multiplied by C through 1 multiplied by 1 convolution to obtain a characteristic diagram F _in Then, a 3 x 3 depth separable convolution is performed to perform position coding to output a characteristic diagram F _emb ：

F _in =Conv1(I)

F _emb =DWConv3(F _in )

Wherein Conv1 is a 1×1 convolution and DWConv3 is a 3×3 depth separable convolution;

inputting feature map F to self-attention sub-module based on self-attention _emb Output of characteristic diagram F _attention ：

Q,K,V=Chunk(F _emb )

Attention(Q,K,V)=V·Softmax(K·Q/α)

F _attention =Attention(Q,K,V)

Wherein Attention is a self-Attention operation; chunk is an average division operation by channel dimension, softmax is a Softmax function; q, K, V is obtained by input coding, Q is information to be queried, K is a vector to be queried, and V is a value obtained by query; alpha is the sampling factor;

inputting feature diagram F to convolution submodule _emb Restoring feature map F using a 3×3 convolution _emb Is output through GELU activation function _detail ：

F _detail =GELU(Conv3(F _emb ))

Wherein Conv3 is a 3×3 convolution and GELU is a GELU activation function;

the feedforward neural network sub-module will be based on the self-attention sub-module's output feature map F _attention And convolution submodule outputs a characteristic diagram F _detail Adding, normalizing by using layers, lifting the dimension of the feature map by using 1×1 convolution, respectively entering three branches, wherein the first branch is a 3×3 convolution extraction feature, the second branch is a 3×3 convolution extraction feature, then passing through a GELU activation function, the third branch is a 3×3 convolution extraction feature, finally performing element multiplication on the three branches, and recovering to the original dimension by using 1×1 convolution, thereby obtaining a feature map Z output by a feedforward neural network submodule:

wherein Concat is a splicing operation along the channel dimension; LN () is a layer normalization operation;

is an element multiplication;

is a 1 x 1 convolution of the first branch,/->

3 x 3 convolutions of the first branch; />

1X 1 convolution of the second branch, < >>

A 3 x 3 convolution of the second branch; />

The third branch's 1 x 1 convolution, < >>

3 x 3 convolution of the third branch;

the background modeling and detail recovery module outputs the feature map to the global feature fusion enhancement module

、/>

、/>

、/>

First of all, the feature maps of the different layers are +.>

(i=1, 2,3, 4) by upsampling the uniform resolution, then by 1×1 convolving the uniform channel dimension, by 3×3 convolving the position coding, then by self-adapting the enhanced global feature, and finally by 1×1 convolving the output feature map F _out ：

Where PS is an upsampling operation.

In the step 2, the step of constraining the image enhancement network model by using the target loss function includes:

the target loss function L1 of the image enhancement network model is:

where GT represents a normal illumination image corresponding to the low illumination image I.

Also comprises a regression Loss function Loss _rec The regression Loss function Loss _rec The method comprises the following steps:

wherein Crec is the weight of the first branch, urec is the weight of the second branch, and Grec is the weight of the third branch in the feedforward neural network sub-module;

representing a binary norm; z is a feature diagram output by a background modeling and detail recovering module, F _attention F for self-attention module output feature map based on self-attention _detail And outputting a characteristic diagram for the convolution submodule.

And also comprises a Loss recovery function Loss _res The recovery Loss function Loss _res The method comprises the following steps:

wherein m represents the number of layers of the background modeling and detail recovering module, and m= 7,j represents the j background modeling and detail recovering module; y is _GT Representing a normal illumination image corresponding to the low illumination image I,

and the feature diagram output by the j background modeling and detail recovering module is represented.

The total Loss function Loss is:

wherein, the liquid crystal display device comprises a liquid crystal display device,

for regression Loss function Loss _rec Weight of->

To recover Loss function Loss _res Is a weight of (2).

Compared with the prior art, the invention has the beneficial effects that:

the image enhancement network model can carry out background modeling and detail recovery on the low-illumination image, and a global feature fusion enhancement module is designed to realize global information fusion and self-adaptive enhancement. The method can enhance the captured image under the low illumination condition and solve the problems of low contrast, color distortion, noise amplification, detail loss and the like of the low illumination image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an exemplary embodiment of an image enhancement network model;

FIG. 2 is a schematic diagram of a background modeling and detail restoration module according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish one from another, and are not to be construed as indicating or implying a relative importance or implying any actual such relationship or order between such entities or operations. In addition, the terms "connected," "coupled," and the like may be used to denote a direct connection between elements, or an indirect connection via other elements.

Examples:

the invention is realized by the following technical scheme, as shown in figure 1, a low-illumination image enhancement method based on background modeling and detail enhancement comprises the following steps:

step 1, a plurality of low-illumination images and paired normal illumination images are obtained, and preprocessing is carried out to obtain a training data set.

A plurality of low-illuminance images and paired normal-illuminance images (group trunk images) are acquired, and a label is added to each low-illuminance image. Each low-illumination image with the size of H multiplied by W multiplied by 3 is randomly cut into images with the size of M multiplied by 3, and the same random cutting mode is adopted for the corresponding normal illumination images, wherein H, W is the height and the width of the low-illumination image and the normal illumination image, and M is the height and the width of the cut images. And carrying out data enhancement on the cut image by adopting a random inversion, random rotation and random cutting mode to obtain a training data set.

And 2, constructing an image enhancement network model, and using a target loss function to constrain the image enhancement network model.

The image enhancement network model comprises a plurality of background modeling and detail recovery modules with different scales and a global feature fusion enhancement module; each background modeling and detail recovering module has the same structure and comprises a position coding sub-module, a convolution sub-module, a self-attention sub-module based on self-attention and a feedforward neural network sub-module, and referring to fig. 2, the output end of the position coding sub-module is respectively connected with the convolution sub-module and the self-attention sub-module based on self-attention, and the output end of the convolution sub-module and the output end of the self-attention sub-module based on self-attention are respectively connected with the feedforward neural network sub-module.

In one embodiment, referring to fig. 1, the plurality of background modeling and detail restoration modules with different scales include a first background modeling and detail restoration module, a second background modeling and detail restoration module, a third background modeling and detail restoration module, a fourth background modeling and detail restoration module, a fifth background modeling and detail restoration module, a sixth background modeling and detail restoration module, and a seventh background modeling and detail restoration module, where the first background modeling and detail restoration module, the second background modeling and detail restoration module, the third background modeling and detail restoration module are downsampling mechanisms, the fourth background modeling and detail restoration module, the fifth background modeling and detail restoration module, the sixth background modeling and detail restoration module, and the seventh background modeling and detail restoration module are upsampling mechanisms, the first background modeling and detail restoration module with the same scales, the seventh background modeling and detail restoration module are fused, the second background modeling and detail restoration module with the same scales, the sixth background modeling and detail restoration module are fused, and the fifth background modeling and detail restoration module are fused.

Inputting a low-illumination image I to a position coding submodule in a first background modeling and detail restoring module, and lifting the dimension of the low-illumination image I from H multiplied by W multiplied by 3 to H multiplied by W multiplied by C through 1 multiplied by 1 convolution to obtain a feature map F _in Then, a 3 x 3 depth separable convolution is performed to perform position coding to output a characteristic diagram F _emb The specific formula is as follows:

F _in =Conv1(I)

F _emb =DWConv3(F _in )

where Conv1 is a 1×1 convolution and DWConv3 is a 3×3 depth separable convolution.

The self-attention sub-module based on self-attention consists of self-attention and inputs a feature diagram F _emb Output of characteristic diagram F _attention The specific formula is as follows:

Q,K,V=Chunk(F _emb )

Attention(Q,K,V)=V·Softmax(K·Q/α)

F _attention =Attention(Q,K,V)

wherein Attention is a self-Attention operation; chunk is an average division operation by channel dimension, softmax is a Softmax function; q, K, V is obtained by input coding, Q is information to be queried, K is a vector to be queried, and V is a value obtained by query; alpha is the sampling factor.

The convolution sub-module recovers feature map F using a 3×3 convolution _emb Is then passed through GELU activation function to output feature map F _detail The specific formula is as follows:

F _detail =GELU(Conv3(F _emb ))

where Conv3 is a 3×3 convolution and GELU is a GELU activation function.

The feedforward neural network sub-module will be based on the self-attention sub-module's output feature map F _attention And convolution submodule outputs a characteristic diagram F _detail Adding, normalizing by using layers, lifting the dimension of the feature map by using 1×1 convolution, respectively entering three branches, wherein the first branch is a 3×3 convolution extraction feature, the second branch is a 3×3 convolution extraction feature, then passing through a GELU activation function, the third branch is a 3×3 convolution extraction feature, finally carrying out element multiplication on the three branches, and recovering to the original dimension by 1×1 convolution, thereby obtaining a feature map Z output by a feedforward neural network submodule, and the specific formula is as follows:

is an element multiplication;

is a 1 x 1 convolution of the first branch,/->

3 x 3 convolutions of the first branch; />

1X 1 convolution of the second branch, < >>

A 3 x 3 convolution of the second branch; />

The third branch's 1 x 1 convolution, < >>

The 3 x 3 convolution of the third branch.

The first background modeling and detail restoring module outputs a feature map Z finally, and after downsampling, upsampling and channel fusion operations of other background modeling and detail restoring modules, the fourth background modeling and detail restoring module, the fifth background modeling and detail restoring module, the sixth background modeling and detail restoring module and the seventh background modeling and detail restoring module output the feature map to the global feature fusion enhancement module respectively

、/>

、/>

、/>

First, feature maps of different layers are obtained

(i=1, 2,3, 4) by upsampling the uniform resolution, then by 1×1 convolving the uniform channel dimension, by 3×3 convolving the position coding, then by self-adapting the enhanced global feature, and finally by 1×1 convolving the output feature map F _out The specific formula is as follows:

where PS is an upsampling operation.

In summary, the image enhancement network model is a 4-layer encoder-decoder structure, a low-illumination image I is input, a feature map Z is obtained through a background modeling and detail recovery module, and then a feature map F is obtained through a global feature fusion enhancement module _out Then the final enhanced image I is obtained by adding the low-illumination image I which is input originally after the convolution of 3 multiplied by 3 _out The specific formula is as follows:

I _out =Conv3(F _out )+I

target Loss function Loss for designing image enhancement network model _target ：

Wherein n is the number of feature graphs input into the global feature fusion enhancement module, n=4, i is the i Zhang Tezheng th graph input into the global feature fusion enhancement module; GT represents a normal illumination image corresponding to the low illumination image I.

Further, to improve the overall Loss function of the image enhancement network model, a regression Loss function and a recovery Loss function are added, the regression Loss function Loss _rec The method comprises the following steps:

The recovery Loss function Loss _res The method comprises the following steps:

The total Loss function Loss is:

for regression Loss function Loss _rec Weight of->

To recover Loss function Loss _res Is a weight of (2).

And step 3, training the image enhancement network model by using a training data set until the error converges to a preset value, and obtaining a trained image enhancement network model.

Inputting the training data set obtained in the step 1 into the image enhancement network model obtained in the step 2, obtaining an enhanced image Iout, and calculating loss by using a target loss function L1. And calculating the gradient of the parameters in the image enhancement network model by using a back propagation method according to the loss, and updating the parameters by using an Adam optimization method. And repeatedly executing the training process by taking the batch as a unit until the target loss function value of the image enhancement network model converges to a preset value, and storing parameters to obtain the trained image enhancement network model.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The low-illumination image enhancement method based on background modeling and detail enhancement is characterized by comprising the following steps of: the method comprises the following steps:

，i=1,2,3,4；

Inputting feature graphs into global feature fusion enhancement module

Obtaining a characteristic diagram F _out ；

Map F of the characteristics _out Added with the low-illumination image I to obtain an enhanced image I _out ；

Each background modeling and detail recovery module has the same structure and comprises a position coding sub-module, a convolution sub-module, a self-attention sub-module based on self-attention and a feedforward neural network sub-module; the output end of the position coding sub-module is respectively connected with the convolution sub-module and the self-attention sub-module based on self-attention, and the output end of the convolution sub-module and the output end of the self-attention sub-module based on self-attention are respectively connected with the feedforward neural network sub-module;

inputting a low-illumination image I into a position coding submodule, and increasing the dimension of the low-illumination image I from H multiplied by W multiplied by 3 to H multiplied by W multiplied by C through 1 multiplied by 1 convolution to obtain a characteristic diagram F _in H, W, C the height, width and dimension of the low-illumination image are respectively represented, and then a 3×3 depth separable convolution is performed to perform position coding to output a feature map F _emb ：

F _in =Conv1(I)

F _emb =DWConv3(F _in )

Q,K,V=Chunk(F _emb )

Attention(Q,K,V)=V·Softmax(K·Q/α)

F _attention =Attention(Q,K,V)

F _detail =GELU(Conv3(F _emb ))

Wherein Conv3 is a 3×3 convolution and GELU is a GELU activation function;

is an element multiplication; />

Is a 1 x 1 convolution of the first branch,/->

3 x 3 convolutions of the first branch; />

A 1 x 1 convolution of the second branch,

a 3 x 3 convolution of the second branch; />

The third branch's 1 x 1 convolution, < >>

3 x 3 convolution of the third branch;

、/>

、/>

、/>

First of all, the feature maps of the different layers are +.>

Through up-sampling uniform resolution, i=1, 2,3,4, then through 1×1 convolution uniform channel dimension, through 3×3 convolution to perform position coding, then through self-adaptive enhancement global feature of self-adaptation, finally through 1×1 convolution to output feature map F _out ：

Wherein W is _3×3 Representing a 3 x 3 convolution, W _1×1 Representing a 1 x 1 convolution, PS is an upsampling operation;

2. The background modeling and detail enhancement based low-light image enhancement method according to claim 1, wherein: the step 1 specifically comprises the following steps:

3. The background modeling and detail enhancement based low-light image enhancement method according to claim 1, wherein: in the step 2, the step of constraining the image enhancement network model by using the target loss function includes:

image enhancement networkTarget Loss function Loss of model _target The method comprises the following steps: