CN116091357A - Low-light image enhancement method for fusion of depth convolution attention and multi-scale features - Google Patents
Low-light image enhancement method for fusion of depth convolution attention and multi-scale features Download PDFInfo
- Publication number
- CN116091357A CN116091357A CN202310139997.8A CN202310139997A CN116091357A CN 116091357 A CN116091357 A CN 116091357A CN 202310139997 A CN202310139997 A CN 202310139997A CN 116091357 A CN116091357 A CN 116091357A
- Authority
- CN
- China
- Prior art keywords
- image
- low
- module
- light image
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000004927 fusion Effects 0.000 title claims abstract description 17
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000005286 illumination Methods 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 10
- 230000002708 enhancing effect Effects 0.000 claims description 4
- 238000007634 remodeling Methods 0.000 claims description 4
- 238000013461 design Methods 0.000 abstract description 9
- 238000012545 processing Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 abstract description 5
- 238000002474 experimental method Methods 0.000 abstract description 5
- 230000000007 visual effect Effects 0.000 abstract description 3
- 238000010606 normalization Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 23
- 230000001965 increasing effect Effects 0.000 description 9
- 238000012360 testing method Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 239000003550 marker Substances 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000008439 repair process Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 241001270131 Agaricus moelleri Species 0.000 description 1
- 235000008733 Citrus aurantifolia Nutrition 0.000 description 1
- 235000011941 Tilia x europaea Nutrition 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 239000004571 lime Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- G06T5/94—Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30168—Image quality inspection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
The invention relates to the technical field of image processing, in particular to a low-light image enhancement method for depth convolution attention and multi-scale feature fusion. The invention designs a new low-light attention module and a multi-scale characteristic compensation module, wherein LLAB consists of a low-light multi-head self-attention module, a double-branch balancing module and two normalization layers, the low-light multi-head self-attention module is designed to extract semantic information of different channels, and the characteristic weights among different channels are balanced by calculating attention force diagram among the channels, so that the visibility and contrast of an image are improved; the dual-branch balancing module further improves the contrast of the image; the multi-scale characteristic compensation module is used for compensating the loss of detail information of the image in a low-light attention module and a downsampling stage and fusing deep space information of images with different scales. Finally, through experiments, the method provided by the invention can obtain images with good visual effects.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a low-light image enhancement method for depth convolution attention and multi-scale feature fusion.
Background
High quality images can improve the accuracy of the model to some advanced computer vision tasks (e.g., object detection, image classification, semantic segmentation, etc.). Many different attempts have been made by a large number of researchers in order to obtain images of as high quality as possible. For example, the aperture is increased and the exposure time is prolonged when an image is taken, however, this requires not only that the photographer have specialized photographing techniques, but also that it is difficult to completely avoid the presence of noise in the taken image.
Based on the above, some researchers have attempted to solve the problems with low-light images using algorithms. However, these methods still have some problems. For example, a gradation conversion method: the method is a method for directly converting the pixel value of the image into other numerical values through a mathematical function, and the method changes the pixel value and the range of the image through mapping, so that the aim of enhancing the image is fulfilled. The method is fast and simple to implement. However, this method does not take into account the overall distribution of image pixel values, so that the enhancement effect of this method is limited and it is poor in some low-light images. The histogram equalization method comprises the following steps: the HE method uses cumulative distribution function to adjust pixel values of a low-light image, thereby outputting gray scales. The method can be used for processing details in the image and can improve the definition of the image. With development, a large number of existing algorithms are fused with the HE method, so that the algorithm based on the HE method can better improve the contrast and detail of the image. However, such algorithms can lead to reduced image fidelity, generate significant amounts of noise, and cause image distortion. Method based on Retinex theory: based on this theory, the low-light image enhancement method decomposes a low-light image into a reflection component and an illuminance component, and often assumes the illuminance component as a result of low-light enhancement. The method can not only improve the contrast and brightness of the image, but also has obvious advantages in the aspect of image detail processing. The Retinex-based model often ignores noise and therefore the enhanced image may be much noisy. And enhanced image artifacts and color distortions.
In recent years, with the development of computer technology, deep learning has achieved remarkable effects in the fields of image denoising, image detection, image super resolution, image defogging (image defogging), and the like. Meanwhile, in the field of low-light image enhancement, scientific researchers also propose a large number of deep learning algorithms. The method is mainly used for utilizing an attention mechanism and a multi-stage method, and the image enhanced by part of the network model has good visual effect. But more algorithms suffer from deficiencies. Wherein: (1) The pooling operation is performed before the attention mechanism is used, so that the characteristics of the image are reduced, and defects exist in detail recovery of the image; (2) Although the multi-stage method can extract the deeper features of the image, the multi-stage structure also causes the loss of detail information of the image, and the restoration of the image is inhibited. And these methods can cause problems such as image artifacts, halation, and color deviation.
For this reason, we propose a low-light image enhancement method of the present invention that combines deep convolution attention with multi-scale features.
Disclosure of Invention
The invention aims to provide a low-light image enhancement method for fusing deep convolution attention and multi-scale features, which is used for solving the problems of noise, halation, distortion and the like existing in the existing low-light image enhancement method.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention provides a low-light image enhancement method for fusion of deep convolution attention and multi-scale features, which is characterized by comprising the following steps of: the method comprises the following steps:
constructing a paired data set, wherein the data set comprises low-illumination images and normal-illumination images, and each low-illumination image corresponds to the normal-illumination image of the same scene;
taking a normal illumination image as a reference image, and extracting shallow layer characteristics of a low illumination image through an Embedding module;
inputting the shallow features into a DAMFFN encoder, the DAMFFN encoder comprising an LLAB module, a downsampling module, and an MSFCB module; the LLAB module comprises an LL-MSAB module and a DBWEB module;
extracting semantic information of the low-light image through an LL-MSAB module, and balancing feature weights among different channels;
the contrast of the low-light image is improved through the DBWEB module;
repairing the image quality and compensating the lost detail information through an MSFCB module;
in the DAMFFN decoder, reducing the difference in frequency domain space between the enhanced image and the reference image by a multi-scale frequency domain loss function;
and outputting the final enhanced low-light image.
Further, the emmbedding module uses a convolution module to increase the channel count of the low-light image from 3 to 48.
Further, the LL-MSAB module shown in FIG. 2 can extract semantic information of the low-light image and can balance feature weights among different channels so as to improve the visibility of the image. The method is characterized in that:
extracting semantic information and local features of different channels by using depth convolution;
directly assigning the extracted feature matrix to Q, K and V, and obtaining a channel attention map by a channel attention mechanism through extracting global features, namely obtaining the channel attention map by extracting Q and K;
finally, multiplying V by channel attention attempts to obtain the output result of the LL-MSAB module.
The detailed processing procedure of the LL-MSAB module shown in FIG. 2 includes:
(1)X input is input into a 3X3 depth convolution layer to implicitly model the local relation between pixels in a channel, X is obtained through the processing of the 3X3 depth convolution layer, and the X is directly assigned to the X Q 、X K And X V . Then, X is Q 、X K And X V Reshaped to Q, K and V, respectively. So in practice Q, K and V are three identical feature matrices. The expression is as follows:
X Q =X input W Q ,X K =X input W K ,X V =X input W V
W Q =W K =W V ,X Q =X K =X V =X
Q=Reshape(X Q ),K=Reshape(X K ),V=Reshape(X V )
Q=K=V
X input representing the input feature matrix, W, of the LL-MSAB module Q 、W K And W is V A weight matrix representing a 3x3 depth convolution layer, reshape (·) represents a remodeling operation.
(2) Due to the limitation of computing power, the invention provides Q (Q= [ Q) 1 ,...,Q N ]),K(K=[K 1 ,...,K N ]) And V (v= [ V) 1 ,...,V N ]) Is divided into N heads j Each head j Is dim h =c/N (C represents the total number of channels of the feature matrix). LL-MSAB needs to calculate each head j Is to try attention to the channel of (a) j . The expression is as follows:
head j =V j ·Atten j
(3) Splice N heads j And are connected into a linear projection. Remolding the obtained feature matrix, and finally inputting the remolded result into a 1X1 convolution layer to obtain an output result X of the LL-MSAB output 。
Conv1 (·) represents a 1x1 convolutional layer, concat (·) represents a splicing operation, W is a learnable parameter, and Reshape (·) represents a reshaping operation.
Further, the enhancing the contrast of the low-light image through the DBWEB module includes:
duplicating the input image into two parts, and respectively passing through a 1x1 convolution layer and a 3x3 depth convolution layer;
performing point multiplication on the result of one branch processed by the Sigmoid activation function and the result of the other branch processed by the Sigmoid activation function;
and (3) the dot multiplication result passes through a 1x1 convolution layer to obtain the output of the DBWEB, wherein the output is expressed as follows:
wherein phi represents the Sigmoid activation function,representing a 3x3 depth convolution layer,/a>Indicating 1x1 depth convolution layer +.indicates multiplication by element, concat (& gt) indicates splicing, and w d Representing a 1x1 depth convolution layer.
Further, the MSFCB module includes three MSC modules, and the input feature matrices of the three MSC modules are all outputs of the first LLAB module of the DAMFFN encoder; the output feature matrixes of the 3 MSC modules are different in size, and the output feature matrixes of the MSC modules are respectively added with the downsampling results corresponding to the DAMFFN encoder to finish the fusion of the features with different scales.
Further, the multi-scale frequency domain loss function (Mult-MSE) is expressed as follows:
where nums=4 represents 4 outputs of the decoder, H represents the high of the feature matrix, W represents the wide of the feature matrix,and->Representing pixel values at corresponding coordinates for the enhanced image and the reference image, respectively, Σ represents the summation symbol.
The method for calculating the multi-scale frequency domain loss function (Mult-SFD) is as follows:
wherein x is i Representing the total number of pixels, f (·) represents the fast fourier transform. I.I 1 Represents the L1 norm and Σ represents the summation symbol.
Finally, the total Loss function Loss of the present invention is shown below:
Loss=Loss Mult-MSE +λLoss Mult-SFD
where λ represents the weight super parameter of Mult-SFD (λ=0.1).
The invention has at least the following beneficial effects:
(1) The invention designs a DAMFFN, which is a deep convolution attention multi-scale feature fusion network for enhancing low-light images. The network is used to perform multi-scale local-global representation learning on low-light images without decomposing the images into local windows, thereby utilizing remote image context information.
(2) The invention designs a low-light multi-head self-attention module (LL-MSAB) which can aggregate local and global features and balance feature weights among different channels.
(3) The invention refers to a double-branch equalizing module (DBWEB), which suppresses the characteristic of less information quantity by increasing gradients among different pixels and promotes the forward transmission of the characteristic of rich information quantity.
(4) A multi-scale feature compensation Module (MSFCB) module is designed. And by fusing the deep space information of the images with different scales, the loss of detail information of the images is reduced.
(5) The present invention refers to a multi-scale frequency domain loss function (MSFD) to reduce the difference in frequency domain space between the enhanced low-light image and the reference image.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is an overall network framework of the present invention;
FIG. 2 is a block diagram of a low-light power module according to the present invention;
FIG. 3 is a block diagram of a dual-branch equalization module employed in the present invention;
FIG. 4 is a block diagram of a multi-scale feature compensation module of the present invention;
FIG. 5 is a block diagram of a multi-scale loss function structure of the present invention;
FIG. 6 is a graph comparing images of the present invention with other algorithms after recovery.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention enhances the low-light image through the DAMFN model shown in fig. 1, and because the low-light image restoration task is more complex than other image restoration tasks, the invention designs that different modules restore different aspects of the low-light image.
Firstly, the LL-MSAB module carries out overall restoration enhancement on a low-light image by balancing weights among different channels and matching with a loss function; secondly, because the repair of local features is weaker in the whole repair process and the repair of LL-MSAB to image contrast is insufficient, the invention designs the DBWEB module to assign different weights to each pixel so as to repair the local area. DEWEB also uses a Sigmoid activation function to improve the contrast of the image. Finally, as the low-light image restoration task is a pixel-level restoration task and the detail information of the image is largely lost along with repeated LLAB and downsampling, in order to improve the restoration effect of the final enhanced image, the MSFCB module is designed to compensate the loss of the detail of the low-light image and fuse the image characteristics of different scales. In summary, the invention designs a low-light image enhancement method for fusion of deep convolution attention and multi-scale features.
The whole flow of the invention is as follows:
1. a dataset of low/normal illumination image pairs is constructed. The dataset should consist of low-light images and normal-light images, wherein each low-light image I low With a normal illumination image I under the same scene ref Correspondingly, the dataset is then divided into a training set and a test set.
2. And inputting a low-light image in the training set, and taking a corresponding normal-light image as a reference image.
3. Shallow features of the input image are extracted using an Embedding module (a 3x3 convolutional layer). The expression is as follows:
F shalow =f Conv (I low )
4. the extracted shallow features are input into the encoder of the DAMFFN, which is mainly composed of the LLAB module and the downsampling operation as shown in fig. 2. In order to extract semantic information of different channels of an image and balance feature weights among different channels, the invention designs a low-light multi-head self-attention module (LL-MSAB). Firstly, the module extracts semantic information and local characteristics of different channels by using depth convolution, so that the local contrast of an image is increased, and the texture of the image is clearer; secondly, a channel attention mechanism in the LL-MSAB module obtains a channel attention map through extracting global features and calculating; finally, channel attention is used for balancing the characteristic weights among different channels of the image matrix, so that the visibility of the image is improved.
LL-MSAB module as shown in FIG. 2, X input ∈R H×W×C As an input of LL-MSAB, it is input into a 3X3 depth convolution layer to extract image semantic information and implicitly model local relations between pixels inside the channel, and finally output X epsilon R H×W×C The method comprises the steps of carrying out a first treatment on the surface of the Then X ε R H×W×C 3 copies were duplicated and Reshape was Q (query) ε R, respectively HW×C 、K(key)∈R HW×C And V (value) ε R HW×C The above procedure is defined as:
X Q =X input W Q ,X K =X input W K ,X V =X input W V
W Q =W K =W V ,X Q =X K =X V =X
Q=Reshape(X Q ),K=Reshape(X K ),V=Reshape(X V )
Q=K=V
X input representing the input feature matrix, W, of the LL-MSAB module Q 、W K And W is V A weight matrix representing a 3x3 depth convolution layer, reshape (·) represents a remodeling operation. The whole pixel value of the low-light image is low, the difference value between local pixels is small, the weight has great influence on the pixel value, and the local pixelsSmall fluctuations in weight can cause blurring or excessive smoothing of the image, so in the LL-MSAB module, W Q 、W K And W is V The actual data are exactly the same, so that X is also caused Q 、X K And X V The three are identical, and Q, K and V are identical.
(2) Due to the limitation of computing power, the invention provides Q (Q= [ Q) 1 ,...,Q N ]),K(K=[K 1 ,...,K N ]) And V (v= [ V) 1 ,...,V N ]) Is divided into N heads j Each head j Is dim h =c/N (C represents the total number of channels of the feature matrix). LL-MSAB needs to calculate each head j Is to try attention to the channel of (a) j . The expression is as follows:
head j =V j ·Atten j
(3) Splice N heads j And are connected into a linear projection. Remolding the obtained feature matrix, and finally inputting the remolded result into a 1X1 convolution layer to obtain an output result X of the LL-MSAB output 。
Conv1 (·) represents a 1x1 convolutional layer, concat (·) represents a splicing operation, W is a learnable parameter, and Reshape (·) represents a reshaping operation.
5. After analyzing the low-light image data, the present invention has found that: (1) The pixel value of the low-light image is low, and after normalization treatment, the value is generally distributed near 0; (2) Local pixels of low-light imagesThe pixel difference is smaller, and the contrast of the image is lower. The characteristics of the normal illumination image and the low-light image data are just opposite, the pixel value of the normal illumination image is large in whole, the pixel difference between local pixels is large, and the contrast of the image is high. So in order to improve the contrast of low-light images, the present invention refers to a dual branch equalization module (DBWEB), as shown in fig. 3. First, an input image is inputDuplicate into two copies, pass 1x1 convolution layer and 3x3 depth convolution layer separately; secondly, carrying out dot multiplication on the result of the DBWEB left branch processed by the Sigmoid activation function and the result of the DBWEB right branch processed; finally, the dot multiplication result is passed through a 1x1 convolution layer to obtain the output +.>The slope near 0 in the Sigmoid activation function is larger, the slope is almost kept at the same level, the pixel value of the whole image can be increased approximately in the same proportion, and the whole brightness of the image can be improved; and the larger slope is beneficial to improving the contrast ratio between local pixels of the low-light image. The process of DBWEB is defined as:
wherein phi represents the Sigmoid activation function,representing a 3x3 depth convolution layer,/a>Indicating 1x1 depth convolution layer +.indicates multiplication by element, concat (& gt) indicates splicing, and w d Representing a 1x1 depth convolution layer.
6. After analyzing the structure of the network model and the data of the low-light image, it was found that: in the encoder stage, after the low-light image passes through the LLAB module and downsampling, detailed information of the low-light image is greatly lost, and the encoder structure of the model is unfavorable for the fusion of multi-scale information. Thus, the invention designs a multi-scale feature compensation Module (MSFCB) to ensure the quality of the final restored image, and the structure is shown in figure 4. The MSFCB is made up of 3 MSC modules, three of which have their input feature matrices all the output of the first LLAB module of the DAMFFN encoder. The output feature matrixes of the 3 MSC modules are different in size, and the output feature matrixes of the MSC modules are added with the downsampling results corresponding to the DAMFFN encoder respectively to finish the fusion of the features with different scales. The expression is as follows:
7. the network model designed by the invention belongs to a multi-scale low-light image enhancement network, so the invention designs a multi-scale Loss function (Mult-Loss). Mult-Loss consists of a multi-scale MSE Loss function and a multi-scale frequency domain Loss function. Wherein the multi-scale MSE loss function (Mult-MSE) is to calculate the mean square Error (meanssquare Error) between the output image of each decoding layer and the corresponding reference image. The multi-scale frequency domain loss function (MSFD) is to reduce the difference in frequency domain space between the output image of each decoding layer and the corresponding reference image.
The method of calculating the multiscale MSE loss function (Mult-MSE) is as follows:
where nums=4 represents 4 outputs of the decoder, H represents the high of the feature matrix, W represents the wide of the feature matrix,and->Representing pixel values at corresponding coordinates for the enhanced image and the reference image, respectively, Σ represents the summation symbol.
The method for calculating the multi-scale frequency domain loss function (Mult-SFD) is as follows:
wherein x is i Representing the total number of pixels, f (·) represents the fast fourier transform. I.I 1 Represents the L1 norm and Σ represents the summation symbol.
Finally, the total Loss function Loss of the present invention is shown below:
Loss=Loss Mult-MSE +λLoss Mult-SFD
where λ represents the weight super parameter of Mult-SFD (λ=0.1).
In one embodiment:
the DAMFFN is implemented based on the Python 3.9 and Pytorch 1.21.1 environments. The invention performs data enhancement operations such as turning over and rotating (rotating an image by 90 degrees, 180 degrees or 270 degrees) on training data. During training, the batch size is 1, and the selected 1-image is randomly cut into a slice size of 256×256×3. Gradient optimization was performed using an AdamW optimizer with a motion term β1=0.9 and β2=0.999. The initial learning rate is set to 3e-4, the weight decay is set to 0.5e-4, and the learning rate is stepped down to 1e-6 using a cosine decay strategy. The invention uses peak signal to noise ratio (PSNR), structural Similarity (SSIM) and Natural Image Quality Estimator (NIQE) to estimate the performance of the model, and performs qualitative and quantitative experimental comparison on LOL, MIT-Adobe FiveK and LIME, MEF, DICM low-light image data sets with a plurality of existing low-light image enhancement algorithms. The PSNR evaluates the quality of an image by calculating a mean square error between a low-light image enhanced by a network model and a reference image (group-trunk image). The higher the PSNR value, the better the quality of the enhanced image. SSIM evaluates the similarity between the enhanced image and a reference image (group-trunk image) from three aspects of brightness, contrast, and structure. The larger the SSI M value, the more similar the enhanced image is to the reference image (group-trunk image), and the better the network model effect. The lower the NIQE value, the closer the image is to the natural image, and the higher the image quality. In the implementation of the DAMFFN, the experimental device of the present invention is configured as 16GB NVIDIA Quadro RTX5000 GPU,C =48, n1=2, n2=n3=4, n4=8.
The LOL dataset is a common real dataset in the field of low-light image enhancement, which contains a large number of indoor and outdoor low-light scene pictures, totaling 500 pairs of low-light/normal-light images. Our approach achieved best performance in both PSNR and SSIM as shown in table 1 below. The result of the present invention is psnr=24.87 db, ssim=0.856, PSNR values exceeding MIRNetv20.13dB, SSIM values exceeding 0.005.
Table 1 quantitative comparison of LOL test set on PSNR and SSIM indicators (optimal outcome marker, suboptimal outcome marker #)
The MIT-Adobe FiveK dataset contained 5000 captured images, and 5 expert-finished images were used as label images. As with other low-light image enhancement algorithms trained by the data set, the invention adopts the color rendering result of expert C as the group trunk, and adopts the front 4500 pairs as the training set and the remaining 500 pairs as the test set. As shown in table 2, the present invention achieves optimal results at psnr=25.78 dB (beyond mirretv 2.73 dB) and better results at ssim=0.912 by comparison with existing advanced algorithms.
TABLE 2 quantitative comparison of MIT-Adobe FiveK test set on PSNR and SSIM indicators (best results marker, suboptimal results marker #)
Referring to fig. 6, it can be seen that: through a plurality of qualitative comparisons, the algorithm provided by the invention can obtain a better visual effect. Most algorithmically enhanced images also present significant amounts of noise, such as KinD and MIRNetv2. Some algorithms have enhanced results with insufficient overall brightness, such as Zero-DCE and EnlightenGAN. The Retinex-Net method increases the brightness of the whole image, but adds much noise to the image at the same time, so that the image is blurred. In contrast, the algorithm of the invention can improve the overall brightness of the image and remove noise in the enhanced image on the premise of ensuring the image quality. The repaired image has better visibility, contrast and clear texture.
From a comparison of experiment table 3, the present invention performed 6 ablation experiments to determine the effectiveness and importance of each module. Finally, experiments prove that when all modules are added to the DAMFFN, the performance of the DAMFFN can be optimized. As shown in Table 3, on the LOL dataset, when only LL-MSAB or DBWEB is used, the performance of the network model is not optimal and the difference from the optimal performance is large. The performance of the network model is worst in the absence of the LL-MSAB module, which is sufficient to demonstrate that the LL-MSAB module can improve the performance of the overall network model. When there is only an LL-MSAB module, the psnr=23.25 db and ssim=0.812 of the network model. When the DBWEB module is added, the PSNR value of the network model is hardly changed, but the SSIM value is increased by 0.003, because the sigmoid function in the DBWEB can increase the difference between pixels and the contrast of local areas, so that the DBWEB can increase the structural similarity of the whole image. When only the MSFCB module and the LL-MSAB module are added, the performance of the DAMFFN is greatly reduced, which further illustrates the effectiveness of the DBWEB module. After adding the MSFCB module, the PSNR value of the network model is increased from 23.22dB to 24.87dB, and the SSIM is increased from 0.842 to 0.856, wherein the PSNR is increased by 1.65dB, and the SSIM is increased by 0.014, because the MSFCB module can fully compensate the loss of the low-light image detail information in LLAB and downsampling stages and can fuse the deep space information of images with different scales. Meanwhile, the MSFCB module is proved to be capable of greatly improving the performance of the model.
Table 3 quantitative results of ablation experiments on PSNR and SSIM metrics for components of the DAMFFN network model on the LOL test set
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (6)
1. A low-light image enhancement method for fusion of deep convolution attention and multi-scale features, which is characterized by comprising the following steps of: the method comprises the following steps:
constructing a paired data set, wherein the data set comprises low-illumination images and normal-illumination images, and each low-illumination image corresponds to the normal-illumination image of the same scene;
taking a normal illumination image as a reference image, and extracting shallow layer characteristics of a low illumination image through an Embedding module;
inputting the shallow features into a DAMFFN encoder, the DAMFFN encoder comprising an LLAB module, a downsampling module, and an MSFCB module; the LLAB module comprises an LL-MSAB module and a DBWEB module;
extracting semantic information of the low-light image through the LL-MSAB module, and balancing feature weights among different channels so as to improve the visibility of the image;
the contrast of the low-light image is improved through the DBWEB module;
repairing the image quality and compensating the lost detail information through an MSFCB module;
in the DAMFFN decoder, reducing the difference in frequency domain space between the enhanced image and the reference image by a multi-scale frequency domain loss function;
and outputting the final enhanced low-light image.
2. A method of low-light image enhancement for deep convolution attention and multi-scale feature fusion according to claim 1, wherein: the emmbedding module uses a convolution module to increase the channel count of the low-light image from 3 to 48.
3. The method for enhancing a low-light image by fusion of deep convolution attention and multi-scale features according to claim 1, wherein the extracting semantic information of a low-light image and equalizing feature weights among different channels by using an LL-MSAB module to improve the visibility of the image comprises:
shallow layer feature X input Is input to a 3X3 depth convolution layer to extract image semantic information to implicitly model local relationships between pixels within a channel to obtain X, which is directly assigned to X Q 、X K And X V ;
X is to be Q 、X K And X V The remodeling into three identical feature matrices Q, K and V, respectively, is shown as follows:
X Q =X input W Q ,X K =X input W K ,X V =X input W V
W Q =W K =W V ,X Q =XK=XV=X
Q=Reshape(X Q ),K=Reshape(X K ),V=Reshape(X V )
Q=K=V
X input representing the input feature matrix, W, of the LL-MSAB module Q 、W K And W is V A weight matrix representing a 3x3 depth convolution layer, reshape (·) representing a remodeling operation;
will Q (q= [ Q) 1 ,...,Q N ]),K(K=[K 1 ,...,K N ]) And V (v= [ V) 1 ,...,V N ]) Divided into N heads j Each head j Channel dimension of (2)C represents the total number of channels of the feature matrix; LL-MSAB calculates channel attention attempt Attens for each header j The expression is as follows:
head j =V j ·Atten j
Splice N heads j And connecting the two components into linear projection, remodelling the obtained feature matrix, and finally inputting the remodelled result into a 1X1 convolution layer to obtain an output result X of the LL-MSAB output The expression is as follows:
conv1 (·) represents a 1x1 convolutional layer, concat (·) represents a splicing operation, W is a learnable parameter, and Reshape (·) represents a reshaping operation.
4. A method of low-light image enhancement for deep convolution attention and multi-scale feature fusion according to claim 1, wherein: the method for improving the contrast of the low-light image through the DBWEB module comprises the following steps:
duplicating the input image into two parts, and respectively passing through a 1x1 convolution layer and a 3x3 depth convolution layer;
performing point multiplication on the result of one branch processed by the Sigmoid activation function and the result of the other branch processed by the Sigmoid activation function;
and (3) the dot multiplication result passes through a 1x1 convolution layer to obtain the output of the DBWEB, wherein the output is expressed as follows:
5. A method of low-light image enhancement for deep convolution attention and multi-scale feature fusion according to claim 1, wherein: the MSFCB module comprises three MSC modules, and the input feature matrixes of the three MSC modules are the output of the first LLAB module of the DAMFFN encoder; the output feature matrixes of the 3 MSC modules are different in size, and the output feature matrixes of the MSC modules are respectively added with the downsampling results corresponding to the DAMFFN encoder to finish the fusion of the features with different scales.
6. A method of low-light image enhancement for deep convolution attention and multi-scale feature fusion according to claim 1, wherein: the multi-scale frequency domain loss function is expressed as follows:
where nums=4 represents 4 outputs of the decoder, H represents the high of the feature matrix, W represents the wide of the feature matrix,and->Representing pixel values at corresponding coordinates for the enhanced image and the reference image, respectively, Σ represents the summation symbol,
the method for calculating the multi-scale frequency domain loss function is as follows:
wherein x is i Representing the total number of pixels, f (·) represents the fast fourier transform, I.I 1 Represents the L1 norm, Σ represents the summation symbol,
the total Loss function Loss is as follows:
Loss=Loss Mult-MSE +λLoss Mult-SFD
where λ represents the weight super parameter of Mult-SFD (λ=0.1).
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310139997.8A CN116091357A (en) | 2023-02-20 | 2023-02-20 | Low-light image enhancement method for fusion of depth convolution attention and multi-scale features |
NL2034901A NL2034901A (en) | 2023-02-20 | 2023-05-23 | Depth-wise convolution attention and multi-scale feature fusion network for low-light image enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310139997.8A CN116091357A (en) | 2023-02-20 | 2023-02-20 | Low-light image enhancement method for fusion of depth convolution attention and multi-scale features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116091357A true CN116091357A (en) | 2023-05-09 |
Family
ID=86204433
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310139997.8A Pending CN116091357A (en) | 2023-02-20 | 2023-02-20 | Low-light image enhancement method for fusion of depth convolution attention and multi-scale features |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116091357A (en) |
NL (1) | NL2034901A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117011194A (en) * | 2023-10-07 | 2023-11-07 | 暨南大学 | Low-light image enhancement method based on multi-scale dual-channel attention network |
CN118038025A (en) * | 2024-03-22 | 2024-05-14 | 重庆大学 | Foggy weather target detection method, device and equipment based on frequency domain and space domain |
-
2023
- 2023-02-20 CN CN202310139997.8A patent/CN116091357A/en active Pending
- 2023-05-23 NL NL2034901A patent/NL2034901A/en unknown
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117011194A (en) * | 2023-10-07 | 2023-11-07 | 暨南大学 | Low-light image enhancement method based on multi-scale dual-channel attention network |
CN117011194B (en) * | 2023-10-07 | 2024-01-30 | 暨南大学 | Low-light image enhancement method based on multi-scale dual-channel attention network |
CN118038025A (en) * | 2024-03-22 | 2024-05-14 | 重庆大学 | Foggy weather target detection method, device and equipment based on frequency domain and space domain |
Also Published As
Publication number | Publication date |
---|---|
NL2034901A (en) | 2024-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | An experimental-based review of image enhancement and image restoration methods for underwater imaging | |
Zhang et al. | Underwater image enhancement via weighted wavelet visual perception fusion | |
He et al. | Guided image filtering | |
CN116091357A (en) | Low-light image enhancement method for fusion of depth convolution attention and multi-scale features | |
CN111161360B (en) | Image defogging method of end-to-end network based on Retinex theory | |
CN114066747B (en) | Low-illumination image enhancement method based on illumination and reflection complementarity | |
Shen et al. | Convolutional neural pyramid for image processing | |
CN111861896A (en) | UUV-oriented underwater image color compensation and recovery method | |
JP7493867B1 (en) | Low-light image enhancement method based on deep Retinex | |
CN111210395A (en) | Retinex underwater image enhancement method based on gray value mapping | |
Yang et al. | Low-light image enhancement based on Retinex theory and dual-tree complex wavelet transform | |
Rahman et al. | Diverse image enhancer for complex underexposed image | |
Ma et al. | Underwater image restoration through a combination of improved dark channel prior and gray world algorithms | |
Zhang et al. | Underwater image enhancement using improved generative adversarial network | |
Tao et al. | An effective and robust underwater image enhancement method based on color correction and artificial multi-exposure fusion | |
Zhou et al. | An improved algorithm using weighted guided coefficient and union self‐adaptive image enhancement for single image haze removal | |
Gao et al. | Image Dehazing Based on Multi-scale Retinex and Guided Filtering | |
CN113066023A (en) | SAR image speckle removing method based on self-calibration convolutional neural network | |
Yuan et al. | Defogging Technology Based on Dual‐Channel Sensor Information Fusion of Near‐Infrared and Visible Light | |
GUAN et al. | A dual-tree complex wavelet transform-based model for low-illumination image enhancement | |
CN116563133A (en) | Low-illumination color image enhancement method based on simulated exposure and multi-scale fusion | |
Xie et al. | DHD-Net: A novel deep-learning-based dehazing network | |
Chen et al. | HCSAM-Net: multistage network with a hybrid of convolution and self-attention mechanism for low-light image enhancement | |
Subramani et al. | Pixel intensity optimization and detail-preserving contextual contrast enhancement for underwater images | |
Kim | Edge-preserving and adaptive transmission estimation for effective single image haze removal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |