CN116091357A

CN116091357A - Low-light image enhancement method for fusion of depth convolution attention and multi-scale features

Info

Publication number: CN116091357A
Application number: CN202310139997.8A
Authority: CN
Inventors: 王勇; 袁鑫林; 李彦; 陈瑜; 李邑灵; 崔修涛
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2023-02-20
Filing date: 2023-02-20
Publication date: 2023-05-09
Also published as: NL2034901A

Abstract

The invention relates to the technical field of image processing, in particular to a low-light image enhancement method for depth convolution attention and multi-scale feature fusion. The invention designs a new low-light attention module and a multi-scale characteristic compensation module, wherein LLAB consists of a low-light multi-head self-attention module, a double-branch balancing module and two normalization layers, the low-light multi-head self-attention module is designed to extract semantic information of different channels, and the characteristic weights among different channels are balanced by calculating attention force diagram among the channels, so that the visibility and contrast of an image are improved; the dual-branch balancing module further improves the contrast of the image; the multi-scale characteristic compensation module is used for compensating the loss of detail information of the image in a low-light attention module and a downsampling stage and fusing deep space information of images with different scales. Finally, through experiments, the method provided by the invention can obtain images with good visual effects.

Description

Low-light image enhancement method for fusion of depth convolution attention and multi-scale features

Technical Field

The invention relates to the technical field of image processing, in particular to a low-light image enhancement method for depth convolution attention and multi-scale feature fusion.

Background

High quality images can improve the accuracy of the model to some advanced computer vision tasks (e.g., object detection, image classification, semantic segmentation, etc.). Many different attempts have been made by a large number of researchers in order to obtain images of as high quality as possible. For example, the aperture is increased and the exposure time is prolonged when an image is taken, however, this requires not only that the photographer have specialized photographing techniques, but also that it is difficult to completely avoid the presence of noise in the taken image.

Based on the above, some researchers have attempted to solve the problems with low-light images using algorithms. However, these methods still have some problems. For example, a gradation conversion method: the method is a method for directly converting the pixel value of the image into other numerical values through a mathematical function, and the method changes the pixel value and the range of the image through mapping, so that the aim of enhancing the image is fulfilled. The method is fast and simple to implement. However, this method does not take into account the overall distribution of image pixel values, so that the enhancement effect of this method is limited and it is poor in some low-light images. The histogram equalization method comprises the following steps: the HE method uses cumulative distribution function to adjust pixel values of a low-light image, thereby outputting gray scales. The method can be used for processing details in the image and can improve the definition of the image. With development, a large number of existing algorithms are fused with the HE method, so that the algorithm based on the HE method can better improve the contrast and detail of the image. However, such algorithms can lead to reduced image fidelity, generate significant amounts of noise, and cause image distortion. Method based on Retinex theory: based on this theory, the low-light image enhancement method decomposes a low-light image into a reflection component and an illuminance component, and often assumes the illuminance component as a result of low-light enhancement. The method can not only improve the contrast and brightness of the image, but also has obvious advantages in the aspect of image detail processing. The Retinex-based model often ignores noise and therefore the enhanced image may be much noisy. And enhanced image artifacts and color distortions.

In recent years, with the development of computer technology, deep learning has achieved remarkable effects in the fields of image denoising, image detection, image super resolution, image defogging (image defogging), and the like. Meanwhile, in the field of low-light image enhancement, scientific researchers also propose a large number of deep learning algorithms. The method is mainly used for utilizing an attention mechanism and a multi-stage method, and the image enhanced by part of the network model has good visual effect. But more algorithms suffer from deficiencies. Wherein: (1) The pooling operation is performed before the attention mechanism is used, so that the characteristics of the image are reduced, and defects exist in detail recovery of the image; (2) Although the multi-stage method can extract the deeper features of the image, the multi-stage structure also causes the loss of detail information of the image, and the restoration of the image is inhibited. And these methods can cause problems such as image artifacts, halation, and color deviation.

For this reason, we propose a low-light image enhancement method of the present invention that combines deep convolution attention with multi-scale features.

Disclosure of Invention

The invention aims to provide a low-light image enhancement method for fusing deep convolution attention and multi-scale features, which is used for solving the problems of noise, halation, distortion and the like existing in the existing low-light image enhancement method.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the invention provides a low-light image enhancement method for fusion of deep convolution attention and multi-scale features, which is characterized by comprising the following steps of: the method comprises the following steps:

constructing a paired data set, wherein the data set comprises low-illumination images and normal-illumination images, and each low-illumination image corresponds to the normal-illumination image of the same scene;

taking a normal illumination image as a reference image, and extracting shallow layer characteristics of a low illumination image through an Embedding module;

inputting the shallow features into a DAMFFN encoder, the DAMFFN encoder comprising an LLAB module, a downsampling module, and an MSFCB module; the LLAB module comprises an LL-MSAB module and a DBWEB module;

extracting semantic information of the low-light image through an LL-MSAB module, and balancing feature weights among different channels;

the contrast of the low-light image is improved through the DBWEB module;

repairing the image quality and compensating the lost detail information through an MSFCB module;

in the DAMFFN decoder, reducing the difference in frequency domain space between the enhanced image and the reference image by a multi-scale frequency domain loss function;

and outputting the final enhanced low-light image.

Further, the emmbedding module uses a convolution module to increase the channel count of the low-light image from 3 to 48.

Further, the LL-MSAB module shown in FIG. 2 can extract semantic information of the low-light image and can balance feature weights among different channels so as to improve the visibility of the image. The method is characterized in that:

extracting semantic information and local features of different channels by using depth convolution;

directly assigning the extracted feature matrix to Q, K and V, and obtaining a channel attention map by a channel attention mechanism through extracting global features, namely obtaining the channel attention map by extracting Q and K;

finally, multiplying V by channel attention attempts to obtain the output result of the LL-MSAB module.

The detailed processing procedure of the LL-MSAB module shown in FIG. 2 includes:

(1)X _input is input into a 3X3 depth convolution layer to implicitly model the local relation between pixels in a channel, X is obtained through the processing of the 3X3 depth convolution layer, and the X is directly assigned to the X _Q 、X _K And X _V . Then, X is _Q 、X _K And X _V Reshaped to Q, K and V, respectively. So in practice Q, K and V are three identical feature matrices. The expression is as follows:

X _Q ＝X _input W ^Q ,X _K ＝X _input W ^K ,X _V ＝X _input W ^V

W ^Q ＝W ^K ＝W ^V ，X _Q ＝X _K ＝X _V ＝X

Q＝Reshape(X _Q ),K＝Reshape(X _K ),V＝Reshape(X _V )

Q＝K＝V

X _input representing the input feature matrix, W, of the LL-MSAB module ^Q 、W ^K And W is ^V A weight matrix representing a 3x3 depth convolution layer, reshape (·) represents a remodeling operation.

(2) Due to the limitation of computing power, the invention provides Q (Q= [ Q) ₁ ，...，Q _N ])，K(K＝[K ₁ ，...，K _N ]) And V (v= [ V) ₁ ，...，V _N ]) Is divided into N heads _j Each head _j Is dim _h =c/N (C represents the total number of channels of the feature matrix). LL-MSAB needs to calculate each head _j Is to try attention to the channel of (a) _j . The expression is as follows:

head _j ＝V _j ·Atten _j

represent K _j Transposed matrix of alpha _j Representing a learnable parameter.

(3) Splice N heads _j And are connected into a linear projection. Remolding the obtained feature matrix, and finally inputting the remolded result into a 1X1 convolution layer to obtain an output result X of the LL-MSAB _output 。

Conv1 (·) represents a 1x1 convolutional layer, concat (·) represents a splicing operation, W is a learnable parameter, and Reshape (·) represents a reshaping operation.

Further, the enhancing the contrast of the low-light image through the DBWEB module includes:

duplicating the input image into two parts, and respectively passing through a 1x1 convolution layer and a 3x3 depth convolution layer;

performing point multiplication on the result of one branch processed by the Sigmoid activation function and the result of the other branch processed by the Sigmoid activation function;

and (3) the dot multiplication result passes through a 1x1 convolution layer to obtain the output of the DBWEB, wherein the output is expressed as follows:

wherein phi represents the Sigmoid activation function,

representing a 3x3 depth convolution layer,/a>

Indicating 1x1 depth convolution layer +.indicates multiplication by element, concat (& gt) indicates splicing, and w _d Representing a 1x1 depth convolution layer.

Further, the MSFCB module includes three MSC modules, and the input feature matrices of the three MSC modules are all outputs of the first LLAB module of the DAMFFN encoder; the output feature matrixes of the 3 MSC modules are different in size, and the output feature matrixes of the MSC modules are respectively added with the downsampling results corresponding to the DAMFFN encoder to finish the fusion of the features with different scales.

Further, the multi-scale frequency domain loss function (Mult-MSE) is expressed as follows:

where nums=4 represents 4 outputs of the decoder, H represents the high of the feature matrix, W represents the wide of the feature matrix,

and->

Representing pixel values at corresponding coordinates for the enhanced image and the reference image, respectively, Σ represents the summation symbol.

The method for calculating the multi-scale frequency domain loss function (Mult-SFD) is as follows:

wherein x is _i Representing the total number of pixels, f (·) represents the fast fourier transform. I.I ₁ Represents the L1 norm and Σ represents the summation symbol.

Finally, the total Loss function Loss of the present invention is shown below:

Loss＝Loss _Mult-MSE +λLoss _Mult-SFD

where λ represents the weight super parameter of Mult-SFD (λ=0.1).

The invention has at least the following beneficial effects:

(1) The invention designs a DAMFFN, which is a deep convolution attention multi-scale feature fusion network for enhancing low-light images. The network is used to perform multi-scale local-global representation learning on low-light images without decomposing the images into local windows, thereby utilizing remote image context information.

(2) The invention designs a low-light multi-head self-attention module (LL-MSAB) which can aggregate local and global features and balance feature weights among different channels.

(3) The invention refers to a double-branch equalizing module (DBWEB), which suppresses the characteristic of less information quantity by increasing gradients among different pixels and promotes the forward transmission of the characteristic of rich information quantity.

(4) A multi-scale feature compensation Module (MSFCB) module is designed. And by fusing the deep space information of the images with different scales, the loss of detail information of the images is reduced.

(5) The present invention refers to a multi-scale frequency domain loss function (MSFD) to reduce the difference in frequency domain space between the enhanced low-light image and the reference image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an overall network framework of the present invention;

FIG. 2 is a block diagram of a low-light power module according to the present invention;

FIG. 3 is a block diagram of a dual-branch equalization module employed in the present invention;

FIG. 4 is a block diagram of a multi-scale feature compensation module of the present invention;

FIG. 5 is a block diagram of a multi-scale loss function structure of the present invention;

FIG. 6 is a graph comparing images of the present invention with other algorithms after recovery.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention enhances the low-light image through the DAMFN model shown in fig. 1, and because the low-light image restoration task is more complex than other image restoration tasks, the invention designs that different modules restore different aspects of the low-light image.

Firstly, the LL-MSAB module carries out overall restoration enhancement on a low-light image by balancing weights among different channels and matching with a loss function; secondly, because the repair of local features is weaker in the whole repair process and the repair of LL-MSAB to image contrast is insufficient, the invention designs the DBWEB module to assign different weights to each pixel so as to repair the local area. DEWEB also uses a Sigmoid activation function to improve the contrast of the image. Finally, as the low-light image restoration task is a pixel-level restoration task and the detail information of the image is largely lost along with repeated LLAB and downsampling, in order to improve the restoration effect of the final enhanced image, the MSFCB module is designed to compensate the loss of the detail of the low-light image and fuse the image characteristics of different scales. In summary, the invention designs a low-light image enhancement method for fusion of deep convolution attention and multi-scale features.

The whole flow of the invention is as follows:

1. a dataset of low/normal illumination image pairs is constructed. The dataset should consist of low-light images and normal-light images, wherein each low-light image I _low With a normal illumination image I under the same scene _ref Correspondingly, the dataset is then divided into a training set and a test set.

2. And inputting a low-light image in the training set, and taking a corresponding normal-light image as a reference image.

3. Shallow features of the input image are extracted using an Embedding module (a 3x3 convolutional layer). The expression is as follows:

F _shalow ＝f _Conv (I _low )

4. the extracted shallow features are input into the encoder of the DAMFFN, which is mainly composed of the LLAB module and the downsampling operation as shown in fig. 2. In order to extract semantic information of different channels of an image and balance feature weights among different channels, the invention designs a low-light multi-head self-attention module (LL-MSAB). Firstly, the module extracts semantic information and local characteristics of different channels by using depth convolution, so that the local contrast of an image is increased, and the texture of the image is clearer; secondly, a channel attention mechanism in the LL-MSAB module obtains a channel attention map through extracting global features and calculating; finally, channel attention is used for balancing the characteristic weights among different channels of the image matrix, so that the visibility of the image is improved.

LL-MSAB module as shown in FIG. 2, X _input ∈R ^H×W×C As an input of LL-MSAB, it is input into a 3X3 depth convolution layer to extract image semantic information and implicitly model local relations between pixels inside the channel, and finally output X epsilon R ^H×W×C The method comprises the steps of carrying out a first treatment on the surface of the Then X ε R ^H×W×C 3 copies were duplicated and Reshape was Q (query) ε R, respectively ^HW×C 、K(key)∈R ^HW×C And V (value) ε R ^HW×C The above procedure is defined as:

X _Q ＝X _input W ^Q ,X _K ＝X _input W ^K ,X _V ＝X _input W ^V

W ^Q ＝W ^K ＝W ^V ，X _Q ＝X _K ＝X _V ＝X

Q＝Reshape(X _Q ),K＝Reshape(X _K ),V＝Reshape(X _V )

Q＝K＝V

X _input representing the input feature matrix, W, of the LL-MSAB module ^Q 、W ^K And W is ^V A weight matrix representing a 3x3 depth convolution layer, reshape (·) represents a remodeling operation. The whole pixel value of the low-light image is low, the difference value between local pixels is small, the weight has great influence on the pixel value, and the local pixelsSmall fluctuations in weight can cause blurring or excessive smoothing of the image, so in the LL-MSAB module, W ^Q 、W ^K And W is ^V The actual data are exactly the same, so that X is also caused _Q 、X _K And X _V The three are identical, and Q, K and V are identical.

head _j ＝V _j ·Atten _j

5. After analyzing the low-light image data, the present invention has found that: (1) The pixel value of the low-light image is low, and after normalization treatment, the value is generally distributed near 0; (2) Local pixels of low-light imagesThe pixel difference is smaller, and the contrast of the image is lower. The characteristics of the normal illumination image and the low-light image data are just opposite, the pixel value of the normal illumination image is large in whole, the pixel difference between local pixels is large, and the contrast of the image is high. So in order to improve the contrast of low-light images, the present invention refers to a dual branch equalization module (DBWEB), as shown in fig. 3. First, an input image is input

Duplicate into two copies, pass 1x1 convolution layer and 3x3 depth convolution layer separately; secondly, carrying out dot multiplication on the result of the DBWEB left branch processed by the Sigmoid activation function and the result of the DBWEB right branch processed; finally, the dot multiplication result is passed through a 1x1 convolution layer to obtain the output +.>

The slope near 0 in the Sigmoid activation function is larger, the slope is almost kept at the same level, the pixel value of the whole image can be increased approximately in the same proportion, and the whole brightness of the image can be improved; and the larger slope is beneficial to improving the contrast ratio between local pixels of the low-light image. The process of DBWEB is defined as:

wherein phi represents the Sigmoid activation function,

representing a 3x3 depth convolution layer,/a>

6. After analyzing the structure of the network model and the data of the low-light image, it was found that: in the encoder stage, after the low-light image passes through the LLAB module and downsampling, detailed information of the low-light image is greatly lost, and the encoder structure of the model is unfavorable for the fusion of multi-scale information. Thus, the invention designs a multi-scale feature compensation Module (MSFCB) to ensure the quality of the final restored image, and the structure is shown in figure 4. The MSFCB is made up of 3 MSC modules, three of which have their input feature matrices all the output of the first LLAB module of the DAMFFN encoder. The output feature matrixes of the 3 MSC modules are different in size, and the output feature matrixes of the MSC modules are added with the downsampling results corresponding to the DAMFFN encoder respectively to finish the fusion of the features with different scales. The expression is as follows:

7. the network model designed by the invention belongs to a multi-scale low-light image enhancement network, so the invention designs a multi-scale Loss function (Mult-Loss). Mult-Loss consists of a multi-scale MSE Loss function and a multi-scale frequency domain Loss function. Wherein the multi-scale MSE loss function (Mult-MSE) is to calculate the mean square Error (meanssquare Error) between the output image of each decoding layer and the corresponding reference image. The multi-scale frequency domain loss function (MSFD) is to reduce the difference in frequency domain space between the output image of each decoding layer and the corresponding reference image.

The method of calculating the multiscale MSE loss function (Mult-MSE) is as follows:

and->

Finally, the total Loss function Loss of the present invention is shown below:

Loss＝Loss _Mult-MSE +λLoss _Mult-SFD

where λ represents the weight super parameter of Mult-SFD (λ=0.1).

In one embodiment:

the DAMFFN is implemented based on the Python 3.9 and Pytorch 1.21.1 environments. The invention performs data enhancement operations such as turning over and rotating (rotating an image by 90 degrees, 180 degrees or 270 degrees) on training data. During training, the batch size is 1, and the selected 1-image is randomly cut into a slice size of 256×256×3. Gradient optimization was performed using an AdamW optimizer with a motion term β1=0.9 and β2=0.999. The initial learning rate is set to 3e-4, the weight decay is set to 0.5e-4, and the learning rate is stepped down to 1e-6 using a cosine decay strategy. The invention uses peak signal to noise ratio (PSNR), structural Similarity (SSIM) and Natural Image Quality Estimator (NIQE) to estimate the performance of the model, and performs qualitative and quantitative experimental comparison on LOL, MIT-Adobe FiveK and LIME, MEF, DICM low-light image data sets with a plurality of existing low-light image enhancement algorithms. The PSNR evaluates the quality of an image by calculating a mean square error between a low-light image enhanced by a network model and a reference image (group-trunk image). The higher the PSNR value, the better the quality of the enhanced image. SSIM evaluates the similarity between the enhanced image and a reference image (group-trunk image) from three aspects of brightness, contrast, and structure. The larger the SSI M value, the more similar the enhanced image is to the reference image (group-trunk image), and the better the network model effect. The lower the NIQE value, the closer the image is to the natural image, and the higher the image quality. In the implementation of the DAMFFN, the experimental device of the present invention is configured as 16GB NVIDIA Quadro RTX5000 GPU,C =48, n1=2, n2=n3=4, n4=8.

The LOL dataset is a common real dataset in the field of low-light image enhancement, which contains a large number of indoor and outdoor low-light scene pictures, totaling 500 pairs of low-light/normal-light images. Our approach achieved best performance in both PSNR and SSIM as shown in table 1 below. The result of the present invention is psnr=24.87 db, ssim=0.856, PSNR values exceeding MIRNetv20.13dB, SSIM values exceeding 0.005.

Table 1 quantitative comparison of LOL test set on PSNR and SSIM indicators (optimal outcome marker, suboptimal outcome marker #)

The MIT-Adobe FiveK dataset contained 5000 captured images, and 5 expert-finished images were used as label images. As with other low-light image enhancement algorithms trained by the data set, the invention adopts the color rendering result of expert C as the group trunk, and adopts the front 4500 pairs as the training set and the remaining 500 pairs as the test set. As shown in table 2, the present invention achieves optimal results at psnr=25.78 dB (beyond mirretv 2.73 dB) and better results at ssim=0.912 by comparison with existing advanced algorithms.

TABLE 2 quantitative comparison of MIT-Adobe FiveK test set on PSNR and SSIM indicators (best results marker, suboptimal results marker #)

Referring to fig. 6, it can be seen that: through a plurality of qualitative comparisons, the algorithm provided by the invention can obtain a better visual effect. Most algorithmically enhanced images also present significant amounts of noise, such as KinD and MIRNetv2. Some algorithms have enhanced results with insufficient overall brightness, such as Zero-DCE and EnlightenGAN. The Retinex-Net method increases the brightness of the whole image, but adds much noise to the image at the same time, so that the image is blurred. In contrast, the algorithm of the invention can improve the overall brightness of the image and remove noise in the enhanced image on the premise of ensuring the image quality. The repaired image has better visibility, contrast and clear texture.

From a comparison of experiment table 3, the present invention performed 6 ablation experiments to determine the effectiveness and importance of each module. Finally, experiments prove that when all modules are added to the DAMFFN, the performance of the DAMFFN can be optimized. As shown in Table 3, on the LOL dataset, when only LL-MSAB or DBWEB is used, the performance of the network model is not optimal and the difference from the optimal performance is large. The performance of the network model is worst in the absence of the LL-MSAB module, which is sufficient to demonstrate that the LL-MSAB module can improve the performance of the overall network model. When there is only an LL-MSAB module, the psnr=23.25 db and ssim=0.812 of the network model. When the DBWEB module is added, the PSNR value of the network model is hardly changed, but the SSIM value is increased by 0.003, because the sigmoid function in the DBWEB can increase the difference between pixels and the contrast of local areas, so that the DBWEB can increase the structural similarity of the whole image. When only the MSFCB module and the LL-MSAB module are added, the performance of the DAMFFN is greatly reduced, which further illustrates the effectiveness of the DBWEB module. After adding the MSFCB module, the PSNR value of the network model is increased from 23.22dB to 24.87dB, and the SSIM is increased from 0.842 to 0.856, wherein the PSNR is increased by 1.65dB, and the SSIM is increased by 0.014, because the MSFCB module can fully compensate the loss of the low-light image detail information in LLAB and downsampling stages and can fuse the deep space information of images with different scales. Meanwhile, the MSFCB module is proved to be capable of greatly improving the performance of the model.

Table 3 quantitative results of ablation experiments on PSNR and SSIM metrics for components of the DAMFFN network model on the LOL test set

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A low-light image enhancement method for fusion of deep convolution attention and multi-scale features, which is characterized by comprising the following steps of: the method comprises the following steps:

extracting semantic information of the low-light image through the LL-MSAB module, and balancing feature weights among different channels so as to improve the visibility of the image;

the contrast of the low-light image is improved through the DBWEB module;

and outputting the final enhanced low-light image.

2. A method of low-light image enhancement for deep convolution attention and multi-scale feature fusion according to claim 1, wherein: the emmbedding module uses a convolution module to increase the channel count of the low-light image from 3 to 48.

3. The method for enhancing a low-light image by fusion of deep convolution attention and multi-scale features according to claim 1, wherein the extracting semantic information of a low-light image and equalizing feature weights among different channels by using an LL-MSAB module to improve the visibility of the image comprises:

shallow layer feature X _input Is input to a 3X3 depth convolution layer to extract image semantic information to implicitly model local relationships between pixels within a channel to obtain X, which is directly assigned to X _Q 、X _K And X _V ；

X is to be _Q 、X _K And X _V The remodeling into three identical feature matrices Q, K and V, respectively, is shown as follows:

X _Q ＝X _input W ^Q ，X _K ＝X _input W ^K ，X _V ＝X _input W ^V

W ^Q ＝W ^K ＝W ^V ，X _Q ＝XK＝XV＝X

Q＝Reshape(X _Q )，K＝Reshape(X _K )，V＝Reshape(X _V )

Q＝K＝V

X _input representing the input feature matrix, W, of the LL-MSAB module ^Q 、W ^K And W is ^V A weight matrix representing a 3x3 depth convolution layer, reshape (·) representing a remodeling operation;

will Q (q= [ Q) ₁ ，...，Q _N ])，K(K＝[K ₁ ，...，K _N ]) And V (v= [ V) ₁ ，...，V _N ]) Divided into N heads _j Each head _j Channel dimension of (2)

C represents the total number of channels of the feature matrix; LL-MSAB calculates channel attention attempt Attens for each header _j The expression is as follows:

head _j ＝V _j ·Atten _j

represent K _j Transposed matrix of alpha _j Representing a learnable parameter; />

Splice N heads _j And connecting the two components into linear projection, remodelling the obtained feature matrix, and finally inputting the remodelled result into a 1X1 convolution layer to obtain an output result X of the LL-MSAB _output The expression is as follows:

4. A method of low-light image enhancement for deep convolution attention and multi-scale feature fusion according to claim 1, wherein: the method for improving the contrast of the low-light image through the DBWEB module comprises the following steps:

wherein phi represents the Sigmoid activation function,

representing a 3x3 depth convolution layer,/a>

5. A method of low-light image enhancement for deep convolution attention and multi-scale feature fusion according to claim 1, wherein: the MSFCB module comprises three MSC modules, and the input feature matrixes of the three MSC modules are the output of the first LLAB module of the DAMFFN encoder; the output feature matrixes of the 3 MSC modules are different in size, and the output feature matrixes of the MSC modules are respectively added with the downsampling results corresponding to the DAMFFN encoder to finish the fusion of the features with different scales.

6. A method of low-light image enhancement for deep convolution attention and multi-scale feature fusion according to claim 1, wherein: the multi-scale frequency domain loss function is expressed as follows:

and->

Representing pixel values at corresponding coordinates for the enhanced image and the reference image, respectively, Σ represents the summation symbol,

the method for calculating the multi-scale frequency domain loss function is as follows:

wherein x is _i Representing the total number of pixels, f (·) represents the fast fourier transform, I.I ₁ Represents the L1 norm, Σ represents the summation symbol,

the total Loss function Loss is as follows:

Loss＝Loss _Mult-MSE +λLoss _Mult-SFD

where λ represents the weight super parameter of Mult-SFD (λ=0.1).