CN114708163A

CN114708163A - Low-illumination image enhancement model based on linear attention mechanism

Info

Publication number: CN114708163A
Application number: CN202210337183.0A
Authority: CN
Inventors: 刘晴; 李玉鑑; 张乐乾
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2022-07-05

Abstract

The invention discloses a low-illumination image enhancement model based on a linear attention mechanism. Belongs to the technical field of deep learning. The method is characterized in that: the linear array self-attention is introduced, 3-D global attention weight can be directly deduced from the feature diagram, the long-range dependency relationship can be established by thinning the feature diagram through convolution operation, the performance of the convolution neural network is improved, richer high-level features can be captured to improve the performance of the model, the parameter quantity is reduced, and the complexity and the cost are reduced. The invention improves the model structure by adding the attention module, and ideally solves the problem of low-illumination image enhancement.

Description

Low-illumination image enhancement model based on linear attention mechanism

Technical Field

The invention relates to the field of computer vision and low-illumination image processing, in particular to a low-illumination image enhancement model based on a linear attention mechanism.

Background

It is often desirable in everyday life to capture images in low light conditions, such as at night or in dim indoor rooms. Images taken in such an environment often have various problems such as poor visibility, low contrast, large noise, and the like. While auto-exposure mechanisms (e.g., ISO, shutter, flash, etc.) may enhance image brightness, other effects (e.g., blur, supersaturation, etc.) may also result. This will negatively impact human visual experience and downstream visual tasks such as object detection, visual recognition, video surveillance, etc. Since most solutions to these tasks are designed for well-exposed images, there is a need for an effective method of improving the quality of low-light images.

With the development of low-illumination image enhancement and low-illumination image recognition technology, researchers in this field are continuously updating the technical methods, but the current research methods still have a lot of gaps, and many problems to be improved exist: in the low-illumination image enhancement process, the situations of insufficient details, insufficient semantic information retention, distortion artifacts and the like still occur; in low-light image recognition, it is difficult to obtain enough recognizable information from a low-quality picture, and most of the recognizable information is completed in two models, which results in large workload and lack of information of low-light image recognition. Low light images, which suffer from degradation due to environmental or technical limitations, suffer from various problems, such as underexposure and high ISO noise. Or the required network parameters are too large, the overall complexity is too large, and the like, and the images are easy to have reduced characteristics and contrast, which can damage the low-level perception quality and reduce the high-level computer vision task depending on accurate semantic information.

The method based on deep learning shows excellent effect in a plurality of tasks of image processing. In the field of computer vision, the attention mechanism-based method can pay more attention to meaningful semantic information of the current task, and in addition, spatial information of different positions can better learn two-dimensional spatial weight. However, the method based on deep learning also has the problems of lacking generalization capability and possibly bringing new problems, such as high complexity, difficulty in processing high-resolution images, and the like. Therefore, it is necessary to develop more general algorithms to achieve better image quality.

Disclosure of Invention

The invention provides a low-illumination image enhancement model based on a linear attention mechanism, which is characterized in that: by introducing the linear array self attention, the 3-D global attention weight can be directly deduced from the characteristic diagrams, and then the characteristic diagrams are refined, so that the convolution operation can establish a long-range dependency relationship by refining the characteristic diagrams, the performance of a convolution neural network is improved, richer high-level characteristics can be captured to improve the performance of a model, the parameter quantity is reduced, and the complexity and the cost are reduced.

The technical scheme adopted by the method comprises the following steps:

step 1: firstly, designing a convolutional neural network capable of performing end-to-end training;

step 2: initializing the convolutional neural network in the step 1 by a Kaiming network parameter initialization method;

and step 3: linear attention encodes the feature map into two-dimensional feature codes in the vertical and horizontal directions, respectively;

and 4, step 4: constructing a global representation using a self-attention mechanism;

and 5: generating a 3-D global attention weight by a multilayer perceptron (MLP) and a sigmoid activation function;

step 6: evaluating the obtained algorithm and outputting a corresponding test result;

further, in step 2, in order to focus on features that have a major effect on low-illumination images, the network embeds a spatial attention module and a channel attention module, and uses residual connection and dense connection in network connection.

Compared with the traditional low-illumination image enhancement model, the low-illumination image enhancement model based on the linear attention mechanism has the following advantages.

(1) The self-attention mechanism is combined into the depth network model, the learning capability of the depth learning on image details and edge contours is improved, the scenes are various, the image content is extensive, and the method can adaptively improve the quality of the image.

(2) The attention mechanism provided by the invention enables the convolution operation to establish a long-range dependency relationship by thinning the characteristic diagram, thereby improving the performance of the convolution neural network.

(3) The invention has less parameters, reduces the cost and improves the universality of the network.

Drawings

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a schematic diagram of a low-illumination image enhancement mode network based on a linear attention mechanism according to the present invention.

Fig. 2 is a schematic diagram of a residual module.

Fig. 3 is an image output after an original image is enhanced by using the image enhancement method provided by the embodiment of the invention.

Detailed Description

The method of the present invention is described in detail with reference to the accompanying drawings and examples. It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

Noting the attention module input as the convolution feature map in the previous hidden layer

Change its dimension into

Wherein C is the number of channels,

. First using two

Convolution pair input

Execute

And

after the operation is carried out,

representing two feature spaces obtained by multiplying image features having different weight matrices, wherein

Converting the two tensors into matrix form and then converting

Is transposed with

Performing softmax operation on the result obtained by multiplication to obtain an attention diagram

：

The more similar the feature representations of the two locations are, the stronger the correlation between them is, for representing the correlation of the image content area i of the model composition area j. Meanwhile, global information and local information are integrated together, and x is input into

Convolution performs a linear transformation

Obtaining a characteristic diagram

Multiplying the attention maps beta and h (x) to obtain a self-attention feature map, and marking as

And change its shape into

Obtaining:

finally, the output of the attention layer is obtained as:

in order to take account of the correlation between the domain information and the long-distance characteristics, initialization is introduced

Parameter(s)

And the weight parameters can be updated through gradual learning, so that the network firstly focuses on the field information and then is associated with the characteristics of other global positions. Therefore, the self-attention module has the capability of associating global information and establishing long-distance dependency relationship.

Fig. 1 shows a schematic diagram of a low-illumination image enhancement model network based on a linear self-attention mechanism. The general flow of the method is as follows.

It can be seen as a stand-alone computing unit to enhance the expressive power of convolutional neural networks and can be integrated into any other network as a plug-and-play module.

For a given profile F ∈ R^C×H×WThe LASA can directly infer the 3-D weight F with global information_attention∈R^C×H×WTo refine the feature map.

The refined feature map can be calculated as: f' = F.F_attention，

Where, denotes element-by-element multiplication, C, H denote the number of channels, height and width of the feature map, respectively. For linear attention, we first fit the feature map F ∈ R^C×H×WEncoding into a pair of two-dimensional feature codes F along the longitudinal and transverse axes_x∈R^C×1×W，F_y∈R^C ^×H×1It can be expressed as:

next, we transform the size profile F using a matrix transformation operation_x∈R^C×W×1And F_y∈R^C×1×HTo F_x∈R^1×C×W，F_y∈R^1×C×H。

We map the feature F_x∈R^1×C×W，F_y∈R^1×C×HSplicing along the channel dimension to obtain a new characteristic diagram F_xy∈R^1×C×(H+W)。 F_xy∈R^1×C×(H+W)Will expand to three times the original number of channels and then divide into Q, K, V partitions in the channel dimension. The value of the global relational computation feature map can be expressed as:

after computing the global relationships of the feature maps, we employ a residual learning strategy to facilitate gradient flow. Finally, the attention weight is calculated as:

where MLP is a multilayer perceptron and σ is a sigmoid function.

The loss function includes the following:

image content function

High-level features extracted by the conv5_2 layer of the pre-trained VGG-19 network are defined.

Is a multi-scale structural loss function, where M represents images at different scales,

and

represents the average of the predicted image and the standard image,

and

representing the standard deviation of the predicted image and the standard image,

is the covariance between the two images. α and β m represent the weight coefficient items, c between them₁And c₂Are two constants.

Wherein D (x, y) is L₁The distance between the first and second electrodes,

is the i-th hidden feature from the VGG model.

L^MIXIs a global loss function, where₁λ₂λ₃,

Is for balancing the loss function L^MIXWeight coefficient of importance.

To test the generalization ability of the resulting network of step 4, it was verified using a test set. The Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) are used as evaluation indexes. PSNR is an objective standard for evaluating images, often used as a measure of the quality of signal reconstruction, and is used to measure the ratio of the average energy between the peak signal and the background noise, in dB, with larger values indicating less distortion. Given a set of images I and O, the PSNR is:

where MSE is the mean square error, MAX, of the two images_IThe maximum pixel value of I.

The PSNR evaluates the image quality based on the error between corresponding pixel points, and does not consider the visual characteristics of human eyes, namely, the human eyes have higher sensitivity to the contrast difference with lower spatial frequency and higher sensitivity to the brightness contrast difference, and the perception result of the human eyes to one region is influenced by the surrounding adjacent regions, so that the condition that the evaluation result is inconsistent with the subjective feeling of the human is often caused. SSIM is a full-reference image quality evaluation index, measures the similarity of images from three aspects of brightness, contrast and structure, and can keep consistent with human visual perception on the whole. SSIM is defined as follows:

wherein

And

are respectively as

Mean and variance of;

and

mean and variance of O, respectively;

covariance as I and O;

，

，

and

are fixed values of 0.01 and 0.03, respectively; l is the range of pixel values.

The present invention has been described in terms of the preferred embodiment, and it is not intended to be limited to the embodiment. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A low-illumination image enhancement model based on a linear self-attention mechanism is characterized in that: the linear array self-attention method can directly deduce 3-D global attention weight from feature maps, and then refines the feature maps, wherein the refined feature maps can implicitly couple local and global relationships by adjusting local feature maps by using the global weight, so that the cost of training and deploying models can be reduced, and the method specifically comprises the following steps:

1) firstly, designing a convolutional neural network capable of performing end-to-end training;

2) initializing the convolutional neural network in the step 1 by a Kaiming network parameter initialization method;

3) linear attention first encodes the feature map into two-dimensional feature codes along the vertical and horizontal directions, respectively;

4) constructing a global representation using a self-attention mechanism;

5) generating a 3-D global attention weight by a multilayer perceptron (MLP) and a sigmoid activation function;

6) and evaluating the obtained algorithm and outputting a corresponding test result.

2. The linear attention mechanism-based low-illumination image enhancement model of claim 1, wherein a convolutional neural network is designed for end-to-end training, which embeds a channel attention module and a spatial attention module, while using residual connections and dense connections over the network connection.

3. The linear attention mechanism-based low-illuminance image enhancement model of claim 1, wherein the convolutional neural network of 1) is initialized using a Kaiming network parameter initialization method.

4. For linear attention, the feature map F ∈ R^C×H×WEncoding into a pair of two-dimensional feature codes F along the longitudinal and transverse axes_x∈R^C×1×W，F_y∈R^C×H×1。

5. The linear attention mechanism-based low-illumination image enhancement model of claim 1, wherein the loss values are calculated using multi-scale structural loss.

6. The linear attention mechanism-based low-illumination image enhancement model of claim 1, wherein the finally trained network is tested on a test set, and evaluation indexes adopted are Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM).