CN114708163A - Low-illumination image enhancement model based on linear attention mechanism - Google Patents
Low-illumination image enhancement model based on linear attention mechanism Download PDFInfo
- Publication number
- CN114708163A CN114708163A CN202210337183.0A CN202210337183A CN114708163A CN 114708163 A CN114708163 A CN 114708163A CN 202210337183 A CN202210337183 A CN 202210337183A CN 114708163 A CN114708163 A CN 114708163A
- Authority
- CN
- China
- Prior art keywords
- attention
- linear
- image enhancement
- low
- illumination image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005286 illumination Methods 0.000 title claims abstract description 19
- 230000007246 mechanism Effects 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 claims abstract description 15
- 238000013527 convolutional neural network Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000011423 initialization method Methods 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 abstract description 12
- 238000013528 artificial neural network Methods 0.000 abstract description 3
- 238000013135 deep learning Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 4
- 230000004438 eyesight Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- G06T5/94—Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a low-illumination image enhancement model based on a linear attention mechanism. Belongs to the technical field of deep learning. The method is characterized in that: the linear array self-attention is introduced, 3-D global attention weight can be directly deduced from the feature diagram, the long-range dependency relationship can be established by thinning the feature diagram through convolution operation, the performance of the convolution neural network is improved, richer high-level features can be captured to improve the performance of the model, the parameter quantity is reduced, and the complexity and the cost are reduced. The invention improves the model structure by adding the attention module, and ideally solves the problem of low-illumination image enhancement.
Description
Technical Field
The invention relates to the field of computer vision and low-illumination image processing, in particular to a low-illumination image enhancement model based on a linear attention mechanism.
Background
It is often desirable in everyday life to capture images in low light conditions, such as at night or in dim indoor rooms. Images taken in such an environment often have various problems such as poor visibility, low contrast, large noise, and the like. While auto-exposure mechanisms (e.g., ISO, shutter, flash, etc.) may enhance image brightness, other effects (e.g., blur, supersaturation, etc.) may also result. This will negatively impact human visual experience and downstream visual tasks such as object detection, visual recognition, video surveillance, etc. Since most solutions to these tasks are designed for well-exposed images, there is a need for an effective method of improving the quality of low-light images.
With the development of low-illumination image enhancement and low-illumination image recognition technology, researchers in this field are continuously updating the technical methods, but the current research methods still have a lot of gaps, and many problems to be improved exist: in the low-illumination image enhancement process, the situations of insufficient details, insufficient semantic information retention, distortion artifacts and the like still occur; in low-light image recognition, it is difficult to obtain enough recognizable information from a low-quality picture, and most of the recognizable information is completed in two models, which results in large workload and lack of information of low-light image recognition. Low light images, which suffer from degradation due to environmental or technical limitations, suffer from various problems, such as underexposure and high ISO noise. Or the required network parameters are too large, the overall complexity is too large, and the like, and the images are easy to have reduced characteristics and contrast, which can damage the low-level perception quality and reduce the high-level computer vision task depending on accurate semantic information.
The method based on deep learning shows excellent effect in a plurality of tasks of image processing. In the field of computer vision, the attention mechanism-based method can pay more attention to meaningful semantic information of the current task, and in addition, spatial information of different positions can better learn two-dimensional spatial weight. However, the method based on deep learning also has the problems of lacking generalization capability and possibly bringing new problems, such as high complexity, difficulty in processing high-resolution images, and the like. Therefore, it is necessary to develop more general algorithms to achieve better image quality.
Disclosure of Invention
The invention provides a low-illumination image enhancement model based on a linear attention mechanism, which is characterized in that: by introducing the linear array self attention, the 3-D global attention weight can be directly deduced from the characteristic diagrams, and then the characteristic diagrams are refined, so that the convolution operation can establish a long-range dependency relationship by refining the characteristic diagrams, the performance of a convolution neural network is improved, richer high-level characteristics can be captured to improve the performance of a model, the parameter quantity is reduced, and the complexity and the cost are reduced.
The technical scheme adopted by the method comprises the following steps:
step 1: firstly, designing a convolutional neural network capable of performing end-to-end training;
step 2: initializing the convolutional neural network in the step 1 by a Kaiming network parameter initialization method;
and step 3: linear attention encodes the feature map into two-dimensional feature codes in the vertical and horizontal directions, respectively;
and 4, step 4: constructing a global representation using a self-attention mechanism;
and 5: generating a 3-D global attention weight by a multilayer perceptron (MLP) and a sigmoid activation function;
step 6: evaluating the obtained algorithm and outputting a corresponding test result;
further, in step 2, in order to focus on features that have a major effect on low-illumination images, the network embeds a spatial attention module and a channel attention module, and uses residual connection and dense connection in network connection.
Compared with the traditional low-illumination image enhancement model, the low-illumination image enhancement model based on the linear attention mechanism has the following advantages.
(1) The self-attention mechanism is combined into the depth network model, the learning capability of the depth learning on image details and edge contours is improved, the scenes are various, the image content is extensive, and the method can adaptively improve the quality of the image.
(2) The attention mechanism provided by the invention enables the convolution operation to establish a long-range dependency relationship by thinning the characteristic diagram, thereby improving the performance of the convolution neural network.
(3) The invention has less parameters, reduces the cost and improves the universality of the network.
Drawings
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a schematic diagram of a low-illumination image enhancement mode network based on a linear attention mechanism according to the present invention.
Fig. 2 is a schematic diagram of a residual module.
Fig. 3 is an image output after an original image is enhanced by using the image enhancement method provided by the embodiment of the invention.
Detailed Description
The method of the present invention is described in detail with reference to the accompanying drawings and examples. It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
Noting the attention module input as the convolution feature map in the previous hidden layerChange its dimension intoWherein C is the number of channels,. First using twoConvolution pair inputExecuteAndafter the operation is carried out,representing two feature spaces obtained by multiplying image features having different weight matrices, whereinConverting the two tensors into matrix form and then convertingIs transposed withPerforming softmax operation on the result obtained by multiplication to obtain an attention diagram:
The more similar the feature representations of the two locations are, the stronger the correlation between them is, for representing the correlation of the image content area i of the model composition area j. Meanwhile, global information and local information are integrated together, and x is input intoConvolution performs a linear transformationObtaining a characteristic diagramMultiplying the attention maps beta and h (x) to obtain a self-attention feature map, and marking asAnd change its shape intoObtaining:
finally, the output of the attention layer is obtained as:
in order to take account of the correlation between the domain information and the long-distance characteristics, initialization is introducedParameter(s)And the weight parameters can be updated through gradual learning, so that the network firstly focuses on the field information and then is associated with the characteristics of other global positions. Therefore, the self-attention module has the capability of associating global information and establishing long-distance dependency relationship.
Fig. 1 shows a schematic diagram of a low-illumination image enhancement model network based on a linear self-attention mechanism. The general flow of the method is as follows.
It can be seen as a stand-alone computing unit to enhance the expressive power of convolutional neural networks and can be integrated into any other network as a plug-and-play module.
For a given profile F ∈ RC×H×WThe LASA can directly infer the 3-D weight F with global informationattention∈RC×H×WTo refine the feature map.
The refined feature map can be calculated as: f' = F.Fattention,
Where, denotes element-by-element multiplication, C, H denote the number of channels, height and width of the feature map, respectively. For linear attention, we first fit the feature map F ∈ RC×H×WEncoding into a pair of two-dimensional feature codes F along the longitudinal and transverse axesx∈RC×1×W,Fy∈RC ×H×1It can be expressed as:
next, we transform the size profile F using a matrix transformation operationx∈RC×W×1And Fy∈RC×1×HTo Fx∈R1×C×W,Fy∈R1×C×H。
We map the feature Fx∈R1×C×W,Fy∈R1×C×HSplicing along the channel dimension to obtain a new characteristic diagram Fxy∈R1×C×(H+W)。 Fxy∈R1×C×(H+W)Will expand to three times the original number of channels and then divide into Q, K, V partitions in the channel dimension. The value of the global relational computation feature map can be expressed as:
after computing the global relationships of the feature maps, we employ a residual learning strategy to facilitate gradient flow. Finally, the attention weight is calculated as:
where MLP is a multilayer perceptron and σ is a sigmoid function.
The loss function includes the following:
image content functionHigh-level features extracted by the conv5_2 layer of the pre-trained VGG-19 network are defined.
Is a multi-scale structural loss function, where M represents images at different scales,andrepresents the average of the predicted image and the standard image,andrepresenting the standard deviation of the predicted image and the standard image,is the covariance between the two images. α and β m represent the weight coefficient items, c between them1And c2Are two constants.
Wherein D (x, y) is L1The distance between the first and second electrodes,is the i-th hidden feature from the VGG model.
LMIXIs a global loss function, where1λ2λ3,Is for balancing the loss function LMIXWeight coefficient of importance.
To test the generalization ability of the resulting network of step 4, it was verified using a test set. The Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) are used as evaluation indexes. PSNR is an objective standard for evaluating images, often used as a measure of the quality of signal reconstruction, and is used to measure the ratio of the average energy between the peak signal and the background noise, in dB, with larger values indicating less distortion. Given a set of images I and O, the PSNR is:
where MSE is the mean square error, MAX, of the two imagesIThe maximum pixel value of I.
The PSNR evaluates the image quality based on the error between corresponding pixel points, and does not consider the visual characteristics of human eyes, namely, the human eyes have higher sensitivity to the contrast difference with lower spatial frequency and higher sensitivity to the brightness contrast difference, and the perception result of the human eyes to one region is influenced by the surrounding adjacent regions, so that the condition that the evaluation result is inconsistent with the subjective feeling of the human is often caused. SSIM is a full-reference image quality evaluation index, measures the similarity of images from three aspects of brightness, contrast and structure, and can keep consistent with human visual perception on the whole. SSIM is defined as follows:
whereinAndare respectively asMean and variance of;andmean and variance of O, respectively;covariance as I and O;,,andare fixed values of 0.01 and 0.03, respectively; l is the range of pixel values.
The present invention has been described in terms of the preferred embodiment, and it is not intended to be limited to the embodiment. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A low-illumination image enhancement model based on a linear self-attention mechanism is characterized in that: the linear array self-attention method can directly deduce 3-D global attention weight from feature maps, and then refines the feature maps, wherein the refined feature maps can implicitly couple local and global relationships by adjusting local feature maps by using the global weight, so that the cost of training and deploying models can be reduced, and the method specifically comprises the following steps:
1) firstly, designing a convolutional neural network capable of performing end-to-end training;
2) initializing the convolutional neural network in the step 1 by a Kaiming network parameter initialization method;
3) linear attention first encodes the feature map into two-dimensional feature codes along the vertical and horizontal directions, respectively;
4) constructing a global representation using a self-attention mechanism;
5) generating a 3-D global attention weight by a multilayer perceptron (MLP) and a sigmoid activation function;
6) and evaluating the obtained algorithm and outputting a corresponding test result.
2. The linear attention mechanism-based low-illumination image enhancement model of claim 1, wherein a convolutional neural network is designed for end-to-end training, which embeds a channel attention module and a spatial attention module, while using residual connections and dense connections over the network connection.
3. The linear attention mechanism-based low-illuminance image enhancement model of claim 1, wherein the convolutional neural network of 1) is initialized using a Kaiming network parameter initialization method.
4. For linear attention, the feature map F ∈ RC×H×WEncoding into a pair of two-dimensional feature codes F along the longitudinal and transverse axesx∈RC×1×W,Fy∈RC×H×1 。
5. The linear attention mechanism-based low-illumination image enhancement model of claim 1, wherein the loss values are calculated using multi-scale structural loss.
6. The linear attention mechanism-based low-illumination image enhancement model of claim 1, wherein the finally trained network is tested on a test set, and evaluation indexes adopted are Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210337183.0A CN114708163A (en) | 2022-04-01 | 2022-04-01 | Low-illumination image enhancement model based on linear attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210337183.0A CN114708163A (en) | 2022-04-01 | 2022-04-01 | Low-illumination image enhancement model based on linear attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114708163A true CN114708163A (en) | 2022-07-05 |
Family
ID=82170067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210337183.0A Pending CN114708163A (en) | 2022-04-01 | 2022-04-01 | Low-illumination image enhancement model based on linear attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114708163A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111950649A (en) * | 2020-08-20 | 2020-11-17 | 桂林电子科技大学 | Attention mechanism and capsule network-based low-illumination image classification method |
CN112435191A (en) * | 2020-11-25 | 2021-03-02 | 西安交通大学 | Low-illumination image enhancement method based on fusion of multiple neural network structures |
CN113096017A (en) * | 2021-04-14 | 2021-07-09 | 南京林业大学 | Image super-resolution reconstruction method based on depth coordinate attention network model |
CN114170095A (en) * | 2021-11-22 | 2022-03-11 | 西安理工大学 | Low-illumination image enhancement method combining Transformers and CNN |
-
2022
- 2022-04-01 CN CN202210337183.0A patent/CN114708163A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111950649A (en) * | 2020-08-20 | 2020-11-17 | 桂林电子科技大学 | Attention mechanism and capsule network-based low-illumination image classification method |
CN112435191A (en) * | 2020-11-25 | 2021-03-02 | 西安交通大学 | Low-illumination image enhancement method based on fusion of multiple neural network structures |
CN113096017A (en) * | 2021-04-14 | 2021-07-09 | 南京林业大学 | Image super-resolution reconstruction method based on depth coordinate attention network model |
CN114170095A (en) * | 2021-11-22 | 2022-03-11 | 西安理工大学 | Low-illumination image enhancement method combining Transformers and CNN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110363716B (en) | High-quality reconstruction method for generating confrontation network composite degraded image based on conditions | |
CN112614077B (en) | Unsupervised low-illumination image enhancement method based on generation countermeasure network | |
CN111582483B (en) | Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism | |
CN109241982B (en) | Target detection method based on deep and shallow layer convolutional neural network | |
CN111709895A (en) | Image blind deblurring method and system based on attention mechanism | |
CN111915526A (en) | Photographing method based on brightness attention mechanism low-illumination image enhancement algorithm | |
CN111292264A (en) | Image high dynamic range reconstruction method based on deep learning | |
CN113077505B (en) | Monocular depth estimation network optimization method based on contrast learning | |
CN114170286B (en) | Monocular depth estimation method based on unsupervised deep learning | |
CN111738948B (en) | Underwater image enhancement method based on double U-nets | |
CN110458765A (en) | The method for enhancing image quality of convolutional network is kept based on perception | |
CN111047543A (en) | Image enhancement method, device and storage medium | |
Fan et al. | Multiscale cross-connected dehazing network with scene depth fusion | |
CN116486074A (en) | Medical image segmentation method based on local and global context information coding | |
CN115035171A (en) | Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion | |
Feng et al. | Low-light image enhancement algorithm based on an atmospheric physical model | |
CN114708615B (en) | Human body detection method based on image enhancement in low-illumination environment, electronic equipment and storage medium | |
CN113781375A (en) | Vehicle-mounted vision enhancement method based on multi-exposure fusion | |
CN113393510B (en) | Image processing method, intelligent terminal and storage medium | |
CN115311149A (en) | Image denoising method, model, computer-readable storage medium and terminal device | |
CN116912114A (en) | Non-reference low-illumination image enhancement method based on high-order curve iteration | |
CN116563141A (en) | Mars surface image enhancement method based on convolutional neural network | |
CN116309171A (en) | Method and device for enhancing monitoring image of power transmission line | |
CN114708163A (en) | Low-illumination image enhancement model based on linear attention mechanism | |
CN115619674A (en) | Low-illumination video enhancement method, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |