CN114708163A - Low-illumination image enhancement model based on linear attention mechanism - Google Patents

Low-illumination image enhancement model based on linear attention mechanism Download PDF

Info

Publication number
CN114708163A
CN114708163A CN202210337183.0A CN202210337183A CN114708163A CN 114708163 A CN114708163 A CN 114708163A CN 202210337183 A CN202210337183 A CN 202210337183A CN 114708163 A CN114708163 A CN 114708163A
Authority
CN
China
Prior art keywords
attention
linear
image enhancement
low
illumination image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210337183.0A
Other languages
Chinese (zh)
Inventor
刘晴
李玉鑑
张乐乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202210337183.0A priority Critical patent/CN114708163A/en
Publication of CN114708163A publication Critical patent/CN114708163A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a low-illumination image enhancement model based on a linear attention mechanism. Belongs to the technical field of deep learning. The method is characterized in that: the linear array self-attention is introduced, 3-D global attention weight can be directly deduced from the feature diagram, the long-range dependency relationship can be established by thinning the feature diagram through convolution operation, the performance of the convolution neural network is improved, richer high-level features can be captured to improve the performance of the model, the parameter quantity is reduced, and the complexity and the cost are reduced. The invention improves the model structure by adding the attention module, and ideally solves the problem of low-illumination image enhancement.

Description

Low-illumination image enhancement model based on linear attention mechanism
Technical Field
The invention relates to the field of computer vision and low-illumination image processing, in particular to a low-illumination image enhancement model based on a linear attention mechanism.
Background
It is often desirable in everyday life to capture images in low light conditions, such as at night or in dim indoor rooms. Images taken in such an environment often have various problems such as poor visibility, low contrast, large noise, and the like. While auto-exposure mechanisms (e.g., ISO, shutter, flash, etc.) may enhance image brightness, other effects (e.g., blur, supersaturation, etc.) may also result. This will negatively impact human visual experience and downstream visual tasks such as object detection, visual recognition, video surveillance, etc. Since most solutions to these tasks are designed for well-exposed images, there is a need for an effective method of improving the quality of low-light images.
With the development of low-illumination image enhancement and low-illumination image recognition technology, researchers in this field are continuously updating the technical methods, but the current research methods still have a lot of gaps, and many problems to be improved exist: in the low-illumination image enhancement process, the situations of insufficient details, insufficient semantic information retention, distortion artifacts and the like still occur; in low-light image recognition, it is difficult to obtain enough recognizable information from a low-quality picture, and most of the recognizable information is completed in two models, which results in large workload and lack of information of low-light image recognition. Low light images, which suffer from degradation due to environmental or technical limitations, suffer from various problems, such as underexposure and high ISO noise. Or the required network parameters are too large, the overall complexity is too large, and the like, and the images are easy to have reduced characteristics and contrast, which can damage the low-level perception quality and reduce the high-level computer vision task depending on accurate semantic information.
The method based on deep learning shows excellent effect in a plurality of tasks of image processing. In the field of computer vision, the attention mechanism-based method can pay more attention to meaningful semantic information of the current task, and in addition, spatial information of different positions can better learn two-dimensional spatial weight. However, the method based on deep learning also has the problems of lacking generalization capability and possibly bringing new problems, such as high complexity, difficulty in processing high-resolution images, and the like. Therefore, it is necessary to develop more general algorithms to achieve better image quality.
Disclosure of Invention
The invention provides a low-illumination image enhancement model based on a linear attention mechanism, which is characterized in that: by introducing the linear array self attention, the 3-D global attention weight can be directly deduced from the characteristic diagrams, and then the characteristic diagrams are refined, so that the convolution operation can establish a long-range dependency relationship by refining the characteristic diagrams, the performance of a convolution neural network is improved, richer high-level characteristics can be captured to improve the performance of a model, the parameter quantity is reduced, and the complexity and the cost are reduced.
The technical scheme adopted by the method comprises the following steps:
step 1: firstly, designing a convolutional neural network capable of performing end-to-end training;
step 2: initializing the convolutional neural network in the step 1 by a Kaiming network parameter initialization method;
and step 3: linear attention encodes the feature map into two-dimensional feature codes in the vertical and horizontal directions, respectively;
and 4, step 4: constructing a global representation using a self-attention mechanism;
and 5: generating a 3-D global attention weight by a multilayer perceptron (MLP) and a sigmoid activation function;
step 6: evaluating the obtained algorithm and outputting a corresponding test result;
further, in step 2, in order to focus on features that have a major effect on low-illumination images, the network embeds a spatial attention module and a channel attention module, and uses residual connection and dense connection in network connection.
Compared with the traditional low-illumination image enhancement model, the low-illumination image enhancement model based on the linear attention mechanism has the following advantages.
(1) The self-attention mechanism is combined into the depth network model, the learning capability of the depth learning on image details and edge contours is improved, the scenes are various, the image content is extensive, and the method can adaptively improve the quality of the image.
(2) The attention mechanism provided by the invention enables the convolution operation to establish a long-range dependency relationship by thinning the characteristic diagram, thereby improving the performance of the convolution neural network.
(3) The invention has less parameters, reduces the cost and improves the universality of the network.
Drawings
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a schematic diagram of a low-illumination image enhancement mode network based on a linear attention mechanism according to the present invention.
Fig. 2 is a schematic diagram of a residual module.
Fig. 3 is an image output after an original image is enhanced by using the image enhancement method provided by the embodiment of the invention.
Detailed Description
The method of the present invention is described in detail with reference to the accompanying drawings and examples. It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
Noting the attention module input as the convolution feature map in the previous hidden layer
Figure 942038DEST_PATH_IMAGE001
Change its dimension into
Figure 205529DEST_PATH_IMAGE002
Wherein C is the number of channels,
Figure 726640DEST_PATH_IMAGE003
. First using two
Figure 163307DEST_PATH_IMAGE004
Convolution pair input
Figure 945318DEST_PATH_IMAGE005
Execute
Figure 927180DEST_PATH_IMAGE006
And
Figure 184855DEST_PATH_IMAGE007
after the operation is carried out,
Figure 910366DEST_PATH_IMAGE008
representing two feature spaces obtained by multiplying image features having different weight matrices, wherein
Figure 937097DEST_PATH_IMAGE009
Converting the two tensors into matrix form and then converting
Figure 355440DEST_PATH_IMAGE010
Is transposed with
Figure 569252DEST_PATH_IMAGE011
Performing softmax operation on the result obtained by multiplication to obtain an attention diagram
Figure 223087DEST_PATH_IMAGE012
Figure 855057DEST_PATH_IMAGE013
Figure 427990DEST_PATH_IMAGE014
The more similar the feature representations of the two locations are, the stronger the correlation between them is, for representing the correlation of the image content area i of the model composition area j. Meanwhile, global information and local information are integrated together, and x is input into
Figure 145410DEST_PATH_IMAGE015
Convolution performs a linear transformation
Figure 727570DEST_PATH_IMAGE016
Obtaining a characteristic diagram
Figure 948467DEST_PATH_IMAGE017
Multiplying the attention maps beta and h (x) to obtain a self-attention feature map, and marking as
Figure 833246DEST_PATH_IMAGE018
And change its shape into
Figure 21651DEST_PATH_IMAGE019
Obtaining:
Figure 79606DEST_PATH_IMAGE020
finally, the output of the attention layer is obtained as:
Figure 420588DEST_PATH_IMAGE021
in order to take account of the correlation between the domain information and the long-distance characteristics, initialization is introduced
Figure 600903DEST_PATH_IMAGE022
Parameter(s)
Figure 292915DEST_PATH_IMAGE023
And the weight parameters can be updated through gradual learning, so that the network firstly focuses on the field information and then is associated with the characteristics of other global positions. Therefore, the self-attention module has the capability of associating global information and establishing long-distance dependency relationship.
Fig. 1 shows a schematic diagram of a low-illumination image enhancement model network based on a linear self-attention mechanism. The general flow of the method is as follows.
It can be seen as a stand-alone computing unit to enhance the expressive power of convolutional neural networks and can be integrated into any other network as a plug-and-play module.
For a given profile F ∈ RC×H×WThe LASA can directly infer the 3-D weight F with global informationattention∈RC×H×WTo refine the feature map.
The refined feature map can be calculated as: f' = F.Fattention
Where, denotes element-by-element multiplication, C, H denote the number of channels, height and width of the feature map, respectively. For linear attention, we first fit the feature map F ∈ RC×H×WEncoding into a pair of two-dimensional feature codes F along the longitudinal and transverse axesx∈RC×1×W,Fy∈RC ×H×1It can be expressed as:
Figure 826665DEST_PATH_IMAGE024
Figure 537001DEST_PATH_IMAGE025
next, we transform the size profile F using a matrix transformation operationx∈RC×W×1And Fy∈RC×1×HTo Fx∈R1×C×W,Fy∈R1×C×H
We map the feature Fx∈R1×C×W,Fy∈R1×C×HSplicing along the channel dimension to obtain a new characteristic diagram Fxy∈R1×C×(H+W)。 Fxy∈R1×C×(H+W)Will expand to three times the original number of channels and then divide into Q, K, V partitions in the channel dimension. The value of the global relational computation feature map can be expressed as:
Figure 560320DEST_PATH_IMAGE026
after computing the global relationships of the feature maps, we employ a residual learning strategy to facilitate gradient flow. Finally, the attention weight is calculated as:
Figure 474050DEST_PATH_IMAGE027
where MLP is a multilayer perceptron and σ is a sigmoid function.
The loss function includes the following:
Figure 201703DEST_PATH_IMAGE028
image content function
Figure 517278DEST_PATH_IMAGE029
High-level features extracted by the conv5_2 layer of the pre-trained VGG-19 network are defined.
Figure 39395DEST_PATH_IMAGE030
Figure 502737DEST_PATH_IMAGE031
Is a multi-scale structural loss function, where M represents images at different scales,
Figure 502923DEST_PATH_IMAGE032
and
Figure 938584DEST_PATH_IMAGE033
represents the average of the predicted image and the standard image,
Figure 366023DEST_PATH_IMAGE034
and
Figure 519924DEST_PATH_IMAGE035
representing the standard deviation of the predicted image and the standard image,
Figure 323801DEST_PATH_IMAGE036
is the covariance between the two images. α and β m represent the weight coefficient items, c between them1And c2Are two constants.
Figure 473022DEST_PATH_IMAGE037
Wherein D (x, y) is L1The distance between the first and second electrodes,
Figure 353253DEST_PATH_IMAGE038
is the i-th hidden feature from the VGG model.
Figure 712560DEST_PATH_IMAGE039
LMIXIs a global loss function, where1λ2λ3,
Figure 15669DEST_PATH_IMAGE040
Is for balancing the loss function LMIXWeight coefficient of importance.
To test the generalization ability of the resulting network of step 4, it was verified using a test set. The Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) are used as evaluation indexes. PSNR is an objective standard for evaluating images, often used as a measure of the quality of signal reconstruction, and is used to measure the ratio of the average energy between the peak signal and the background noise, in dB, with larger values indicating less distortion. Given a set of images I and O, the PSNR is:
Figure 425921DEST_PATH_IMAGE041
where MSE is the mean square error, MAX, of the two imagesIThe maximum pixel value of I.
The PSNR evaluates the image quality based on the error between corresponding pixel points, and does not consider the visual characteristics of human eyes, namely, the human eyes have higher sensitivity to the contrast difference with lower spatial frequency and higher sensitivity to the brightness contrast difference, and the perception result of the human eyes to one region is influenced by the surrounding adjacent regions, so that the condition that the evaluation result is inconsistent with the subjective feeling of the human is often caused. SSIM is a full-reference image quality evaluation index, measures the similarity of images from three aspects of brightness, contrast and structure, and can keep consistent with human visual perception on the whole. SSIM is defined as follows:
Figure 460742DEST_PATH_IMAGE042
wherein
Figure 58077DEST_PATH_IMAGE043
And
Figure 469336DEST_PATH_IMAGE044
are respectively as
Figure 734095DEST_PATH_IMAGE045
Mean and variance of;
Figure 815183DEST_PATH_IMAGE046
and
Figure 414661DEST_PATH_IMAGE047
mean and variance of O, respectively;
Figure 114763DEST_PATH_IMAGE048
covariance as I and O;
Figure 217718DEST_PATH_IMAGE049
Figure 610653DEST_PATH_IMAGE050
Figure 431847DEST_PATH_IMAGE051
and
Figure 935641DEST_PATH_IMAGE052
are fixed values of 0.01 and 0.03, respectively; l is the range of pixel values.
The present invention has been described in terms of the preferred embodiment, and it is not intended to be limited to the embodiment. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A low-illumination image enhancement model based on a linear self-attention mechanism is characterized in that: the linear array self-attention method can directly deduce 3-D global attention weight from feature maps, and then refines the feature maps, wherein the refined feature maps can implicitly couple local and global relationships by adjusting local feature maps by using the global weight, so that the cost of training and deploying models can be reduced, and the method specifically comprises the following steps:
1) firstly, designing a convolutional neural network capable of performing end-to-end training;
2) initializing the convolutional neural network in the step 1 by a Kaiming network parameter initialization method;
3) linear attention first encodes the feature map into two-dimensional feature codes along the vertical and horizontal directions, respectively;
4) constructing a global representation using a self-attention mechanism;
5) generating a 3-D global attention weight by a multilayer perceptron (MLP) and a sigmoid activation function;
6) and evaluating the obtained algorithm and outputting a corresponding test result.
2. The linear attention mechanism-based low-illumination image enhancement model of claim 1, wherein a convolutional neural network is designed for end-to-end training, which embeds a channel attention module and a spatial attention module, while using residual connections and dense connections over the network connection.
3. The linear attention mechanism-based low-illuminance image enhancement model of claim 1, wherein the convolutional neural network of 1) is initialized using a Kaiming network parameter initialization method.
4. For linear attention, the feature map F ∈ RC×H×WEncoding into a pair of two-dimensional feature codes F along the longitudinal and transverse axesx∈RC×1×W,Fy∈RC×H×1
5. The linear attention mechanism-based low-illumination image enhancement model of claim 1, wherein the loss values are calculated using multi-scale structural loss.
6. The linear attention mechanism-based low-illumination image enhancement model of claim 1, wherein the finally trained network is tested on a test set, and evaluation indexes adopted are Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM).
CN202210337183.0A 2022-04-01 2022-04-01 Low-illumination image enhancement model based on linear attention mechanism Pending CN114708163A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210337183.0A CN114708163A (en) 2022-04-01 2022-04-01 Low-illumination image enhancement model based on linear attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210337183.0A CN114708163A (en) 2022-04-01 2022-04-01 Low-illumination image enhancement model based on linear attention mechanism

Publications (1)

Publication Number Publication Date
CN114708163A true CN114708163A (en) 2022-07-05

Family

ID=82170067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210337183.0A Pending CN114708163A (en) 2022-04-01 2022-04-01 Low-illumination image enhancement model based on linear attention mechanism

Country Status (1)

Country Link
CN (1) CN114708163A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950649A (en) * 2020-08-20 2020-11-17 桂林电子科技大学 Attention mechanism and capsule network-based low-illumination image classification method
CN112435191A (en) * 2020-11-25 2021-03-02 西安交通大学 Low-illumination image enhancement method based on fusion of multiple neural network structures
CN113096017A (en) * 2021-04-14 2021-07-09 南京林业大学 Image super-resolution reconstruction method based on depth coordinate attention network model
CN114170095A (en) * 2021-11-22 2022-03-11 西安理工大学 Low-illumination image enhancement method combining Transformers and CNN

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950649A (en) * 2020-08-20 2020-11-17 桂林电子科技大学 Attention mechanism and capsule network-based low-illumination image classification method
CN112435191A (en) * 2020-11-25 2021-03-02 西安交通大学 Low-illumination image enhancement method based on fusion of multiple neural network structures
CN113096017A (en) * 2021-04-14 2021-07-09 南京林业大学 Image super-resolution reconstruction method based on depth coordinate attention network model
CN114170095A (en) * 2021-11-22 2022-03-11 西安理工大学 Low-illumination image enhancement method combining Transformers and CNN

Similar Documents

Publication Publication Date Title
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN112614077B (en) Unsupervised low-illumination image enhancement method based on generation countermeasure network
CN111582483B (en) Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
CN111709895A (en) Image blind deblurring method and system based on attention mechanism
CN111915526A (en) Photographing method based on brightness attention mechanism low-illumination image enhancement algorithm
CN111292264A (en) Image high dynamic range reconstruction method based on deep learning
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
CN114170286B (en) Monocular depth estimation method based on unsupervised deep learning
CN111738948B (en) Underwater image enhancement method based on double U-nets
CN110458765A (en) The method for enhancing image quality of convolutional network is kept based on perception
CN111047543A (en) Image enhancement method, device and storage medium
Fan et al. Multiscale cross-connected dehazing network with scene depth fusion
CN116486074A (en) Medical image segmentation method based on local and global context information coding
CN115035171A (en) Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion
Feng et al. Low-light image enhancement algorithm based on an atmospheric physical model
CN114708615B (en) Human body detection method based on image enhancement in low-illumination environment, electronic equipment and storage medium
CN113781375A (en) Vehicle-mounted vision enhancement method based on multi-exposure fusion
CN113393510B (en) Image processing method, intelligent terminal and storage medium
CN115311149A (en) Image denoising method, model, computer-readable storage medium and terminal device
CN116912114A (en) Non-reference low-illumination image enhancement method based on high-order curve iteration
CN116563141A (en) Mars surface image enhancement method based on convolutional neural network
CN116309171A (en) Method and device for enhancing monitoring image of power transmission line
CN114708163A (en) Low-illumination image enhancement model based on linear attention mechanism
CN115619674A (en) Low-illumination video enhancement method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination