CN116664446A - Lightweight dim light image enhancement method based on residual error dense block - Google Patents

Lightweight dim light image enhancement method based on residual error dense block Download PDF

Info

Publication number
CN116664446A
CN116664446A CN202310783361.7A CN202310783361A CN116664446A CN 116664446 A CN116664446 A CN 116664446A CN 202310783361 A CN202310783361 A CN 202310783361A CN 116664446 A CN116664446 A CN 116664446A
Authority
CN
China
Prior art keywords
layer
convolution
input
image
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310783361.7A
Other languages
Chinese (zh)
Inventor
顾国华
徐秀钰
万敏杰
王佳节
龚晟
陈钱
韶阿俊
许运凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202310783361.7A priority Critical patent/CN116664446A/en
Publication of CN116664446A publication Critical patent/CN116664446A/en
Pending legal-status Critical Current

Links

Classifications

    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a lightweight dim light image enhancement method based on a residual error dense block, which comprises the following steps: acquiring paired dim light image data sets; building a condition generation countermeasure network model, including designing a lightweight generator network and a full convolution discriminator network based on a residual error density block and a channel attention mechanism; determining a multi-modal loss function based on global similarity loss, structural similarity loss, content similarity loss, color similarity loss, and local texture loss; training and testing conditions generates an countermeasure network model. The invention can process the darklight image with the size of 400 x 600 at the speed of 36 frames per second on the RTX2080Ti display card, the darklight image enhanced by the invention is more in line with the observation habit of human eyes, and the darklight image is superior in common image evaluation indexes such as peak signal-to-noise ratio, structural similarity and the like.

Description

Lightweight dim light image enhancement method based on residual error dense block
Technical Field
The invention relates to the field of dark light image enhancement, in particular to a lightweight dark light image enhancement method based on a residual error density block.
Background
In the process of dim light imaging, the acquired dim light image is influenced by environmental factors, equipment factors, human factors and the like, and a series of problems such as low brightness, color cast, low contrast, poor visibility and the like are presented, so that the machine vision tasks such as subsequent scene understanding, target tracking and the like are not facilitated. Conventional methods to increase the ISO of the camera may make the image brighter, but may also increase noise and create color differences. The method of prolonging the exposure time to obtain a better dim light image is only suitable for shooting static scenes, the imaging quality is sensitive to camera shake, a stable platform is needed, and otherwise, blurring is inevitably generated. Therefore, the dim light image enhancement technology capable of improving the quality of the dim light image with lower shooting cost becomes one of important research directions in the current image processing field, and has important practical application value.
In recent decades, researchers in various countries have conducted extensive researches on a dim light image enhancement algorithm in order to solve the problems of low brightness, high color distortion, high noise level, loss of scene texture details and the like in a dim light image, and the dim light image enhancement algorithm can be generally classified into a dim light image enhancement algorithm based on a traditional method and a dim light image enhancement algorithm based on deep learning. The traditional method-based dim light image enhancement algorithm is used for realizing image enhancement according to a physical model. Ren et al designed a successive decomposition Joint Enhancement and Denoising (JED) method based on the Retinex model, which enhanced the dim light image by sequentially estimating the piecewise smoothed luminance and noise suppressed reflectance maps (Ren X, li M, cheng WH, et al, joint enhancement and denoising method via sequential decomposition [ C ]//2018IEEE international symposium on circuits and systems (ISCAS). IEEE, 2018:1-5.). The method can obtain good denoising and contrast enhancement results under certain conditions, but the algorithm has poor applicability and low processing speed.
The dark light image enhancement algorithm based on deep learning depends on the strong data fitting capability of the neural network, and plays an increasingly important role in the field of dark light image enhancement. Chen et al propose a Retinex model-based deep optic neural network, retinex-Net, including Decom-Net for decomposition and enhancement-Net for illumination adjustment (Wei C, wang W, yang W, et al deep Retinex decomposition for low-light enhancement [ J ]. ArXivpreprint arXiv:1808.04560,2018.). This approach visually achieves satisfactory dim light enhancement quality, but image noise is not effectively removed. Zhang et al propose a network model called KinDNet for denoising and color correction of dark images (Zhang Y, zhang J, guo X. Kindling the dark; A practical low-light image enhancer [ C ]// Proceedings of the 27th ACM international conference on multimedia.2019:1632-1640.). The method is excellent in image denoising, color cast correction and other dark light image enhancement tasks, but has the defect of detail reservation. Wu et al propose a deep expansion network URetinex-Net based on the Retinex model, comprising three learning-based modules (Wu W, weng J, zhang P, et al Uretinex-Net: retinex-based deep unfolding network for low-light image enhancement [ C ]// Proceedings ofthe IEEE/CVF Conference on Computer Vision and Pattern Recognizion.2022:5901-5910.). The method can realize noise suppression and detail preservation of the dim light image, but the processing speed is required to be improved. Xu et al have proposed SNR network models (Xu X, wang R, fu C W, et al SNR-aware low-light image enhancement [ C ]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recovery.2022:17714-17724) using signal-to-noise perceptual transformers and convolution models. The enhanced image of this method has superior perceived quality, but the image detail is lost.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a lightweight dim light image enhancement method based on a residual error dense block, so as to overcome the defects of the traditional dim light image enhancement algorithm in noise suppression, detail reservation, processing speed and the like.
The technical scheme for realizing the purpose of the invention is as follows: a lightweight dim light image enhancement method based on a residual error dense block comprises the following steps:
step 1, acquiring paired dark light image data sets, wherein the paired dark light image data sets consist of dark light images and corresponding normal light images;
step 2, constructing a condition generation countermeasure network model, wherein the condition generation countermeasure network model comprises a generator network and a discriminator network, the generator network is a lightweight network based on a residual error density block and a channel attention mechanism, and the discriminator network is a full convolution network;
step 3, determining a multi-mode loss function for measuring the difference between the predicted value and the true value of the model, wherein the loss function consists of global similarity loss, structural similarity loss, content similarity loss, color similarity loss and local texture loss;
step 4, performing countermeasure training on the condition generation countermeasure network model by using training images in the pair of dim light image data sets, and obtaining a loss value for optimizing a network through a multi-mode loss function until a generator network model with good prediction performance is obtained;
and 5, inputting the collected dim light image into a trained generator network model to obtain an enhanced image.
Preferably, the generator network comprises an input layer, a hidden layer and an output layer, wherein the input layer is used for inputting a dark-light image of three channels of RGB, the hidden layer is used for extracting characteristics of the input image through convolution operation, and the output layer is used for outputting a processing result; the hidden layer comprises 3 residual error dense blocks; the specific structure of the hidden layer of the generator is as follows:
convolution layer 1: taking an image of M x N x 3 input by an input layer as input, and after the image is activated by a 3*3 convolution kernel convolution with the step length of 1 and a LeakyReLU activation function, outputting a characteristic image of M x N with the channel number of 32, wherein M and N are the length and the width of the input image respectively;
convolution layer 2: taking a characteristic diagram of M x N x 32 output by the convolution layer 1 as an input, and outputting the characteristic diagram of M x N with 32 channels after the convolution of a 3*3 convolution kernel with the step length of 1 and activation of a LeakyReLU activation function;
residual dense block 1: taking the M x N x 32 feature map output by the convolution layer 2 as input, and outputting the M x N feature map with the channel number of 32 after passing through 3 dense connecting layers, 1 local feature fusion layer, 1 channel attention layer and 1 local residual error learning layer;
residual error dense block 2: taking the M x N x 32 feature map output by the residual error dense block 1 as input, and outputting the M x N feature map with the channel number of 32 after passing through 3 dense connecting layers, 1 local feature fusion layer, 1 channel attention layer and 1 local residual error learning layer;
residual error dense block 3: taking the M x N x 32 feature map output by the residual error dense block 2 as input, and outputting the M x N feature map with the channel number of 32 after passing through 3 dense connecting layers, 1 local feature fusion layer, 1 channel attention layer and 1 local residual error learning layer;
connection layer 1: inputting 3 feature graphs processed by the residual error dense block 1, the residual error dense block 2 and the residual error dense block 3, and outputting M-N feature graphs with 96 channels after being connected by a connecting layer;
channel attention layer 1: inputting a characteristic diagram of M x N x 96 output by a connecting layer, and outputting the characteristic diagram of M x N of 96 channels after a channel attention mechanism is introduced to distinguish the characteristic diagram from the channels for attention;
convolution layer 3: taking a characteristic diagram of M x N x 96 output by the channel attention layer 1 as input, and outputting a characteristic diagram of M x N with 32 channels after convolution of 3*3 convolution kernels with the step length of 1;
convolution layer 4: taking the M x N x 32 characteristic diagram output by the convolution layer 3 as input, and outputting the M x N characteristic diagram with the channel number of 32 after the 3*3 convolution kernel convolution with the step length of 1 and the activation of the LeakyReLU activation function are activated;
residual learning layer 1: 2 feature maps processed by the convolution layer 1 and the convolution layer 4 are input, and after pixel-by-pixel addition operation, a feature map with M x N and 32 channels is output;
convolution layer 5: taking the M x N x 32 feature map output by the residual learning layer 1 as an input, and outputting the M x N feature map with 3 channels after the 3*3 convolution kernel convolution with 3 steps of 1 and the sigmoid activation function activation.
Preferably, the discriminator includes an input layer, a hidden layer, and an output layer, where the input layer is used to input a dark light picture of RGB three channels and a picture to be discriminated of RGB three channels, the hidden layer is used to perform convolution calculation on an input image, and the output layer is used to output a discrimination result, and the discriminator hidden layer is composed of 6 convolution layers, specifically:
convolution layer 1: taking M1N 1 characteristic graphs obtained after two pictures input by an input layer are connected in a channel dimension as input, and after the two pictures are activated by a 3*3 convolution kernel convolution with 32 steps and a relu activation function, outputting a characteristic graph with 32 channels (M1/2) and (N1/2), wherein the two pictures input by the input layer comprise dark light pictures with the size of M1N 1 and pictures to be identified with the size of M1N 1, M1 and N1 are respectively the length and the width of an image, and M1 and N1 are integer multiples of 16;
convolution layer 2: taking an (M1/2) x (N1/2) x 32 characteristic diagram output by the convolution layer 1 as an input, and outputting an (M1/4) x (N1/4) characteristic diagram with 64 channels after the convolution of a 3*3 convolution kernel with the step length of 2 and activation of a relu activation function;
convolution layer 3: taking an (M1/4) x (N1/4) x 64 characteristic diagram output by the convolution layer 2 as an input, and outputting an (M1/8) x (N1/8) characteristic diagram with 128 channels after the convolution of a 3*3 convolution kernel with the step length of 2 and activation of a relu activation function;
convolution layer 4: taking the 32 x 128 characteristic diagram output by the convolution layer 3 as an input, and after 256 steps of 3*3 convolution kernel convolution with 2 and activation of a relu activation function, outputting a characteristic diagram with 256 channels (M1/16) ×N 1/16);
convolution layer 5: taking an (M1/8) x (N1/8) x 256 feature map output by the convolution layer 4 as an input, and outputting a feature map with the number of channels being 1 (M1/16) x (N1/16) after convolution of a 3*3 convolution kernel with the step length being 1;
convolution layer 6: taking the (M1/16) x (N1/16) x 1 feature map output by the convolution layer 5 as input, and outputting the feature map (M1/16) x (N1/16) with the channel number of 1, namely an information distribution matrix after convolution by a 3*3 convolution kernel with the step length of 1.
Preferably, the loss function in step 3 is:
in the formula ,LcGAN The discriminator loss calculated for the discriminator, referred to herein as local texture loss, λ 1 、λ S 、λ C and λP For superparameter for adjusting specific gravity, L 1 For global similarity loss, L S For structural similarity loss, L C For content similarity loss, L P G is generator map and D is discriminator map for color similarity loss.
Preferably, the local texture loss calculated by the discriminator is specifically:
L cGAN (G,D)=E X,Y [logD(Y)]+E X,Y [log(1-D(X,G(X,Z)))]
wherein X and Y respectively represent a dark light image to be enhanced and a corresponding normal light image, Z represents random noise inputted, E X,Y Mean values calculated after pixel-by-pixel operation with X and Y as arguments, G for generator map, D for discriminator map.
Preferably, the global similarity loss is specifically:
wherein , and />The function mapping relationship of (a) is the same, taking c= { r, g, b }, then||*|| 1 Representing 1-norm operations, ω r 、ω g and ωb Is calculated in the same way by and />The three are calculated, and if c= { r, g, b }, thenIs-> and />And the sum of the three.
Preferably, the structural similarity loss is specifically:
L S (G)=1-SSIM(Y,G(X,Z))
wherein SSIM represents a structural similarity calculation function, X and Y represent a dark light image to be enhanced and a corresponding normal light image, respectively, and Z represents an input random noise image.
Preferably, the content similarity loss is specifically:
wherein Θ (·) represents a feature extraction function corresponding to the block5_conv2 layer in the VGG-19 pre-training model, E X,Y,Z [*]Representing matrix variables X, Y, Z, the average value of the matrix obtained after calculation according to the formula.
Preferably, the color similarity loss is specifically:
L P (G)=delta_E(Y,G(X,Z))
where delta_e represents a weighted euclidean-based color difference calculation function, X and Y represent the desired enhanced darklight image and the corresponding normal light image, respectively, and Z represents the random noise of the input.
Preferably, the specific steps of generating the countermeasure network model under the training conditions in the step 4 are as follows:
assigning values to the super parameters required in the training process;
loading the training image pair in the step 1;
initializing generator optimizer and generator network parameters;
the generator generates an enhanced image according to the input dim light image and random noise;
initializing a discriminator optimizer and discriminator network parameters;
the discriminator generates an information matrix according to the input dim light image and the image to be discriminated, when the image to be discriminated is an image enhanced by the generator, the expected output of the discriminator is a zero matrix, and when the image to be discriminated is a normal light image corresponding to the dim light image, the expected output of the discriminator is a matrix;
the generator optimizer updates the generator network parameters according to the generator loss function values to minimize the generator loss functions;
the discriminator optimizer updates the discriminator network parameters according to the discriminator loss function value to maximize the discriminator loss function;
loading a new training image pair, repeating the steps by using the updated network parameters of the generator and the discriminator, and ending the cycle when the generator and the discriminator model obtained by training meet the objective function condition;
saving a model which enables the generator loss function to reach the minimum value as a final dim light image enhancement model;
the objective function of the lightweight dim light image enhancement model based on the residual error dense block is as follows:
where G is a mapping obtained from training the generator network and D is a mapping obtained from training the discriminator network.
Compared with the prior art, the invention is characterized in that: (1) According to the invention, a condition generation countermeasure network is taken as a main structure, a generator based on a residual error dense block structure is built to strengthen feature transmission, so that image details are better kept, a attention mechanism is introduced into the generator to realize a dynamic adjustment function of feature weights, so that the image processing efficiency is improved, and attention is limited to local areas with different scales of pictures by using a PatchGan discriminator, so that the details of the local areas of the images are sharpened. (2) The invention designs a multi-mode loss function for measuring the difference between the predicted value and the true value of the model, and the loss function consists of global similarity loss, structural similarity loss, content similarity loss, color similarity loss and local texture loss, so that the model training direction can be better guided. (3) The average processing frame number on the RTX2080Ti display card is about 36 frames (400 x 600 size pictures), and the invention has excellent performance in qualitative and quantitative experiments.
Drawings
Fig. 1 is a schematic flow chart of a method for enhancing a dark-light image based on a residual error density block.
FIG. 2 is a diagram of a network structure of a conditional generation countermeasure network generator and discriminator of the design.
Fig. 3 is a schematic diagram of test results of 15 sets of test charts in the LOL darkness image dataset, wherein from left to right, there are respectively (a) an original darkness image, (b) a linear brightness image, (c) a JED method processing result chart, (d) a Retinex-Net method processing result chart, (e) a kindnaet method processing result chart, (f) a URetinex-Net method processing result chart, (g) an SNR method processing result chart, (h) an inventive method processing result chart, and (i) a normal light image.
Detailed description of the preferred embodiments
The invention provides a lightweight dim light image enhancement method based on a residual error density block, which is used for improving the quality of a dim light image, enabling the dim light image to accord with the observation habit of human eyes and facilitating the subsequent machine vision task.
In order to more stably solve noise and color cast in a dim light image, the invention selects a condition generation countermeasure network model which is favorable for improving network robustness as a network main body structure. The condition generating countermeasure network consists of a generator network and a discriminator network, wherein the generator network is used for completing the image enhancement tasks of brightness adjustment, color correction, noise suppression, scene texture detail restoration and the like, and the discriminator network is used for authenticating the authenticity of the image so as to adjust the training direction of the generator network. In order to avoid the loss of characteristic information in the dim light image in the generator network transmission, the invention selects the residual error density block with the characteristic propagation enhancement and the characteristic multiplexing encouragement as the basic structure of the lightweight generator network, and simultaneously, in order to select useful channel information to further improve the processing efficiency of the generator network, the invention also introduces a channel attention mechanism in the generator network. In addition, the invention utilizes the PatchGAN discriminator network which can limit the attention to the local areas of different scales of the image to identify the authenticity of the local areas of the image, thereby improving the detail capability of the local areas sharpened by the generator network.
The invention also designs a multi-mode loss function for measuring the difference between the predicted value and the true value of the model, which consists of global similarity loss, structural similarity loss, content similarity loss, color similarity loss and local texture loss.
The invention provides a lightweight dim light image enhancement method based on a residual error dense block, which comprises the following steps:
in step 1, in order to enable the designed condition countermeasure generation network model to complete the task of enhancing the dim light image, the invention utilizes an LOL data set which is manufactured by Ren and comprises 500 pairs of real image data to complete the training of the network model. Ren et al realize the acquisition of dark and normal light images in the same scene by fixing other parameters of the camera and changing the exposure time and ISO sensitivity. The data set 500 is divided into 485 pairs of training images and 15 pairs of test images according to purposes, the training images are utilized to train a model, and the test images are utilized to test the trained model.
Step 2, constructing a condition generation countermeasure network model, wherein the condition generation countermeasure network model comprises a generator network and a discriminator network, the generator network is a lightweight network based on a residual error density block and a channel attention mechanism, and the discriminator network is a full convolution network;
and 2.1, the generator network comprises an input layer, a hidden layer and an output layer, wherein the input layer of the generator is used for inputting RGB three-channel dim light images, the hidden layer of the generator is used for carrying out feature extraction and restoration on the input images through convolution operation, and the output layer of the generator is used for outputting processing results. Taking the example that the input layer of the generator inputs a dark light image with the size of 256 x 3, the specific structure of the hidden layer of the generator is as follows:
convolution layer 1: taking 256 x 3 images input by an input layer as input, and outputting 256 x 256 feature maps with the channel number of 32 after the images are subjected to 3*3 convolution kernel convolution with the step length of 1 and activation of a LeakyReLU activation function;
convolution layer 2: taking 256×256×32 feature maps output by the convolution layer 1 as input, and after the convolution of the 3*3 convolution kernels with the step length of 1 and activation of the activation function of the LeakyReLU, outputting 256×256 feature maps with the channel number of 32;
residual dense block 1: taking 256 x 32 feature images output by the convolution layer 2 as input, and outputting 256 x 256 feature images with the channel number of 32 after passing through 3 dense connecting layers, 1 local feature fusion layer, 1 channel attention layer and 1 local residual error learning layer;
residual error dense block 2: taking 256×256×32 feature maps output by the residual error dense block 1 as input, and outputting 256×256 feature maps with 32 channels after passing through 3 dense connection layers, 1 local feature fusion layer, 1 channel attention layer and 1 local residual error learning layer;
residual error dense block 3: taking 256×256×32 feature maps output by the residual error dense block 2 as input, and outputting 256×256 feature maps with 32 channels after passing through 3 dense connection layers, 1 local feature fusion layer, 1 channel attention layer and 1 local residual error learning layer;
connection layer 1: inputting 3 feature graphs processed by the residual error dense block 1, the residual error dense block 2 and the residual error dense block 3, and outputting 256-256 feature graphs with 96 channels after being connected in the channel dimension by a connecting layer;
channel attention layer 1: 256×256×96 feature maps output by the input connection layer output 256×256 feature maps with 96 channels after the feature maps are focused on the channels by the channel focusing mechanism;
convolution layer 3: taking 256×256×96 feature maps output by the channel attention layer 1 as input, and outputting 256×256 feature maps with the channel number of 32 after the convolution of 3*3 convolution kernels with the step length of 1;
convolution layer 4: taking 256×256×32 feature maps output by the convolution layer 3 as input, and after the convolution of the 3*3 convolution kernel with the step length of 1 and activation of the activation function of the LeakyReLU, outputting 256×256 feature maps with the channel number of 32;
residual learning layer 1: 2 feature maps processed by the convolution layer 1 and the convolution layer 4 are input, and 256-by-256 feature maps with 32 channels are output after pixel-by-pixel addition operation;
convolution layer 5: taking 256×256×32 feature maps output by the residual learning layer 1 as input, and outputting 256×256 feature maps with 3 channels after the 3*3 convolution kernel convolution with 3 steps of 1 and the sigmoid activation function activation;
at this time, the output of the generator output layer is 256×256×3 dark-light image enhancement map.
The network structure of the generator is shown in fig. 2. The convolutional layer defaults to 3*3 size.
Step 2.2, the discriminator network structure also comprises an input layer, a hidden layer and an output layer, wherein the discriminator input layer is used for inputting RGB three-channel dim light pictures and RGB three-channel pictures to be discriminated, the discriminator hidden layer is used for carrying out convolution calculation on input images, and the discriminator output layer is used for outputting discrimination results. Taking an example that the identifier input layer inputs a dark light picture with a size of 256×256×3 and a picture to be identified with a size of 256×256×3, the specific structure of the identifier hidden layer is as follows:
convolution layer 1: 256 x 6 feature maps obtained after two pictures input by an input layer are connected in a channel dimension are taken as input, and 128 x 128 feature maps with 32 channels are output after the convolution of 3*3 convolution kernels with the step length of 2 and activation of a relu activation function are carried out;
convolution layer 2: taking 128 x 32 characteristic diagrams output by the convolution layer 1 as input, and outputting 64 x 64 characteristic diagrams with the channel number of 64 after the convolution of the 3*3 convolution kernel with the step length of 2 and activation of a relu activation function;
convolution layer 3: taking the 64 x 64 feature map output by the convolution layer 2 as input, and outputting a 32 x 32 feature map with 128 channels after the feature map is activated by a 3*3 convolution kernel convolution with the step length of 2 and a relu activation function;
convolution layer 4: taking the 32 x 128 feature map output by the convolution layer 3 as input, and after 256 steps of 3*3 convolution kernel convolution with 2 and activation of a relu activation function, outputting a 16 x 16 feature map with 256 channels;
convolution layer 5: taking the 16×16×256 feature map output by the convolution layer 4 as input, and outputting the 16×16 feature map with the number of channels being 1 after convolution by a 3*3 convolution kernel with the step length being 1;
convolution layer 6: the 16×16×1 feature map output by the convolution layer 5 is taken as an input, and after convolution by a 3*3 convolution kernel with 1 step length being 1, the 16×16 feature map with 1 channel number is output, namely an information distribution matrix.
At this time, the output of the discriminator output layer is based on a 16×16×1 prediction information matrix of the image to be measured, and each number in the matrix represents the probability that the corresponding local area is true.
The discriminator network architecture is shown in figure 2.
And 3, designing a multi-mode loss function for measuring the difference between the predicted value and the true value of the model, wherein the loss function consists of global similarity loss, structural similarity loss, content similarity loss, color similarity loss and local texture loss.
The loss function designed by the invention is as follows:
in the formula ,LcGAN The discriminator loss calculated for the discriminator, referred to herein as local texture loss, λ 1 、λ S 、λ C and λP For adjusting the super parameters of specific gravity, 0.35, 0.30, 0.05, L are respectively taken according to the test process 1 For global similarity loss, L S For structural similarity loss, L C For content similarity loss, L P G is generator map and D is discriminator map for color similarity loss.
The local texture loss calculated by the discriminator is specifically:
L cGAN (G,D)=E X,Y [logD(Y)]+E X,Y [log(1-D(X,G(X,Z)))]
wherein X and Y respectively represent a dark light image to be enhanced and a corresponding normal light image, Z represents random noise inputted, E X,Y Mean values calculated after pixel-by-pixel operation with X and Y as arguments, G for generator map, D for discriminator map.
The global similarity loss is specifically:
wherein ,
the structural similarity loss is specifically:
L S (G)=1-SSIM(Y,G(X,Z))
wherein SSIM represents a structural similarity calculation function.
The content similarity loss is specifically:
L C (G)=E X,Y,Z [||Θ(Y)-Θ(G(X,Z))|| 2 ]
wherein Θ (·) represents a feature extraction function corresponding to the block5_conv2 layer in the VGG-19 pre-training model, which is used for extracting content feature information of the image to be detected.
The color similarity loss is specifically:
L P (G)=delta_E(Y,G(X,Z))
where delta_e represents a weighted euclidean based color difference calculation function.
And 4, performing countermeasure training on the condition generation countermeasure network model by using training images in the pair of dim light image data sets, and obtaining a loss value for optimizing the network through a multi-mode loss function until a generator network model with good prediction performance is obtained. The training process is as follows:
first, the hyper-parameters required in the training process are assigned, and the selection of the hyper-parameters depends on past experimental experience and experimental process. The value of the hyper-parameters in the loss function is shown in step 3. In addition, in order to improve the stability of the condition countermeasure generation network model during training, an Adam optimizer is selected to optimize the network model. Notably, to avoid the problem of pattern collapse, the generator network and the arbiter network each use Adam optimizers of different initial learning rates. Wherein the initial learning rate of the generator network optimizer is 1×10-4 and the initial learning rate of the discriminator network optimizer is 9×10-7.
During training, 4000 rounds of training were performed using the LOL training data set in total, and the batch size was set to 4. Before loading the training image pair, the training image pair needs to be preprocessed: the training image is cut into image blocks according to 256×256 size randomly, and each image block is subjected to random rotation, overturn and other processing to perform data enhancement operation.
The generator network parameters and the identifier network parameters are initialized before the start of the countermeasure training, and in order to ensure the effective mobility of the forward propagation and the backward propagation of the information, the neural generator network parameters and the identifier network parameters are initialized by adopting a He initialization method.
The initialized generator generates an enhanced image from the input darklight image and random noise.
The initialized discriminator generates an information matrix according to the input dark light image and the image to be discriminated, when the image to be discriminated is an image enhanced by the generator, the expected output of the discriminator is a zero matrix, and when the image to be discriminated is a normal light image corresponding to the dark light image, the expected output of the discriminator is a matrix.
The generator optimizer updates the generator network parameters according to the generator loss function values calculated from the enhanced image output by the generator and the information matrix output by the discriminator according to the multi-modal loss function designed by the invention to achieve the minimization of the generator loss function.
The discriminator optimizer updates the discriminator network parameters according to the discriminator loss function value to maximize the discriminator loss function, wherein the discriminator loss function value is calculated by the information matrix output by the discriminator according to the discriminator loss function, namely the local texture loss, in the multi-mode loss function.
Loading a new training image pair, and repeating the steps by using the updated generator and identifier network parameters until the circulation is finished;
the model that minimizes the generator loss function is saved as the final dim image enhancement model.
In summary, the objective function of the lightweight dim light image enhancement model based on the residual error dense block is:
where G is a mapping obtained from training the generator network and D is a mapping obtained from training the discriminator network.
And 5, inputting the test images in the dim light image data set into a trained generator network model, and obtaining the enhanced image.
Examples
The experimental example compares the method provided by the invention with the traditional method JED and four deep learning methods Retinex-Net, kinDNet, URetinex-Net, and the SNR is tested in the LOL dim light image data set mentioned in the step 1. The JED method is a joint enhancement and denoising method based on a Retinex model, and can enhance a dim light image by successively decomposing a piecewise smooth illumination map and a noise-suppressed reflectivity map. The Retinex-Net method designs two sub-networks of Decom-Net for decomposition and enhancement-Net for illumination adjustment based on the Retinex model for enhancement of dim-light images. The KinDNet method also designs two branches for processing reflectivity and illumination, respectively, for denoising and color correction of dark images based on the Retinex model. Deep expansion network URetinex-Net, then through three learning-based modules: the system comprises an initialization module, an unfolding optimization module and an illumination adjustment module, wherein the initialization module, the unfolding optimization module and the illumination adjustment module are used for realizing noise suppression and detail preservation of a dim light image. The SNR method is based on a signal noise sensing transformer and a convolution model, so that the sensing quality of the enhanced dim light image is improved. The test results of each method are shown in fig. 3, wherein the test results are respectively (a) an original dim light image, (b) a linear brightness image, (c) a JED method processing result image, (d) a KinDNet method processing result image, (e) a Retinex-Net method processing result image, (f) an SNR method processing result image, (g) an inventive method processing result image, and (h) a normal light image from left to right. The JED method can obtain good denoising and contrast enhancement results under certain conditions, but the algorithm has poor applicability, and noise still exists in part of test patterns. The Retinex-Net approach visually achieves satisfactory dim light enhancement quality, but image noise is not effectively removed. The KinDNet method is excellent in image denoising, color cast correction and other dark light image enhancement tasks, but has a defect in detail preservation. The URetinex-Net method can realize noise suppression and detail preservation of a dark image, but the color correction aspect needs to be improved. SNR method enhanced images have improved perceived quality but lost image detail. In addition, the test experiment also utilizes common image evaluation indexes: the above test images were evaluated for peak signal to noise ratio (PSNR) and Structural Similarity (SSIM). The image processing frame rate of each method was also tested. The test results are shown in Table 1. Wherein, PSNR is used for evaluating an image quality reference value between an image maximum signal value and background noise, and the larger the value is, the less image distortion is. The SSIM is used for evaluating the structural similarity between the processed image and the reference image, and the larger the value is, the better the structural similarity is. The frame rate is used to evaluate the image processing speed, the greater the frame rate, the faster the image processing speed. Obviously, compared with other comparison algorithms, the method provided by the invention has the advantages of optimal enhancement effect and highest processing speed.
Table 1: evaluation index results
In summary, compared with other dark image enhancement methods, the light-weight dark image enhancement method based on the residual error density block provided by the invention can obtain dark image enhancement images with lower noise, less color shift and higher contrast in a shorter processing time.

Claims (10)

1. The lightweight dim light image enhancement method based on the residual error dense block is characterized by comprising the following steps of:
step 1, acquiring paired dark light image data sets, wherein the paired dark light image data sets consist of dark light images and corresponding normal light images;
step 2, constructing a condition generation countermeasure network model, wherein the condition generation countermeasure network model comprises a generator network and a discriminator network, the generator network is a lightweight network based on a residual error density block and a channel attention mechanism, and the discriminator network is a full convolution network;
step 3, determining a multi-mode loss function for measuring the difference between the predicted value and the true value of the model, wherein the loss function consists of global similarity loss, structural similarity loss, content similarity loss, color similarity loss and local texture loss;
step 4, performing countermeasure training on the condition generation countermeasure network model by using training images in the pair of dim light image data sets, and obtaining a loss value for optimizing a network through a multi-mode loss function until a generator network model with good prediction performance is obtained;
and 5, inputting the collected dim light image into a trained generator network model to obtain an enhanced image.
2. The method for enhancing a lightweight dark-light image based on a residual error density block according to claim 1, wherein the generator network comprises an input layer, a hidden layer and an output layer, the input layer is used for inputting a dark-light image of three channels of RGB, the hidden layer is used for extracting characteristics of the input image through convolution operation, and the output layer is used for outputting a processing result; the hidden layer comprises 3 residual error dense blocks; the specific structure of the hidden layer of the generator is as follows:
convolution layer 1: taking an image of M x N x 3 input by an input layer as input, and after the image is activated by a 3*3 convolution kernel convolution with the step length of 1 and a LeakyReLU activation function, outputting a characteristic image of M x N with the channel number of 32, wherein M and N are the length and the width of the input image respectively;
convolution layer 2: taking a characteristic diagram of M x N x 32 output by the convolution layer 1 as an input, and outputting the characteristic diagram of M x N with 32 channels after the convolution of a 3*3 convolution kernel with the step length of 1 and activation of a LeakyReLU activation function;
residual dense block 1: taking the M x N x 32 feature map output by the convolution layer 2 as input, and outputting the M x N feature map with the channel number of 32 after passing through 3 dense connecting layers, 1 local feature fusion layer, 1 channel attention layer and 1 local residual error learning layer;
residual error dense block 2: taking the M x N x 32 feature map output by the residual error dense block 1 as input, and outputting the M x N feature map with the channel number of 32 after passing through 3 dense connecting layers, 1 local feature fusion layer, 1 channel attention layer and 1 local residual error learning layer;
residual error dense block 3: taking the M x N x 32 feature map output by the residual error dense block 2 as input, and outputting the M x N feature map with the channel number of 32 after passing through 3 dense connecting layers, 1 local feature fusion layer, 1 channel attention layer and 1 local residual error learning layer;
connection layer 1: inputting 3 feature graphs processed by the residual error dense block 1, the residual error dense block 2 and the residual error dense block 3, and outputting M-N feature graphs with 96 channels after being connected by a connecting layer;
channel attention layer 1: inputting a characteristic diagram of M x N x 96 output by a connecting layer, and outputting the characteristic diagram of M x N of 96 channels after a channel attention mechanism is introduced to distinguish the characteristic diagram from the channels for attention;
convolution layer 3: taking a characteristic diagram of M x N x 96 output by the channel attention layer 1 as input, and outputting a characteristic diagram of M x N with 32 channels after convolution of 3*3 convolution kernels with the step length of 1;
convolution layer 4: taking the M x N x 32 characteristic diagram output by the convolution layer 3 as input, and outputting the M x N characteristic diagram with the channel number of 32 after the 3*3 convolution kernel convolution with the step length of 1 and the activation of the LeakyReLU activation function are activated;
residual learning layer 1: 2 feature maps processed by the convolution layer 1 and the convolution layer 4 are input, and after pixel-by-pixel addition operation, a feature map with M x N and 32 channels is output;
convolution layer 5: taking the M x N x 32 feature map output by the residual learning layer 1 as an input, and outputting the M x N feature map with 3 channels after the 3*3 convolution kernel convolution with 3 steps of 1 and the sigmoid activation function activation.
3. The light weight dark image enhancement method based on residual error dense blocks according to claim 1, wherein the discriminator comprises an input layer, a hidden layer and an output layer, the input layer is used for inputting a dark light picture of three channels of RGB and a picture to be discriminated of three channels of RGB, the hidden layer is used for carrying out convolution calculation on the input image, the output layer is used for outputting a discrimination result, and the discriminator hidden layer is composed of 6 convolution layers, specifically:
convolution layer 1: taking M1N 1 characteristic graphs obtained after two pictures input by an input layer are connected in a channel dimension as input, and after the two pictures are activated by a 3*3 convolution kernel convolution with 32 steps and a relu activation function, outputting a characteristic graph with 32 channels (M1/2) and (N1/2), wherein the two pictures input by the input layer comprise dark light pictures with the size of M1N 1 and pictures to be identified with the size of M1N 1, M1 and N1 are respectively the length and the width of an image, and M1 and N1 are integer multiples of 16;
convolution layer 2: taking an (M1/2) x (N1/2) x 32 characteristic diagram output by the convolution layer 1 as an input, and outputting an (M1/4) x (N1/4) characteristic diagram with 64 channels after the convolution of a 3*3 convolution kernel with the step length of 2 and activation of a relu activation function;
convolution layer 3: taking an (M1/4) x (N1/4) x 64 characteristic diagram output by the convolution layer 2 as an input, and outputting an (M1/8) x (N1/8) characteristic diagram with 128 channels after the convolution of a 3*3 convolution kernel with the step length of 2 and activation of a relu activation function;
convolution layer 4: taking the 32 x 128 characteristic diagram output by the convolution layer 3 as an input, and after 256 steps of 3*3 convolution kernel convolution with 2 and activation of a relu activation function, outputting a characteristic diagram with 256 channels (M1/16) ×N 1/16);
convolution layer 5: taking an (M1/8) x (N1/8) x 256 feature map output by the convolution layer 4 as an input, and outputting a feature map with the number of channels being 1 (M1/16) x (N1/16) after convolution of a 3*3 convolution kernel with the step length being 1;
convolution layer 6: taking the (M1/16) x (N1/16) x 1 feature map output by the convolution layer 5 as input, and outputting the feature map (M1/16) x (N1/16) with the channel number of 1, namely an information distribution matrix after convolution by a 3*3 convolution kernel with the step length of 1.
4. The residual-dense-block-based lightweight dim-light image enhancement method according to claim 1, wherein the loss function in step 3 is:
in the formula ,LcGAN The discriminator loss calculated for the discriminator, referred to herein as local texture loss, λ 1 、λ S 、λ C and λP For superparameter for adjusting specific gravity, L 1 For global similarity loss, L S For structural similarity loss, L C For content similarity loss, L P G is generator map and D is discriminator map for color similarity loss.
5. The residual-dense-block-based lightweight darklight image enhancement method according to claim 4, wherein the local texture loss calculated by the discriminator is specifically:
L cGAN (G,D)=E X,Y [logD(Y)]+E X,Y [log(1-D(X,G(X,Z)))]
wherein X and Y respectively represent a dark light image to be enhanced and a corresponding normal light image, Z represents random noise inputted, E X,Y Mean values calculated after pixel-by-pixel operation with X and Y as arguments, G for generator map, D for discriminator map.
6. The method for enhancing a lightweight dim light image based on a residual error density block according to claim 4, wherein the global similarity loss is specifically:
wherein , and />The function mapping relationship of (a) is the same, taking c= { r, g, b }, then||*|| 1 Representing 1-norm operations, ω r 、ω g and ωb Is calculated in the same way by and />The three are calculated, and the same is taken as c= { r, g, b }, then +.> Is-> and />And the sum of the three.
7. The method for enhancing a lightweight dim light image based on a residual error density block according to claim 4, wherein the structural similarity loss is specifically:
L S (G)=1-SSIM(Y,G(X,Z))
wherein SSIM represents a structural similarity calculation function, X and Y represent a dark light image to be enhanced and a corresponding normal light image, respectively, and Z represents an input random noise image.
8. The method for enhancing a lightweight dim light image based on a residual error density block according to claim 4, wherein the content similarity loss is specifically:
L C (G)=E X,Y,Z [||Θ(Y)-Θ(G(X,Z))|| 2 ]
wherein Θ (·) represents a feature extraction function corresponding to the block5_conv2 layer in the VGG-19 pre-training model, E X,Y,Z [*]Representing matrix variables X, Y, Z, the average value of the matrix obtained after calculation according to the formula.
9. The method for enhancing a lightweight darklight image based on a residual error density block according to claim 4, wherein the color similarity loss is specifically:
L P (G)=delta_E(Y,G(X,Z))
where delta_e represents a weighted euclidean-based color difference calculation function, X and Y represent the desired enhanced darklight image and the corresponding normal light image, respectively, and Z represents the random noise of the input.
10. The lightweight dim light image enhancement method based on residual dense blocks according to claim 1, wherein the specific step of generating the countermeasure network model under the training condition in step 4 is as follows:
assigning values to the super parameters required in the training process;
loading the training image pair in the step 1;
initializing generator optimizer and generator network parameters;
the generator generates an enhanced image according to the input dim light image and random noise;
initializing a discriminator optimizer and discriminator network parameters;
the discriminator generates an information matrix according to the input dim light image and the image to be discriminated, when the image to be discriminated is an image enhanced by the generator, the expected output of the discriminator is a zero matrix, and when the image to be discriminated is a normal light image corresponding to the dim light image, the expected output of the discriminator is a matrix;
the generator optimizer updates the generator network parameters according to the generator loss function values to minimize the generator loss functions;
the discriminator optimizer updates the discriminator network parameters according to the discriminator loss function value to maximize the discriminator loss function;
loading a new training image pair, repeating the steps by using the updated network parameters of the generator and the discriminator, and ending the cycle when the generator and the discriminator model obtained by training meet the objective function condition;
saving a model which enables the generator loss function to reach the minimum value as a final dim light image enhancement model;
the objective function of the lightweight dim light image enhancement model based on the residual error dense block is as follows:
where G is a mapping obtained from training the generator network and D is a mapping obtained from training the discriminator network.
CN202310783361.7A 2023-06-28 2023-06-28 Lightweight dim light image enhancement method based on residual error dense block Pending CN116664446A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310783361.7A CN116664446A (en) 2023-06-28 2023-06-28 Lightweight dim light image enhancement method based on residual error dense block

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310783361.7A CN116664446A (en) 2023-06-28 2023-06-28 Lightweight dim light image enhancement method based on residual error dense block

Publications (1)

Publication Number Publication Date
CN116664446A true CN116664446A (en) 2023-08-29

Family

ID=87711869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310783361.7A Pending CN116664446A (en) 2023-06-28 2023-06-28 Lightweight dim light image enhancement method based on residual error dense block

Country Status (1)

Country Link
CN (1) CN116664446A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391975A (en) * 2023-12-13 2024-01-12 中国海洋大学 Efficient real-time underwater image enhancement method and model building method thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391975A (en) * 2023-12-13 2024-01-12 中国海洋大学 Efficient real-time underwater image enhancement method and model building method thereof
CN117391975B (en) * 2023-12-13 2024-02-13 中国海洋大学 Efficient real-time underwater image enhancement method and model building method thereof

Similar Documents

Publication Publication Date Title
Wang et al. An experimental-based review of image enhancement and image restoration methods for underwater imaging
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN112509001A (en) Multi-scale and multi-feature fusion feature pyramid network blind restoration method
CN111861901A (en) Edge generation image restoration method based on GAN network
Liu et al. Survey of natural image enhancement techniques: Classification, evaluation, challenges, and perspectives
CN111462012A (en) SAR image simulation method for generating countermeasure network based on conditions
Kim et al. Multiple level feature-based universal blind image quality assessment model
CN113284061B (en) Underwater image enhancement method based on gradient network
CN112150379A (en) Image defogging method and device for enhancing generation of countermeasure network based on perception discrimination
CN111047543A (en) Image enhancement method, device and storage medium
CN116664446A (en) Lightweight dim light image enhancement method based on residual error dense block
CN116563693A (en) Underwater image color restoration method based on lightweight attention mechanism
CN116739899A (en) Image super-resolution reconstruction method based on SAUGAN network
Huang et al. Underwater image enhancement based on color restoration and dual image wavelet fusion
Saleh et al. Adaptive uncertainty distribution in deep learning for unsupervised underwater image enhancement
Yu et al. Two-stage image decomposition and color regulator for low-light image enhancement
CN113379861B (en) Color low-light-level image reconstruction method based on color recovery block
Saleem et al. A non-reference evaluation of underwater image enhancement methods using a new underwater image dataset
Tan et al. Low-light image enhancement with geometrical sparse representation
Babu et al. An efficient image dahazing using Googlenet based convolution neural networks
CN116596792B (en) Inland river foggy scene recovery method, system and equipment for intelligent ship
CN112348762A (en) Single image rain removing method for generating confrontation network based on multi-scale fusion
CN116433516A (en) Low-illumination image denoising and enhancing method based on attention mechanism
CN114119428B (en) Image deblurring method and device
Li et al. LDNet: low-light image enhancement with joint lighting and denoising

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination