CN115082341A

CN115082341A - Low-light image enhancement method based on event camera

Info

Publication number: CN115082341A
Application number: CN202210723127.0A
Authority: CN
Inventors: 金海燕; 王乔斌; 苏浩楠; 肖照林; 蔡磊; 王彬; 刘瑾
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-20

Abstract

The invention discloses a low-light image enhancement method based on an event camera, which comprises the steps of firstly selecting a data set with a normal illumination image and an event stream, requiring the normal illumination image and the event stream to be matched in space, then generating a gradient image and a low-light image containing noise by the normal illumination image, and preprocessing the event stream to obtain an event pseudo image with good edge information; then reconstructing and enhancing the gradient image, designing a characteristic fusion module, adding a condition discriminator, training a neural network built by the gradient branch and the weak light image enhancement branch, storing a model of the neural network, finally testing the model, and outputting the enhanced image. The method guides the enhancement of the low-light image in the image domain through the gradient image reconstructed by the event, and can obtain the normal light image with rich edge information.

Description

Low-light image enhancement method based on event camera

Technical Field

The invention belongs to the technical field of computer digital image processing, and particularly relates to a low-light image enhancement method based on an event camera.

Background

With the advent of the information age, photographing has become an indispensable part of human life, and thus digital image processing has rapidly progressed, and demands for digital image processing have gradually increased.

In the aspect of weak light, the requirement of taking a picture under normal illumination exists, and night scene shooting also becomes an important selection index for selecting a mobile phone. In a picture taken under weak light, if the taking time is short, the picture generally has large noise, and if the exposure time is prolonged, the picture is easily blurred due to motion. The hardware method can cause the cost of the imaging system to rise, meanwhile, the deep learning is rapidly developed, and the software method can be enhanced to obtain a good effect.

The early low-light image enhancement method mostly adopts the traditional method for enhancement, for example, histogram equalization or gamma correction is easy to cause that the image is locally exposed, other areas are insufficiently enhanced, and the overall contrast is insufficient. The computer computing power is remarkably improved, the deep learning is rapidly developed, and better experimental results than the traditional method can be obtained by adopting a deep learning mode. For weak light image enhancement, a single image enhancement mode can be adopted, but for very weak light, the noise seriously destroys the structural information of the image, and the single image enhancement cannot obtain a better result. The event camera records the change of the intensity value, has the capabilities of high dynamic range, low delay and no motion blur, can obtain good event stream even under the condition of weak light, can synthesize an event pseudo image with a good edge information structure, supplements a dark area structure which cannot be shot by the traditional camera, and finally obtains a normal light image with rich texture details.

Disclosure of Invention

The invention aims to provide a low-light image enhancement method based on an event camera, which guides the enhancement of a low-light image in an image domain through a gradient map reconstructed by an event and can obtain a normal light image with rich edge information.

The technical scheme adopted by the invention is that the low-light image enhancement method based on the event camera is implemented according to the following steps:

step 1, data set composition, selecting a data set with a normal illumination image and an event stream, requiring the normal illumination image and the event stream to be matched in space, then generating a gradient image and a noise-containing weak light image from the normal illumination image, and preprocessing the event stream to obtain an event pseudo image with good edge information;

step 2, reconstructing the event pseudo image obtained in the step 1 into a gradient image by adopting UNet gradient branches;

step 3, enhancing the low-light image obtained in the step 1 by adopting a UNet low-light image enhancement branch to obtain an enhanced image;

step 4, designing a feature fusion module, namely adopting a module CBAM based on a channel and space attention, and fusing information contained in the gradient image in the step 2 to the weak light enhancement branch in the step 3;

step 5, adding a condition discriminator, wherein the condition is an event pseudo image and a gradient image, and generating a more real enhanced image;

step 6, training 300 epochs on the neural network built by the gradient branch in the step 2 and the weak light image enhancement branch in the step 3, verifying the training result and storing the model of the neural network;

and 7, testing the neural network model stored in the step 6, and outputting the enhanced image.

The present invention is also characterized in that,

the step 1 is as follows:

step 1.1, selecting a data set with a normal illumination image and an event stream, wherein the normal illumination image and the event stream are required to be paired in space, and the event is represented by e ∈ ═ x _j ，y _j ，t _j ，p _j ) Where e represents an event, x and y represent coordinate locations of pixel points, t represents a timestamp, p represents the polarity of the event, and j represents the number of events. Assuming an event stream time duration of Δ t, during which the event camera returns n gray frames simultaneously, for the pixel values of the synthesized pseudo-image

Each event interval is represented as

Adding the polarity values of the events to obtain an event pseudo image;

step 1.2, for the reference image GT, adding noise to the reference image GT, and using Gaussian blind noise, specifically, using a standard deviation range to generate Gaussian blind noise to substitute a standard deviation to generate Gaussian noise, then performing weak light scene simulation, gamma correction, as shown in formula I,

V _out ＝(V _in ) ^gamma (1)

wherein V _out Representing corrected images, V _in Representing the image before correction, gamma representing the scaling strength of the pixel value;

and then the data is subjected to linear normalization processing, as shown in formula two,

wherein X _norm Representing the normalized image, X representing the pixel value at coordinates (X, y), X _min Representing the minimum pixel value, X, of an image X _max Representing the maximum pixel value of the image to obtain a simulated noise-containing dim light image;

step 1.3, performing edge extraction on the normal illumination image obtained in the step 1.1, and using a sobel operator, wherein a specific calculation mode is shown as formulas (3), (4) and (5): for hypothetical image A:

G _x representing a first order difference in the transverse direction, G _y The first difference in the longitudinal direction is indicated, G indicates a gradient map, where x indicates the convolution operation, resulting in G being a gradient image.

The step 2 is as follows:

step 2.1, firstly, using a dataset class of a deep learning framework PyTorch, performing transform operation on pictures in the class, firstly, converting the pictures into a tensor format, performing normalization operation, and then performing normalization processing, as shown in a formula (6):

wherein O is _c Denotes the output of the c-th channel, i _c Input of c channels of representation, m _c Means, S, for the c-th channel _c Representing the variance of the c channel, and then packaging the data through a DataLoader;

step 2.2, for the packed data, firstly performing feature extraction to obtain a feature map, selecting UNet as a backbone network of a gradient branch, replacing maximum pooling operation with convolution operation with a step length of 2, then adjusting a data range by a batchnorm algorithm, next activating by using a ReLU activation function, performing down sampling for seven times, then performing up sampling by deconvolution operation with the step length of 2, adjusting the data range by the batchnorm algorithm and using the ReLU activation function, performing up sampling for seven times in the same way, finally performing gradient map reconstruction on the up sampled feature map to obtain a reconstructed gradient image, performing padding operation, and introducing the information of the weak light image obtained in the step 1.2 into the gradient branch, specifically as shown in a convolution formula (7,8) and a deconvolution formula (9, 10):

H _out ＝(H _in -1)×2-2×p+k (9)

W _out ＝(W _in -1)×2-2×p+k (10)

wherein H _out Indicating the height, W, of the output image _out Indicating the width of the input image, letter H _in Denotes the height of the input picture, p denotes the padding size, k denotes the convolution kernel size, s denotes the step size, W _in Representing the width of the input image;

step 2.3, for the reconstructed gradient map and the gradient reference map of the gradient branch output, using L1loss, as shown in equations (11) (12),

L＝{l ₁ ，l ₂ ，...，l _n }，l _n ＝|x _n -y _n | (11)

l _(x，y) ＝mean(L) (12)

wherein L represents the sum of loss values of all pixel points, x and y represent the coordinates of the pixel respectively, and L ₁ ，l ₂ ，...，l _n Respectively representing pixel values, n representing the number of pixels, l _(x，y) Representing loss values, calculated here as mean valuesAnd completing updating of the gradient branch parameters by using the loss value to obtain the updated gradient branch.

The step 3 is as follows:

3.1, for the packed data input image enhancement branch in the step 2.1, selecting UNet as a backbone network of the image enhancement branch, replacing downsampling operation with convolution operation with the step length of 2, then performing batch norm and ReLU operation, repeating the operation for seven times, then completing upsampling operation by deconvolution with the step length of 2, then performing batch norm and ReLU operation, repeating the operation for seven times to obtain a feature map, and performing convolution and ReLU operation on the feature map once to obtain an output image with the channel number of 3;

step 3.2, calculating L1loss for the output image of the image enhancement branch and the normal light image, judging which of the output image and the normal light image is true, adopting MSEloss, as shown in formulas (13) and (14),

L＝{l ₁ ，l ₂ ，...，l _n }，l _n ＝(x _n -y _n ) ² (13)

l _(x，y) ＝mean(L) (14)

where L represents the sum of the loss values of all pixels, L ₁ ，l ₂ ，...，l _n Respectively representing the values of the corresponding blocks, l _(x，y) And the loss value is expressed, x and y express the horizontal and vertical coordinates of patch, and the image enhancement branch parameters are updated by using the loss value in a mean value mode to obtain an updated image enhancement branch.

The step 4 is as follows:

step 4.1, constructing a feature fusion block, and adopting a module CBAM based on a channel and space attention, wherein the CBAM specifically operates as follows: firstly, performing channel attention operation on the output feature maps of the step 2 and the step 3, respectively performing maximum pooling and average pooling on the space, then inputting the shared multi-layer perceptron MLP, then adding and passing through tanh function, as shown in formula (15),

where tanh (x) represents the output value after activation, x represents the input value, and exp represents the natural logarithm.

Then, carrying out space attention operation, carrying out maximum pooling and average pooling on each channel, then carrying out splicing operation, and similarly, sending the output characteristic diagram to the image enhancement branch in the step 3 through a sigmoid function;

and 4.2, firstly performing splicing operation on the feature maps in the steps 2 and 3, then performing convolution to obtain a new feature map, then following a CBAM module on the new feature map, then performing batch norm operation and ReLU operation, adding dropout operation, randomly shielding off half of neurons of the feature fusion block in the step 4.1, repeating the splicing operation until half of the neurons are shielded twice, and using nine fusion modules in the gradient branch and the weak light enhancement branch together to obtain nine feature fusion modules.

The step 5 is as follows:

generating a countering neural network RGAN by adopting a relativistic theory, adding a condition to control the output of a discriminator in a mode of adopting a condition discriminator, wherein the added condition is the event pseudo image, the reconstructed gradient image and the weak light image in the step 1, adopting a patchGAN discriminator to judge the probability that the part of the image is true and false when the probability that the image is true is calculated, adopting a sigmoid activation function as shown in a formula (16),

the discriminator is specifically operated by firstly splicing the feature map, extracting features through convolution operation, then performing instancenorm and ReLU operation, repeating for four times, and judging the probability that the output image in the step 3 is a normal illumination image;

the step 6 is as follows:

step 6.1, for the neural network formed by the step 2 and the step 3, selecting the optimizer of the network as an ADAM optimizer, setting the initial learning rate to be 0.0002, setting the scheduler strategy as a multi-step attenuation strategy, wherein the attenuation steps are respectively 25 and 100, the attenuation is half each time, and the total training time is 300 epochs, and during the training period, observing psnr as shown in the formulas (17) and (18),

MSE represents mean square error, m and n represent length and width of the image respectively, I and j represent coordinate positions of pixel points, I (I, j) represents the image output by the network in the step 3, K (I, j) represents the normal illumination image obtained in the step 1, PSNR represents peak signal-to-noise ratio, MAX represents MAX _I Representing the maximum pixel value of the image.

ssim is shown in equation (23), equation (22) contains three parts, brightness comparison l (x, y), as shown in equation (19), structure comparison s (x, y), as shown in equation (20), contrast comparison c (x, y), as shown in equation (21), μ _x And mu _y Denotes the mean, σ, of x and y, respectively _x And σ _y Denotes the variance, σ, of x and y, respectively _xy Denotes the covariance of x and y, where c ₁ ，c ₂ ，c ₃ Denotes a constant, avoids the denominator being 0, and sets α ═ β ═ γ ═ 1 and

reducing equation (22) to equation (23):

SSIM(x，y)＝[l(x，y)] ^α [c(x，y)] ^β [s(x，y)] ^γ (22)

dynamically adjusting the hyper-parameters by observing two indexes of PSNR and SSIM: learning rate lr, balance parameter λ, and training round number epoch;

and 6.2, outputting the reference indexes of the training process loss, PSNR and SSIM to the tenasorboard through Summary writer of a Python third-party library tenasorboard, verifying a set test result, then storing the model, and storing the neural network parameters trained in the step 6.1, the number of rounds epoch under training, the optimizer ADAM and the scheduler to obtain the trained network model.

Step 7 is specifically as follows:

and 6, loading the trained network model in the step 6, inputting the test set into the trained network model, and then storing the test result to obtain an enhanced image.

The method has the advantages that the event stream is synthesized into an event pseudo image, the event pseudo image is reconstructed into a gradient map, part of information of the weak light enhancement branch is introduced into the gradient branch in order to reduce the reconstructed parameter quantity, then the weak light enhancement branch is guided by the gradient information, a CBAM module is adopted by a fusion block, information needing to be learned by a network is better concerned, L1loss is adopted as a loss function in order to increase the robustness of the network, finally a discriminator based on conditions is added in order to improve the reality of the generated image, and experimental results prove that the method has a good weak light enhancement effect.

Drawings

FIG. 1 is a schematic diagram of the general structure of the low-light enhancement method based on the event camera;

FIG. 2 is an example of a data set for training, including a low light image, an event artifact, a normal light image, and a gradient image;

FIG. 3 is a schematic diagram of a network for reconstructing an event pseudo-image into a gradient image;

FIG. 4 is a schematic diagram of a network for enhancing a low-light image to a normal-light image;

FIG. 5(a) is a network schematic of a convergence module;

FIG. 5(b) a network schematic of a channel and spatial attention module;

FIG. 6(a) is a diagram showing the change in the evaluation index psnr during training;

fig. 6(b) shows a change in the evaluation index ssim;

FIG. 6(c) shows a decrease in loss value;

FIG. 7 is a graph showing the results of the experiment according to the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention relates to a low-light image enhancement method based on an event camera, a flow chart is shown in figure 1, and the method is implemented according to the following steps:

the step 1 is as follows:

the low-light image enhancement process based on the event camera comprises the following steps:

for an event e ═ (x) _i ，y _i ，t _i ，p _i ) Firstly, the polarities of events are summed to obtain an event pseudo image, then the event pseudo image is subjected to feature extraction and gradient image reconstruction, in order to reduce the calculation amount of parameters in the reconstruction process, a feature map of a low-light image is introduced into a Gradient Branch (GB), meanwhile, a fusion block is used for introducing gradient information into a low-light enhancement branch, and the image is completed through the low-light enhancement branchAnd (5) enhancing the image, and finally judging whether the generated image is true or not through a condition discriminator.

Firstly, the event combination is called as an event pseudo image, then a simulated weak light image is obtained by adding noise and gamma correction to a normal light image, sobel edge extraction operation is carried out on the normal light image to obtain a gradient map, and the gradient map is divided into a training set, a verification set and a test set according to the proportion of 7: 2: 1.

Each event interval is represented as

Adding the polarity values of the events to obtain an event pseudo image;

step 1.2, for the reference image GT, adding noise to the reference image GT in order to obtain a more real low-light image, and using Gaussian blind noise for the authenticity of the data set, specifically, generating Gaussian blind noise by using a standard deviation range instead of generating Gaussian noise by using a standard deviation, then performing simulation and gamma correction of a low-light scene, as shown in formula I,

V _out ＝(V _in ) ^gamma (1)

wherein V _out Representing the rectified image, V _in Representing the image before correction, gamma representing the scaling strength of the pixel value;

step 1.3, in order to obtain a gradient image, performing edge extraction on the normal illumination image obtained in the step 1.1, and using a sobel operator to calculate the specific calculation mode as shown in formulas (3), (4) and (5): for hypothetical image A:

the step 2 is as follows:

step 2.1, firstly, using the dataset class of the deep learning framework pytorreh, performing transform operation on pictures in the class, firstly, converting the pictures into a tensor format, performing normalization operation, and then performing normalization processing so as to enable better deep learning of learning data, as shown in formula (6):

wherein O is _c Represents the output of the c channel, i _c Input of c channels shown, m _c Mean value, S, of the c-th channel _c Representing the variance of the c channel, and then packaging the data through a DataLoader;

and 2.2, firstly performing feature extraction on the packed data to obtain a feature map, wherein an UNet network has a good feature extraction structure, the UNet is selected as a backbone network of a gradient branch, the maximum pooling operation is replaced by the convolution operation with the step length of 2, then the data range is adjusted through a batch norm algorithm, next, a ReLU activation function is used for activation, downsampling is performed seven times, then, up-sampling is performed through deconvolution operation with the step length of 2, the data range is adjusted through the batch norm algorithm, the ReLU activation function is used for the same up-sampling seven times, and finally, the up-sampled feature map is subjected to gradient map reconstruction to obtain a reconstructed gradient image. In order to ensure the size of the feature map, padding operation is performed, and in order to reduce the parameter, the information of the low-light image obtained in step 1.2 is introduced into the gradient branch, as shown in convolution equations (7,8) and deconvolution equations (9, 10):

H _out ＝(H _in -1)×2-2×p+k (9)

W _out ＝(W _in -1)×2-2×p+k (10)

wherein H _out Indicating the height, W, of the output image _out Indicating the width of the input image, letter H _in Denotes the height of the input picture, p denotes paddingSize, k denotes convolution kernel size, s denotes step size, W _in Representing the width of the input image;

step 2.3, for the reconstructed gradient map and the gradient reference map of the gradient branch output, using L1loss, as shown in equation (11) (12),

L＝{l ₁ ，l ₂ ，...，l _n }，l _n ＝|x _n -y _n | (11)

l _(x，y) ＝mean(L) (12)

wherein L represents the sum of loss values of all pixel points, x and y represent the coordinates of the pixel respectively, and L ₁ ，l ₂ ，...，l _n Respectively representing pixel values, n representing the number of pixels, l _(x，y) And representing the loss value, calculating the loss value in a mean value mode, and updating the gradient branch parameters by using the loss value to obtain the updated gradient branch.

the step 3 is as follows:

step 3.2, calculating L1loss for the output image of the image enhancement branch and the normal light image, and judging which of the output image and the normal light image is true, wherein MSEloss is adopted, as shown in formulas (13) and (14),

L＝{l ₁ ，l ₂ ，...，l _n }，l _n ＝(x _n -y _n ) ² (13)

l _(x，y) ＝mean(L) (14)

wherein L represents all pixelsSum of point loss values,/ ₁ ，l ₂ ，...，l _n Respectively representing the values of the corresponding blocks, l _(x，y) And the loss value is expressed, x and y express the horizontal and vertical coordinates of patch, and the image enhancement branch parameters are updated by using the loss value in a mean value mode to obtain an updated image enhancement branch.

Step 4, in order to help the gradient map to better guide the enhancement of the low-light image, a feature fusion module is designed, namely the feature fusion module adopts a module CBAM based on a channel and space attention to fuse the information contained in the gradient image in the step 2 to the low-light enhancement branch in the step 3;

the step 4 is specifically as follows:

the step 5 is as follows:

in order to increase the authenticity of the generated picture, a countering neural network RGAN is generated by adopting a relativistic theory, and the output of the discriminator can be better controlled by adopting a condition discriminator, wherein the added conditions are the step 1 event false image, the reconstructed gradient image and the dim light image, when the probability of the picture being true is calculated, a single value is not adopted for representing, but a PatchGAN discriminator is adopted for judging the probability of the part of the image being true and false, a sigmoid activation function is adopted, as shown in an equation (16),

the step 6 is specifically as follows:

ssim is shown in equation (23), equation (22) contains three parts, luminance comparison l (x, y), as shown in equation (19), structure comparison s (x, y), as shown in equation (20), contrast comparison c (x, y), as shown in equation (21), μ _x And mu _y Denotes the mean, σ, of x and y, respectively _x And σ _y Denotes the variance, σ, of x and y, respectively _xy Denotes the covariance of x and y, where c ₁ ，c ₂ ，c ₃ Denotes a constant, avoids the denominator being 0, and sets α ═ β ═ γ ═ 1 and

reducing equation (22) to equation (23):

SSIM(x，y)＝[l(x，y)] ^α [c(x，y)] ^β [s(x，y)] ^γ (22)

and 6.2, outputting reference indexes of the training process loss, PSNR and SSIM to the tenasorboard through Summary writer of a Python third-party library tenasorboard for better visualization of the training result, testing the result in a verification set, and then storing the model, wherein the neural network parameters trained in the step 6.1, the number of rounds epoch in training, the optimizer ADAM and the scheduler can be stored for convenience of continuous training and over-parameter adjustment, so that the trained network model is obtained.

Step 7 is specifically as follows:

As shown in fig. 1, an embodiment of the present invention includes:

a low-light image enhancement based on an event camera comprises the steps of firstly extracting an event stream and a normal light image, synthesizing an event pseudo image by the event stream, generating a gradient image and a low-light noise image by the normal light image, inputting the event pseudo image into a gradient branch to be reconstructed into the gradient image, inputting the low-light image into a low-light enhancement branch to obtain the normal light image, adopting a module based on a channel and a space attention mechanism in order to reduce parameter calculation amount and adopt the gradient image to guide the enhancement of the low-light image, adopting an identifier network added with conditions in order to increase the authenticity of the generated low-light image, and proving that the low-light image enhancement method has an excellent enhancement effect through experiments.

The dim light enhancement method based on the event camera is implemented according to the following steps:

step 1, firstly, combining event streams to be event pseudo images, then obtaining simulated weak light images through noise addition and gamma correction of normal light images, carrying out sobel edge extraction operation on the normal light images to obtain gradient images, and carrying out the following steps according to the step 7: 2: the scale of 1 is divided into a training set, a validation set, and a test set.

Step 1.1, the event is represented by ∈ ═ x _i ，y _i ，t _i ，p _i ) Assuming that the event stream has a time length Δ t, the period during which the event camera returns n gray frames at the same time, for the pixel values of the synthesized pseudo-image

Each event interval is represented as

Using time-interval-based input of reconstructed branches for gradient maps obtained by adding the polarity values of the events

The stacking mode of (2) is easy to cause excessive superposition and sparseness of events, so a fixed number N is adopted _e Events may lead to better results and may be modified by changing N _e The value and the quantity of the control gradient branch are input;

step 1.2, adding noise to the reference image (GT) in order to obtain a more real low-light image, and in order to increase the authenticity of the data set, gaussian blind noise can be used, specifically, the gaussian blind noise generated by a standard deviation range is used for replacing a standard deviation to generate the gaussian noise, and then, the simulation and gamma correction of a low-light scene are carried out, formula (1), wherein V is _out Representing the rectified image, V _in Representing the image before rectification, gamma representing the degree of scaling of the pixel values, after which the data is subjected to a linear normalization process, equation (2), where X represents the pixel value at coordinates (X, y), X _min Representing the minimum pixel value, X, of the image of view X _max Representing the maximum pixel value of the image, thus obtaining an input of the low-light image enhancement branch;

V _out ＝(V _in ) ^gamma (1)

step 1.3, in order to obtain a reference image of a reconstructed gradient map, edge extraction is carried out on a normal light image, a sobel operator is utilized, a concrete calculation mode is adopted, and a formula (3, 4, 5) is used for supposing images A and G _x Representing a first order difference in the transverse direction, G _y The first order difference in the vertical direction is represented, G represents a gradient map, where x represents a convolution operation, then G is a reference image of the gradient map, the data set is exemplified as fig. 2, the first and fifth lines represent weak light noisy images, the second and sixth lines represent event-synthesized pseudo images, the third and seventh lines represent gradient images extracted from normal light images, and the fourth and eighth lines represent normal light images.

The step 2 specifically comprises the following operations:

step 2.1, firstly defining a dataset class, performing transform operation on pictures in the class, firstly converting the pictures into a tensor format, performing normalization operation, and then performing normalization processing on the learning data so as to enable deep learning to be better, wherein the formula (6) is shown in the specification, wherein O is _c Denotes the output of the c-th channel, i _c Input of c channels of representation, m _c Means, S, for the c-th channel _c Representing the variance of the c channel, and then packaging the data through a DataLoader;

step 2.2, for the packed data, firstly performing feature extraction to obtain a feature map, wherein the UNet network has a good feature extraction structure, selecting UNet as a backbone network of a gradient branch, as shown in fig. 3, replacing a maximum pooling operation by a convolution operation with a step size of 2, then adjusting a data range by a batchnorm algorithm, next performing a ReLU activation function, performing downsampling for seven times, then performing upsampling by a deconvolution operation with the step size of 2, adjusting the data range by a bach norm algorithm and using the ReLU activation function, performing upsampling in the same way, finally performing gradient map reconstruction on the upsampled feature map, performing padding operation to ensure the size of the feature map, introducing the feature map of a weak light image enhancement branch into the gradient branch in order to reduce the number of parameters, and specifically performing convolution formulas (7,8) and deconvolution formulas (9,10) as follows, wherein the letter H _in Denotes the height of the input picture, p denotes the padding size, k denotes the convolution kernel size, s denotes the step size, H _out Indicating the height, W, of the output image _in Representing the width, W, of the input image _out Representing the width of the input image;

H _out ＝(H _in -1)×2-2×p+k (9)

W _out ＝(W _in -1)×2-2×p+k (10)

and 2.3, for the reconstructed gradient map and the gradient reference map output by the gradient branch, using L1loss and formulas (11 and 12), wherein x and y respectively represent the coordinates of the pixels, and n represents the number of the pixel points, and the loss is calculated by adopting a mean value mode.

l _(x，y) ＝L＝{l ₁ ，l ₂ ，…，l _n }，l _n ＝|x _n -y _n | (11)

l _(x，y) ＝mean(L) (12)

The step 3 specifically comprises the following steps:

step 3.1, for the packed data input image enhancement branch in step 2.1, similarly, the convolution operation with the step length of 2 is used to replace the downsampling operation, then the batch norm and the ReLU operation are performed, the operation is repeated for seven times, then the deconvolution with the step length of 2 is used to complete the upsampling operation, then the batch norm and the ReLU operation are performed, the operation is repeated for seven times, finally the convolution and the ReLU operation are performed on the feature diagram once, and the output diagram with the channel number of 3 is obtained, and the network structure diagram is shown in fig. 4;

and 3.2, calculating L1loss for the output image of the image enhancement branch and the normal light image, and judging which of the output image and the normal light image is true, wherein MSEloss is adopted, and the formula (13 and 14) is adopted, wherein x and y represent the value of nth patch, and the mean value mode is adopted.

l _(x，y) ＝L＝{l ₁ ，l ₂ ，...，l _n }，l _n ＝(x _n -y _n ) ² (13)

l _(x，y) ＝mean(L) (14)

The step 4 specifically comprises the following steps:

step 4.1, for the feature fusion block, a module (CBAM) based on channel and spatial attention is adopted, and the specific operation of the CBAM is to firstly perform channel attention on an input feature map F, respectively perform maximum pooling and average pooling on a space, then input a shared multilayer perceptron MLP, then perform addition and pass through a tanh function and a formula (15), then perform spatial attention, perform maximum pooling and average pooling on each channel, then perform splicing operation, and similarly pass through a sigmoid function;

step 4.2, firstly performing splicing operation on the feature maps needing to be fused, then performing convolution to obtain the feature maps, then following a CBAM module on the fused feature maps, then performing batch norm operation and ReLU operation, adding dropout operation, randomly shielding half of neurons, repeating the operation twice, totally using nine fusion modules on the gradient branches and the weak light enhancement branches, and referring to a fusion module network structure shown in fig. 5(a) and a schematic diagram of a channel and space attention mechanism module shown in fig. 5 (b);

the step 5 specifically comprises the following steps:

in order to increase the authenticity of the generated picture, a relativistic generation antagonistic neural network (RGAN) is adopted, a condition discriminator mode is adopted, the output of the generator can be better controlled by adding conditions, the added conditions are an event pseudo image, a reconstructed gradient image and a dim light image, when the probability of the picture being true is calculated, a single value is not adopted for representing, and PatchGAN is adopted for judging the probability of the part of the image being true and false, a sigmoid activation function is adopted, equation (16) is adopted, a network module of the discriminator specifically operates to firstly splice characteristic maps and extract characteristics through convolution operation, then instancenorm and ReLU operation are carried out, and the step is repeated for four times.

The step 6 specifically comprises the following steps:

step 6.1, selecting an optimizer of the network as an ADAM optimizer, setting the initial learning rate to be 0.0002, setting the scheduler to be a multi-step attenuation strategy, wherein the attenuation steps are respectively 25 and 100, each time of attenuation is half, training is carried out on 300 epochs in total, and during training, through observing psnr, formulas (17 and 18), wherein MAX is the maximum allowable Moving Average (MAX) _I Maximum pixel value of the image, ssim equation (23), where l (x, y) represents brightness comparison, c (x, y) represents contrast comparison, s (x, y) represents structure comparison, μ _x And mu _y Denotes the mean, σ, of x and y, respectively _x And σ _y Denotes the variance, σ, of x and y, respectively _xy Denotes the covariance of x and y, where c ₁ ，c ₂ ，c ₃ Denotes a constant, avoids the denominator being 0, and sets α ═ β ═ γ ═ 1 and

equation (22) may be reduced to equation (23). The two indexes dynamically adjust some hyper-parameters, such as learning rate lr, balance parameter λ, and training round number epoch.

SSIM(x，y)＝[l(x，y)] ^α [c(x，y)] ^β [s(x，y)] ^γ (22)

And 6.2, outputting reference indexes such as loss, psnr, ssim and the like in the training process to the tensorboard through Summary writer for better visualizing the training result, and testing the result in the verification set. The training result is shown in fig. 6, where fig. 6(a) shows the psnr variation per epoch in the training phase, fig. 6(b) shows the ssim variation per epoch in the training phase, and fig. 6(c) shows the loss reduction in the training phase. And then, storing the model, wherein in order to facilitate continuous training and super-parameter adjustment, the network parameters, the number of rounds epoch under training, the optimizer and the scheduler can be stored, and meanwhile, preparation is made for the next test.

The step 7 specifically comprises the following steps:

loading the trained model, inputting the test set into the trained network model, then storing the test result, and calculating psnr and ssim indexes of the test result, wherein the objective indexes are that psnr is about 25.5 and ssim is 0.82, the subjective experiment result is shown in fig. 7, the first line is a low-light image, the second line is an event pseudo image, the third line is a reconstructed gradient image, the fourth line is a gradient image corresponding to a normal light image, the fifth line is a result obtained after enhancement of the low-light image, and the sixth line is a normal light image.

The method enhances the low-light image, guides the enhancement of the low-light image through the gradient image reconstructed by the event stream, obtains a higher numerical value on an objective index of the enhanced result, obtains a good enhancement effect on a subjective experiment, and can remove most of noise of the low-light image and reconstruct a good detailed normal light image.

Claims

1. The low-light image enhancement method based on the event camera is characterized by being implemented according to the following steps:

2. The event camera-based low-light image enhancement method according to claim 1, wherein the step 1 is specifically as follows:

step 1.1, selecting a data set with a normal illumination image and an event stream, wherein the normal illumination image and the event stream are required to be paired in space, and the event is represented by e ∈ ═ x _j ,y _j ,t _j ,p _j ) Wherein ∈ represents an event, x and y represent coordinate positions of pixel points, t represents a timestamp, p represents the polarity of the event, and j represents the next event; assuming an event stream time duration of Δ t, during which the event camera returns n gray frames simultaneously, for the pixel values of the synthesized pseudo-image

i is 1,2, …, n, each event interval being represented as

Adding the polarity values of the events to obtain an event pseudo image;

step 1.2, adding noise to the reference image GT, and using Gaussian blind noise, specifically, using a standard deviation range to generate Gaussian blind noise to replace a standard deviation to generate Gaussian noise, then performing simulation and gamma correction of a weak light scene, as shown in formula I,

V _out ＝(V _in ) ^gamma (1)

3. The event camera-based low-light image enhancement method according to claim 2, wherein the step 2 is specifically as follows:

step 2.1, firstly, a dataset class of a deep learning framework PyTorch is utilized, a transform operation is carried out on pictures in the class, the pictures are firstly converted into a tensor format and are normalized, and then normalization processing is carried out, as shown in a formula (6):

step 2.2, for the packed data, firstly performing feature extraction to obtain a feature map, selecting UNet as a backbone network of a gradient branch, replacing a maximum pooling operation by a convolution operation with a step length of 2, then adjusting a data range by a batch norm algorithm, next using a ReLU activation function for activation, performing down sampling for seven times, then performing up sampling by a deconvolution operation with the step length of 2, adjusting the data range by the batch norm algorithm, using the ReLU activation function, performing up sampling for seven times in the same way, finally performing gradient map reconstruction on the up sampled feature map to obtain a reconstructed gradient image, performing padding operation, introducing the information of the weak light image obtained in the step 1.2 into the gradient branch, specifically as shown by convolution formulas (7,8) and deconvolution formulas (9, 10):

H _out ＝(H _in -1)×2-2×p+k (9)

W _out ＝(W _in -1)×2-2×p+k (10)

L＝{l ₁ ,l ₂ ,…,l _n },l _n ＝|x _n -y _n | (11)

l _(x,y) ＝mean(L) (12)

wherein L represents the sum of loss values of all pixel points, x and y represent the coordinates of the pixel respectively, and L ₁ ,l ₂ ,…,l _n Respectively representing pixel values, n representing the number of pixels, l _(x,y) And representing the loss value, calculating the loss value in a mean value mode, and updating the gradient branch parameters by using the loss value to obtain the updated gradient branch.

4. The event camera-based low-light image enhancement method according to claim 3, wherein the step 3 is specifically as follows:

L＝{l ₁ ,l ₂ ,…,l _n },l _n ＝(x _n -y _n ) ² (13)

l _(x,y) ＝mean(L) (14)

where L represents the sum of the loss values of all pixels, L ₁ ,l ₂ ,…,l _n Respectively representing the values of the corresponding blocks, l _(x,y) And the loss value is expressed, x and y express the horizontal and vertical coordinates of patch, and the image enhancement branch parameters are updated by using the loss value in a mean value mode to obtain an updated image enhancement branch.

5. The event camera-based low-light image enhancement method according to claim 4, wherein the step 4 is specifically as follows:

wherein tanh (x) represents an output value after activation, x represents an input value, and exp represents a natural logarithm;

6. The event camera-based low-light image enhancement method according to claim 5, wherein the step 5 is specifically as follows:

wherein δ (x) represents the activated value, x represents the input value, exp represents the natural logarithm, the specific operation of the discriminator is to firstly splice the feature map and extract the features through convolution operation, then perform instancenorm and ReLU operation, repeat for four times, and judge the probability that the output image in step 3 is the normal illumination image.

7. The event camera-based low-light image enhancement method according to claim 6, wherein the step 6 is as follows:

MSE represents mean square error, m and n represent length and width of the image respectively, I and j represent coordinate positions of pixel points, I (I, j) represents the image output by the network in the step 3, K (I, j) represents the normal illumination image obtained in the step 1, PSNR represents peak signal-to-noise ratio, MAX represents MAX _I A maximum pixel value representing an image;

ssim is shown in equation (23), equation (22) contains three parts, luminance comparison l (x, y), as shown in equation (19), structure comparison s (x, y), as shown in equation (20), contrast comparison c (x, y), as shown in equation (21), μ _x And mu _y Denotes the mean, σ, of x and y, respectively _x And σ _y Denotes the variance, σ, of x and y, respectively _xy Denotes the covariance of x and y, where c ₁ ,c ₂ ,c ₃ Denotes a constant, avoids the denominator being 0, and sets α ═ β ═ γ ═ 1 and

reducing equation (22) to equation (23):

SSIM(x,y)＝[l(x,y)] ^α [c(x,y)] ^β [s(x,y)] ^γ (22)

8. The event camera-based low-light image enhancement method according to claim 7, wherein the step 7 is specifically as follows: