CN111612722B

CN111612722B - Low-illumination image processing method based on simplified Unet full-convolution neural network

Info

Publication number: CN111612722B
Application number: CN202010455150.7A
Authority: CN
Inventors: 宋永端; 徐晨
Original assignee: Star Institute of Intelligent Systems
Current assignee: Star Institute of Intelligent Systems
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2023-04-18
Anticipated expiration: 2040-05-26
Also published as: CN111612722A

Abstract

The invention discloses a low-illumination image processing method based on a simplified Unet full convolution neural network, which comprises the following steps: collecting a dark image and a corresponding clear image to generate an image data set; constructing a full convolution neural network (FCN) model for end-to-end image enhancement; training the FCN model of the convolutional network by using the generated image data set to obtain a trained FCN model; and inputting the low-level image in the original RAW format into the trained FCN model to obtain an enhanced clear image. The invention rebuilds the network by simplifying the network layer number and reducing the convolution kernel number, trains the simplified Unet network by a supervising learning mode through a large batch of data sets based on real scenes, and reduces the running time by more than half and greatly reduces the training cost on the premise of keeping the visual effect difference unobvious by comparing the network performance before simplification based on the simplified Unet full convolution neural network.

Description

Low-illumination image processing method based on simplified Unet full-convolution neural network

Technical Field

The invention relates to the technical field of neural networks, in particular to a video image processing method in a low-illumination environment.

Background

With the rapid development of information technology, digital images play a great role in many fields such as public safety, medical treatment, entertainment and the like, and meanwhile, the quality requirements of people on the images are continuously improved. However, due to various factors such as the image capturing device and the environment of the image capturing, the original low-illumination image often cannot completely satisfy the visual viewing requirement of people and the requirement of image technology application in engineering to a certain extent. However, many upper-layer image processing algorithms have certain requirements on picture quality, so that enhancement of low-illumination images is fundamental work of image application.

Disclosure of Invention

In view of the above, the present invention is directed to a low-illumination image processing method based on a simplified Unet full convolution neural network, so as to solve the technical problems of how to process a low-illumination low-quality image to obtain an image recognizable to human eyes and increase the processing speed.

The invention discloses a low-illumination image processing method based on a simplified Unet full convolution neural network, which comprises the following steps of:

the method comprises the following steps: collecting a short-exposure dark image and a corresponding long-exposure bright clear image in a low-illumination environment, and generating an image data set by the collected dark image and the corresponding clear image;

step two: constructing a full convolution neural network (FCN) model for end-to-end image enhancement, wherein the full convolution neural network (FCN) model comprises an input layer, a hidden layer and an output layer, the input layer is used for inputting a graph, the convolution layer of each computing node in the hidden layer is used for performing convolution calculation and deconvolution calculation on input data, all layers of the FCN model are connected together through an activation function, and network parameters are continuously improved through a training algorithm;

step three: training the FCN model of the convolutional network by using the image data set generated in the first step to obtain a trained FCN model;

step four: and inputting the low-image in the original RAW format into the trained FCN model to obtain an enhanced clear image.

Further, the step one of acquiring a dark image with short exposure and a corresponding clear image with long exposure and brightness comprises:

step 1: selecting a shooting scene, and fixing a camera to keep the shooting posture of the camera unchanged;

step 2: setting the exposure time parameters of the camera to be 0.1s,0.04s and 0.033s respectively, and carrying out short-exposure shooting respectively; setting the exposure time parameter of the camera to be 10s, and carrying out long exposure shooting;

and step 3: and (3) repeatedly selecting different shooting scenes according to the methods of the step (1) and the step (2) to acquire images, so as to obtain a dark image and a bright clear image which are matched with each other.

Further, the input layer of the full convolution neural network FCN model in the step two is: receiving 4-channel image data;

the hidden layer of the full convolution neural network FCN model in the step two comprises:

the convolutional layer 1: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 32, the convolution step length s =1, valid is selected by padding;

a pooling layer 1: selecting Max pooling, selecting same with the maximum pooling size of 2 × 2 and the step length s =2, padding;

and (3) convolutional layer 2: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 64, the convolution step length s =1, and valid is selected by padding;

and (3) a pooling layer 2: max pooling was chosen using a maximum pooling size of 2 × 2, step length s =2; padding selects same;

and (3) convolutional layer: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 128, the convolution step length s =1, valid is selected by padding;

a pooling layer 3: max pooling was chosen, same was chosen with maximum pooling size of 2 × 2, step s =2, padding;

and (4) convolutional layer: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 256, the convolution step length s =1, valid is selected by padding;

the pooling layer 4: max pooling was chosen, same was chosen with maximum pooling size of 2 × 2, step s =2, padding;

5-1 of convolutional layer: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 512, the convolution step length s =1, valid is selected by padding;

5-2 of the convolution layer: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 512, the convolution step length s =1, valid is selected by padding;

deconvolution layer 6: the size of the convolution kernel is 2 multiplied by 2, and the row and column are doubled;

the stage structure is as follows: after the Crop operation is carried out on the convolution layer 4, cascading with the convolution layer 5-2, fusing the high-resolution characteristic diagram and the low-resolution characteristic diagram, and further splicing the high-resolution characteristic diagram and the low-resolution characteristic diagram to be used as the input of the next convolution layer;

and (6) a convolutional layer: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 256, the convolution step length s =1, valid is selected by padding;

deconvolution layer 7: the size of the convolution kernel is 2 multiplied by 2, and the row and column are doubled;

the stage structure is as follows: after the Crop operation is carried out on the convolution layer 3, cascading with the convolution layer 6, fusing the high-resolution characteristic diagram and the low-resolution characteristic diagram, and further splicing the two to be used as the input of the next convolution layer;

and (3) a convolutional layer 7: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 128, the convolution step length s =1, valid is selected by padding;

deconvolution layer 8: the size of the convolution kernel is 2 multiplied by 2, and the row and column are doubled;

the stage structure is as follows: after the Crop operation is carried out on the convolution layer 2, cascading with the convolution layer 7, fusing the high-resolution characteristic diagram and the low-resolution characteristic diagram, and further splicing the two to be used as the input of the next convolution layer;

and (3) convolutional layer 8: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 64, the convolution step length s =1, and valid is selected by padding;

deconvolution layer 9: the size of the convolution kernel is 2 multiplied by 2, and the row and column are doubled;

the stage structure is as follows: after the Crop operation is carried out on the convolutional layer 1, cascading with the convolutional layer 8, fusing the high-resolution characteristic diagram and the low-resolution characteristic diagram, and further splicing the high-resolution characteristic diagram and the low-resolution characteristic diagram to be used as the input of the next convolutional layer;

a convolutional layer 9: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 32, the convolution step length s =1, valid is selected by padding;

the output layer of the full convolution neural network FCN model in the second step is a convolution layer 10: the convolution kernel size is 1 × 1, the number of convolution kernels is 12, 12 channels are output, each feature map size is 1/12 of the RGB component, and the number of feature maps output is changed to 1 using a convolution kernel size of 1 × 1.

Further, the training the FCN model of the convolutional network using the image dataset generated in the first step in the third step includes:

1) Preprocessing an original image: firstly, RGB pixel separation is carried out on an Raw format image, each Raw image block is expanded into a four-channel feature map of RGBG components, the spatial resolution on each channel is reduced by half, and finally feature maps of RGBG four channels with the size of 1/2 of an original image are obtained; subtracting the black level value of the characteristic graph after executing different gamma amplification rates to obtain a brightness image with the same brightness as the clear image corresponding to the long exposure;

2) Inputting the preprocessed image into a full convolution neural network FCN model, wherein a feature map output by the full convolution neural network FCN model is 12 channels, and the size of each feature map is half of RGB components;

3) And performing sub-pixel convolution operation on the image output by the full convolution neural network FCN model to restore the data to a normal sRGB image format.

Further, the training algorithm comprises:

1) The activation functions of the FCN model are all selected as Leaky Relu functions, and the expression of the Leaky Relu functions is as follows:

f (x) = max (0.2x, x), wherein x is a natural number;

2) Pooling the images after the convolution operation, and selecting a maximum pooling method;

3) The loss function is constructed such that,

wherein I _out Representing predicted pictures, I _gt Representing a desired image;

4) The loss function is processed using an ADAM optimizer.

5) Setting a reinforcement learning effect of a segmented learning rate: the initial learning rate is 1e-4, and the number of times of training is 4000, where the learning rate is divided by 10 to decrease the learning rate when the number of times of training is 2000.

The invention has the beneficial effects that:

aiming at the problems that in the image enhancement by utilizing a Unet convolutional neural network in a low-illumination environment, the operation speed is low due to the complex network structure, the noise-modulated multi-enhancement effect is not obvious by using a traditional method and the like, the low-illumination image processing method based on the simplified Unet fully convolutional neural network provided by the invention reserves the network construction style of computer coding and decoding, rebuilds the network by simplifying the network layer number and reducing the convolutional kernel number, and trains the simplified Unet network in a supervised learning mode through a large batch of data sets based on real scenes. The final experiment result shows that compared with the network performance before simplification, the simplified Unet full convolution neural network reduces the running time by more than half and greatly reduces the training cost under the condition of keeping the visual effect difference unobvious.

Drawings

FIG. 1 is a schematic structural diagram of a deep learning image enhancement model;

FIG. 2 is a long exposure sharp image acquired;

figure 3 is a low illumination image for a short exposure,

FIG. 4 is an image after processing a short-exposed low-illumination image using the method of the embodiment;

FIG. 5 illustrates an image after white balance adjustment using conventional methods;

FIG. 6 is a graph of a LeakRelu function;

fig. 7 is a diagram of an image enhancement process.

Detailed Description

The invention is further described below with reference to the figures and examples.

In this embodiment, the method for processing a low-illuminance image based on a simplified Unet full convolution neural network includes the steps of:

the method comprises the following steps: a short-exposure dark image and a corresponding long-exposure bright sharp image are acquired in a low-light environment, and an image dataset is generated from the acquired dark image and the corresponding sharp image.

Acquiring a short-exposure dark image and a corresponding long-exposure bright sharp image includes:

Step two: the method comprises the steps of constructing a full convolution neural network FCN model for end-to-end image enhancement, wherein the full convolution neural network FCN model comprises an input layer, a hidden layer and an output layer, the input layer is used for inputting a graph, the convolution layer of each computing node in the hidden layer is used for performing convolution calculation and deconvolution calculation on input data, all layers of the FCN model are connected together through an activation function, and network parameters are continuously improved through a training algorithm.

The input layer of the full convolution neural network FCN model is as follows: the size of the image it receives is not limited and the full resolution input image is accepted. The input layer of the present embodiment receives 4-channel (R, G, B, G) image data.

The hidden layer of the full convolution neural network FCN model comprises:

convolutional layer 1: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 32, the convolution step length s =1, valid is selected by padding; thus, the convolutional layer can ensure that the segmentation result is complete, is obtained based on the context features without missing, and can cause the sizes of the input and the output to be inconsistent. So the feature map size will be reduced by 2 after this operation.

A pooling layer 1: max pooling was chosen, same was chosen using a maximum pooling size of 2x 2, step size s =2, padding. At this point, padding selects same two different from Unet selects vaild. The same policy will fill in 0 at the edge, ensuring that every value of featuremap will be fetched, and instead of filling, the value will ignore the pooling operation that cannot be followed, which will result in some information being lost if the size of the feature map before pooling is odd.

Convolution layer 2: the convolution kernel size is 3 × 3, the number of convolution kernels is 64, the convolution step s =1, padding selects valid.

And (3) a pooling layer 2: max pooling was chosen using a maximum pooling size of 2 × 2, step length s =2; padding selects same.

And (3) convolutional layer: the convolution kernel size is 3 × 3, the number of convolution kernels is 128, the convolution step s =1, padding selects valid.

A pooling layer 3: max pooling was chosen, and same was chosen using a maximum pooling size of 2x 2, step size s =2,padding.

Convolution layer 4: the convolution kernel size is 3 × 3, the number of convolution kernels is 256, the convolution step s =1, padding selects valid.

And (4) a pooling layer: max pooling was chosen, same was chosen using a maximum pooling size of 2x 2, step size s =2,padding.

5-1 of convolutional layer: the convolution kernel size is 3 × 3, the number of convolution kernels is 512, the convolution step s =1, padding selects valid.

5-2 of the convolution layer: the convolution kernel size is 3 × 3, the number of convolution kernels is 512, the convolution step s =1, padding selects valid.

Deconvolution layer 6: the convolution kernel size was 2 × 2, doubling the rows and columns.

The stage structure is as follows: and the convolution layer 4 is cascaded with the convolution layer 5-2 after the Crop operation is carried out, and the high-resolution characteristic diagram and the low-resolution characteristic diagram are fused and then spliced to be used as the input of the next convolution layer.

And (6) a convolutional layer: the convolution kernel size is 3 × 3, the number of convolution kernels is 256, the convolution step s =1, padding selects valid.

Deconvolution layer 7: the convolution kernel size is 2x 2, doubling the rows and columns.

A hierarchical structure: and the convolution layer 3 is cascaded with the convolution layer 6 after the Crop operation is carried out, and the high-resolution characteristic diagram and the low-resolution characteristic diagram are fused and further spliced to be used as the input of the next convolution layer.

And (3) a convolutional layer 7: the convolution kernel size is 3 × 3, the number of convolution kernels is 128, the convolution step s =1, padding selects valid.

Deconvolution layer 8: the convolution kernel size is 2x 2, doubling the rows and columns.

The stage structure is as follows: after the Crop operation is carried out on the convolution layer 2, the convolution layer 7 is cascaded, the high-resolution characteristic diagram and the low-resolution characteristic diagram are fused, and the fused high-resolution characteristic diagram and the low-resolution characteristic diagram are used as the input of the next convolution layer after splicing.

And (3) convolutional layer 8: the convolution kernel size is 3 × 3, the number of convolution kernels is 64, the convolution step s =1, padding selects valid.

Deconvolution layer 9: the convolution kernel size was 2 × 2, doubling the rows and columns.

The stage structure is as follows: after the Crop operation is carried out on the convolutional layer 1, the convolutional layer 8 is cascaded, the high-resolution characteristic diagram and the low-resolution characteristic diagram are fused, and the fused data are used as the input of the next convolutional layer.

A convolutional layer 9: the convolution kernel size is 3 × 3, the number of convolution kernels is 32, the convolution step s =1, padding selects valid.

The output layer of the full convolution neural network FCN model in the second step is a convolution layer 10: the convolution kernel size is 1 × 1, the number of convolution kernels is 12, 12 channels are output, each feature map size is 1/12 of the RGB component, and the number of output feature maps is changed to 1 using convolution kernels of size 1 × 1.

The training algorithm comprises:

f(x)＝max(0.2x,x)

in the formula, x is a natural number, and a function graph of the formula is shown in FIG. 6. The Leaky Relu activation function can avoid gradient vanishing.

2) Pooling the images after the convolution operation, and selecting a maximum pooling method. The main characteristics of the local area are extracted, so that the dimensionality of data can be reduced to a great extent, the total number of weight parameters is correspondingly reduced, the calculation cost is reduced, and the calculation efficiency is improved.

3) The loss function is constructed such that,

wherein I _out Representing predicted pictures, I _gt Representing the desired image.

4) Using an ADAM optimizer to process the loss function, the ADAM optimizer can dynamically adjust the learning rate for each parameter using the first moment estimate and the second moment estimate of the gradient, calculating different adaptive learning rates for different parameters. Adam is selected as the optimizer because after bias correction, the learning rate of each iteration has a certain range, so that the parameters are relatively stable, and the requirement on the memory is relatively small.

5) Setting a reinforcement learning effect of a segmented learning rate: the initial learning rate is 1e-4, and the number of times of training is 4000, where the learning rate is divided by 10 to decrease the learning rate when the number of times of training is 2000. Finally, a fixed parameter training completed deep learning image enhancement model is obtained.

Step three: training the FCN model of the convolutional network by using the image data set generated in the first step to obtain a trained FCN model, wherein the training comprises the following steps of:

the training of the convolution network FCN model by using the image data set generated in the first step in the third step comprises the following steps:

1) Preprocessing an original image: firstly, RGB pixel separation is carried out on an Raw format image, each Raw image block is expanded into a four-channel feature map of RGBG components, the spatial resolution of each channel is reduced by half, and finally feature maps of RGBG four channels with the size of 1/2 of an original image are obtained; subtracting the black level value of the characteristic graph after executing different gamma amplification rates to obtain a brightness image with the same brightness as the clear image corresponding to the long exposure;

3) The sub-pixel convolution operation is performed on the image output by the full convolution neural network FCN model to restore the data to the normal sRGB image format, and the processing procedure is as shown in fig. 7.

Step four: and inputting the low-level image in the original RAW format into the trained FCN model to obtain an enhanced clear image.

The following table is a graph of test results obtained by processing low-illumination images, and the image quality evaluation Index data used in the test are Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM). From the test results in the table, the invention can reduce the running time by more than half and greatly reduce the training cost based on the simplified Unet full convolution neural network compared with the network performance before simplification under the condition of keeping the visual effect difference unobvious.

Comparison of experimental results of different algorithms

	White balance	Algorithm of the invention	Unet
				PSNR/SSIM	17.668/0.207	28.956/0.695	29.012/0.703
Training time	--	16h	24h
				Run time	--	0.04s	0.1s

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. The low-illumination image processing method based on the simplified Unet full convolution neural network is characterized by comprising the following steps of:

the input layer of the full convolution neural network FCN model in the step two is as follows: receiving 4-channel image data;

convolutional layer 1: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 32, the convolution step length s =1, valid is selected by padding;

a pooling layer 1: max pooling is selected, same is selected using a maximum pooling size of 2 × 2, step length s =2, padding;

and (3) convolutional layer 2: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 64, the convolution step length s =1, valid is selected by padding;

and (3) a pooling layer 2: max pool ing was chosen using a maximum pooling size of 2 × 2, step length s =2; padding selects same;

convolution layer 3: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 128, the convolution step length s =1, valid is selected by padding;

a pooling layer 3: max pool is selected, maximum pooling size is used of 2 × 2, step length s =2, padding selects same;

and (4) convolutional layer: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 256, the convolution step length s =1, and valid is selected by padding;

and (4) a pooling layer: max pool is selected, maximum pooling size is used of 2 × 2, step length s =2, padding selects same;

a hierarchical structure: after the Crop operation is carried out on the convolution layer 3, cascading with the convolution layer 6, fusing the high-resolution characteristic diagram and the low-resolution characteristic diagram, and further splicing the two to be used as the input of the next convolution layer;

convolution layer 7: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 128, the convolution step length s =1, valid is selected by padding;

the stage structure is as follows: after the Crop operation is carried out on the convolutional layer 1, cascading with the convolutional layer 8, fusing the high-resolution characteristic diagram and the low-resolution characteristic diagram, and further splicing the two to be used as the input of the next convolutional layer;

convolutional layer 9: the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 32, the convolution step length s =1, valid is selected by padding;

the output layer of the full convolution neural network FCN model in the second step is a convolution layer 10: the size of a convolution kernel is 1 multiplied by 1, the number of the convolution kernels is 12, 12 channels are output, each feature map is RGB components with the size of 1/12, and the number of the output feature maps is changed into 1 by using the convolution kernel with the size of 1 multiplied by 1;

2. The reduced Unet fully convolutional neural network-based low-light image processing method of claim 1, wherein: the step one of acquiring the dark image of the short exposure and the corresponding clear image of the long exposure and brightness comprises the following steps:

and 2, step: setting the exposure time parameters of the camera to be 0.1s,0.04s and 0.033s respectively, and carrying out short-exposure shooting respectively; setting the exposure time parameter of the camera to be 10s, and carrying out long exposure shooting;

3. The reduced Unet full convolution neural network-based low-light image processing method of claim 1, wherein: the training of the convolution network FCN model by using the image data set generated in the first step in the third step comprises the following steps:

4. The reduced Unet full convolution neural network-based low-light image processing method of claim 1, wherein: the training algorithm comprises:

f (x) = max (0.2x, x); wherein x is a natural number;

3) The loss function is constructed such that,

4) Processing the loss function using an ADAM optimizer;