CN111161178A

CN111161178A - Single low-light image enhancement method based on generation type countermeasure network

Info

Publication number: CN111161178A
Application number: CN201911361967.1A
Authority: CN
Inventors: 朱宁波; 程秋锋; 蒲斌
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2020-05-15

Abstract

The invention provides a single low-light image enhancement method based on a generating type countermeasure network, which comprises the following steps: collecting a training data set: acquiring a low-light image and a normal-light image from the same scene by changing the exposure time and the sensitivity of a camera, wherein the low-light image and the normal-light image of the same scene form an image group; processing of the data set: removing the misaligned image pair caused by uncontrollable factors such as camera shake, object movement and the like; constructing a low-light image enhancement model and a loss function based on a generative countermeasure network; training a low-light image enhancement model; and inputting the low-light image to be enhanced into the trained low-light image enhancement model to obtain an enhanced normal-light image. Compared with the related technology, the single low-light image enhancement method based on the generation type countermeasure network can obtain vivid and clear high-quality images, and the image generation speed is high.

Description

Single low-light image enhancement method based on generation type countermeasure network

Technical Field

The invention relates to the field of image enhancement, in particular to a single low-light image enhancement method based on a generative countermeasure network.

Background

High quality images, of course, play a crucial role in computer vision tasks such as object detection and scene understanding. However, images obtained in reality tend to be degraded in some cases, for example, images taken under low light conditions, always using very low contrast and brightness, which increases the difficulty of subsequent high-level tasks. In image capture, insufficient lighting can significantly reduce the visibility of the image, and lost detail and low contrast can not only cause an unpleasant subjective experience, but also impair the performance of many computer vision systems designed for normal-light instant messaging.

In the image acquisition process, due to various reasons such as a scanning system, a photoelectric conversion system or a field environment, the problems of insufficient illumination, too low light environment, limited performance of photographic equipment, improper equipment configuration and the like exist for a long time. In the past decades, many researchers have been working on the problem of low-light image enhancement, and in order to improve the subjective and objective quality of low-light images, many techniques have been developed, such as histogram equalization, retinal theory-based methods. In recent years, with the development of a deep neural network, more and more image enhancement technologies based on the deep neural network are proposed, a generation countermeasure network is also a network model which is proposed in recent years and is very suitable for solving image conversion, style migration and image generation, a plurality of excellent classical generation countermeasure networks are proposed so far, and the problem of low-light image enhancement by using the generation countermeasure network is very suitable.

However, existing generative countermeasure networks still suffer from some non-negligible drawbacks in addressing low-light image enhancement: firstly, the domain knowledge is excessively depended on, and the model complexity is high; secondly, the enhanced image has low quality, the detail texture is not clear enough, the color is not vivid enough, and the like.

Therefore, it is necessary to provide a new single low-light image enhancement method based on a generative countermeasure network to solve the above problems.

Disclosure of Invention

In response to the above-identified deficiencies in the art or needs for improvement, the present invention provides a single low-light image enhancement method based on a generative confrontation network.

A single low-light image enhancement method based on a generative countermeasure network comprises the following steps:

step S1, collecting a training data set: acquiring a low-light image and a normal-light image from the same scene by changing the exposure time and the sensitivity of a camera, wherein the low-light image and the normal-light image of the same scene form an image group;

step S2, processing of data set: removing the image groups which are not aligned due to uncontrollable factors such as camera shake, object movement and the like;

step S3, constructing a low-light image enhancement model and a loss function based on the generation type countermeasure network, wherein the low-light image enhancement model comprises a generator network G and a local discriminator network D_localAnd global arbiter network D_global；

Step S4, training a low-light image enhancement model: randomly inputting the image pairs in the processed data set into the low-light image enhancement model for training, and repeatedly iterating through a minimum generator network G or a discriminator network D_localAnd D_globalTraining the low-light image enhancement model until the low-light image enhancement model reaches a Nash equilibrium state, and finishing the training;

and step S5, inputting the low-light image to be enhanced into the trained low-light image enhancement model, and obtaining the enhanced normal light image.

Preferably, the step S1 specifically includes: in the same scene, two normal light images are firstly shot, which are recorded as N1 and N2, then a plurality of low light images are shot by reducing the exposure time and the sensitivity of the camera, and then the exposure time and the sensitivity of the camera are reset to shoot the two normal light images, which are recorded as N3 and N4, and N1, N2, N3, N4 and the plurality of low light images of the same scene form an image group.

Preferably, the step S2 includes the following steps:

step S21, by formula

Calculating a reference normal light image R of each image pair;

step S22, by formula

Calculating the mean square error M between N1-N4 and R:

step S23, determining whether the mean square error M of each image group exceeds a preset threshold, if so, removing the image group, and if not, retaining the image group.

Preferably, the step S4 includes the following steps:

step S41, randomly selecting an image group, randomly selecting a low light image and a real normal light image from the image group to form an image pair, and extracting the illumination intensity characteristics of the low light image and the low light image in the image pair;

step S42, inputting the illumination intensity characteristic and the low light image into a generator network G together to generate an enhanced normal light image;

step S43, using the generated normal light image and the real normal light image and the corresponding low light image and the illumination intensity feature as the discriminator network D_localAnd D_globalThe input of (1);

step S44, according to the discriminator network D_localAnd D_globalCalculating a loss function and optimizing parameters of the low-light image enhancement model according to the calculation result of the loss function;

and step S45, repeating the steps S41-S44 until the low-light image enhancement model reaches a Nash equilibrium state.

Preferably, the generator network G is a Unet + + network with a codec structure, and includes up-sampling, short-connection, long-connection, and down-sampling, the generator network G performs down-sampling for multiple times, and then performs down-sampling for multiple times, the same convolution kernel, step length, and padding are used for each down-sampling and up-sampling, the LeakyReLU function is used for activation after each down-sampling convolution, the ReLU function is used for each up-sampling convolution, the Tanh function is used for activation for the last layer, except that the first layer of down-sampling and the last layer of up-sampling all use instance regularization accelerated training, and the down-sampling and the up-sampling share features through short-connection and long-connection:

wherein x^(i，j)Represents the jth characteristic diagram of the ith layer, and Conv represents convolution operation.

Preferably, the loss function includes a conditional generative confrontation loss function, a least squares GAN loss function, a L1 loss function, a content loss function, a color loss function, and a total loss function, wherein:

the conditional generative penalty function is:

wherein

Representing a mathematical expectation, x is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, D (x, x)_grayY) is the discrimination output of the discriminator network D on the real normal light image, G (x, x)_gray) For the normal light image generated, D (x, x)_gray,G(x,x_gray) Is the discrimination output of the discriminator network D for the generated normal light image;

the least squares GAN loss function is:

wherein

Representing a mathematical expectation, x is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, G (x, x)_gray) For the normal light image generated, D (x, x)_grayY) is the discrimination output of the discriminator network D on the real normal light image, D (x, x)_gray,G(x,x_gray) For the discriminatory output of the discriminator network D on the generated normal light image,

representing a loss function that minimizes the arbiter network D,

a loss function representing a minimum generator network G;

the L1 loss function is:

wherein

Representing the mathematical expectation, | | | | purple₁Expressing L1 loss function for specification of y and G (x, x)_gray) X is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, G (x, x)_gray) Is the generated normal light image;

the content loss function is:

wherein

Representing the mathematical expectation, | | | | purple₂Expressing the L2 loss function for the specification of Φ (y) and Φ (G (x, x)_gray) X is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, G (x, x)_gray) For the generated normal light image, Φ (y) represents a feature map of a real normal light image extracted using the VGG19 network, Φ (G (x, x)_gray) A feature map representing a generated normal light image extracted using a VGG19 network;

the color loss function is:

wherein

Representing the mathematical expectation, | | | | purple₂Representing L2 loss function for specification G (x, x)_gray)_blurAnd y_blurX is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, G (x, x)_gray)_blurIndicating that the generated normal light image is taken with Gaussian blur, y_blurRepresenting that the real normal light image is subjected to Gaussian blur;

the overall loss function is:

in the formula of_l1、λ_content、λ_colorWeighting parameters of L1 loss, content loss and color loss respectively, and taking lambda as_l1＝100、λ_content＝10、λ_color＝100，

A loss function representing a least squares GAN is shown,

the L1 loss function is represented,

a function representing the loss of content is represented,

representing a color loss function.

Preferably, in step S41, the image is preprocessed before being input into the low-light image enhancement model for training, where the preprocessing includes normalizing the low-light image and the real normal-light image, and extracting the illumination intensity feature of the low-light image for training, specifically: will lower the light patternThe pixel matrix is divided by 255 so that the pixel values are normalized to [0, 1%]Is marked as x_scaleThen using the formula

Each element is distributed to [ -1,1 [ ]]Where channel represents the three channels, x, of the image_normExpressing normalized low light image, mean 0.5, std 0.5, true normal light image also making normalization operation, finally using formula

Obtaining the illumination intensity characteristic, wherein r is x_norm ^R+1，g＝x_norm ^G+1，b＝x_norm ^B+1，x_norm ^R、x_norm ^GAnd x_norm ^BThe light is the value, x, of the red channel image, the green channel image and the blue channel image of the low-light image after normalization operation_grayRepresenting the extracted illumination intensity characteristics.

Compared with the related art, the single low-light image enhancement method based on the generation type countermeasure network has the following beneficial effects:

1. the improved Unet + + network is used as a generator network, so that not only can image characteristics of each level be learned, but also the model parameters are fewer, and the model training speed and the image generation speed are greatly improved;

2. problems in the generation process are comprehensively considered, and various loss functions are applied to enable the generation network to generate vivid and clear high-quality images;

3. an attention mechanism is introduced, so that the network can sense the illumination intensity of each area of the low-light image, the illumination intensity of each area of the generated image is adaptively adjusted, and a vivid normal-light image is generated.

Drawings

FIG. 1 is a flow chart of a single low-light image enhancement method based on a generative countermeasure network according to the present invention;

FIG. 2 is a schematic diagram of a generator network in the low-light image enhancement model according to the present invention;

FIG. 3 is a schematic diagram of a structure of a discriminator network in the low-light image enhancement model according to the present invention;

fig. 4 is a comparison of a low light image and an image enhanced using the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Referring to fig. 1, the present invention provides a single low-light image enhancement method based on a generative countermeasure network, comprising the following steps:

specifically, in the same scene, two normal light images are firstly shot, which are recorded as N1 and N2, then a plurality of low light images are shot by reducing the exposure time and the sensitivity of the camera, and then the two normal light images are shot by resetting the exposure time and the sensitivity of the camera, which are recorded as N3 and N4, and N1, N2, N3, N4 and the plurality of low light images of the same scene form an image group. The more image groups in the data set, the better, different scenes can be changed, and a plurality of image groups are collected.

specifically, the step S2 includes the following steps:

step S21, by formula

Calculating a reference normal light image R of each image pair;

step S22, by formula

Calculating the mean square error M between N1-N4 and R:

step S23, determining whether the mean square error M of each image group exceeds a preset threshold, if so, removing the image group, and if not, retaining the image group. Preferably, the threshold is typically set to 0.1.

The processed data set was randomly divided into 2 parts, 90% of which were used as training set (Train set) and 10% of which were used as verification set (Validation set).

Step S3, constructing a low-light image enhancement model and a loss function based on the generation type countermeasure network, wherein the low-light image enhancement model comprises a generator network G and a discriminator network D, and the discriminator network D comprises a local discriminator network D_localAnd global arbiter network D_global；

In particular, the model design is based on a generative countermeasure network, comprising a generator network G consisting of a modified Unet + + network, a local arbiter network D_localAnd a global arbiter network D_global。

Please refer to fig. 2, which is a schematic structural diagram of a generator network in the low-light image enhancement model according to the present invention. The generator network G adopts a Unet + + network with a codec structure, including down-sampling, short connection, long connection, and up-sampling. The down sampling can increase robustness to small disturbances of an input image, reduce the risk of overfitting, reduce the amount of calculation, increase the size of a receptive field, the up sampling can restore and decode abstract features to the size of an original image, finally, a segmentation result is obtained, and the importance of the features of different depths of the image can be automatically learned and utilized through short connection and long connection. The invention improves the structure of the Unet + + network, so that the Unet + + network can be used as a generation network to complete the function of generating pictures.

Preferably, the generator network G performs downsampling for a plurality of times, and then performs downsampling for a plurality of times, each downsampling and upsampling uses the same convolution kernel, step size and padding, for example, each downsampling uses a convolution kernel of 4x4, the step size is 2, the padding is 1, each downsampling is performed, the image size becomes half of the original image size, the number of channels becomes twice of the original image size (the number of channels is set to 64 in the first downsampling), then performs upsampling for four times, the convolution kernel of 4x4 is also used, the step size is 2, the padding is 1, the image size becomes twice of the original image size in each upsampling, and the number of channels becomes half of the original image size (the number of channels in the last downsampling is 3. The method comprises the following steps that after convolution every time of downsampling, an LeakyReLU function is used for activation, after convolution every time of upsampling, a ReLU function is used, the last layer is activated by using a Tanh function, except that the first layer of downsampling and the last layer of upsampling are both trained in an acceleration mode by using an instance normalization, and features are shared between downsampling and upsampling through short connection and long connection:

Please refer to fig. 3, which is a schematic diagram of a structure of a discriminator network in a low-light image enhancement model according to the present invention, wherein a dual-scale discriminator network is adopted, one of which is a local discriminator network D_localAnother global arbiter network D_global. Local arbiter network D_localAnd global arbiter network D_globalThe invention takes the generated normal light image and the real normal light image as well as the corresponding low light image and the illumination intensity characteristic as input to be sent into a local discriminator network D_localTraining, simultaneously performing 2-time down-sampling on the generated normal light image, the real normal light image, the corresponding low light image and the illumination intensity characteristic, and sending the images into a global discriminator network D_globalTraining, although the structures of the arbiter networks are all the same, the global arbiter network D_globalHas the maximum receptive fieldThis means that there is more global perspective information of the image, which can lead the generator network to generate globally consistent images, and on the other hand, the local discriminator network D_localCapturing high frequency information such as local textures and patterns encourages the generator network to be able to generate finer details. In the embodiment of the invention, the arbiter network structure uses the concept of PatchGan, and uses a five-layer convolution structure, the first three layers use convolution kernels of 4x4, the step size is 2, the padding is 1, the second two layers set the step size to 1, each layer uses LeakyReLU as an activation function, except the first layer and the last layer use instance regularization accelerated training.

The loss function is designed as follows:

the resistance loss: the generation network G receives a low-light image, inputs and outputs a generation image, hopefully cheats the discriminator network as much as possible, and the discriminator network D distinguishes the real target image and the output of the generation network as much as possible, so that a game process of the two networks is formed, and the condition generation counteracts the loss function of the networks as follows:

wherein

Representing a mathematical expectation, x is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, D (x, x)_grayY) is the discrimination output of the discriminator network D on the real normal light image, G (x, x)_gray) For the normal light image generated, D (x, x)_gray,G(x,x_gray) Is output for the discriminator network D to discriminate the generated normal light image.

The generator network G wants the images generated by itself to fool the discriminator network D as much as possible, so it is desirable for G to be minimized

The discriminator network D is intended to distinguish between the normal light image and the image generated by the generator network as much as possibleSo it is desirable to maximize for D

In order to stabilize the training of the network, the invention adopts least square GAN loss as the countermeasure loss:

for arbiter network D, minimization

And the loss function of the upper half part enables the output value of the real target image after passing through the network D to be as close to 1 as possible, and the output value of the generated image after passing through the network D to be as close to 0 as possible. For the generator network G, minimize

The lower half loss function is such that the output value of the generated image after passing through the network D is as close to 1 as possible, where

representing a loss function that minimizes the arbiter network D,

representing a loss function that minimizes the generator network G.

Loss of L1: in order to make the generated picture as close to the real target image as possible, the L1 loss of the generated image and the real target image is taken, and the formula is as follows:

wherein

Representing the mathematical expectation, | | | | purple₁Expressing L1 loss function for specification of y and G (x, x)_gray) X is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, G (x, x)_gray) Is the normal light image generated.

Calculating content loss by respectively extracting feature maps of a generated image and a target real image by using a pre-trained VGG19 network, wherein the deeper the feature maps, the more abstract the image features are, the feature maps of the 7 th convolutional layer are taken to calculate the content loss, and a function of the extracted features is defined as phi (x), so that the content loss function is as follows:

wherein

Representing the mathematical expectation, | | | | purple₂Expressing the L2 loss function for the specification of Φ (y) and Φ (G (x, x)_gray) X is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, G (x, x)_gray) For the generated normal light image, Φ (y) represents a feature map of a real normal light image extracted using the VGG19 network, Φ (G (x, x)_gray) Characteristic map representing the generated normal light image extracted using the VGG19 network.

Color loss: in order to make the color of the generated image as close to the real target image as possible, the method firstly performs Gaussian blur on the generated image and the real target image, removes texture details and only retains color information, and then takes the L2 loss between the two as color loss:

wherein

Representing the mathematical expectation, | | | | purple₂Representing L2 loss function for specification G (x, x)_gray)_blurAnd y_blurX is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, G (x, x)_gray)_blurIndicating that the generated normal light image is taken with Gaussian blur, y_blurIndicating that the true normal light image is gaussian blurred.

The overall loss function is:

A loss function representing a least squares GAN is shown,

the L1 loss function is represented,

a function representing the loss of content is represented,

representing a color loss function.

Step S4, training a low-light image enhancement model: randomly inputting the image pairs in the processed data set into the low-light image enhancement model for training, and repeatedly iterating to generate the image pairs through minimizationDevice network G or discriminator network D_localAnd D_globalTraining the low-light image enhancement model until the low-light image enhancement model reaches a Nash equilibrium state, and finishing the training;

specifically, the step S4 includes the following steps:

step S43, using the generated normal light image and the real normal light image as the discriminator network D_localAnd D_globalThe input of (1);

step S44, according to the discriminator network D_localAnd D_globalCalculating a loss function, and adjusting parameters of the low-light image enhancement model according to the calculation result of the loss function;

The image is preprocessed before being input into the low-light image enhancement model for training, wherein the preprocessing comprises normalizing the low-light image and a real normal-light image, and simultaneously extracting the illumination intensity characteristic of the low-light image for training.

The method specifically comprises the following steps: dividing the low-light image matrix by 255 normalizes the pixel values to [0,1]Is marked as x_scaleThen using the formula

Each element is distributed to [ -1,1 [ ]]Where channel represents the three channels, x, of the image_normRepresenting the normalized low-light image, mean is 0.5, std is 0.5, the true normal-light image is also subjected to the normalization operationDo, use the formula finally

And step S5, inputting the low-light image to be enhanced into the trained low-light image enhancement model to obtain the enhanced normal-light image.

And randomly selecting the low-light images in the verification set, and inputting the low-light images into a trained low-light image enhancement model for verification.

FIG. 4 is a graph showing a comparison of a low light image and an image enhanced using the method of the present invention. It is obvious from fig. 4 that the generated image enhanced by the method of the present invention is close to the normal light image, the generated image has high image quality, clear detail texture and vivid color, which shows that the method of the present invention can obtain high quality image.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A single low-light image enhancement method based on a generative countermeasure network is characterized by comprising the following steps:

Step S4, training a low-light image enhancement model: randomly inputting the image group in the processed data set into the low-light image enhancement model for training, and repeatedly iterating to train the low-light image enhancement model through the loss function value of the minimum generator network G or the discriminator network D until the low-light image enhancement model reaches a Nash equilibrium state, and finishing training;

2. The single low-light image enhancement method according to claim 1, wherein the step S1 is specifically as follows: in the same scene, two normal light images are firstly shot, which are recorded as N1 and N2, then a plurality of low light images are shot by reducing the exposure time and the sensitivity of the camera, and then the exposure time and the sensitivity of the camera are reset to shoot the two normal light images, which are recorded as N3 and N4, and N1, N2, N3, N4 and the plurality of low light images of the same scene form an image group.

3. The single low-light image enhancement method according to claim 2, wherein the step S2 includes the steps of:

step S21, by formula

Calculating a reference normal light image R of each image pair;

step S22, by formula

Calculating the mean square error M between N1-N4 and R:

4. The single low-light image enhancement method according to claim 3, wherein the step S4 comprises the steps of:

step S43, using the generated normal light image, the real normal light image, the corresponding low light image and the illumination intensity characteristic as the discriminator network D_localAnd D_globalThe input of (1);

5. The method for enhancing a single low-light image according to claim 1, wherein the generator network G is a net + + network with a codec structure and includes up-sampling, short-connection, long-connection and down-sampling, the generator network G performs down-sampling a plurality of times and then down-sampling a plurality of times, the same convolution kernel, step size and padding are used for each down-sampling and up-sampling, the down-sampling is activated by using a LeakyReLU function after each convolution, the up-sampling is activated by using a ReLU function after each convolution, the last layer is activated by using a Tanh function, except that the first layer of the down-sampling and the last layer of the up-sampling are both accelerated training by using instance regularization, and the features are shared between the down-sampling and the up-sampling through the short-connection and the long-connection:

6. The single low-light image enhancement method according to claim 4, wherein the loss function includes a conditional-generation countermeasure loss function, a least-squares GAN loss function, an L1 loss function, a content loss function, a color loss function, and a total loss function, wherein:

the conditional generative penalty function is:

wherein

the least squares GAN loss function is:

wherein

representing a loss function that minimizes the arbiter network D,

a loss function representing a minimum generator network G;

the L1 loss function is:

wherein

Representing the mathematical expectation, | | | | purple₁Expressing L1 loss function for the specification y andG(x,x_gray) X is a low light image, x_grayFor extracted illumination intensity features, y is the true normal light image, G (x, x)_gray) Is the generated normal light image;

the content loss function is:

wherein

the color loss function is:

wherein

the overall loss function is:

A loss function representing a least squares GAN is shown,

the L1 loss function is represented,

a function representing the loss of content is represented,

representing a color loss function.

7. The method of claim 4, wherein in step S41, the image is preprocessed before being input into the low-light image enhancement model for training, the preprocessing includes normalizing the low-light image and the real normal-light image, and extracting the illumination intensity feature of the low-light image for training, specifically: dividing the low-light image matrix by 255 normalizes the pixel values to [0,1]Is marked as x_scaleThen using the formula

Obtaining the illumination intensity characteristic, wherein r is x_norm ^R+1，g＝x_norm ^G+1，b＝x_norm ^B+1，x_norm ^R、x_norm ^GAnd x_norm ^BThe normalized values, x, of the red channel image, the green channel image and the blue channel image of the low-light image respectively_grayRepresenting the extracted illumination intensity characteristics.