CN115035011A

CN115035011A - Low-illumination image enhancement method for self-adaptive RetinexNet under fusion strategy

Info

Publication number: CN115035011A
Application number: CN202210644966.3A
Authority: CN
Inventors: 尹学辉; 陈巧玉; 涂戈; 赵锡琰; 田璐; 唐逸航
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2022-09-09

Abstract

The invention relates to the technical field of image processing, in particular to a low-illumination image enhancement method of a self-adaptive RetinexNet under a fusion strategy, which comprises the steps of inputting a V-channel image and a normal light image of a low-illumination image into a Decommet to obtain the illumination and reflectivity of the image; inputting the reflectivity of the low-illumination image and illumination into a RestorationNet, and using the illumination to guide the reflectivity to reduce noise to obtain the reflectivity after noise reduction; inputting the reflectivity and illumination of the low-illumination image into an EnhanceNet, and enhancing the illumination of the low-illumination image to obtain enhanced illumination; reconstructing an image to obtain a coarse enhanced image; acquiring a virtual overexposure image of the low-illumination image, and fusing the virtual overexposure image with the low-illumination image and the coarse enhancement image; the invention improves the color distortion phenomenon after image enhancement, and meets the requirement of effectively inhibiting noise while retaining the edge structure and detail information.

Description

Low-illumination image enhancement method of self-adaptive RetinexNet under fusion strategy

Technical Field

The invention relates to the technical field of image processing, in particular to a low-illumination image enhancement method of a self-adaptive RetinexNet under a fusion strategy.

Background

The popularization of the internet enables people to rapidly enter an information era, people have greater and greater requirements on various information, the information acquired through a human visual system approximately reaches 75% of the total amount of the information acquired by the people, and the image is one of important carriers for human eyes to acquire the information. As an important information carrier, image processing technology has become a popular field of research in order to enable images to meet the needs of various application fields. In real life, due to factors such as the equipment and environment for acquiring images, some low-quality pictures are often obtained. Some of the reasons are caused by abnormal weather factors, and some of the reasons are caused by equipment (such as underexposure). Such images generally suffer from overall darkness, poor contrast, and lack of detail, which can affect the viewing of the image content and the subsequent use of the image.

In order to extract important information in a low-quality image as much as possible, it is necessary to perform image processing on such an image, and image processing techniques such as image enhancement have been developed. The content of the image is reproduced by utilizing the image enhancement technology, so that on one hand, the visual experience can be improved, and the visual appreciation requirements of people are met; on the other hand, the image enhancement is one of the preprocessing means of computer vision, the detail information in the image is reproduced, and the accuracy of production application such as detection and identification in the computer vision field, pathological characteristic information extraction in the biomedical field and the like can be greatly improved.

Image enhancement in a low-illumination environment currently comprises histogram equalization enhancement, wavelet transform image enhancement, Retinex theory enhancement and the like. In comparison, the image enhancement based on the Retinex theory has a good enhancement effect on most images, and the application range is wide. The Retinex image enhancement method has good effects on night images, fog images, low-illumination images and the like. But also has a more obvious color distortion phenomenon and loses certain edge detail information.

Disclosure of Invention

In order to improve the details and contrast of an image and enable the image to contain rich texture details and good visual effect, the invention provides a low-illumination image enhancement method of a self-adaptive RetinexNet under a fusion strategy, which specifically comprises the following steps:

acquiring an original image and a synthesized low-illumination image corresponding to the original image from historical data, taking the original image as a normal light image, and taking the synthesized low-illumination image as a low-illumination image;

inputting the V-channel image of the low-illumination image and the normal light image into a DecomNet to obtain the illumination and reflectivity of the normal light image and the illumination and reflectivity of the low-illumination image;

inputting the reflectivity and illumination of the obtained low-illumination image into a RestorationNet, and using the illumination to guide the reflectivity to reduce noise to obtain the reflectivity after noise reduction;

inputting the reflectivity of the low-illumination image and illumination into an EnhanceNet, and enhancing the illumination of the low-illumination image to obtain enhanced illumination;

reconstructing an image, namely synthesizing an RGB image by using color channels for an H channel, a V channel and an S channel of the optimized image, namely a coarse enhancement image;

and acquiring a virtual overexposure image of the low-illumination image, and fusing the low-illumination image, the rough enhanced image and the virtual overexposure image to obtain a final optimized enhanced image.

Further, before the image is input into the DecomNet, color channel conversion is carried out on the training set or the real-time low-illumination image to be enhanced, and the image is converted into an HSV image from an RGB image.

Further, extracting features of images input into the DecomNet by using convolution kernels with convolution kernels of 3 × 3, namely sequentially extracting features by using 5 convolution layers with ReLU and convolution kernels of 3 × 3, wherein each convolution layer is used for extracting features, and the ReLU is used for mapping the obtained features to reflectivity and illumination; and after mapping, sequentially passing a layer of convolution layer with convolution kernel of 3 × 3 and Sigmoid function to obtain an image with a channel of 4, taking the first 3 channels of the image as the reflectivity R of the image, taking the last channel as the illumination I of the image, namely projecting the reflectivity R and the illumination I from the characteristic space of the image through the layer of convolution layer with convolution kernel of 3 × 3, and using the Sigmoid function to constrain the projected value in the range of [0, 1 ].

Further, enhancing the illumination of the low-illumination image, and obtaining the enhanced illumination includes:

splicing the illumination of the low-illumination image and the reflectivity after noise reduction to be used as the input of an Enhancenet network;

acquiring context information in a large area of an input image through an encoder-decoder framework of an EnhanceNet network;

in an EnhanceNet network, an input image is downsampled to different sizes by adopting three downsampling modules; for example, if the size of the original image is 600x400, the image needs to be reduced to 75x50 after 3 times of downsampling;

respectively splicing the down-sampled image and the context information, reconstructing the spliced image through up-sampling to obtain enhanced illumination, summing elements for each splicing, introducing jump connection from the up-sampling block to the corresponding mirror image up-sampling, then introducing multi-scale splicing, then adjusting the final scale through nearest neighbor interpolation, connecting the final scale to a channel characteristic diagram, and finally performing 3 x 3 convolution to obtain a final illumination diagram.

Further, Bacpropagation training Decombet by the loss function of Decombet, the loss function L of Decombet ₁ By a reconstruction loss function L _recon Reflection component uniformity loss function L _ir And the structural smoothing loss function L _is The composition, expressed as:

L ₁ ＝L _recon +λ _ir L _ir +λ _is L _is ；

L _ir ＝||R _low -R _normal || ₁ ；

wherein λ is _ir Denotes the coefficient of reflectivity uniformity, λ _is Representing an illumination smoothness coefficient; low represents the low-light image dataset and normal represents the normal-light image dataset; lambda [ alpha ] _ij Equilibrium coefficients to reconstruct losses; r _i Is the reflectance when i equals low or normal; when j is low, I _j For low-light image illumination, when j is normal, I _j Illumination that is a normal light image; when j is low, S _j Representing low-light images, when j is normal, S _j Representing a normal light image; r _low A reflectance representing a low illuminance image; r is _normal Representing the reflectivity of a normal light image; ^ represents obtaining a gradient; lambda [ alpha ] _g The perceived intensity coefficient of the balanced structure; | | non-woven hair ₂ Represents a 2 norm; | | non-woven hair ₁ Represents a 1-normal form, wherein L _recon The 1 norm in (1) is used to find the reconstruction loss of the whole training image, L _ir The 1-norm in (a) is used to find the difference between the reflectogram of all the training images; ^ represents obtaining a gradient; | | non-woven hair ₂ 2-norm is expressed for solving the modular length;

backpropagating RestorationNet through its loss function, expressed as:

wherein the content of the first and second substances,

loss function for restationnet;

is a reflection diagram after noise reduction; r _h The reflectance of a normal light image;

is composed of

And R _h A structural similarity measure between;

carrying out back propagation training EnhanceNet through a loss function of EnhanceNet, wherein the loss function of EnhanceNet is a reconstruction loss function L _recon And the structural smoothing loss function L _is The composition, expressed as:

L ₂ ＝L _recon +λ _is L _is ；

wherein L is ₂ As a loss function for EnhanceNet.

Further, after obtaining the enhanced V-channel image, performing adaptive adjustment on the S-channel image, where the adjustment process is expressed as:

s′(x,y)＝s(x,y)+t[v′(x,y)-v(x,y)]×λ(x,y)；

wherein s' (x, y) is the saturation of the pixel points at the x-th row and y-th column of the rough enhanced image; s (p, q) is the saturation of pixel points at the x-th row and y-th column of the low-illumination image; v' (x, y) is the brightness of pixel points at the x-th row and y-th column of the rough enhanced image; v (p, q) is the brightness of pixel points at the x-th row and y-th column of the low-illumination image; t is a proportionality constant; λ (x, y) is the correlation coefficient of v (p, q) and s (p, q).

Further, the correlation coefficient λ (x, y) of v (p, q) and s (p, q) is expressed as:

wherein v (p, q) is at position (p, q) in the neighborhood window of pixel point (x, y)Corresponding to the brightness of the pixel point, and s (p, q) is the saturation of the corresponding pixel point at the position (p, q) in the neighborhood window of the pixel point (x, y);

is the mean value of the luminance of the pixel point (x, y) in the neighborhood window w,

is the saturation mean of the pixel point (x, y) in the neighborhood window w; delta _v (x, y) is the variance of the luminance of the pixel point (x, y) in the neighborhood window w, δ _s (x, y) is the saturation variance of the pixel point (x, y) in the neighborhood window w; w is a window of n × n with the pixel point (x, y) as the center.

Further, a virtual overexposure image is obtained using the camera response model, as represented by:

P＝f(E)；

wherein, P is an image obtained by camera imaging, namely a virtual overexposure image; e is the irradiance of the low-illumination image; f is the nonlinear response function of the camera

Further, the process of fusing the original low-illumination image, the rough enhanced image and the virtual overexposure image to obtain a final optimized enhanced image comprises the following steps:

performing row vectorization on an original low-illumination image, a rough enhancement image and a virtual overexposure image by using an image block decomposition method;

the value with the maximum block signal intensity of the image after column vectorization is obtained and recorded as

Obtaining the expectation of the Block Structure of the column-vectorized image, denoted

Expressed as:

wherein the content of the first and second substances,

is a weighting function expressed as

To remove the image blocks of the mean, expressed as

x _k Which represents an image block or a sub-image block,

for image block x _k Average value of (d); p is a weight parameter; s. the _k Is a unit length vector; k is the exposure rate; s is _k A block structure for an image with an exposure k;

the average intensity of the block is obtained by using a weighted linear fusion mechanism and is recorded as

Expressed as:

wherein, L (. mu.) is _k ,l _k ) For imaging an image X _k Global mean value mu of _k And a current image block x _k As a weighted function of the input; l _k Representing the average intensity of pixel blocks of different exposure rates;

to be stacked

Reverting to the RGB channel, i.e. the optimized enhanced image is represented as:

the method can well depict the edge and the texture area of the image in the process of processing the texture image noise, can reserve the low-frequency information of the image as much as possible, can distinguish the high-frequency information of the image, and is suitable for the occasions of image noise reduction with complicated texture detail characteristics. Compared with the traditional RetinexNet method, the scheme of the invention also has the following advantages:

1. and enhancing the brightness component by utilizing the mutually independent characteristic of each channel in the HSV color space model.

2. And the saturation component is adaptively adjusted along with the change of the brightness component by utilizing the correlation coefficient, so that the change of the color sensation of the image is avoided.

3. On the basis of UNet, different areas of the illumination enhancement image are combined to bear different levels of noise, and a reflectivity noise reduction model is constructed.

4. A camera response model is introduced to generate a virtual overexposed image that is complementary to the original image. On the basis of the original image, the image enhancement effect is improved, the brightness is more uniform, and the image detail information is better reserved.

Drawings

Fig. 1 is a schematic flow diagram of a low-illumination image enhancement method of a self-adaptive RetinexNet under a fusion policy in the prior art;

FIG. 2 is a schematic diagram of an Enhancenet network structure adopted by the present invention;

FIG. 3 is a schematic diagram of a RestorationNet network architecture according to the present invention;

FIG. 4 is a schematic diagram of a low-illumination image enhancement method of an adaptive RetinexNet under a fusion strategy according to the present invention;

fig. 5 is a schematic diagram of the contrast of the enhanced image obtained by the method of the present invention, wherein (a) is low luminance data, (b) is (a) a coarse enhanced image obtained by the present invention, and (c) is (a) a final optimized enhanced image obtained by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The invention provides a low-illumination image enhancement method of a self-adaptive RetinexNet under a fusion strategy, which specifically comprises the following steps as shown in figure 1:

inputting the reflectivity and illumination of the low-illumination image into an EnhanceNet, and enhancing the illumination of the low-illumination image to obtain enhanced illumination;

reconstructing an image, namely synthesizing an RGB image by using color channels for an H channel, a V channel and an S channel of the optimized image, namely a coarse enhancement image; the reconstruction performed in this embodiment mainly includes three steps: obtaining an enhanced V-channel image by the enhanced illumination and the reflectivity after noise reduction according to a retinex theory; according to the enhanced V-channel image, the S-channel image is subjected to self-adaptive adjustment so as to keep the contrast of the image; the H channel image is not transformed, and an RGB image is synthesized through images of a V channel, an S channel and an H channel;

In the embodiment, the RetinexNet illumination component enhancement method for the HSV color space is provided, and the method separates a V-channel image from an RGB image, so that the distortion problem of a color image is better solved; on the basis, the design of a reflectivity recovery model based on the UNet network is provided, and the reflectivity is recovered in an auxiliary mode by utilizing illumination; and finally, introducing a camera response model and designing a fusion strategy by combining image block decomposition.

In the embodiment, by analyzing the HSV model and utilizing the mutually independent relationship among the channels, the color information of the low-illumination image is completely reserved, and the color distortion problem of the enhanced image is improved; meanwhile, the saturation is adaptively adjusted, so that color deviation is avoided; and reconstructing the image and converting the image into an RGB space to obtain a final enhancement effect. The change of the image brightness can cause the image contrast to change, so that the color deviation of the enhanced image occurs, the saturation of the image is adaptively adjusted by using a relative coefficient, and the contrast of the image is maintained:

s′(x,y)＝s(x,y)+t[v′(x,y)-v(x,y)]×λ(x,y)；

wherein the content of the first and second substances,

v (x, y) is the brightness of the pixel corresponding to the original image, v '(x, y) is the brightness of the pixel after enhancement, s (x, y) is the saturation of the pixel corresponding to the original image, s' (x, y) is the saturation of the pixel after correction, t is a proportionality constant, t is taken as 0.4 in the experiment, lambda (x, y) is the correlation coefficient of v (x, y) and s (x, y), and n x n is the size of the neighborhood window w,

and

respectively, the mean value of the brightness and saturation of the pixel point (x, y) in the neighborhood window w, delta _v (x, y) and δ _s (x, y) are respectively the variance of the brightness and saturation of the pixel point (x, y) in the neighborhood window w, v (p, q) is the brightness of the pixel point corresponding to the neighborhood window, s (p, q) is the saturation of the pixel point corresponding to the neighborhood window, and (p, q) belongs to w and represents the fieldA point pixel point in the window.

Taking the obtained V-channel image and an original normal light image as input of a DecomNet network, firstly extracting features from the input image by adopting a 3 x 3 convolution kernel, then mapping RGB images into R and I by adopting 5 3 x 3 convolution layers with ReLU, and finally performing 3 x 3 convolution and obtaining an image with a channel of 4 through a sigmoid function, wherein the images of the first 3 channels are taken as reflection components, the image of the last channel is taken as illumination components, and the reflectivity and illumination of normal light and the reflectivity and illumination of a low-illumination image are obtained; the loss function of the decomposition network model mainly comprises a reconstruction loss function, a reflection component consistency loss function and a structure smooth loss function, and specifically comprises the following steps:

L＝L _recon +λ _ir L _ir +λ _is L _is ；

wherein λ is _ir And λ _is The reflectance uniformity coefficient and the illumination smoothness coefficient are respectively expressed.

Decomposing the model into a reflection component and an illumination component, recombining to construct an original image, wherein the reconstruction loss function can be expressed as the following formula:

introducing a shared reflectivity loss function for the purpose of maintaining reflectivity consistency, which is expressed by the following formula:

L _ir ＝||R _low -R _normal || ₁ ；

weighting the original TV function by the gradient of the reflectivity diagram, and smoothing the loss function L _is The expression formula is:

wherein ∑ comprises a horizontal direction gradient (° v) _h ) And a vertical direction gradient (#) _v )，λ _g Means for indicating flatnessA constant structure perception intensity coefficient; at a weight of exp (- λ) _g ▽R _i ) In the case of (1), L _is The constraint on the smoothness of the area with larger reflectivity gradient is reduced, namely the constraint on the smoothness of the discontinuous position corresponding to the illumination is reduced at the position of the image structure, and the smoothness of the image structure is maintained, so that a clearer illumination image is obtained.

The network adjusting stage mainly comprises two parts, namely a noise reduction operation (RestorationNet) and an enhancement network (EnhanceNet), and the noise reduction function of the low-illumination reflection image and the enhancement function of the illumination image are respectively realized.

For RestorationNet, a typical 5-layer UNet structure is followed by a convolutional layer and a Sigmoid layer with a loss function of

Wherein

Representing the recovered reflection diagram, SSIM () is the structural similarity measure, and the recovery result

With the target result R _h L of ₂ Distance, the last entry, keeps texture detail information etc. consistent.

Restationnet architecture as shown in fig. 3, the input data is subjected to four cascaded convolution + pooling architectures, the convolution + pooling architecture includes a convolution module and a pooling module, the convolution module includes a convolution operation (Cov) of 3 × 3 and a Rule activation function (Rule), and the pooling module includes a convolution operation (MaxPooling) of 3 × 3, a Rule activation function and a max pooling operation (MaxPooling) of 2 × 2; the four cascaded convolution + pooling structures are followed by four convolution + UpSampling structures in a cascaded mode, each convolution + UpSampling structure comprises a convolution module and an UpSampling module, each convolution module comprises a 3 x 3 convolution operation (Cov) and a Rule activation function (Rule), each pooling module comprises a 3 x 3 convolution operation (Cov), a Rule activation function (Rule) and an UpSampling operation (UpSampling), each UpSampling module is in jump connection with each pooling module according to the diagram shown in the drawing 3, and the last stage output of each convolution + UpSampling structure is output after passing through a Sigmoid after passing through the 3 x 3 convolution operation.

And adjusting the illumination intensity by adopting an illumination enhancement network (Enhancenet), inputting a reflection component diagram and an illumination component diagram in the enhancement network, connecting the reflection component diagram and the illumination component diagram, and inputting the connected reflection component diagram and the illumination component diagram into a network layer, wherein the convolution kernel size of the convolutional layer is 3 multiplied by 3, and the convolution kernel size of the pooling layer is 2 multiplied by 2. At the moment, the image is enlarged by utilizing a nearest neighbor interpolation method by combining the U-Net thought, and upsampling is carried out. The size of the feature graph combined with the feature graph is ensured to be consistent, the feature graphs are correspondingly added, then feature fusion is carried out, the feature graph with more complete detail storage is obtained, finally, gradient descent is used for carrying out end-to-end fine adjustment on the network, the whole encoding and decoding structure is adopted to obtain image information, the input image can be continuously sampled downwards, and a large number of illumination reflection images are obtained.

As shown in fig. 2, an EnhanceNet network adopted in this embodiment is an encoder-decoder architecture as a whole, input data is subjected to feature extraction by a convolution layer with a convolution kernel of 3 and a step length of 1, the extracted data is sequentially input into a first downsampling, a second downsampling and a third downsampling, each downsampling is composed of a convolution layer with a step length of 2 and an activation function, that is, a downsampling structure is Conv + ReLU; after the third down-sampling, processing the third down-sampling by a convolution layer with convolution kernel of 3, up-sampling, after each up-sampling, processing the third up-sampling by a convolution layer with convolution kernel of 3, after the first up-sampling, processing the first up-sampling by a convolution layer with convolution kernel of 3, jump-connecting the first up-sampling output with the second up-sampling output to be used as the input of the second up-sampling, using the spliced data as the output result of the first down-sampling, and so on, after the output of the convolution layer after the second up-sampling is jump-connected with the output of the first down-sampling to be used as the output of the second up-sampling, after the output of the convolution layer after the third up-sampling is jump-connected with the input of the first down-sampling to be used as the output of the third up-sampling, wherein each up-sampling layer uses resize-conjugation, that is, the up-sampling layer is composed of a structure of nearest interpolation operation and a convolution layer with step length of 1 and an activation function, i.e., the Conv + ReLU structure; splicing the outputs of the first up-sampling, the second up-sampling and the third up-sampling, and reducing the cascade characteristic into C channels by passing the spliced data through a 1 x 1 convolution layer; finally, the illumination map is reconstructed by using the convolution layer of 3 x 3.

Therefore, a large amount of illumination information is utilized, a local illumination distribution image is reconstructed through up-sampling, an improved illumination image is obtained, and meanwhile, multi-scale connection is introduced to improve the adaptivity of a network model.

Similar to the decomposition model loss function, the loss function of the enhanced network model is also mainly composed of a reconstruction loss function and a structure smoothing loss function, and the expression is shown in the following formula:

L＝L _recon +λ _is L _is ；

and mapping the finally obtained enhanced brightness component image, the H-channel image and the S-channel image into an RGB space to obtain a coarse enhanced image.

Under the same conditions, a better exposed image may provide more detailed information. Thus, by constructing a camera response model as a complement to the original image, a virtual overexposed image can be obtained. The model of the camera response model is:

P＝f(E)；

wherein E is the irradiance of the image, P is the image obtained by imaging of the camera, and f is the nonlinear response function of the camera.

For the low-illumination image enhancement problem, the functional form of f can be obtained indirectly by modeling the luminance transfer function (BTF). BTF is two images P of different exposures in the same scene ₀ And P ₁ A mapping function between.

Where g is a luminance conversion function, k is an exposure rate, and β and γ are parameters determined by the camera parameters and the exposure rate k. By solving the above equation, a camera response model can be obtained, namely:

most cameras can be accommodated when a-0.3293 and b-1.1258. In order to use the input image and the generated image to express as much information as possible, it is necessary to find the optimal exposure k so that the composite image is well exposed where the original image is underexposed.

Determining the optimal exposure according to the principle of' image entropy maximization

The image entropy is expressed as:

the image entropy maximization process is represented as:

wherein the content of the first and second substances,

representing the entropy of the image, N being the maximum value of the grey value of the image, p _i Representing the probability of occurrence of a gray value i;

the image entropy of the luminance transfer function g (B, k) is represented.

At the time of obtaining the optimum exposure

Thereafter, a virtual overexposure Δ k can be obtained, and then a virtual overexposure image is generated using a luminance transfer function (BTF). The parameter ak is set to 0.5.

There are three images that need to be fused. For each image p, a row vectorization is performed using the method of image block decomposition, expressed as:

P＝c·s+l；

where c is the block signal strength, s is the block structure, and l is the average strength of the block.

Considering that all input pixel blocks are true images of the scene, the visibility is best for the pixel block with the highest contrast. Therefore, the required signal intensity of the fusion image block is determined by the maximum value of the signal intensity in all the source image blocks, and the value with the maximum block signal intensity of the image after column vectorization is obtained and recorded as the maximum value

(generally, the higher the contrast, the better the visibility. considering that all input source image tiles are true captures of the scene, the tile with the highest contrast of them will correspond to the best visibility.

Wherein the content of the first and second substances,

denotes x _k Of (2), where { x _k }＝{x _k K is equal to or less than 1.k.sub.k.sub.is a set of image blocks extracted at the same spatial position of a source sequence containing K multi-exposure images. Here, x of all K _k Is CN ² A column vector of dimensions, where C is the number of color channels of the input image and N is the spatial size of the square patch.

Structural vector s of unit length, unlike signal strength _k Pointing to CN ² A particular direction in dimensional space. The expected structure of the fused image block is expected to represent the structures of all original image blocks best, and the block structure value of the column-vectorized image is calculated through the expectation of the block structure of the column-vectorized image. A simple implementation of this relationship is as follows:

wherein, the first and the second end of the pipe are connected with each other,

is a weighting function that determines the contribution of each original image block in the fused image block structure; k is the exposure rate; s _k Is a unit length vector. The contribution increases with the intensity of the image patch, using a power weighting function given by:

wherein, P is a weighting parameter, P is more than or equal to 0 and is an index parameter, and weighting functions with different physical meanings are generated along with different choices of the value of P. The larger p, the more image blocks with relatively greater intensity.

Regarding the average intensity of the local image, the average intensity of the blocks obtained by the weighted linear fusion mechanism is recorded as

Expressed as:

wherein, L (. mu.) is _k ,l _k ) Is to image X _k Global mean value μ of _k And current patch x _k As a weighted function of the input; l _k Representing the average intensity of a block of pixels with an exposure k. L (. mu.) _k ,l _k ) Quantize X _k In x _k So as to be at X _k Or x _k As a preferred embodiment, the present embodiment uses a two-dimensional gaussian distribution to specify this measure, which is expressed as:

wherein σ _g And σ _l Control the curve edge mu separately _k And l _k Distribution of dimensions, μ _c And l _c Is a constant of medium intensity value, where medium intensity refers to a median value between the maximum and minimum values of a parameter, for example, if the parameter has a value in the range of [0, 1%]Then the median intensity value in this parameter is 0.5, mu _c And l _c Is respectively based on mu _k And l _k Is determined.

To be stacked

the embodiment also provides a specific implementation example, a deep learning framework used in the example is Tensorflow 1.13GPU, a NumPy computing library and a PIL image processing library are installed, software development environments of experiments are Pycherm 2019 and python3.7, and implementation results are shown in FIG. 5, wherein (a) is a low-illumination image, (b) is a coarse enhanced image, and (c) is a final enhanced image, so that the image processed by the method has higher detail information and smaller image distortion phenomenon, and the image quality is effectively improved.

The implementation process of the embodiment proposed by the present invention is shown in FIG. 4, and is based on S in the training connection process _normal Generating a V-channel image S of a corresponding low-illumination image _low Generating reflectivity and illumination according to Decompsition in training process, and generating reflectivity and illumination according to normal image S _normal Illumination of (I) _normal V-channel image S corresponding to low-illumination image _low Illumination of (I) _ow And a normal image S _normal Reflectivity of (2) _normal V-channel image S corresponding to low-illumination image _low Reflectivity of (2) _ow The difference between the two parameters is used for updating network parameters, a restationNet and an Enhancenet are updated in the same way, an S channel image of a low-illumination image is adaptively adjusted according to an obtained rough enhanced image of a V channel, the rough enhanced image of the V channel, the adaptively adjusted S channel image and an H channel image are synthesized into an RGB image, namely a rough enhanced image, a virtual overexposure image is generated according to the low-illumination image, the virtual overexposure image, the low-illumination image and the rough enhanced image are synthesized into a final enhanced image, and the loss of the image and a normal image is used for updating the network parameters; in a real-time data stage, a low-illumination image is used as input, the reflectivity and illumination corresponding to the image of a V channel of the low-illumination image are obtained through Decomposion which training is completed, the reflectivity is denoised through RestorationNet, the illumination is enhanced through EnhanceNet, the denoised reflectivity and the enhanced illumination are synthesized to obtain an enhanced image of an S channel, the image of the S channel of the low-illumination image is subjected to self-adaptive enhancement according to the coarsely enhanced image, and the obtained processed image of an S, V channel is synthesized with an H channel to obtain an RGB image, namely, a coarsely enhanced image; and generating a virtual overexposure image of the low-illumination image, and synthesizing the virtual overexposure image, the low-illumination image and the coarse enhancement image to obtain a final virtual image.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A low-illumination image enhancement method of a self-adaptive RetinexNet under a fusion strategy is characterized by comprising the following steps:

inputting the V-channel image of the low-illumination image and the normal light image into a DecomNet to obtain the illumination and the reflectivity of the normal light image and the illumination and the reflectivity of the low-illumination image;

2. The method as claimed in claim 1, wherein the training set or the real-time low-illumination image to be enhanced is subjected to color channel conversion before the image is input into the Decombet, and the image is converted from an RGB image to an HSV image.

3. The method as claimed in claim 1, wherein the method for enhancing low-illumination images with adaptive RetinexNet under fusion strategy comprises extracting features from the convolution layer with convolution kernel of 3 × 3 for the image inputted into Decommet, sequentially mapping the extracted features with 5 convolution layers with ReLU and convolution kernel of 3 × 3, sequentially mapping one convolution layer with convolution kernel of 3 × 3 and sigmoid function to obtain an image with channel of 4, using the first 3 channels of the image as the reflectivity R of the image, and using the last channel as the illumination I of the image.

4. The method for enhancing a low-illumination image by self-adaptive RetinexNet under a fusion strategy according to claim 1, wherein the process of enhancing the illumination of the low-illumination image to obtain the enhanced illumination comprises:

in an EnhanceNet network, an input image is downsampled to different sizes by adopting three downsampling modules;

and respectively splicing the image subjected to down-sampling and the context information, and reconstructing the spliced image through up-sampling to obtain enhanced illumination.

5. The method as claimed in claim 1, wherein the DCE performs back propagation training Decombet through the loss function of Decombet, and the loss function L of Decombet ₁ By a reconstruction loss function L _recon Reflection component uniformity loss function L _ir And the structural smoothing loss function L _is The composition, expressed as:

L ₁ ＝L _recon +λ _ir L _ir +λ _is L _is ；

L _ir ＝||R _low -R _normal || ₁ ；

wherein λ is _ir Denotes the coefficient of reflectivity uniformity, λ _is Representing an illumination smoothness coefficient; low represents the low-light image dataset and normal represents the normal-light image dataset; lambda _ij Equilibrium coefficients to reconstruct losses; r _i Is the reflectance when i equals low or normal; when j is low, I _j For low-light image illumination, when j is normal, I _j Illumination that is a normal light image; when j is low, S _j Representing low-light images, when j is normal, S _j Representing a normal light image; r is _low A reflectance representing a low illuminance image; r is _normal Representing the reflectivity of a normal light image; | | non-woven hair ₁ Represents a 1-paradigm;

representing the gradient calculation; lambda _g The perceived intensity coefficient of the balanced structure; | | non-woven hair ₁ Represents a 1-norm;

representing the gradient calculation; | | non-woven hair ₂ Represents a 2-norm;

backpropagating RestorationNet through its loss function, expressed as:

wherein the content of the first and second substances,

loss function for restationnet;

is composed of

And R _h A structural similarity measure between;

reverse propagation training of EnhanceNet through a loss function of EnhanceNet, the loss function of EnhanceNetBy a reconstruction loss function L _recon And the structural smoothing loss function L _is The composition, expressed as:

L ₂ ＝L _recon +λ _is L _is ；

wherein L is ₂ As a loss function for EnhanceNet.

6. The method for enhancing the low-illumination image of the adaptive RetinexNet under the fusion strategy according to claim 1, wherein after the enhanced V-channel image is obtained, the S-channel image is adaptively adjusted, and the adjustment process is represented as follows:

s′(x,y)＝s(x,y)+t[v′(x,y)-v(x,y)]×λ(x,y)；

7. The method for enhancing a low-illumination image of an adaptive RetinexNet under a fusion strategy according to claim 1, wherein the correlation coefficient λ (x, y) of v (p, q) and s (p, q) is expressed as:

v (p, q) is the brightness of the corresponding pixel point at the position (p, q) in the neighborhood window of the pixel point (x, y), and s (p, q) is the saturation of the corresponding pixel point at the position (p, q) in the neighborhood window of the pixel point (x, y);

is the mean of the saturation of the pixel point (x, y) in the neighborhood window w; delta _v (x, y) is the variance of the luminance of the pixel point (x, y) in the neighborhood window w, δ _s (x, y) is the variance of the saturation of the pixel point (x, y) in the neighborhood window w; w is a window of n × n with the pixel point (x, y) as the center.

8. The method for enhancing the low-illumination image of the adaptive RetinexNet under the fusion strategy according to claim 1, wherein the virtual overexposure image is obtained by using a camera response model, and the method is represented as follows:

P＝f(E)；

wherein, P is an image obtained by camera imaging, namely a virtual overexposure image; e is the irradiance of the low-illumination image; f is the camera nonlinear response function.

9. The method for enhancing the low-illumination image of the adaptive RetinexNet under the fusion strategy according to claim 1, wherein the process of fusing the original low-illumination image, the rough enhanced image and the virtual overexposed image to obtain the final optimized enhanced image comprises:

carrying out row vectorization on the original low-illumination image, the rough enhanced image and the virtual overexposure image by using an image block decomposition method;

Expectation of obtaining the Block Structure of the column-vectorized image, denoted

Calculating the block structure value of the column-vectorized image by the expectation of the block structure of the column-vectorized image, expressed as:

is a weighting function expressed as

To remove the image blocks of the mean, expressed as

x _k Which represents an image block or a sub-image block,

for image block x _k Average value of (d); p is a weight parameter; s _k Is a unit length vector; k is the exposure rate; s _k A block structure for an image with an exposure k;

Expressed as:

wherein, L (. mu.) is _k ,l _k ) For imaging an image X _k Global mean value mu of _k And a current image block x _k As a weighted function of the input; l _k Representing the average intensity of pixel blocks of different exposure;

to be stacked