CN115205160A

CN115205160A - No-reference low-illumination image enhancement method based on local scene perception

Info

Publication number: CN115205160A
Application number: CN202210960432.1A
Authority: CN
Inventors: 牛玉贞; 李悦洲; 陈铭铭; 林晓锋
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-08-11
Filing date: 2022-08-11
Publication date: 2022-10-18

Abstract

The invention relates to a no-reference low-illumination image enhancement method based on local scene perception, which comprises the following steps of: s1, acquiring low-illumination images, and preprocessing each image to obtain a training data set; s2, constructing a low-illumination image enhancement network based on local scene perception; and step S3: designing a no-reference loss function for training the network designed in the step S2; s4, training a low-illumination image enhancement network based on local scene perception by adopting a training data set based on a non-reference loss function; and S5, passing the image to be detected through a trained low-illumination image enhancement network based on local scene perception to obtain a normal-illumination image. The invention can effectively realize the enhancement of the low-illumination image.

Description

No-reference low-illumination image enhancement method based on local scene perception

Technical Field

The invention relates to the field of image processing and computer vision, in particular to a no-reference low-illumination image enhancement method based on local scene perception.

Background

In recent years, progress and development of science and technology has significantly promoted improvement of society and people's lives. With the miniaturization of the volume of the image acquisition equipment and the high efficiency of the acquisition capacity, the image and the image system are closely related to the daily life and production development of people. The image-based processing system has wide application in life scenes, and the image system can facilitate users to visually record and observe in real time and has high convenience. But the method is limited by the environment and the using state of the acquisition equipment, and the obtained image has poor observability, which is reflected in the phenomena of motion blur, poor illumination and the like of the image. Especially, when the lighting condition is poor, the illumination is very common, and the shot image often has a plurality of dark areas, which brings difficulty to human reading or machine vision processing. Therefore, the method for designing the enhancement method of the low-illumination image has important theoretical and application significance.

Low-light image enhancement aims to improve the visibility of content to images with insufficient lighting conditions. Objects, scenes and textures in low-light images are poorly visible and distinguishable and difficult to recognize by the human eye or to use for advanced computer vision tasks. For low-light images, there are generally two typical scenes: one is to enhance the whole image, which is usually seen in poor overall lighting conditions or in incorrect setting of the exposure parameters (shutter time or white balance parameters) of the camera; the second is to enhance certain dark areas in the image, which is more common in naturally captured images.

The low-illumination image enhancement has important practical significance, can directly improve the visibility and perception of human eyes to poor-illumination images, and can also be embodied in specific computer vision tasks, such as night scenes of automatic driving, and analysis of natural images or video contents (including target detection, target tracking, human body posture detection and the like). The low-illumination image enhancement method is used for directly preprocessing the image with poor illumination, so that the algorithm designed by the specific task can directly process the image data in the dark environment, the labeling cost of the data of the specific task and the reconstruction cost of the model can be greatly reduced, and a large amount of manpower and material resources are saved.

Disclosure of Invention

In view of this, the present invention provides a method for enhancing a low-illumination image without reference based on local scene perception, so as to effectively enhance the low-illumination image.

In order to realize the purpose, the invention adopts the following technical scheme:

a no-reference low-illumination image enhancement method based on local scene perception comprises the following steps:

s1, acquiring low-illumination images, and preprocessing each image to obtain a training data set;

s2, constructing a low-illumination image enhancement network based on local scene perception;

and step S3: designing a no-reference loss function for training the network designed in the step S2;

s4, training a low-illumination image enhancement network based on local scene perception by adopting a training data set based on a no-reference loss function;

and S5, passing the image to be detected through a trained low-illumination image enhancement network based on local scene perception to obtain a normal-illumination image.

Further, the pretreatment specifically comprises: scaling each image in the dataset to a same size image of size H W;

normalizing the image to obtain image I _train Calculating a normalized image

The formula (c) is as follows:

wherein, I _train Article of furnitureH x W sized image with 8 bit color depth, I _{bit_max} Is an image of H × W size and having pixel values of 255.

Further, the low-illumination image enhancement network based on local scene perception comprises a local scene perception branch network, an enhancement branch network, an attention module, an iteration enhancement module and a denoising module.

Further, the local scene aware branching network receives as input a normalized image X of size H × W and performs a spatial 4-equal-division cropping, the cropped image being represented as

The local scene perception branch network consists of 4 perception branch networks J ₁ ,J ₂ ,J ₃ ,J ₄ And a feature conversion block T ₁ ,T ₂ The method comprises the following steps of using a perception branch network to extract the characteristics of a local image, using a characteristic conversion block to generate convolution parameters for image enhancement, wherein the operation of taking a spatial 4-equal-division cutting image is represented as follows:

where (p, q) is the pixel position of the input normalized image X of size H × W.

Further, the perception branch network J ₁ ,J ₂ ,J ₃ ,J ₄ All have the same network structure and their trainable parameters are shared; the perception branch network consists of 4 convolution blocks, and each convolution block consists of a convolution layer and an activation layer in sequence;

the feature conversion block T ₁ Composed of convolution layer with convolution kernel size of 1 × 1, step size of 0 and filling of 0, and receiving the aggregation output characteristic of 3 rd convolution block of sensing branch network to output convolution parameter k for image enhancement ₁ (ii) a Feature conversion block T ₂ Is composed of convolution layers with convolution kernel size of 1 × 1, step length of 0 and filling of 0, and the number of output channels is 25% of the number of input channelsAnd receiving the aggregation output characteristic of the 4 th volume block of the perception branch network, and outputting a convolution parameter k for image enhancement ₂ Expressed as follows in formula:

where Concat (. Cndot.) represents the concatenation of features in the convolution channel dimension, T ₁ And T ₂ A feature conversion block.

Further, the enhanced branch network comprises 2 volume blocks and 2 variable parameter volume blocks D ₁ ,D ₂ And an upper sampling layer, and an upper sampling layer; the convolution block consists of a convolution layer and an activation layer in sequence, the convolution layer uses convolution with the convolution kernel size of 3 x 3, the step length of 1 and the filling of 0, and the activation layer adopts a ReLU activation function;

the variable parameter convolution block D ₁ ，D ₂ The device comprises a variable parameter convolution layer and an active layer in sequence. Wherein the variable parameter volume block D ₁ The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network ₁ The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function; variable parameter convolution Block D ₂ The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network ₂ The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function;

the input of the up-sampling layer is from a variable parameter volume block D ₂ The upsampling operation is a bilinear interpolation operation performed on each feature channel, wherein the output size of the upsampling is set to H × W, and the illumination feature of the generated image enhancement is denoted as f.

Further, the attention module includes an attention layer and a volume block;

the input to the attention layer is an enhanced branch netThe output image enhances the illumination characteristic f, and simultaneously receives the normalized image X with the size of H multiplied by W as input; converting image X into grayscale image X _gary Gray scale image X _gary Obtaining a smoothed grayscale image X through a Gaussian filter _smooth Wherein, the Gaussian kernel size of the filter is set to be 3, and the attention layer adopts the following mode to calculate and output:

wherein

Is a matrix element multiplication with a broadcast mechanism that multiplies each channel dimension in f by X _smooth Multiplying matrix elements;

the volume block is composed of a volume layer and an active layer in sequence, and receives f ^A′ As input, the convolution layer is convolution with convolution kernel size of 3 × 3, step size of 1 and filling of 0, the active layer adopts tanh active function, and the output f of the convolution block ^A I.e. the illumination factor of the image enhancement output by the attention module.

Further, the enhancement module input is a normalized H X W size image X, and an image-enhanced illumination coefficient f generated by the attention module ^A The formula for generating the preliminarily enhanced image is as follows:

the above formula is an iterative form, n is the number of iterations, E _n I.e. representing the passage f ^A As a result of iterating n times with the illumination coefficient for image enhancement, n =4 is set by default. For convenient representation, generate the image E after preliminary enhancement ₄ Is E;

the denoising module inputs the image E after the initial enhancement and the normalized size is H multiplied by W image X, and the output of the module is the final imageImage enhancement results

Further, the optimization objective is to minimize the total loss function

Wherein,

representing a luminance control loss function, λ _bri A weight representing a brightness control loss function;

representing a color enhancement loss function, λ _col A weight representing a color enhancement loss function;

representing a global consistency constraint loss function, λ _glo A weight representing a global consistency constraint loss function;

representing the illumination smoothing loss function, λ _tv The weight representing the illumination smoothing loss function,

representing the de-noising loss function, λ _noi A weight representing a denoising loss function;

loss function of brightness control

The calculation formula of (c) is as follows:

the image E after the preliminary enhancement is divided into N =16 × 16, that is, 256 regions of the same size, and the average brightness value in the region is recorded as

E ^b =0.6 is a preset target luminance constant value, | | | | | luminance ₁ Is an absolute value operation;

color enhancement loss function

The calculation formula of (a) is as follows:

in the above formula c ₁ ，c ₂ Color channels representing an image, Ω = { (R, G), (R, B), (G, B) } represents a combination of different color channels, where R, G, B represent red, green, blue channels in an RGB color space, respectively;

and

i.e. representing c in the image E after the preliminary enhancement ₁ Color channels and c ₂ A color channel mean value; in a similar manner, the first and second electrodes are,

i.e. representing c in the input image X ₁ Color channels and c ₂ A color channel mean value; | | non-woven hair ² The operation of solving Euclidean norm; g (-) is defined as follows:

wherein

Representing c in an arbitrary input image I ₁ Color channels and c ₂ Color channel mean, | | | luminance ₁ Is an operation of taking the absolute value, gamma _γ (. Cndot.) represents the gamma correction operation, which is calculated as follows:

wherein

Representing c in an arbitrary input image I ₁ E is an element of { R, G, B } color channel, and gamma is a preset parameter;

global consistency constraint penalty function

The calculation formula of (a) is as follows:

wherein,

indicating a global maximum consistency constraint penalty,

representing a global minimum consistency constraint penalty;

representing the global maximum consistency constraint loss, the calculation formula is as follows:

an input image X is divided into 16 × 16 regions, an average pixel value is calculated for each region, an image having a width and a height of 16 × 16 is obtained from the average pixel values, maximum pooling (MaxPooling) is performed, the size of the pooling operation is set to 8 × 8, and the obtained result is an image having a size of 2 × 2, where each value is recorded as an image

r ₁ ,r ₂ ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted maxima; i O ₁ Is an absolute value operation;

representing the global minimum consistency constraint penalty such that the darkest regions in the 16 × 16 partitioned space of the image vary as equally as possible, the calculation formula is as follows:

wherein, the image E after the preliminary enhancement is divided into 16 × 16 regions, an average pixel value is calculated for each region, an image with a width and a height of 16 × 16 formed by the average pixel values is obtained, then minimum pooling (MinPooling) is performed, the size of the pooling operation is set to 8 × 8, and the obtained result is an image with a size of 2 × 2, wherein each pixel value is respectively recorded as an image

Similarly, the input image X is divided into 16 × 16 regions, and an average pixel value is calculated for each region, so as to obtain an image with a width and a height of 16 × 16, which is formed by the average pixel values, and then the image is subjected to minimum pooling, the size of the pooling operation is set to 8 × 8, and the obtained result is an image with a size of 2 × 2, wherein each value is respectively recorded as an image

(r ₁ ,r ₂ ) Epsilon { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted minima;

illumination smooth loss function

The calculation formula of (c) is as follows:

wherein,

illumination factor f for image enhancement output by attention module ^A The corresponding color channel c, c is taken from the RGB color space. | | non-woven hair ² Is an operation to find the euclidean norm.

Representing operations for finding first-order gradients, i.e.

And

are respectively as

First order difference in vertical and horizontal directions, | | | | non-woven phosphor ₁ Is an absolute value operation;

denoising loss function

The calculation formula of (a) is as follows:

wherein Φ (-) represents the operation of extracting Conv4-1 level features using the VGG-16 classification model pre-trained in ImageNet; so Φ (E) represents the classification features extracted for the preliminarily enhanced image E,

representing the final image enhancement result

The extracted classification features;

representing an operation for finding a first order gradient, there are

And

wherein

And

are respectively as

The first order difference in the vertical and horizontal directions,

and

first order difference of E in vertical and horizontal directions, respectively; where denotes multiplication of corresponding elements of the matrix, e is the base of the natural logarithm, and μ is a parameter that can control the perception of edge intensity.

Further, the step S4 specifically includes:

step S41: selecting a random training image X in a training data set;

step S42: inputting an image X, obtaining an image E after preliminary enhancement through a local scene perception branch network, an enhancement branch network, an attention module and an iteration enhancement module, and obtaining an image enhanced illumination coefficient f output by the attention module ^A And final image enhancement results

Calculating step total loss function loss

Step S43: calculating gradients of all parameters in the local scene perception branch network, the enhancement branch network, the attention module and the iteration enhancement module by using a back propagation method, and updating the parameters by using an Adam optimization method;

and S44, the steps are one iteration of the training process, the whole training process needs preset iterations, and in each iteration process, a plurality of image pairs are randomly sampled to be used as a batch for training.

Compared with the prior art, the invention has the following beneficial effects:

1. the method aims to solve the problem that the existing low-illumination image enhancement method lacks attention to a local scene, and aims to sense the local scene and dynamically enhance an image;

2. the invention senses the local information of the image scene by designing the network and predicts the parameters for enhancement, so that the network can dynamically balance the relationship between the local scenes. The attention module in the network is used for balancing an extreme brightness area of the image, the iterative enhancement module outputs a preliminary enhancement result, the denoising module further improves the image quality, and finally the image with high quality and normal illumination can be output.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

Fig. 2 is a flow chart of enhancing a local scene perceived low-illumination image according to an embodiment of the present invention.

Fig. 3 is a partial scene aware branch network according to an embodiment of the present invention.

Fig. 4 is an enhanced branch network according to an embodiment of the present invention.

FIG. 5 is a diagram of an attention module according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

Referring to fig. 1, the present invention provides a method for enhancing a non-reference low-illumination image based on local scene perception, as shown in fig. 1, including the following steps:

In this embodiment, the preprocessing in step S1 specifically includes: scaling each image in the dataset to a same size image of size H W;

normalizing the image to obtain image I _train Calculating a normalized image

The formula of (1) is as follows:

wherein, I _train Is an H × W sized image with 8-bit color depth, I _{bit_max} The images have H × W sizes and each pixel value is 255.

In the embodiment, the low-illumination image enhancement network based on local scene perception comprises a local scene perception branch network, an enhancement branch network, an attention module, an iteration enhancement module and a denoising module.

Preferably, the local scene aware branching network receives as input a normalized image X of size H × W and performs a spatial 4-equal cropping, the cropped image being represented as

The local scene perception branch network consists of 4 perception branch networks J ₁ ,J ₂ ,J ₃ ,J ₄ And a feature conversion block T ₁ ,T ₂ The system comprises a perception branch network and a feature conversion block, wherein the perception branch network is used for extracting features of local images, the feature conversion block is used for generating convolution parameters for image enhancement, and the operation of obtaining a spatial 4 equal division cutting image is represented as follows:

Perceptual branching network J ₁ ,J ₂ ,J ₃ ,J ₄ Their inputs are respectively

The perceptually-branched networks all have the same network structure and their trainable parameters are shared. The structure is as follows: each perception branch network is composed of 4 convolution blocks, and each convolution block is composed of a convolution layer and an activation layer in sequence. The convolutional layer is a convolution with a convolution kernel of size 3 × 3, step size 2, and padding 1, and the active layer uses the ReLU activation function. Dividing 4 parts of the cut image

Separately input aware branching network J ₁ ,J ₂ ,J ₃ ,J ₄ And respectively extracting the characteristics output by the 3 rd convolution block and the 4 th convolution block for each perception branch network. Wherein the output of the 3 rd volume block is characterized by

The output of the 4 th convolution block is characterized as

Which correspond to the features of 4 parts of the image, respectively.

Feature conversion block T ₁ Composed of convolution layer with convolution kernel size of 1 × 1, step size of 0 and filling of 0, and receiving the aggregation output characteristic of the 3 rd convolution block of the perception branch network to output convolution parameter k for image enhancement ₁ . Feature conversion block T ₂ The convolution kernel is composed of convolution layers with convolution kernel size of 1 multiplied by 1, step length of 0 and filling of 0, the output channel number is 25% of the input channel number, and the convergence output characteristic of the 4 th convolution block of the perception branch network is received, and the convolution parameter k for image enhancement is output ₂ . Is formulated as follows:

In this embodiment, as shown in fig. 4, the enhanced branch network includes 2 volume blocks and 2 variable parameter volume blocks D ₁ ，D ₂ The device comprises an upper sampling layer and a lower sampling layer; the volume block is composed of volume layer and active layer in sequence, the volume layer uses volume with convolution kernel size of 3 × 3, step size of 1 and filling of 0Accumulating, wherein the activation layer adopts a ReLU activation function;

variable parameter convolution block D ₁ ，D ₂ The device comprises a variable parameter convolution layer and an active layer in sequence. Wherein the variable parameter volume block D ₁ The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network ₁ The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function; variable parameter convolution Block D ₂ The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network ₂ The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function;

the input to the upsampling layer is from a variable parameter volume block D ₂ The upsampling operation is a bilinear interpolation operation performed on each characteristic channel, wherein the output size of the upsampling is set to be H multiplied by W, and the illumination characteristic of the generated image enhancement is recorded as f.

In this embodiment, as shown in fig. 5, the attention module accepts the image enhancement illumination feature f output by the enhancement branch network, while accepting as input the normalized image X of size H × W, where the image X is the same as the input of the enhancement branch network. The attention module consists of an attention layer and 1 convolution block.

The input of the attention layer is an image enhancement illumination characteristic f output by an enhancement branch network, and an image X with the normalized size of H multiplied by W is accepted as the input; converting image X into grayscale image X _gary Gray scale image X _gary Obtaining a smoothed grayscale image X through a Gaussian filter _smooth Wherein the Gaussian kernel size of the filter is set to 3, and the attention layer calculates the output in the following way:

wherein

Is a matrix element multiplication with a broadcast mechanism that passes each of fRoad dimension and X _smooth Multiplying matrix elements;

the volume block is composed of volume layer and active layer in sequence, and receives f ^A′ As input, the convolution layer is convolution with convolution kernel size of 3 × 3, step size of 1 and padding of 0, the active layer adopts tanh active function, and the output f of the convolution block ^A I.e. the illumination factor of the image enhancement output by the attention module.

In this embodiment. The enhancement module input is a normalized H W size image X, and the image-enhanced illumination coefficient f generated by the attention module ^A The formula for generating the preliminarily enhanced image is as follows:

the above formula is an iterative form, n is the number of iterations, E _n I.e. representing the passage f ^A As a result of iterating n times with the illumination coefficient for image enhancement, n =4 is set by default. For convenient representation, generate a preliminarily enhanced image E ₄ Is E;

the input of the denoising module is an image E after preliminary enhancement and the normalized size is an H multiplied by W image X, and the output of the denoising module is a final image enhancement result

The denoising module consists of 3 convolution blocks. And de-noising the rolling blocks in the module. And 3 rolling blocks in the denoising module, wherein each rolling block consists of a rolling layer and an active layer in sequence. The convolutional layer uses convolution with convolution kernel size of 3 × 3, step size of 1, and padding of 1, and the active layer uses the ReLU activation function. Outputting final image enhancement result with size H multiplied by W

In the present embodiment, the optimization objective is to minimize the total loss function

Wherein,

representing the color enhancement loss function, λ _col A weight representing a color enhancement loss function;

loss function of brightness control

The calculation formula of (a) is as follows:

E ^b =0.6 is a preset target brightness constant value, | | | | survival of eyes ₁ Taking absolute value operation;

color enhancement loss function

The calculation formula of (c) is as follows:

in the above formula c ₁ ，c ₂ Color channels representing an image, Ω = { (R, G), (R, B), (G, B) } represents a combination of different color channels, where R, G, B represent red, green, blue channels in the RGB color space, respectively;

and

i.e. representing c in the image E after the preliminary enhancement ₁ Color channels and c ₂ A color channel mean value; in a similar manner, the first and second substrates are,

wherein

wherein

Representing c in an arbitrary input image I ₁ E { R, G, B } color channel, the general experience of the parameters is set to γ =2.2.

Global consistency constraint penalty function

The calculation formula of (a) is as follows:

wherein,

representing a global maximum consistency constraint penalty,

representing a global minimum consistency constraint penalty;

represents the global maximum consistency constraint loss so that the brightest region in the 16 × 16 partition space of the image varies as equally as possible

Wherein the image E after the preliminary enhancement is divided into 16 × 16 regions, and an average pixel value is calculated for each region, to obtain an image having a width and a height of 16 × 16 regions formed by the average pixel values, and then maximum pooling (MaxPooling) is performed, the size of the pooling operation being set to be8 × 8, the obtained result is an image of 2 × 2 size, in which each pixel value is recorded as

Similarly, the input image X is divided into 16 × 16 regions, and an average pixel value is calculated for each region, an image having a width and a height of 16 × 16 regions is obtained from the average pixel values, and then maximum pooling (MaxPooling) is performed, the size of the pooling operation is set to 8 × 8, and the obtained result is an image having a size of 2 × 2, where each value is recorded as an image

(r ₁ ，r ₂ ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted maxima. | | non-woven hair ₁ Is an absolute value operation.

wherein, the image E after the preliminary enhancement is divided into 16 × 16 areas, an average pixel value is calculated for each area, an image with a width and a height of 16 × 16, which are formed by the average pixel values, is obtained, then a minimum pooling (MinPooling) is performed, the size of the pooling operation is set to 8 × 8, and the obtained result is an image with a size of 2 × 2, wherein each pixel value is respectively recorded as an image

Similarly, the input image X is divided into 16 × 16 regions, and an average pixel value is calculated for each region, thereby obtaining an image having a width and a height of 16 × 16 regions formed by the average pixel values, and then minimum pooling (MinPooling) is performed for poolingThe size of the operation is set to 8 × 8, and the obtained result is an image of 2 × 2, where each value is recorded as

(r ₁ ，r ₂ ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted minima. | | non-woven hair ₁ Is an absolute value operation.

Illumination smooth loss function

The calculation formula of (a) is as follows:

wherein,

illumination factor f for image enhancement output by attention module ^A The corresponding color channel c, c is taken from the RGB color space. | | non-woven hair ² The euclidean norm is calculated.

Representing operations for obtaining first order gradients, i.e.

And

are respectively as

First order difference in vertical and horizontal directions, | | | | calness ₁ Taking absolute value operation;

denoising loss function

The calculation formula of (a) is as follows:

representing the final image enhancement result

The extracted classification features;

representing an operation for finding a first order gradient, there are

And

wherein

And

are respectively as

A first order difference in the vertical and horizontal directions,

and

first order difference of E in vertical and horizontal directions; where, represents the multiplication of corresponding elements of the matrix, e is the base of the natural logarithm, and μ is a parameter that can control the perception of the edge intensity.

In this embodiment, step S4 specifically includes:

step S41: selecting a random training image X in a training data set;

Calculating step total loss function loss

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A no-reference low-illumination image enhancement method based on local scene perception is characterized by comprising the following steps:

2. The local scene perception-based no-reference low-illumination image enhancement method according to claim 1, wherein the preprocessing specifically includes: scaling each image in the dataset to a same size image of size H W;

normalizing the image to obtain image I _train Calculating a normalized image

The formula of (1) is as follows:

wherein, I _train Is an H × W sized image with 8-bit color depth, I _{bit_max} Is an image of H × W size and having pixel values of 255.

3. The local scene perception-based no-reference low-illumination image enhancement method according to claim 1, wherein the local scene perception-based low-illumination image enhancement network comprises a local scene perception branch network, an enhancement branch network, an attention module, an iterative enhancement module and a denoising module.

4. The local scene perception based no-reference low-illumination image enhancement method of claim 3, whereinCharacterized in that the local scene aware branching network receives as input a normalized H X W size image X and performs a spatial 4-equal cropping, the cropped image being represented as

5. The local scene perception-based no-reference low-illumination image enhancement method according to claim 4, wherein the perception branch network J ₁ ,J ₂ ,J ₃ ,J ₄ All have the same network structure and their trainable parameters are shared; the perception branch network consists of 4 convolution blocks, and each convolution block consists of a convolution layer and an activation layer in sequence;

the characteristic conversion block T ₁ Composed of convolution layer with convolution kernel size of 1 × 1, step size of 0 and filling of 0, and receiving the aggregation output characteristic of 3 rd convolution block of sensing branch network to output convolution parameter k for image enhancement ₁ (ii) a Feature conversion block T ₂ The method is characterized by comprising convolution layers with convolution kernel size of 1 multiplied by 1, step length of 0 and filling of 0, wherein the number of output channels is 25 percent of the number of input channels, the method receives the aggregation output characteristic of the 4 th convolution block of the perception branch network and outputs convolution parameters k for image enhancement ₂ Expressed as follows in formula:

6. The local scene perception-based no-reference low-illumination image enhancement method according to claim 3, wherein the enhancement branch network comprises 2 volume blocks and 2 variable parameter volume blocks D ₁ ,D ₂ And an upper sampling layer, and an upper sampling layer; the convolution block consists of a convolution layer and an activation layer in sequence, the convolution layer uses convolution with the convolution kernel size of 3 x 3, the step length of 1 and the filling of 0, and the activation layer adopts a ReLU activation function;

the variable parameter convolution block D ₁ ,D ₂ The device comprises a variable parameter convolution layer and an active layer in sequence. Wherein the variable parameter volume block D ₁ The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network ₁ The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function; variable parameter convolution block D ₂ The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network ₂ The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function;

the input to the upsampling layer is from a variable parameter volume block D ₂ The upsampling operation is a bilinear interpolation operation performed on each feature channel, wherein the output size of the upsampling is set to H × W, and the illumination feature of the generated image enhancement is denoted as f.

7. The local scene perception-based no-reference low-illumination image enhancement method according to claim 3, wherein the attention module comprises an attention layer and a volume block;

the input of the attention layer is an image enhancement illumination characteristic f output by an enhancement branch network, and a normalized image X with the size of H multiplied by W is accepted as the input; converting image X into grayscale image X _gary Gray scale image X _gary Obtaining a smoothed grayscale image X through a Gaussian filter _smooth Wherein, the Gaussian kernel size of the filter is set to be 3, and the attention layer adopts the following mode to calculate and output:

wherein

the volume block is composed of volume layer and active layer in sequence, and receives f ^A' As input, the convolution layer is convolution with convolution kernel size of 3 × 3, step size of 1 and filling of 0, the active layer adopts tanh active function, and the output f of the convolution block ^A I.e. the illumination coefficient of the image enhancement output by the attention module.

8. The local scene perception-based no-reference low-illumination image enhancement method according to claim 3, wherein the enhancement module input is a normalized H X W image X and an image-enhanced illumination coefficient f generated by an attention module ^A The formula for generating the preliminary enhanced image is as follows:

the above formula is an iterative form, n is the number of iterations, E _n I.e. representing the passage f ^A AsThe result of iterating n times with the image enhanced illumination factor is set by default to n =4. For convenient representation, generate a preliminarily enhanced image E ₄ Is E;

9. The local scene perception-based no-reference low-illumination image enhancement method according to claim 1, wherein an optimization goal is to minimize a total loss function

Wherein,

representing the smooth loss function of illumination, λ _tv Representing loss of smooth illuminationThe weight of the function is such that,

luminance control loss function

The calculation formula of (a) is as follows:

E ^b =0.6 is a preset target brightness constant value, | | | | survival of eyes ₁ Is an absolute value operation;

color enhancement loss function

The calculation formula of (a) is as follows:

in the above formula c ₁ ,c ₂ Color channels representing an image, Ω = { (R, G), (R, B), (G, B) } represents a combination of different color channels, where R, G, B represent red, green, blue channels in the RGB color space, respectively;

and

i.e. representC in the preliminarily enhanced image E ₁ Color channels and c ₂ A color channel mean value; in a similar manner, the first and second substrates are,

i.e. representing c in the input image X ₁ Color channels and c ₂ A color channel mean value; | | non-woven hair ² The operation of solving Euclidean norm; g (. Cndot.) is defined as follows:

wherein

Representing c in an arbitrary input image I ₁ Color channels and c ₂ Color channel mean, | | | luminance ₁ Is an operation of taking the absolute value, gamma _γ (. -) represents the gamma correction operation, which is calculated as follows:

wherein

Representing c in an arbitrary input image I ₁ The method comprises the following steps that (1) a color channel is formed by a factor of { R, G, B }, and gamma is a preset parameter;

global consistency constraint penalty function

The calculation formula of (a) is as follows:

wherein,

indicating a global maximum consistency constraint penalty,

representing a global minimum consistency constraint penalty;

wherein, an input image X is divided into 16 × 16 regions, an average pixel value is calculated for each region, an image having a width and a height of 16 × 16 regions is obtained from the average pixel values, then maximum pooling (MaxPooling) is performed, the size of pooling is set to 8 × 8, and the obtained result is an image having a size of 2 × 2, wherein each value is respectively recorded as an image

(r ₁ ,r ₂ ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted maxima; i O ₁ Is an absolute value operation;

wherein, the image E after the preliminary enhancement is subjected to 16 multiplied by 1Dividing the image into 6 sections, calculating the average pixel value of each area to obtain an image with the width and height of 16 × 16 composed of the average pixel values, performing minimum pooling (MinPooling), setting the size of the pooling operation to be 8 × 8, and obtaining the result of the image with the size of 2 × 2, wherein each pixel value is recorded as the image

(r ₁ ,r ₂ ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted minima;

illumination smooth loss function

The calculation formula of (c) is as follows:

wherein,

Representing operations for finding first-order gradients, i.e.

And

are respectively as

denoising loss function

The calculation formula of (a) is as follows:

wherein Φ (·) represents the operation of extracting the Conv4-1 layer features using the VGG-16 classification model pre-trained in ImageNet; so Φ (E) represents the classification feature extracted for the preliminarily enhanced image E,

representing the final image enhancement result

The extracted classification features;

representing an operation for finding a first order gradient, has

And

wherein

And

are respectively as

The first order difference in the vertical and horizontal directions,

and

10. The method for enhancing a non-reference low-illuminance image based on local scene perception according to claim 1, wherein the step S4 is specifically:

step S41: selecting a random training image X in a training data set;

Calculating step total loss function loss

and S44, the steps are one iteration of the training process, the whole training process needs preset iterations, and in each iteration process, a plurality of image pairs are randomly sampled to be used as one batch for training.