CN115205160A - No-reference low-illumination image enhancement method based on local scene perception - Google Patents
No-reference low-illumination image enhancement method based on local scene perception Download PDFInfo
- Publication number
- CN115205160A CN115205160A CN202210960432.1A CN202210960432A CN115205160A CN 115205160 A CN115205160 A CN 115205160A CN 202210960432 A CN202210960432 A CN 202210960432A CN 115205160 A CN115205160 A CN 115205160A
- Authority
- CN
- China
- Prior art keywords
- image
- enhancement
- convolution
- illumination
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005286 illumination Methods 0.000 title claims abstract description 81
- 230000008447 perception Effects 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000006870 function Effects 0.000 claims abstract description 60
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims description 20
- 238000011176 pooling Methods 0.000 claims description 19
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 6
- 230000002776 aggregation Effects 0.000 claims description 5
- 238000004220 aggregation Methods 0.000 claims description 5
- 230000002708 enhancing effect Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000013145 classification model Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims description 2
- 239000000758 substrate Substances 0.000 claims description 2
- 230000004083 survival effect Effects 0.000 claims description 2
- 238000009499 grossing Methods 0.000 description 4
- 238000005096 rolling process Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- G06T5/94—Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
The invention relates to a no-reference low-illumination image enhancement method based on local scene perception, which comprises the following steps of: s1, acquiring low-illumination images, and preprocessing each image to obtain a training data set; s2, constructing a low-illumination image enhancement network based on local scene perception; and step S3: designing a no-reference loss function for training the network designed in the step S2; s4, training a low-illumination image enhancement network based on local scene perception by adopting a training data set based on a non-reference loss function; and S5, passing the image to be detected through a trained low-illumination image enhancement network based on local scene perception to obtain a normal-illumination image. The invention can effectively realize the enhancement of the low-illumination image.
Description
Technical Field
The invention relates to the field of image processing and computer vision, in particular to a no-reference low-illumination image enhancement method based on local scene perception.
Background
In recent years, progress and development of science and technology has significantly promoted improvement of society and people's lives. With the miniaturization of the volume of the image acquisition equipment and the high efficiency of the acquisition capacity, the image and the image system are closely related to the daily life and production development of people. The image-based processing system has wide application in life scenes, and the image system can facilitate users to visually record and observe in real time and has high convenience. But the method is limited by the environment and the using state of the acquisition equipment, and the obtained image has poor observability, which is reflected in the phenomena of motion blur, poor illumination and the like of the image. Especially, when the lighting condition is poor, the illumination is very common, and the shot image often has a plurality of dark areas, which brings difficulty to human reading or machine vision processing. Therefore, the method for designing the enhancement method of the low-illumination image has important theoretical and application significance.
Low-light image enhancement aims to improve the visibility of content to images with insufficient lighting conditions. Objects, scenes and textures in low-light images are poorly visible and distinguishable and difficult to recognize by the human eye or to use for advanced computer vision tasks. For low-light images, there are generally two typical scenes: one is to enhance the whole image, which is usually seen in poor overall lighting conditions or in incorrect setting of the exposure parameters (shutter time or white balance parameters) of the camera; the second is to enhance certain dark areas in the image, which is more common in naturally captured images.
The low-illumination image enhancement has important practical significance, can directly improve the visibility and perception of human eyes to poor-illumination images, and can also be embodied in specific computer vision tasks, such as night scenes of automatic driving, and analysis of natural images or video contents (including target detection, target tracking, human body posture detection and the like). The low-illumination image enhancement method is used for directly preprocessing the image with poor illumination, so that the algorithm designed by the specific task can directly process the image data in the dark environment, the labeling cost of the data of the specific task and the reconstruction cost of the model can be greatly reduced, and a large amount of manpower and material resources are saved.
Disclosure of Invention
In view of this, the present invention provides a method for enhancing a low-illumination image without reference based on local scene perception, so as to effectively enhance the low-illumination image.
In order to realize the purpose, the invention adopts the following technical scheme:
a no-reference low-illumination image enhancement method based on local scene perception comprises the following steps:
s1, acquiring low-illumination images, and preprocessing each image to obtain a training data set;
s2, constructing a low-illumination image enhancement network based on local scene perception;
and step S3: designing a no-reference loss function for training the network designed in the step S2;
s4, training a low-illumination image enhancement network based on local scene perception by adopting a training data set based on a no-reference loss function;
and S5, passing the image to be detected through a trained low-illumination image enhancement network based on local scene perception to obtain a normal-illumination image.
Further, the pretreatment specifically comprises: scaling each image in the dataset to a same size image of size H W;
normalizing the image to obtain image I train Calculating a normalized imageThe formula (c) is as follows:
wherein, I train Article of furnitureH x W sized image with 8 bit color depth, I bit_max Is an image of H × W size and having pixel values of 255.
Further, the low-illumination image enhancement network based on local scene perception comprises a local scene perception branch network, an enhancement branch network, an attention module, an iteration enhancement module and a denoising module.
Further, the local scene aware branching network receives as input a normalized image X of size H × W and performs a spatial 4-equal-division cropping, the cropped image being represented asThe local scene perception branch network consists of 4 perception branch networks J 1 ,J 2 ,J 3 ,J 4 And a feature conversion block T 1 ,T 2 The method comprises the following steps of using a perception branch network to extract the characteristics of a local image, using a characteristic conversion block to generate convolution parameters for image enhancement, wherein the operation of taking a spatial 4-equal-division cutting image is represented as follows:
where (p, q) is the pixel position of the input normalized image X of size H × W.
Further, the perception branch network J 1 ,J 2 ,J 3 ,J 4 All have the same network structure and their trainable parameters are shared; the perception branch network consists of 4 convolution blocks, and each convolution block consists of a convolution layer and an activation layer in sequence;
the feature conversion block T 1 Composed of convolution layer with convolution kernel size of 1 × 1, step size of 0 and filling of 0, and receiving the aggregation output characteristic of 3 rd convolution block of sensing branch network to output convolution parameter k for image enhancement 1 (ii) a Feature conversion block T 2 Is composed of convolution layers with convolution kernel size of 1 × 1, step length of 0 and filling of 0, and the number of output channels is 25% of the number of input channelsAnd receiving the aggregation output characteristic of the 4 th volume block of the perception branch network, and outputting a convolution parameter k for image enhancement 2 Expressed as follows in formula:
where Concat (. Cndot.) represents the concatenation of features in the convolution channel dimension, T 1 And T 2 A feature conversion block.
Further, the enhanced branch network comprises 2 volume blocks and 2 variable parameter volume blocks D 1 ,D 2 And an upper sampling layer, and an upper sampling layer; the convolution block consists of a convolution layer and an activation layer in sequence, the convolution layer uses convolution with the convolution kernel size of 3 x 3, the step length of 1 and the filling of 0, and the activation layer adopts a ReLU activation function;
the variable parameter convolution block D 1 ,D 2 The device comprises a variable parameter convolution layer and an active layer in sequence. Wherein the variable parameter volume block D 1 The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network 1 The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function; variable parameter convolution Block D 2 The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network 2 The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function;
the input of the up-sampling layer is from a variable parameter volume block D 2 The upsampling operation is a bilinear interpolation operation performed on each feature channel, wherein the output size of the upsampling is set to H × W, and the illumination feature of the generated image enhancement is denoted as f.
Further, the attention module includes an attention layer and a volume block;
the input to the attention layer is an enhanced branch netThe output image enhances the illumination characteristic f, and simultaneously receives the normalized image X with the size of H multiplied by W as input; converting image X into grayscale image X gary Gray scale image X gary Obtaining a smoothed grayscale image X through a Gaussian filter smooth Wherein, the Gaussian kernel size of the filter is set to be 3, and the attention layer adopts the following mode to calculate and output:
whereinIs a matrix element multiplication with a broadcast mechanism that multiplies each channel dimension in f by X smooth Multiplying matrix elements;
the volume block is composed of a volume layer and an active layer in sequence, and receives f A′ As input, the convolution layer is convolution with convolution kernel size of 3 × 3, step size of 1 and filling of 0, the active layer adopts tanh active function, and the output f of the convolution block A I.e. the illumination factor of the image enhancement output by the attention module.
Further, the enhancement module input is a normalized H X W size image X, and an image-enhanced illumination coefficient f generated by the attention module A The formula for generating the preliminarily enhanced image is as follows:
the above formula is an iterative form, n is the number of iterations, E n I.e. representing the passage f A As a result of iterating n times with the illumination coefficient for image enhancement, n =4 is set by default. For convenient representation, generate the image E after preliminary enhancement 4 Is E;
the denoising module inputs the image E after the initial enhancement and the normalized size is H multiplied by W image X, and the output of the module is the final imageImage enhancement results
Wherein,representing a luminance control loss function, λ bri A weight representing a brightness control loss function;representing a color enhancement loss function, λ col A weight representing a color enhancement loss function;representing a global consistency constraint loss function, λ glo A weight representing a global consistency constraint loss function;representing the illumination smoothing loss function, λ tv The weight representing the illumination smoothing loss function,representing the de-noising loss function, λ noi A weight representing a denoising loss function;
the image E after the preliminary enhancement is divided into N =16 × 16, that is, 256 regions of the same size, and the average brightness value in the region is recorded asE b =0.6 is a preset target luminance constant value, | | | | | luminance 1 Is an absolute value operation;
in the above formula c 1 ,c 2 Color channels representing an image, Ω = { (R, G), (R, B), (G, B) } represents a combination of different color channels, where R, G, B represent red, green, blue channels in an RGB color space, respectively;andi.e. representing c in the image E after the preliminary enhancement 1 Color channels and c 2 A color channel mean value; in a similar manner, the first and second electrodes are,i.e. representing c in the input image X 1 Color channels and c 2 A color channel mean value; | | non-woven hair 2 The operation of solving Euclidean norm; g (-) is defined as follows:
whereinRepresenting c in an arbitrary input image I 1 Color channels and c 2 Color channel mean, | | | luminance 1 Is an operation of taking the absolute value, gamma γ (. Cndot.) represents the gamma correction operation, which is calculated as follows:
whereinRepresenting c in an arbitrary input image I 1 E is an element of { R, G, B } color channel, and gamma is a preset parameter;
wherein,indicating a global maximum consistency constraint penalty,representing a global minimum consistency constraint penalty;
an input image X is divided into 16 × 16 regions, an average pixel value is calculated for each region, an image having a width and a height of 16 × 16 is obtained from the average pixel values, maximum pooling (MaxPooling) is performed, the size of the pooling operation is set to 8 × 8, and the obtained result is an image having a size of 2 × 2, where each value is recorded as an imager 1 ,r 2 ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted maxima; i O 1 Is an absolute value operation;
representing the global minimum consistency constraint penalty such that the darkest regions in the 16 × 16 partitioned space of the image vary as equally as possible, the calculation formula is as follows:
wherein, the image E after the preliminary enhancement is divided into 16 × 16 regions, an average pixel value is calculated for each region, an image with a width and a height of 16 × 16 formed by the average pixel values is obtained, then minimum pooling (MinPooling) is performed, the size of the pooling operation is set to 8 × 8, and the obtained result is an image with a size of 2 × 2, wherein each pixel value is respectively recorded as an imageSimilarly, the input image X is divided into 16 × 16 regions, and an average pixel value is calculated for each region, so as to obtain an image with a width and a height of 16 × 16, which is formed by the average pixel values, and then the image is subjected to minimum pooling, the size of the pooling operation is set to 8 × 8, and the obtained result is an image with a size of 2 × 2, wherein each value is respectively recorded as an image(r 1 ,r 2 ) Epsilon { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted minima;
wherein,illumination factor f for image enhancement output by attention module A The corresponding color channel c, c is taken from the RGB color space. | | non-woven hair 2 Is an operation to find the euclidean norm.Representing operations for finding first-order gradients, i.e. Andare respectively asFirst order difference in vertical and horizontal directions, | | | | non-woven phosphor 1 Is an absolute value operation;
wherein Φ (-) represents the operation of extracting Conv4-1 level features using the VGG-16 classification model pre-trained in ImageNet; so Φ (E) represents the classification features extracted for the preliminarily enhanced image E,representing the final image enhancement resultThe extracted classification features;representing an operation for finding a first order gradient, there areAnd whereinAndare respectively asThe first order difference in the vertical and horizontal directions,andfirst order difference of E in vertical and horizontal directions, respectively; where denotes multiplication of corresponding elements of the matrix, e is the base of the natural logarithm, and μ is a parameter that can control the perception of edge intensity.
Further, the step S4 specifically includes:
step S41: selecting a random training image X in a training data set;
step S42: inputting an image X, obtaining an image E after preliminary enhancement through a local scene perception branch network, an enhancement branch network, an attention module and an iteration enhancement module, and obtaining an image enhanced illumination coefficient f output by the attention module A And final image enhancement resultsCalculating step total loss function loss
Step S43: calculating gradients of all parameters in the local scene perception branch network, the enhancement branch network, the attention module and the iteration enhancement module by using a back propagation method, and updating the parameters by using an Adam optimization method;
and S44, the steps are one iteration of the training process, the whole training process needs preset iterations, and in each iteration process, a plurality of image pairs are randomly sampled to be used as a batch for training.
Compared with the prior art, the invention has the following beneficial effects:
1. the method aims to solve the problem that the existing low-illumination image enhancement method lacks attention to a local scene, and aims to sense the local scene and dynamically enhance an image;
2. the invention senses the local information of the image scene by designing the network and predicts the parameters for enhancement, so that the network can dynamically balance the relationship between the local scenes. The attention module in the network is used for balancing an extreme brightness area of the image, the iterative enhancement module outputs a preliminary enhancement result, the denoising module further improves the image quality, and finally the image with high quality and normal illumination can be output.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention.
Fig. 2 is a flow chart of enhancing a local scene perceived low-illumination image according to an embodiment of the present invention.
Fig. 3 is a partial scene aware branch network according to an embodiment of the present invention.
Fig. 4 is an enhanced branch network according to an embodiment of the present invention.
FIG. 5 is a diagram of an attention module according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
Referring to fig. 1, the present invention provides a method for enhancing a non-reference low-illumination image based on local scene perception, as shown in fig. 1, including the following steps:
s1, acquiring low-illumination images, and preprocessing each image to obtain a training data set;
s2, constructing a low-illumination image enhancement network based on local scene perception;
and step S3: designing a no-reference loss function for training the network designed in the step S2;
s4, training a low-illumination image enhancement network based on local scene perception by adopting a training data set based on a no-reference loss function;
and S5, passing the image to be detected through a trained low-illumination image enhancement network based on local scene perception to obtain a normal-illumination image.
In this embodiment, the preprocessing in step S1 specifically includes: scaling each image in the dataset to a same size image of size H W;
normalizing the image to obtain image I train Calculating a normalized imageThe formula of (1) is as follows:
wherein, I train Is an H × W sized image with 8-bit color depth, I bit_max The images have H × W sizes and each pixel value is 255.
In the embodiment, the low-illumination image enhancement network based on local scene perception comprises a local scene perception branch network, an enhancement branch network, an attention module, an iteration enhancement module and a denoising module.
Preferably, the local scene aware branching network receives as input a normalized image X of size H × W and performs a spatial 4-equal cropping, the cropped image being represented asThe local scene perception branch network consists of 4 perception branch networks J 1 ,J 2 ,J 3 ,J 4 And a feature conversion block T 1 ,T 2 The system comprises a perception branch network and a feature conversion block, wherein the perception branch network is used for extracting features of local images, the feature conversion block is used for generating convolution parameters for image enhancement, and the operation of obtaining a spatial 4 equal division cutting image is represented as follows:
where (p, q) is the pixel position of the input normalized image X of size H × W.
Perceptual branching network J 1 ,J 2 ,J 3 ,J 4 Their inputs are respectivelyThe perceptually-branched networks all have the same network structure and their trainable parameters are shared. The structure is as follows: each perception branch network is composed of 4 convolution blocks, and each convolution block is composed of a convolution layer and an activation layer in sequence. The convolutional layer is a convolution with a convolution kernel of size 3 × 3, step size 2, and padding 1, and the active layer uses the ReLU activation function. Dividing 4 parts of the cut imageSeparately input aware branching network J 1 ,J 2 ,J 3 ,J 4 And respectively extracting the characteristics output by the 3 rd convolution block and the 4 th convolution block for each perception branch network. Wherein the output of the 3 rd volume block is characterized byThe output of the 4 th convolution block is characterized asWhich correspond to the features of 4 parts of the image, respectively.
Feature conversion block T 1 Composed of convolution layer with convolution kernel size of 1 × 1, step size of 0 and filling of 0, and receiving the aggregation output characteristic of the 3 rd convolution block of the perception branch network to output convolution parameter k for image enhancement 1 . Feature conversion block T 2 The convolution kernel is composed of convolution layers with convolution kernel size of 1 multiplied by 1, step length of 0 and filling of 0, the output channel number is 25% of the input channel number, and the convergence output characteristic of the 4 th convolution block of the perception branch network is received, and the convolution parameter k for image enhancement is output 2 . Is formulated as follows:
where Concat (. Cndot.) represents the concatenation of features in the convolution channel dimension, T 1 And T 2 A feature conversion block.
In this embodiment, as shown in fig. 4, the enhanced branch network includes 2 volume blocks and 2 variable parameter volume blocks D 1 ,D 2 The device comprises an upper sampling layer and a lower sampling layer; the volume block is composed of volume layer and active layer in sequence, the volume layer uses volume with convolution kernel size of 3 × 3, step size of 1 and filling of 0Accumulating, wherein the activation layer adopts a ReLU activation function;
variable parameter convolution block D 1 ,D 2 The device comprises a variable parameter convolution layer and an active layer in sequence. Wherein the variable parameter volume block D 1 The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network 1 The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function; variable parameter convolution Block D 2 The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network 2 The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function;
the input to the upsampling layer is from a variable parameter volume block D 2 The upsampling operation is a bilinear interpolation operation performed on each characteristic channel, wherein the output size of the upsampling is set to be H multiplied by W, and the illumination characteristic of the generated image enhancement is recorded as f.
In this embodiment, as shown in fig. 5, the attention module accepts the image enhancement illumination feature f output by the enhancement branch network, while accepting as input the normalized image X of size H × W, where the image X is the same as the input of the enhancement branch network. The attention module consists of an attention layer and 1 convolution block.
The input of the attention layer is an image enhancement illumination characteristic f output by an enhancement branch network, and an image X with the normalized size of H multiplied by W is accepted as the input; converting image X into grayscale image X gary Gray scale image X gary Obtaining a smoothed grayscale image X through a Gaussian filter smooth Wherein the Gaussian kernel size of the filter is set to 3, and the attention layer calculates the output in the following way:
whereinIs a matrix element multiplication with a broadcast mechanism that passes each of fRoad dimension and X smooth Multiplying matrix elements;
the volume block is composed of volume layer and active layer in sequence, and receives f A′ As input, the convolution layer is convolution with convolution kernel size of 3 × 3, step size of 1 and padding of 0, the active layer adopts tanh active function, and the output f of the convolution block A I.e. the illumination factor of the image enhancement output by the attention module.
In this embodiment. The enhancement module input is a normalized H W size image X, and the image-enhanced illumination coefficient f generated by the attention module A The formula for generating the preliminarily enhanced image is as follows:
the above formula is an iterative form, n is the number of iterations, E n I.e. representing the passage f A As a result of iterating n times with the illumination coefficient for image enhancement, n =4 is set by default. For convenient representation, generate a preliminarily enhanced image E 4 Is E;
the input of the denoising module is an image E after preliminary enhancement and the normalized size is an H multiplied by W image X, and the output of the denoising module is a final image enhancement resultThe denoising module consists of 3 convolution blocks. And de-noising the rolling blocks in the module. And 3 rolling blocks in the denoising module, wherein each rolling block consists of a rolling layer and an active layer in sequence. The convolutional layer uses convolution with convolution kernel size of 3 × 3, step size of 1, and padding of 1, and the active layer uses the ReLU activation function. Outputting final image enhancement result with size H multiplied by W
Wherein,representing a luminance control loss function, λ bri A weight representing a brightness control loss function;representing the color enhancement loss function, λ col A weight representing a color enhancement loss function;representing a global consistency constraint loss function, λ glo A weight representing a global consistency constraint loss function;representing the illumination smoothing loss function, λ tv The weight representing the illumination smoothing loss function,representing the de-noising loss function, λ noi A weight representing a denoising loss function;
the image E after the preliminary enhancement is divided into N =16 × 16, that is, 256 regions of the same size, and the average brightness value in the region is recorded asE b =0.6 is a preset target brightness constant value, | | | | survival of eyes 1 Taking absolute value operation;
in the above formula c 1 ,c 2 Color channels representing an image, Ω = { (R, G), (R, B), (G, B) } represents a combination of different color channels, where R, G, B represent red, green, blue channels in the RGB color space, respectively;andi.e. representing c in the image E after the preliminary enhancement 1 Color channels and c 2 A color channel mean value; in a similar manner, the first and second substrates are,i.e. representing c in the input image X 1 Color channels and c 2 A color channel mean value; | | non-woven hair 2 The operation of solving Euclidean norm; g (-) is defined as follows:
whereinRepresenting c in an arbitrary input image I 1 Color channels and c 2 Color channel mean, | | | luminance 1 Is an operation of taking the absolute value, gamma γ (. Cndot.) represents the gamma correction operation, which is calculated as follows:
whereinRepresenting c in an arbitrary input image I 1 E { R, G, B } color channel, the general experience of the parameters is set to γ =2.2.
wherein,representing a global maximum consistency constraint penalty,representing a global minimum consistency constraint penalty;
represents the global maximum consistency constraint loss so that the brightest region in the 16 × 16 partition space of the image varies as equally as possible
Wherein the image E after the preliminary enhancement is divided into 16 × 16 regions, and an average pixel value is calculated for each region, to obtain an image having a width and a height of 16 × 16 regions formed by the average pixel values, and then maximum pooling (MaxPooling) is performed, the size of the pooling operation being set to be8 × 8, the obtained result is an image of 2 × 2 size, in which each pixel value is recorded asSimilarly, the input image X is divided into 16 × 16 regions, and an average pixel value is calculated for each region, an image having a width and a height of 16 × 16 regions is obtained from the average pixel values, and then maximum pooling (MaxPooling) is performed, the size of the pooling operation is set to 8 × 8, and the obtained result is an image having a size of 2 × 2, where each value is recorded as an image(r 1 ,r 2 ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted maxima. | | non-woven hair 1 Is an absolute value operation.
Representing the global minimum consistency constraint penalty such that the darkest regions in the 16 × 16 partitioned space of the image vary as equally as possible, the calculation formula is as follows:
wherein, the image E after the preliminary enhancement is divided into 16 × 16 areas, an average pixel value is calculated for each area, an image with a width and a height of 16 × 16, which are formed by the average pixel values, is obtained, then a minimum pooling (MinPooling) is performed, the size of the pooling operation is set to 8 × 8, and the obtained result is an image with a size of 2 × 2, wherein each pixel value is respectively recorded as an imageSimilarly, the input image X is divided into 16 × 16 regions, and an average pixel value is calculated for each region, thereby obtaining an image having a width and a height of 16 × 16 regions formed by the average pixel values, and then minimum pooling (MinPooling) is performed for poolingThe size of the operation is set to 8 × 8, and the obtained result is an image of 2 × 2, where each value is recorded as(r 1 ,r 2 ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted minima. | | non-woven hair 1 Is an absolute value operation.
wherein,illumination factor f for image enhancement output by attention module A The corresponding color channel c, c is taken from the RGB color space. | | non-woven hair 2 The euclidean norm is calculated.Representing operations for obtaining first order gradients, i.e. Andare respectively asFirst order difference in vertical and horizontal directions, | | | | calness 1 Taking absolute value operation;
wherein Φ (-) represents the operation of extracting Conv4-1 level features using the VGG-16 classification model pre-trained in ImageNet; so Φ (E) represents the classification features extracted for the preliminarily enhanced image E,representing the final image enhancement resultThe extracted classification features;representing an operation for finding a first order gradient, there areAnd whereinAndare respectively asA first order difference in the vertical and horizontal directions,andfirst order difference of E in vertical and horizontal directions; where, represents the multiplication of corresponding elements of the matrix, e is the base of the natural logarithm, and μ is a parameter that can control the perception of the edge intensity.
In this embodiment, step S4 specifically includes:
step S41: selecting a random training image X in a training data set;
step S42: inputting an image X, obtaining an image E after preliminary enhancement through a local scene perception branch network, an enhancement branch network, an attention module and an iteration enhancement module, and obtaining an image enhanced illumination coefficient f output by the attention module A And final image enhancement resultsCalculating step total loss function loss
Step S43: calculating gradients of all parameters in the local scene perception branch network, the enhancement branch network, the attention module and the iteration enhancement module by using a back propagation method, and updating the parameters by using an Adam optimization method;
and S44, the steps are one iteration of the training process, the whole training process needs preset iterations, and in each iteration process, a plurality of image pairs are randomly sampled to be used as a batch for training.
The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.
Claims (10)
1. A no-reference low-illumination image enhancement method based on local scene perception is characterized by comprising the following steps:
s1, acquiring low-illumination images, and preprocessing each image to obtain a training data set;
s2, constructing a low-illumination image enhancement network based on local scene perception;
and step S3: designing a no-reference loss function for training the network designed in the step S2;
s4, training a low-illumination image enhancement network based on local scene perception by adopting a training data set based on a no-reference loss function;
and S5, passing the image to be detected through a trained low-illumination image enhancement network based on local scene perception to obtain a normal-illumination image.
2. The local scene perception-based no-reference low-illumination image enhancement method according to claim 1, wherein the preprocessing specifically includes: scaling each image in the dataset to a same size image of size H W;
normalizing the image to obtain image I train Calculating a normalized imageThe formula of (1) is as follows:
wherein, I train Is an H × W sized image with 8-bit color depth, I bit_max Is an image of H × W size and having pixel values of 255.
3. The local scene perception-based no-reference low-illumination image enhancement method according to claim 1, wherein the local scene perception-based low-illumination image enhancement network comprises a local scene perception branch network, an enhancement branch network, an attention module, an iterative enhancement module and a denoising module.
4. The local scene perception based no-reference low-illumination image enhancement method of claim 3, whereinCharacterized in that the local scene aware branching network receives as input a normalized H X W size image X and performs a spatial 4-equal cropping, the cropped image being represented asThe local scene perception branch network consists of 4 perception branch networks J 1 ,J 2 ,J 3 ,J 4 And a feature conversion block T 1 ,T 2 The method comprises the following steps of using a perception branch network to extract the characteristics of a local image, using a characteristic conversion block to generate convolution parameters for image enhancement, wherein the operation of taking a spatial 4-equal-division cutting image is represented as follows:
where (p, q) is the pixel position of the input normalized image X of size H × W.
5. The local scene perception-based no-reference low-illumination image enhancement method according to claim 4, wherein the perception branch network J 1 ,J 2 ,J 3 ,J 4 All have the same network structure and their trainable parameters are shared; the perception branch network consists of 4 convolution blocks, and each convolution block consists of a convolution layer and an activation layer in sequence;
the characteristic conversion block T 1 Composed of convolution layer with convolution kernel size of 1 × 1, step size of 0 and filling of 0, and receiving the aggregation output characteristic of 3 rd convolution block of sensing branch network to output convolution parameter k for image enhancement 1 (ii) a Feature conversion block T 2 The method is characterized by comprising convolution layers with convolution kernel size of 1 multiplied by 1, step length of 0 and filling of 0, wherein the number of output channels is 25 percent of the number of input channels, the method receives the aggregation output characteristic of the 4 th convolution block of the perception branch network and outputs convolution parameters k for image enhancement 2 Expressed as follows in formula:
where Concat (. Cndot.) represents the concatenation of features in the convolution channel dimension, T 1 And T 2 A feature conversion block.
6. The local scene perception-based no-reference low-illumination image enhancement method according to claim 3, wherein the enhancement branch network comprises 2 volume blocks and 2 variable parameter volume blocks D 1 ,D 2 And an upper sampling layer, and an upper sampling layer; the convolution block consists of a convolution layer and an activation layer in sequence, the convolution layer uses convolution with the convolution kernel size of 3 x 3, the step length of 1 and the filling of 0, and the activation layer adopts a ReLU activation function;
the variable parameter convolution block D 1 ,D 2 The device comprises a variable parameter convolution layer and an active layer in sequence. Wherein the variable parameter volume block D 1 The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network 1 The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function; variable parameter convolution block D 2 The convolution kernel used in the convolution layer is the output k from the local scene-aware branch network 2 The convolution step length is 1, the filling is 0, and the active layer adopts a ReLU active function;
the input to the upsampling layer is from a variable parameter volume block D 2 The upsampling operation is a bilinear interpolation operation performed on each feature channel, wherein the output size of the upsampling is set to H × W, and the illumination feature of the generated image enhancement is denoted as f.
7. The local scene perception-based no-reference low-illumination image enhancement method according to claim 3, wherein the attention module comprises an attention layer and a volume block;
the input of the attention layer is an image enhancement illumination characteristic f output by an enhancement branch network, and a normalized image X with the size of H multiplied by W is accepted as the input; converting image X into grayscale image X gary Gray scale image X gary Obtaining a smoothed grayscale image X through a Gaussian filter smooth Wherein, the Gaussian kernel size of the filter is set to be 3, and the attention layer adopts the following mode to calculate and output:
whereinIs a matrix element multiplication with a broadcast mechanism that multiplies each channel dimension in f by X smooth Multiplying matrix elements;
the volume block is composed of volume layer and active layer in sequence, and receives f A' As input, the convolution layer is convolution with convolution kernel size of 3 × 3, step size of 1 and filling of 0, the active layer adopts tanh active function, and the output f of the convolution block A I.e. the illumination coefficient of the image enhancement output by the attention module.
8. The local scene perception-based no-reference low-illumination image enhancement method according to claim 3, wherein the enhancement module input is a normalized H X W image X and an image-enhanced illumination coefficient f generated by an attention module A The formula for generating the preliminary enhanced image is as follows:
the above formula is an iterative form, n is the number of iterations, E n I.e. representing the passage f A AsThe result of iterating n times with the image enhanced illumination factor is set by default to n =4. For convenient representation, generate a preliminarily enhanced image E 4 Is E;
9. The local scene perception-based no-reference low-illumination image enhancement method according to claim 1, wherein an optimization goal is to minimize a total loss function
Wherein,representing a luminance control loss function, λ bri A weight representing a brightness control loss function;representing a color enhancement loss function, λ col A weight representing a color enhancement loss function;representing a global consistency constraint loss function, λ glo A weight representing a global consistency constraint loss function;representing the smooth loss function of illumination, λ tv Representing loss of smooth illuminationThe weight of the function is such that,representing the de-noising loss function, λ noi A weight representing a denoising loss function;
the image E after the preliminary enhancement is divided into N =16 × 16, that is, 256 regions of the same size, and the average brightness value in the region is recorded asE b =0.6 is a preset target brightness constant value, | | | | survival of eyes 1 Is an absolute value operation;
in the above formula c 1 ,c 2 Color channels representing an image, Ω = { (R, G), (R, B), (G, B) } represents a combination of different color channels, where R, G, B represent red, green, blue channels in the RGB color space, respectively;andi.e. representC in the preliminarily enhanced image E 1 Color channels and c 2 A color channel mean value; in a similar manner, the first and second substrates are,i.e. representing c in the input image X 1 Color channels and c 2 A color channel mean value; | | non-woven hair 2 The operation of solving Euclidean norm; g (. Cndot.) is defined as follows:
whereinRepresenting c in an arbitrary input image I 1 Color channels and c 2 Color channel mean, | | | luminance 1 Is an operation of taking the absolute value, gamma γ (. -) represents the gamma correction operation, which is calculated as follows:
whereinRepresenting c in an arbitrary input image I 1 The method comprises the following steps that (1) a color channel is formed by a factor of { R, G, B }, and gamma is a preset parameter;
wherein,indicating a global maximum consistency constraint penalty,representing a global minimum consistency constraint penalty;
wherein, an input image X is divided into 16 × 16 regions, an average pixel value is calculated for each region, an image having a width and a height of 16 × 16 regions is obtained from the average pixel values, then maximum pooling (MaxPooling) is performed, the size of pooling is set to 8 × 8, and the obtained result is an image having a size of 2 × 2, wherein each value is respectively recorded as an image(r 1 ,r 2 ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted maxima; i O 1 Is an absolute value operation;
representing the global minimum consistency constraint penalty such that the darkest regions in the 16 × 16 partitioned space of the image vary as equally as possible, the calculation formula is as follows:
wherein, the image E after the preliminary enhancement is subjected to 16 multiplied by 1Dividing the image into 6 sections, calculating the average pixel value of each area to obtain an image with the width and height of 16 × 16 composed of the average pixel values, performing minimum pooling (MinPooling), setting the size of the pooling operation to be 8 × 8, and obtaining the result of the image with the size of 2 × 2, wherein each pixel value is recorded as the imageSimilarly, the input image X is divided into 16 × 16 regions, and an average pixel value is calculated for each region, so as to obtain an image with a width and a height of 16 × 16, which is formed by the average pixel values, and then the image is subjected to minimum pooling, the size of the pooling operation is set to 8 × 8, and the obtained result is an image with a size of 2 × 2, wherein each value is respectively recorded as an image(r 1 ,r 2 ) E { (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) } are different combinations of extracted minima;
wherein,illumination factor f for image enhancement output by attention module A The corresponding color channel c, c is taken from the RGB color space. | | non-woven hair 2 Is an operation to find the euclidean norm.Representing operations for finding first-order gradients, i.e. Andare respectively asFirst order difference in vertical and horizontal directions, | | | | non-woven phosphor 1 Is an absolute value operation;
wherein Φ (·) represents the operation of extracting the Conv4-1 layer features using the VGG-16 classification model pre-trained in ImageNet; so Φ (E) represents the classification feature extracted for the preliminarily enhanced image E,representing the final image enhancement resultThe extracted classification features;representing an operation for finding a first order gradient, hasAnd whereinAndare respectively asThe first order difference in the vertical and horizontal directions,andfirst order difference of E in vertical and horizontal directions; where, represents the multiplication of corresponding elements of the matrix, e is the base of the natural logarithm, and μ is a parameter that can control the perception of the edge intensity.
10. The method for enhancing a non-reference low-illuminance image based on local scene perception according to claim 1, wherein the step S4 is specifically:
step S41: selecting a random training image X in a training data set;
step S42: inputting an image X, obtaining an image E after preliminary enhancement through a local scene perception branch network, an enhancement branch network, an attention module and an iteration enhancement module, and obtaining an image enhanced illumination coefficient f output by the attention module A And final image enhancement resultsCalculating step total loss function loss
Step S43: calculating gradients of all parameters in the local scene perception branch network, the enhancement branch network, the attention module and the iteration enhancement module by using a back propagation method, and updating the parameters by using an Adam optimization method;
and S44, the steps are one iteration of the training process, the whole training process needs preset iterations, and in each iteration process, a plurality of image pairs are randomly sampled to be used as one batch for training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210960432.1A CN115205160A (en) | 2022-08-11 | 2022-08-11 | No-reference low-illumination image enhancement method based on local scene perception |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210960432.1A CN115205160A (en) | 2022-08-11 | 2022-08-11 | No-reference low-illumination image enhancement method based on local scene perception |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115205160A true CN115205160A (en) | 2022-10-18 |
Family
ID=83585473
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210960432.1A Pending CN115205160A (en) | 2022-08-11 | 2022-08-11 | No-reference low-illumination image enhancement method based on local scene perception |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115205160A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115512251A (en) * | 2022-11-04 | 2022-12-23 | 深圳市瓴鹰智能科技有限公司 | Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement |
CN115861380A (en) * | 2023-02-16 | 2023-03-28 | 深圳市瓴鹰智能科技有限公司 | End-to-end unmanned aerial vehicle visual target tracking method and device in foggy low-light scene |
CN116844192A (en) * | 2023-07-19 | 2023-10-03 | 滁州学院 | Enhancement processing method of low-quality fingerprint image |
-
2022
- 2022-08-11 CN CN202210960432.1A patent/CN115205160A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115512251A (en) * | 2022-11-04 | 2022-12-23 | 深圳市瓴鹰智能科技有限公司 | Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement |
CN115861380A (en) * | 2023-02-16 | 2023-03-28 | 深圳市瓴鹰智能科技有限公司 | End-to-end unmanned aerial vehicle visual target tracking method and device in foggy low-light scene |
CN116844192A (en) * | 2023-07-19 | 2023-10-03 | 滁州学院 | Enhancement processing method of low-quality fingerprint image |
CN116844192B (en) * | 2023-07-19 | 2024-04-12 | 滁州学院 | Enhancement processing method of low-quality fingerprint image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107767413B (en) | Image depth estimation method based on convolutional neural network | |
CN107123089B (en) | Remote sensing image super-resolution reconstruction method and system based on depth convolution network | |
CN115205160A (en) | No-reference low-illumination image enhancement method based on local scene perception | |
Duan et al. | Tone-mapping high dynamic range images by novel histogram adjustment | |
Panetta et al. | Tmo-net: A parameter-free tone mapping operator using generative adversarial network, and performance benchmarking on large scale hdr dataset | |
CN111105376B (en) | Single-exposure high-dynamic-range image generation method based on double-branch neural network | |
EP1745438A1 (en) | Method for determining image quality | |
CN114862698B (en) | Channel-guided real overexposure image correction method and device | |
CN111047543A (en) | Image enhancement method, device and storage medium | |
US8565513B2 (en) | Image processing method for providing depth information and image processing system using the same | |
CN115641391A (en) | Infrared image colorizing method based on dense residual error and double-flow attention | |
CN103971340A (en) | High-bit-width digital image dynamic range compression and detail enhancement method | |
CN113344773B (en) | Single picture reconstruction HDR method based on multi-level dual feedback | |
Singh et al. | Weighted least squares based detail enhanced exposure fusion | |
CN116385298A (en) | No-reference enhancement method for night image acquisition of unmanned aerial vehicle | |
CN115393225A (en) | Low-illumination image enhancement method based on multilevel feature extraction and fusion | |
CN115272072A (en) | Underwater image super-resolution method based on multi-feature image fusion | |
CN114862707A (en) | Multi-scale feature recovery image enhancement method and device and storage medium | |
CN117974459A (en) | Low-illumination image enhancement method integrating physical model and priori | |
CN111369435B (en) | Color image depth up-sampling method and system based on self-adaptive stable model | |
Lee et al. | Ramp distribution-based contrast enhancement techniques and over-contrast measure | |
CN113706400A (en) | Image correction method, image correction device, microscope image correction method, and electronic apparatus | |
CN116912114A (en) | Non-reference low-illumination image enhancement method based on high-order curve iteration | |
CN116630198A (en) | Multi-scale fusion underwater image enhancement method combining self-adaptive gamma correction | |
CN116977190A (en) | Image processing method, apparatus, device, storage medium, and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |