CN116452472A - Low-illumination image enhancement method based on semantic knowledge guidance - Google Patents
Low-illumination image enhancement method based on semantic knowledge guidance Download PDFInfo
- Publication number
- CN116452472A CN116452472A CN202310277679.8A CN202310277679A CN116452472A CN 116452472 A CN116452472 A CN 116452472A CN 202310277679 A CN202310277679 A CN 202310277679A CN 116452472 A CN116452472 A CN 116452472A
- Authority
- CN
- China
- Prior art keywords
- image
- semantic
- loss
- image enhancement
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000005286 illumination Methods 0.000 title claims abstract description 29
- 230000011218 segmentation Effects 0.000 claims description 57
- 238000012545 processing Methods 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000000638 solvent extraction Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000001174 ascending effect Effects 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008485 antagonism Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration using histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a low-illumination image enhancement method based on semantic knowledge guidance, and belongs to the technical field of low-illumination image enhancement. The semantic knowledge guided low-illumination image enhancement method can be used for focusing on the problem that the previous method is ignored through the introduction of semantic information. And the invention can be applied to any image enhancement network of encoder-decoder structure, so that the models without semantically related information learn more knowledge. The invention can pay attention to the knowledge related to the semantics from a plurality of different angles by combining the semantics guiding and embedding module with the semantics guiding color histogram loss and the semantics guiding and countering loss. The invention improves the capability of the low-illumination image enhancement network and obtains more real and natural enhancement results.
Description
Technical Field
The invention belongs to the technical field of low-illumination image enhancement, and particularly relates to a low-illumination image enhancement method based on semantic knowledge guidance.
Background
Due to unavoidable environmental and/or technical limitations, such as insufficient illumination and limited exposure time, images are often taken under suboptimal illumination conditions, disturbed by backlight, non-uniform illumination and low light. The aesthetic quality of such images can be compromised and the transmission of information is not desirable for higher-level tasks such as object tracking, identification and detection. Low-light (image brightness less than or equal to a specified value) enhancement enjoys wide application in different fields including visual monitoring, autopilot, and computed photography. In particular, smartphone photography has become popular and popular. Limited by the size of the cell phone camera aperture, the real-time processing requirements, and memory limitations, taking pictures with the camera of a smartphone in a dim environment is particularly challenging. In this application, enhancing low-light images and video is a valuable area of research. Conventional low-illumination image enhancement methods include histogram equalization-based and Retinex model-based methods, but these methods do not adapt well to diverse environments, run-times are generally long, and optimal parameters are difficult to obtain. In recent years, in combination with the progress of the deep learning technique, low-illuminance image enhancement based on deep learning has been remarkably successful.
The current low-illumination image enhancement method based on deep learning is mainly divided into two types, namely an end-to-end method and a Retinex-based method. From the most classical LLNet, researchers have proposed a variety of end-to-end approaches, including end-to-end parametric filter estimation networks, recurrent neural networks, multiple exposure fusion networks, deep pile laplace enhancement networks, and wavelet transform-based enhancement networks. In contrast to the enhancement effect learned directly in an end-to-end network, the Retinex theory, due to its physical interpretability, the Retinex theory-based deep low-illumination image enhancement method generally can obtain better effects. The first Retinex-based method, called Retinex-Net, decomposes the low-light image into a light component and a reflection component through a network, enhances the light component, and then fuses into a normal light image. Later researchers proposed KinD based on Retinex-Net, increased enhancement and denoising operations on the reflective component, and improved enhancement effects. Besides, there are KinD++, retinex and neural structure search-based enhancement networks, retinex-based deep-expansion enhancement networks, and regularized flow-based enhancement networks. Notably, these methods all tend to enhance the low-light image regardless of semantic information of its different regions, and when there are objects in the low-light image that are inherently black, such as human black hair and black vehicles, these enhancement methods typically enhance these portions to gray, resulting in color bias. To solve this problem, there is a need to make the enhancement network learn semantically relevant information, and researchers have proposed some preliminary schemes including fusing the prediction results of the semantic segmentation network into a Retinex-based network, and constraining the parameter update of the image enhancement network with the loss function of the semantic segmentation network. Both methods achieve the combination of semantic information and image information through well-designed networks and training methods, but they do not fully utilize the information that the semantic segmentation network can provide, nor do they take into account the differences between semantic information and the original image enhancement task. For the former, the difference between the semantic segmentation result and the image enhancement intermediate feature is relatively large, and the original image information is inevitably damaged during fusion; in the latter case, the two different tasks are constrained directly by loss, which affects the original optimization process of the image enhancement network parameters, and thus the final enhancement result. In summary, the existing scheme cannot well introduce semantic information into the image enhancement task, and the semantic information and the image information are interactively and carefully designed, so that not only is generalization limited, but also abnormal colors and details exist in the generated normal light image, the visual effect of the image is affected, and the effect of the subsequent image processing task is also affected.
Disclosure of Invention
The invention provides a low-illumination image enhancement method based on semantic knowledge guidance, which can be used for improving the image enhancement effect of a low-illumination image.
The invention adopts the technical scheme that:
a semantic knowledge-based low-light image enhancement method, the method comprising:
step 1, constructing an image enhancement processing network model;
the image enhancement processing network model comprises two branches, wherein one branch is a semantic segmentation network, the other branch is an image enhancement network, and N (N is more than or equal to 2) semantic embedding modules are arranged between the two branches;
the semantic segmentation network sequentially comprises: a first encoder, a first decoder and a pre-processing head, the first encoder is used for inputting image I l Feature extraction is performed on (low-illumination image) to obtain an input image I l Is a first initial feature map of (a);
the first decoder is used for decoding the first initial feature map in multiple scales to obtain deep feature maps in different scales, namely semantic segmentation features F i The scale number M of the semantic segmentation features is larger than N;
predicting semantic segmentation features F of different scales output by the head to the first decoder i Performing pixel-level semantic category prediction, and outputting an input image I l Semantic predictive diagram I of (1) seg (semantic segmentation result);
selecting N-scale semantic segmentation features of continuous scales from the M-scale deep feature graphs as one of two inputs of each semantic embedding module respectively, and sequentially defining the N semantic embedding modules as 1 st to N semantic embedding modules according to the scale ascending direction;
the image enhancement network includes: a second encoder and a second decoder, wherein the second encoder is used for inputting image I l Feature extraction is carried out on the low-illumination image of (2) to obtain an input image I l Is a second initial feature map of (2); the decoder comprises n+1 convolution blocks, each for upsampling its input to output a different oneScaled image enhancement features F i The output of the last convolution block is the input image I l Is a predictive enhanced image of (1)The input of the 1 st convolution block is a second initial feature map, the output of the 1 st convolution block is used as the other input of the 1 st semantic embedding module, and the output of any i (i=1, …, N) semantic embedding module is used as the input of the i+1st convolution block;
step 2, learning and training network parameters of the image enhancement processing network model based on the training sample, and stopping when a preset training ending condition is met, so as to obtain a trained image enhancement processing network model;
the loss function when training the image enhancement processing network model is set as follows:
wherein,,representing prediction enhanced image +.>And input image I l Is a label image I of (1) h The reconstruction loss between the two is equal to the reconstruction loss,representing semantic guided color histogram loss, i.e. prediction enhancement image +.>Histogram of (2) and label image I h L1 norm loss, lambda between histograms of (2) SCH Representing semantic guided color histogram loss->Weight of->Representing semantic guided fight loss, lambda SA Representing semantic guidance fight loss->Weights of (2);
and 3, inputting the image to be enhanced, which is matched with the input of the image enhancement processing network model, into a trained image enhancement network model, and obtaining an enhancement result of the image to be enhanced based on the output of the last convolution block of the image enhancement network.
Further, in step 2, semantic guidance counter-lossIs a global countermeasure loss->Is in charge of local countermeasures against loss>The sum is used for acquiring the global countermeasures loss +.>Against local lossesThe method comprises the following steps:
semantic prediction graph I based on semantic segmentation network output seg For prediction enhancement imagePartitioning, wherein each image block corresponds to a semantic category, and P is defined k Representing an arbitrary kth image block;
calculating local challenge losses
x f =P t ,D(P t )=min{D(P k )}
Wherein G represents a generator, i.e. an image enhancement processing network model, D represents a generator, D () represents an output of the generator, x r Representing a real image block, p real Data distribution, x, representing real image blocks f Representing false image blocks, p fake Representing the distribution of data for the dummy image blocks,representing mathematical expectations about a real image block;
input of a pre-measurement head of the semantic segmentation network is recorded as a characteristic diagram I s ′ eg Predicting enhanced images in the channel dimensionIn the characteristic diagram I s ′ eg Stitching is performed as a new dummy image block x' f And calculate global challenge loss->
Wherein,,representing mathematical expectations about new false image blocks.
Furthermore, the semantic segmentation network is a pre-trained network, and is kept unchanged in learning and training network parameters of the image enhancement processing network model, namely, only the image enhancement network and the network parameters of the N semantic embedding modules are learned and updated.
Further, the network structure of the semantic embedding module is specifically:
input semantic segmentation feature F s And image enhancement feature F i Respectively passing through a normalization layer and a convolution layer to obtain a semantic feature map and an image enhancement feature map with consistent dimensions; flattening the semantic feature map and the image enhancement feature map in the channel dimension respectively, and calculating attention force map between the two flattened feature maps through a transposed attention mechanism to obtain semantic correlation attention force map A;
adjusting image enhancement features F by semantically related attention seeking A i Obtaining the output characteristic F of the semantic embedding module o :
F o =FN(W v (F i )×A+F i ),
Wherein W is v Representing the weights of the value embedded convolutional layers, FN () represents the output of the feedforward neural network.
Further, compute semantic guided color histogram lossWhen the histogram is estimated in a conductive way, the method specifically comprises the following steps:
semantic prediction graph I based on semantic segmentation network output seg For prediction enhancement images respectivelyAnd a label image I h Partitioning, wherein each image block corresponds to a semantic category;
separately estimating prediction enhanced imagesAnd a label image I h Is a semantic guided color histogram of:
respectively carrying out category edge pixel adjustment on each color channel of each image block to obtain high and low anchor point values of the pixel gray value of each pixel of each image block;
for the same pixel gray value of the same color channel of the same semantic class, multiplying the high and low anchor point values by a preset scaling factor to be used as input of a Sigmoid activation function, obtaining the pixel number estimated value of the current pixel gray value of the current semantic class based on the accumulated value of the difference between the Sigmoid activation function values of the high and low anchor point values under the scaling factor of all pixel points, obtaining the estimated histogram of the current color channel of the current semantic class based on the pixel number estimated value of all pixel gray values, and obtaining the prediction enhancement image based on the estimated histogram of all color channelsOr label image I h Is described herein) a semantic guided color histogram.
The technical scheme provided by the invention has at least the following beneficial effects:
by introducing semantic information, the semantic knowledge guided low-illumination image enhancement method can be used for focusing on the problem that the previous method ignores. And the invention can be applied to any image enhancement network of encoder-decoder structure, so that the models without semantically related information learn more knowledge. The invention can pay attention to the knowledge related to the semantics from a plurality of different angles by combining the semantics guiding and embedding module with the semantics guiding color histogram loss and the semantics guiding and countering loss.
The semantic guidance embedding module processes at a feature level, the multi-scale features (semantic segmentation features) extracted by the semantic segmentation network correspond to the multi-scale features in the original low-illumination image enhancement network decoder, the deep coding information of the semantic segmentation is introduced into the image features through similarity calculation, and transformation is carried out in a feature characterization space, so that output optimization is realized; after the semantic segmentation prediction result is obtained, the final enhanced image output by the image enhancement network is divided according to the category, and the color histograms are estimated for different image blocks and compared with the histograms of the real label images respectively, so that more accurate color constraint is realized, the network learns the color information related to the semantic category, and the color consistency of the enhanced result is ensured. The semantic guided contrast loss is processed at the output level as well, the semantic segmentation prediction result is used again to combine the semantic guided contrast loss with the global and local contrast loss, and in the local contrast loss, the most false image block is found by comparing the output of the image block through the discriminator, so that the generator (namely the image enhancement network) focuses on the false part. In addition, in the global contrast loss, the segmentation result and the enhancement result are spliced and then input into the discriminator, so that the discriminator refers to the semantic information to give a global discrimination result, and the discriminator and the generator are constrained together with the local contrast loss, thereby improving the capability of the low-illumination image enhancement network and obtaining a more real and natural enhancement result.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of a processing procedure of a low-illumination image enhancement method based on semantic knowledge guidance according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a network structure of a semantic-guided semantic embedding module according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
The embodiment of the invention is used for solving the problems of color deviation and abnormal details of the enhanced image caused by lack of guidance of semantic information. The aim of the embodiment of the invention is that: it is confirmed which semantic information can be utilized by image enhancement, how these semantic information positively affect the low-light image enhancement task. For a semantic segmentation network, there are many multi-scale features (semantic segmentation features) of the middle layer output inside the network, which have different receptive fields and different characterization capabilities, the middle layer features of the image enhancement network can be optimized in the characterization space. Secondly, the prediction result of the semantic segmentation network can also be used as priori information to guide the image enhancement network to learn the mapping relation related to the semantics.
Because the middle layer characteristics of the semantic segmentation network and the middle layer characteristics of the image enhancement network have certain differences, if splicing or multiplication operation is directly adopted, the characteristics can be degraded, and the image enhancement effect is affected. Therefore, the method provides a semantic guidance embedding module, and the semantic features are reasonably embedded into the image enhancement features by establishing cross-modal interaction information. Secondly, for color optimization, the embodiment of the invention keeps the color consistency of the output image through the constraint of the color histogram, but the histogram is taken as a global statistical feature, and the local consistency cannot be ensured, so that the reservation capacity of the histogram for the color is limited. Therefore, the invention proposes the semantic-guided color histogram loss, and by dividing each region by means of the semantic division result, respectively calculating the histogram and calculating the loss, the output color characteristics are constrained on the semantic level. Finally, the current loss function does not represent the visual effect of the image well, and the internal structure of the image cannot be captured, resulting in a visually poor result, and in order to further improve the quality of the output image, researchers have improved the quality of the image through global and local countermeasure training, but the random selection of the local image blocks cannot fully exert the capability of local countermeasure loss. Therefore, the method of the invention provides the antagonism loss of semantic guidance, obtains the image blocks corresponding to different categories through the semantic segmentation result, finds out the most false image block as the local image block to update the parameters, thereby improving the capability of local loss and improving the quality of the final output image.
As a possible implementation manner, the specific implementation process of the low-illumination image enhancement method based on semantic knowledge guidance provided by the embodiment of the invention includes:
first, a low-illuminance image (I l ) In the image enhancement network and the semantic segmentation network, after multi-layer feature interaction, outputting image enhancementAnd semantic segmentation results (I seg ) And the color histogram loss and the countering loss are realized under the guidance of the semantic segmentation result, and the training of the image enhancement network is restrained, as shown in fig. 1.
In the present method, the low-light image enhancement problem under semantic guidance can be described as a formula,
M=F segment (I l ;θ s ),
wherein F is segment For a pre-trained semantic segmentation network, M is semantic priori information obtained from the semantic segmentation network, I l For input low-light image, θ s Parameters of the network are partitioned for semantics. The semantic segmentation network can provide rich and varied semantic prior information due to pre-training on a large-scale data set, and is called a semantic knowledge base in the embodiment. After the semantic prior information is obtained, the semantic prior information and the low-illumination image are input into an image enhancement network together:
wherein F is enhance For low-light image enhancement network, θ e For the parameters of the image enhancement network,is the output normal light image, i.e., the predicted enhanced image. In this embodiment, only parameters of the image enhancement network are updated in the training process, and the semantic segmentation network is fixed, as follows:
wherein I is h To correspond to I l As a label-constrained image enhancement network.
In order to solve the influence of the difference between semantic segmentation and image enhancement on feature fusion, reasonable interaction is established between a semantic segmentation network and an image enhancement network through the constructed semantic guidance embedding module. In this embodiment, HRNet (High-Resolution Net) is selected as the semantic knowledge base to provide semantic priori information. In HRNet, multi-scale intermediate layer features, output features, and prediction results are used as semantic information in image enhancement tasks. For better explanation, the number of semantic guidance embedding modules in the present embodiment is set to three as shown in fig. 1.
FIG. 2 shows a network structure schematic diagram of each semantic guidance semantic embedding module, the input of the module is semantic segmentation features and image enhancement features, the semantic guidance semantic embedding module is changed into feature graphs with the same dimension after convolution, layer normalization and flattening operations, attention force is calculated and information contained in the semantic segmentation features is fused into the image enhancement features, and finally optimized features are output to realize corresponding feature interaction operations. In each semantic guidance semantic embedding module, the input of the semantic guidance semantic embedding module is corresponding semantic segmentation features and image enhancement features, after the input of the semantic guidance semantic embedding module into the module, the features are preprocessed through convolution layer and layer normalization, the dimensions of the two features are transformed to be consistent, and the corresponding dimensions are expressed as H multiplied by W multiplied by C. The features are then flattened in the channel dimension, resulting in two hwxc feature maps. Based on the transposed attention mechanism, an attention map between the two feature maps is then computed and computational resources are saved, the resulting semantically related attention map a is shown below:
wherein W is k And W is q For key embedding and query embedding convolutional layers, F i And F s Is an imageEnhancement features and semantic segmentation features, C is the number of channels, softmax is the activation function. The resulting semantically related attention mapRepresents F i And F s Inter-correlation between, then use A to adjust F i The following is shown:
F o =FN(W v (F i )×A+F i ),
wherein W is v For value embedding convolution layer, FN is feedforward neural network, F o And guiding the output characteristics of the semantic embedding module for the optimized characteristics, namely the semantics. Therefore, the invention realizes the optimization of the image enhancement features through the semantic segmentation features, so that the image enhancement features pay attention to the semantic related information in the characterization subspace.
Namely W v (F i ) XA is convolved with the image enhancement feature F i And adding, namely sequentially passing through a normalization layer and a convolution layer, performing dot multiplication of a matrix, then passing through the convolution layer, and finally adding with the addition result to obtain the output characteristics of the semantic guidance semantic embedding module.
Color histograms have important image statistics and are well suited for preserving the color consistency of images. In order to achieve the purpose of optimizing the colors, the learned color histogram can be combined with the image content by adopting an affinity matrix method, but the histogram describes global statistical information and has great difference with the content, and the direct fusion can influence the recovery of detail textures. And the color features of each category are ignored when computing the global histogram, limiting the color optimization capability. Therefore, the embodiment of the invention provides semantic guidance of color histogram loss to realize local color adjustment so as to improve the color retention capacity of the image enhancement frame.
First, the embodiment of the present invention uses semantic segmentation results to segment an image into image blocks, each of which contains only one category of content. Thus, the image block generation process is as follows:
P={P 0 ,P 1 ,…,P class },
wherein, as follows, the symbol ". Ii represents a matrix dot product out Representing the output enhancement result (i.e. predictive enhanced image),/>Representing the c type prediction result, P, output by the semantic segmentation network c Representing a c-th category tile and P representing a group of tiles. Thus, image blocks of each category are obtained.
Since color histograms are discrete statistical features, embodiments of the present invention estimate the histograms in a guided manner and thus can be used for model training. Considering the error of the semantic segmentation result, pixels at the class edge are ignored during calculation, the influence of segmentation errors on training is reduced, each image block of the group P of image blocks is adjusted based on class edge pixel adjustment, and an adjusted image block group P' is obtained. For convenience of explaining the histogram estimation process, the image block P is adjusted by the R channel of the c-th class c′ (R) is exemplified:
wherein x is j Representing P c′ The j-th pixel of (R), i ε [0,255]Representing the pixel gray value.And->Representing a high anchor value and a low anchor value, respectively, as a feature of the current pixel for use in subsequent calculations, as follows:
wherein H is c Representing P c′ A derivative histogram estimate of (R),an estimated value representing the number of pixels with a gray value i, α being a scaling factor, is set to 400 in this embodiment. The two anchor point values are amplified and contracted and then output by using a Sigmoid activation function, and the output difference value is used as a pixel x j Contribution to pixel number estimate, x j The closer the distance from i, the greater the difference, when x j When the value of (1) is exactly equal to i, the difference is 1, i.e. one pixel is contributed. Finally, l is 1 The penalty is the final constraint of the estimated color histogram, and therefore, the semantically guided color histogram penalty is as follows:
wherein,,and I h Respectively representing an output image and a real label image, H c (. Cndot.) represents the histogram estimation process.
In the image complement task, global and local discriminators are used to get more realistic complement results. In the low-illumination image enhancement task, semantic information is introduced to guide the discriminator to pay attention to the region of interest. To achieve this goal, embodiments of the present invention optimize global and local countermeasures against loss by predicting the semantic graph I seg And the image block group P' introduces the calculation of the loss function, the semantic guided countermeasure is proposedLoss.
For local countermeasures against loss, the aforementioned group of tiles P' is first taken as candidate false tiles. And then inputting the candidate false image blocks into a discriminator to obtain a discriminating result (the probability of whether the candidate image blocks are label images or not), wherein the image block with the smallest output result is regarded as the most false part, and selecting the gradient obtained by the output to update parameters of the discriminator and a generator, so that the discriminator reasonably utilizes semantic prior information to find false target areas. For real image blocks, the method of random clipping is adopted to acquire the real image blocks from the data set, so the local countermeasure loss can be described as:
x f =P t ,D(P t )=min(D(P 0 ),…,D(P class ))
wherein MSE (·) represents the mean square error, P t Representing candidate false image blocks, x r Representing a real image block, x f Representing a false image block.
For global countering loss, the embodiment of the invention adopts a simple design to realize the guidance of semantic correlation. Splice I in channel dimension out (n+1th convolution block output) and I s ′ eg As a new x f Wherein I s ′ eg The output features before the function are activated for the last Softmax of the semantic segmentation network, namely the output features of the pre-measurement head of the semantic segmentation network. The true image is still sampled randomly, so the final global contrast loss can be described as:
that is, the semantic guidance fight loss of an embodiment of the present invention can be described as:
meanwhile, the embodiment of the invention also defines the original loss function of the enhanced network as(prediction enhanced image->And input image I l Is a label image I of (1) h Reconstruction loss between), in general, the loss function (reconstruction loss) of the enhanced network may be a first order differential loss, a mean square error loss, or a structural similarity loss, etc.
In summary, in an embodiment of the present invention, a loss function of semantic knowledge-guided low-light image enhancement may be described as:
wherein lambda is sCH And lambda (lambda) SA To balance the weights of the loss functions, an empirical value is used.
Based on total lossThe image enhancement processing network model is trained, and when the loss value tends to be converged and kept stable, the training is stopped, and the trained image enhancement processing network model is obtained, so that an enhancement processing result of an image to be enhanced (low-illumination image) is obtained based on the output of the trained image enhancement processing network model.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
What has been described above is merely some embodiments of the present invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.
Claims (5)
1. The low-illumination image enhancement method based on semantic knowledge guidance is characterized by comprising the following steps of:
step 1, constructing an image enhancement processing network model;
the image enhancement processing network model comprises two branches, wherein one branch is a semantic segmentation network, the other branch is an image enhancement network, and N semantic embedding modules are arranged between the two branches, wherein N is more than or equal to 2;
the semantic segmentation network sequentially comprises: a first encoder, a first decoder and a pre-processing head, the first encoder is used for inputting image I l Extracting features to obtain an input image I l Is the first initial feature map of the input image I l Is a low-illumination image;
the first decoder is used for decoding the first initial feature map in multiple scales to obtain deep feature maps in different scales, namely semantic segmentation features F i The scale number M of the semantic segmentation features is larger than N;
predicting semantic segmentation features F of different scales output by the head to the first decoder i Performing pixel-level semantic category prediction, and outputting an input image I l Semantic predictive diagram I of (1) seg ;
Selecting N-scale semantic segmentation features of continuous scales from the M-scale deep feature graphs as one of two inputs of each semantic embedding module respectively, and sequentially defining the N semantic embedding modules as 1 st to N semantic embedding modules according to the scale ascending direction;
the image enhancement network includes: a second encoder and a second decoder, wherein the second encoder is used for inputting image I l Feature extraction is carried out on the low-illumination image of (2) to obtain an input image I l Is a second initial feature map of (2); the decoder comprises n+1 convolution blocks, each convolution block is used for up-sampling the input of the convolution block and outputting image enhancement features F with different scales i The output of the last convolution block is the input image I l Is a predictive enhanced image of (1)The input of the 1 st convolution block is a second initial feature map, the output of the 1 st convolution block is used as the other input of the 1 st semantic embedding module, and the output of any i-th semantic embedding module is used as the input of the i+1st convolution block, wherein i=1, … and N;
step 2, learning and training network parameters of the image enhancement processing network model based on the training sample, and stopping when a preset training ending condition is met, so as to obtain a trained image enhancement processing network model;
the loss function when training the image enhancement processing network model is set as follows:
wherein,,representing prediction enhanced image +.>And input image I l Is a label image I of (1) h Reconstruction loss between->Representing semantic guided color histogram loss, i.e. prediction enhancement image +.>Histogram of (2) and label image I h L1 norm loss, lambda between histograms of (2) SCH Representing semantic guided color histogram loss->Weight of->Representing semantic guided fight loss, lambda SA Representing semantic guidance fight loss->Weights of (2);
and 3, inputting the image to be enhanced, which is matched with the input of the image enhancement processing network model, into a trained image enhancement network model, and obtaining an enhancement result of the image to be enhanced based on the output of the last convolution block of the image enhancement network.
2. The method of claim 1, wherein in step 2, semantic guidance counter-lossIs a global countermeasure loss->Is in charge of local countermeasures against loss>Sum and get global challenge loss by introducing discriminators during trainingIs in charge of local countermeasures against loss>The method comprises the following steps:
semantic prediction graph I based on semantic segmentation network output seg For prediction enhancement imagePartitioning, wherein each image block corresponds to a semantic category, and P is defined k Representing an arbitrary kth image block;
calculating local challenge losses
x f =P t ,D(P t )=min{D(P k )}
Wherein G represents a generator, namely an image enhancement processing network model, D represents a generator, D (-) represents the output of the generator, and x r Representing a real image block, p real Data distribution, x, representing real image blocks f Representing false image blocks, p fake Representing the distribution of data for the dummy image blocks,representing mathematical expectations about a real image block;
the input of the pre-measurement head of the semantic segmentation network is recorded as a characteristic diagram I' seg Predicting enhanced images in the channel dimensionIn the characteristic diagram I' seg Stitching is performed as a new dummy image block x' f And calculate global challenge loss->
Wherein,,representing mathematical expectations about new false image blocks.
3. The method of claim 1, wherein the semantic segmentation network is a pre-trained network that remains unchanged during learning training of network parameters of the image enhancement processing network model.
4. The method according to claim 1, wherein the network structure of the semantic embedding module is specifically:
input semantic segmentation feature F s And image enhancement feature F i Respectively passing through a normalization layer and a convolution layer to obtain a semantic feature map and an image enhancement feature map with consistent dimensions; flattening the semantic feature map and the image enhancement feature map in the channel dimension respectively, and calculating attention force map between the two flattened feature maps through a transposed attention mechanism to obtain semantic correlation attention force map A;
adjusting image enhancement features F by semantically related attention seeking A i Obtaining the output characteristic F of the semantic embedding module o :
F o =FN(W v (F i )×A+F i ),
Wherein W is v Representing the weights of the value embedded convolutional layers, FN () represents the output of the feedforward neural network.
5. The method of claim 1, wherein a semantically-guided color histogram penalty is calculatedWhen the histogram is estimated in a conductive way, the method specifically comprises the following steps:
semantic based partitioningSemantic predictive graph I of cut network output seg For prediction enhancement images respectivelyAnd a label image I h Partitioning, wherein each image block corresponds to a semantic category;
separately estimating prediction enhanced imagesAnd a label image I h Is a semantic guided color histogram of:
respectively carrying out category edge pixel adjustment on each color channel of each image block to obtain high and low anchor point values of the pixel gray value of each pixel of each image block;
for the same pixel gray value of the same color channel of the same semantic class, multiplying the high and low anchor point values by a preset scaling factor to be used as input of a Sigmoid activation function, obtaining the pixel number estimated value of the current pixel gray value of the current semantic class based on the accumulated value of the difference between the Sigmoid activation function values of the high and low anchor point values under the scaling factor of all pixel points, obtaining the estimated histogram of the current color channel of the current semantic class based on the pixel number estimated value of all pixel gray values, and obtaining the prediction enhancement image based on the estimated histogram of all color channelsOr label image I h Is described herein) a semantic guided color histogram.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310277679.8A CN116452472A (en) | 2023-03-21 | 2023-03-21 | Low-illumination image enhancement method based on semantic knowledge guidance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310277679.8A CN116452472A (en) | 2023-03-21 | 2023-03-21 | Low-illumination image enhancement method based on semantic knowledge guidance |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116452472A true CN116452472A (en) | 2023-07-18 |
Family
ID=87119327
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310277679.8A Pending CN116452472A (en) | 2023-03-21 | 2023-03-21 | Low-illumination image enhancement method based on semantic knowledge guidance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116452472A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117853348A (en) * | 2024-03-07 | 2024-04-09 | 中国石油大学(华东) | Underwater image enhancement method based on semantic perception |
-
2023
- 2023-03-21 CN CN202310277679.8A patent/CN116452472A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117853348A (en) * | 2024-03-07 | 2024-04-09 | 中国石油大学(华东) | Underwater image enhancement method based on semantic perception |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109949317B (en) | Semi-supervised image example segmentation method based on gradual confrontation learning | |
CN110363716B (en) | High-quality reconstruction method for generating confrontation network composite degraded image based on conditions | |
CN113688723B (en) | Infrared image pedestrian target detection method based on improved YOLOv5 | |
CN111950649B (en) | Attention mechanism and capsule network-based low-illumination image classification method | |
CN113313657B (en) | Unsupervised learning method and system for low-illumination image enhancement | |
CN112150493B (en) | Semantic guidance-based screen area detection method in natural scene | |
CN111292264A (en) | Image high dynamic range reconstruction method based on deep learning | |
Li et al. | Deep dehazing network with latent ensembling architecture and adversarial learning | |
CN111861925A (en) | Image rain removing method based on attention mechanism and gate control circulation unit | |
CN113781377A (en) | Infrared and visible light image fusion method based on antagonism semantic guidance and perception | |
CN113378775B (en) | Video shadow detection and elimination method based on deep learning | |
CN111401374A (en) | Model training method based on multiple tasks, character recognition method and device | |
CN110852199A (en) | Foreground extraction method based on double-frame coding and decoding model | |
CN113705490A (en) | Anomaly detection method based on reconstruction and prediction | |
CN111126155B (en) | Pedestrian re-identification method for generating countermeasure network based on semantic constraint | |
CN113205103A (en) | Lightweight tattoo detection method | |
CN116452472A (en) | Low-illumination image enhancement method based on semantic knowledge guidance | |
CN112149526A (en) | Lane line detection method and system based on long-distance information fusion | |
CN116524307A (en) | Self-supervision pre-training method based on diffusion model | |
CN117237994B (en) | Method, device and system for counting personnel and detecting behaviors in oil and gas operation area | |
CN113283320A (en) | Pedestrian re-identification method based on channel feature aggregation | |
CN117058235A (en) | Visual positioning method crossing various indoor scenes | |
Xu et al. | Adaptive brightness learning for active object recognition | |
CN114581769A (en) | Method for identifying houses under construction based on unsupervised clustering | |
CN114549340A (en) | Contrast enhancement method, computer program product, storage medium, and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |