CN117152019A

CN117152019A - Low-illumination image enhancement method and system based on double-branch feature processing

Info

Publication number: CN117152019A
Application number: CN202311190311.4A
Authority: CN
Inventors: 张朝晖; 赵雅欣; 吴桐; 计姗姗; 周丙寅; 王铮
Original assignee: Hebei Normal University
Current assignee: Hebei Normal University
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-12-01

Abstract

The invention discloses a low-illumination image enhancement method and system based on double-branch feature processing, and relates to the technical field of image processing of deep learning. The invention comprises the following steps: s1, acquiring a low-illumination color image; s2, performing double-branch feature coding on the low-illumination color image to obtain a local feature coding result and a global feature coding result; s3, carrying out feature fusion on the local feature coding result and the global feature coding result; s4, carrying out feature enhancement on the feature fusion result; and S5, performing feature decoding on the feature enhancement result to obtain an enhanced output image. By adopting the technical scheme of the invention, the effective enhancement of the low-illumination color input image within the reasonable calculation cost range can be realized.

Description

Low-illumination image enhancement method and system based on double-branch feature processing

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a low-illumination image enhancement method and system based on double-branch feature processing.

Background

Due to the unavoidable limitations of external imaging environments, such as backlight, weak light, etc., the quality of the color digital image is seriously affected, resulting in a series of degradation problems of low visual effects such as low visibility, low contrast, obvious noise interference, etc. of the imaging result of the color image, the color image with the low visual effect is a low-illumination image. The low-illumination image enhancement can not only effectively improve the visual quality and the interpretability of the image, but also be beneficial to improving the performance of advanced visual tasks such as target tracking, target detection, target identification, semantic segmentation and the like.

The conventional low-illumination image enhancement method is represented by a histogram equalization method, and the histogram equalization-based method can be classified into global histogram equalization and local histogram equalization according to the difference of image areas considered in the calculation. The global histogram equalization method is used for mapping the global brightness or color histogram corresponding to the whole image into a uniform and simple mathematical distribution, and improving the contrast by changing the shape of the global histogram, however, as the global histogram equalization method does not consider the local brightness (or color) distribution of different areas in the image, for an input image with relatively complex content, the enhancement effect is not ideal, and the problems of overexposure of partial areas, local detail blurring and the like exist; the local histogram equalization method is to divide the original image into a plurality of sub-blocks according to a certain strategy in space, and equalize each sub-block to enhance the local information of the image, however, the local histogram equalization method generally cannot effectively eliminate the interference of the potential noise in the low-illumination image, and may even amplify the noise.

Compared with the conventional method, the current research mainly focuses on the deep learning-based method. The deep learning model has strong representation learning capability about image data, and not only low-level features describing image space details, but also high-level features describing semantic information can be obtained, so that the low-illumination image enhancement method based on deep learning shows more excellent performance.

In the aspect of feature extraction, the CNN-based method is good at extracting the local structural features of the image, and can effectively fuse multi-scale information, but to acquire information with more abstract and global significance about the input image, the network depth is often required to be increased or the local receptive field of convolution operation is required to be increased, which inevitably leads to higher calculation cost; the method based on the transducer is good at acquiring long-range dependency relationship of different components or sub-blocks in an image through global self-attention coding, however, since the self-attention of the transducer is often calculated on an image sub-block sequence formed after the image is spatially segmented, the learning and reasoning of a deep learning model based on a pure transducer architecture need to consume huge memory and calculation cost.

With respect to the network architecture employed for the image restoration task, the U-shaped network architecture is one of the most common, which introduces "jump connections" on an encoder-decoder basis, wherein: the encoder is composed of a series of encoding modules and is used for performing step-by-step feature encoding from low level to high level on an input image from bottom to top; the decoder is composed of a series of decoding modules, and performs progressive decoding of the feature map from top to bottom on the basis of feature coding until a predicted output image is generated at the network output end; in order to effectively restore the low-level information of the image in the decoding process, the coding result of each coding module at the encoder end needs to be output laterally by means of 'jump connection' continuously and combined with the high-level characteristics of the corresponding decoding module. The coding/decoding module of the typical U-shaped network structure mainly uses convolution operation, is more focused on local feature processing based on convolution, and is difficult to effectively utilize global information of images; although the Uformer uses pure convectors to construct the coding or decoding modules in a U-shaped network, the learning and reasoning of the model requires a huge computational cost.

Disclosure of Invention

The invention aims to solve the technical problem of providing a low-illumination image enhancement method and a system based on double-branch feature processing so as to realize effective enhancement of a low-illumination color image in a reasonable calculation cost range.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a low-illumination image enhancement method based on double-branch feature processing comprises the following steps:

s1, acquiring a low-illumination color image;

s2, performing double-branch feature coding on the low-illumination color image to obtain a local feature coding result and a global feature coding result;

s3, carrying out feature fusion on the local feature coding result and the global feature coding result;

s4, carrying out feature enhancement on the feature fusion result;

and S5, performing feature decoding on the feature enhancement result to obtain an enhanced output image.

Preferably, in step S2, a CNN feature encoding branch and a transform feature encoding branch are used to perform local feature encoding and global feature encoding of the low-luminance color image layer by layer from bottom to top, respectively.

Preferably, in step S3, the local feature encoding result and the global feature encoding result are fused by using a dual-branch feature based on a spatial attention mechanism and a channel attention mechanism, so as to obtain a fusion result of complementary spatial local context and global semantic information advantages.

Preferably, in step S4, spatial domain local feature enhancement and frequency domain global feature enhancement are performed on the feature fusion result, and the output results of the two enhancement modes are selectively fused, so as to generate a final feature enhancement result.

Preferably, in step S5, the enhancement result is feature decoded from top to bottom by using the enhancement converter, so as to obtain an enhanced output image.

The invention also provides a low-illumination image enhancement system based on double-branch feature processing, which comprises the following steps:

an image acquisition device for acquiring a low-illuminance color image;

the feature coding device is used for carrying out double-branch feature coding on the low-illumination color image to obtain a local feature coding result and a global feature coding result;

the feature fusion device is used for carrying out feature fusion on the local feature coding result and the global feature coding result;

the feature enhancement device is used for enhancing the features of the feature fusion result;

and the feature decoding device is used for carrying out feature decoding on the feature enhancement result to obtain an enhanced output image.

Preferably, the feature encoding device adopts a CNN feature encoding branch and a transform feature encoding branch, and performs local feature encoding and global feature encoding of the low-illumination color image layer by layer from bottom to top.

Preferably, the feature fusion device performs double-branch feature fusion on the local feature coding result and the global feature coding result based on a spatial attention mechanism and a channel attention mechanism to obtain a fusion result of complementary spatial local context and global semantic information advantages.

Preferably, the feature enhancement device performs selective fusion of spatial domain local feature enhancement and frequency domain global feature enhancement on the feature fusion result and output results of the two enhancement modes so as to generate a final feature enhancement result.

Preferably, the feature decoding device performs feature decoding on the feature enhancement result by using an enhancement type converter from top to bottom so as to obtain an enhanced output image.

The invention comprehensively utilizes the local and global features, low and high level features, airspace and frequency domain information of the low-illumination input image, limits the calculation cost within a reasonable range, effectively improves the visual perception or interpretability of the input image, ensures that the content hidden in the dark area of the image is clearly visible, and finally obtains the output result with high quality.

Drawings

FIG. 1 is a diagram of a conventional U-shaped network architecture;

FIG. 2 is a schematic diagram of a network model of a low-illumination image enhancement method based on dual-branch feature processing constructed in an embodiment of the present invention;

fig. 3 is a schematic implementation flow diagram of a low-illumination image enhancement method based on dual-branch feature processing according to an embodiment of the present invention;

FIG. 4 is a frequency domain Fourier residual sub-block of the present invention;

fig. 5 is a diagram showing low-luminance image enhancement results based on the dual-branch feature processing in the embodiment.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Example 1:

the traditional U-shaped network structure can effectively integrate multi-scale characteristics and simultaneously utilize low-level and high-level characteristic information, which is beneficial to enhancing low-illumination images. However, the interaction relationship between the feature information also affects the quality of the low-illumination image enhancement, and it is crucial to achieve a satisfactory low-illumination enhancement result while taking the advantages of the local feature information and the global feature information into account. Based on the problem, a low-illumination image enhancement method based on double-branch feature processing in the embodiment of the invention is designed.

Fig. 1 is a conventional U-shaped network structure, the overall design of which is based on an encoder-decoder structure, and an intermediate "jumper connection portion" is used to prevent low-level feature information loss in the decoding process, but the path of single-branch feature coding based on "convolution" in the encoder cannot be used to combine local feature coding and global feature coding, so that the enhancement result obtained by top-down decoding at the decoder end is often not satisfactory.

Referring to the basic structural characteristics of the U-shaped network shown in fig. 1, in the embodiment of the present invention, a network structure schematic diagram of the low-illuminance image enhancement method based on the dual-branch feature processing shown in fig. 2 is constructed, where: the left part of the network structure is an encoder, a double-branch feature coding module consisting of a CNN feature coding branch and a Transformer feature coding branch forms each feature coding layer, so that local feature coding and global feature coding which are about the bottom up of an input image are realized, a double-branch feature fusion module is adopted to perform feature fusion on the global feature coding result and the local feature coding result of the coding module, the obtained fusion result is laterally output, and input is provided for a 'jump-joint part' positioned in the middle area of the network structure; the dual-branch characteristic enhancement module positioned at the 'jump-joint part' receives the lateral output of the dual-branch characteristic fusion module, and adopts a space-frequency combination mode to perform characteristic enhancement, thereby providing richer characteristic supplement for a decoder positioned at the right side of the network structure; the decoder forms each decoding layer by a characteristic decoding module based on an enhancement type converter, and gradually decodes the characteristic enhancement result from top to bottom to finally obtain a high-quality output image with effectively enhanced visual effect at an output end.

As shown in fig. 3, the low-illumination image enhancement method based on the dual-branch feature processing includes the following steps:

s1, constructing a pairing data set and inputting a low-illumination color image

Pairing numberThe dataset is composed of a series of image pairs, each pair being composed of a low-intensity color image X captured from a real scene _low And a normal illumination image X corresponding to the same scene _norm Composition is prepared. Selecting LOL data set composed of 500 image pairs as paired data set, wherein 485 image pairs form training set, 15 image pairs form test set, and uniformly scaling images of all image pairs to 256×256, each low-illumination image X _low Is the input image, and the corresponding normal illumination image X matched with the input image _norm The true image Y.

S2, performing double-branch feature coding on the low-illumination color image through a double-branch feature coding module to obtain a local feature coding result and a global feature coding result

On the left side of the network structure as shown in fig. 2 is an encoder composed of 4 dual-branch feature encoding modules organized in layers, each dual-branch feature encoding module composed of a CNN-based feature encoding branch and a transform-based feature encoding branch. The specific operation of feature coding by using the dual-branch feature coding module is as follows:

s201, adopting a feature coding branch based on CNN to perform local feature coding on an input feature diagram of the double-branch coding module. The characteristic coding branch firstly sends the received multi-channel input characteristic diagram into the 1 st residual block participating in dense connection, each residual block participating in dense connection carries out local characteristic coding on the received multi-channel characteristic diagram, the coding result is combined with the coding result of other residual blocks participating in dense connection in the front of the residual block in a mode of adding pixel values at corresponding channels and corresponding positions, the combined result is used as the input characteristic diagram of the subsequent residual block, finally the multi-channel input characteristic diagram of the current characteristic coding branch and the output characteristic diagrams of all the residual blocks participating in dense connection in the branch are fused by means of channel connection and cross-channel fusion of 1X 1 convolution, and the fusion result is connected with the multi-channel input characteristic diagram of the current characteristic coding branch in a residual way to be used as the local characteristic coding result of the current CNN-based characteristic coding branch relative to the input characteristic diagram.

S202, global feature coding is carried out on the input feature map of the coding module by adopting a feature coding branch based on a transducer. The method comprises the steps that a transducer module used by a branch adopts a layering mode to carry out attention coding, a recursive quadtree mode is used for realizing layering space division of a feature map, a layer where leaf nodes of the quadtree are located is used as the bottommost layer, a bottom-up layer-by-layer attention coding mode is adopted for carrying out feature coding, firstly, multi-head self-attention coding is carried out on each fine molecular block in an image area where each leaf node is located, a higher-level layer is based on a son node self-attention calculation result of a lower level of the fine molecular block, the encoding of the son node forms a token sequence to carry out multi-head self-attention calculation of a higher sub-layer, and finally, the multi-head self-attention feature coding which gradually progresses from the lower level to the higher level and from the local to the global layer is adopted, so that the calculation complexity is reduced while global information of an input feature map is effectively captured.

S203, different double-branch feature coding modules form an encoder according to tissue, and perform local feature coding and global feature coding on the low-illumination input image layer by layer from bottom to top, wherein each double-branch feature coding module forms a feature coding layer of the encoder, and each feature coding layer generates two groups of multi-channel feature graphs with the same type, namely a local feature coding result and a global feature coding result. The encoder with the network structure shown in fig. 2 includes 4 feature coding layers, wherein the dual-branch feature coding module of layer 1 directly receives a low-illumination input image with a size of h×w×3, and after the output feature map of the dual-branch feature coding module of layer i (i=1, 2, 3) is spatially downsampled, the output feature map of the dual-branch feature coding module of layer i+1st is sent as input to the dual-branch feature coding modules of layers 1 through 4, and the size of the output feature map of each feature coding branch of the dual-branch feature coding modules of each layer is h×w×64, (H/4) × (W/4) ×128, (H/8) × (W/8) ×256, (H/16) × (W/16) ×512 in sequence.

S3, carrying out feature fusion on the local feature coding result and the global feature coding result through a double-branch feature fusion module

A dual-branch feature fusion module is arranged behind each dual-branch feature encoding module of the network structure shown in fig. 2, and the specific operation of performing feature fusion on the feature encoding result output by the dual-branch feature encoding module by using the dual-branch feature fusion module is as follows:

s301, using a spatial attention mechanism to initially enhance the spatial local feature of the local feature coding result from the dual-branch feature coding module;

s302, using a channel attention mechanism to initially enhance the global feature of the channel on the global feature coding result from the dual-branch feature coding module;

s303, multiplying the local feature coding result of the dual-branch feature coding module by the pixel value of the corresponding channel and the corresponding position of the global feature coding result, connecting the local feature coding result with the channel based on the feature map after the initial enhancement of the two attentions, and finally obtaining the output result of the current dual-branch feature fusion module by means of 3X 3 space convolution to realize the complementary advantages of the space local context and the global semantic information;

s304, the double-branch feature fusion modules are organized according to layers, and comprise 4 layers, wherein the double-branch feature fusion modules positioned on the ith layer (i=1, 2,3, 4) take the output of the double-branch feature coding modules on the same layer as the input, and the sizes of output feature graphs generated by the double-branch feature fusion modules from the 1 st layer to the 4 th layer are respectively as follows: h X W X64, (H/4) X (W/4) X128, (H/8) X (W/8) X256, (H/16) X (W/16) X512.

S4, carrying out spatial domain local feature enhancement and frequency domain global feature enhancement on the output feature map of the double-branch feature fusion module through the space-frequency combined double-branch feature enhancement module, and selectively fusing enhancement results of the spatial domain local feature enhancement and the frequency domain global feature enhancement

In the network structure shown in fig. 2, the dual-branch feature enhancement module located at the "jumper portion" takes the output feature map generated by the dual-branch feature fusion module as input, and further enhances the features of the dual-branch feature fusion module, so as to provide richer feature supplements for feature decoding of the decoder, and the specific operations are as follows:

s401, performing spatial domain local feature enhancement on a multi-channel input feature map based on feature enhancement branches of spatial domain convolution residual sub-blocks, firstly performing spatial local feature extraction on the input feature map by adopting 3X 3 convolution connected in series, and then adding pixel values of corresponding channels and corresponding spatial positions of the local feature extraction result and the multi-channel input feature map by means of residual connection to obtain a spatial domain local feature enhancement result about the multi-channel input feature map;

s402, performing frequency domain global feature enhancement on a multi-channel input feature map based on feature enhancement branches of the frequency domain Fourier residual sub-block, wherein FIG. 4 is a specific frequency domain Fourier residual sub-block proposed by us, the sub-block firstly maps the multi-channel input feature map to a frequency domain from a space domain to a frequency domain by means of forward Fast Fourier Transform (FFT) to obtain a Fourier spectrum map related to each channel feature map, then adopts a 1X 1 convolution mode to realize cross-channel fusion of frequency components at the same frequency point in different channel Fourier spectrum maps in the frequency domain, further realizes output of a frequency domain processing result by Inverse Fast Fourier Transform (IFFT) of the channel-by-channel Fourier spectrum map on the fused result, and finally performs residual connection on the processing result and the input feature map of the Fourier residual sub-block to obtain a frequency domain global feature enhancement result related to the multi-channel input feature map;

s403, selectively fusing a spatial domain local feature enhancement result and a frequency domain global feature enhancement result by adopting a selective fusion branch to realize the advantage complementation of the two enhancement results, firstly synchronously receiving the result of the spatial domain local feature enhancement and the result of the spatial domain local feature enhancement, respectively carrying out channel-by-channel global pooling to obtain the global feature of each channel feature map, then carrying out SoftMax calculation of the corresponding channel on the global feature of the two enhancement results to generate selection vectors reflecting the selective attention to different degrees of the two enhancement results, finally applying the selection vector of each channel to the corresponding channel feature map of the two enhancement results, carrying out selective fusion by means of weighting and summing pixel values by pixel value by corresponding spatial positions of the corresponding channels to realize the advantage complementation of the two enhancement results, finally obtaining the dual-branch feature enhancement result based on air-frequency combination, and ensuring the richness of feature information.

S404, feature fusion results generated by the double-branch feature fusion modules of each layer are sent to the double-branch feature enhancement modules of the same layer to carry out feature enhancement, the double-branch feature enhancement modules enhance feature graphs output by the double-branch feature fusion modules of the 1 st layer to the 4 th layer, and the sizes of the generated enhanced feature graphs are respectively as follows: h X W X64, (H/4) X (W/4) X128, (H/8) X (W/8) X256, (H/16) X (W/16) X512.

Step S5, decoding the characteristic enhancement result step by step from top to bottom through the characteristic decoding module to obtain an enhanced high-quality output image, and comparing the output image with the true value image to obtain the prediction loss

On the right side of the network structure as shown in fig. 2 is a decoder composed of 4 feature decoding modules organized in layers, each of which constitutes a feature decoding layer. The specific operation of the feature decoding module is as follows:

s501, each feature decoding module is composed of an enhanced transducer, the enhanced transducer is generated by introducing 3×3 spatial convolution in a feedforward layer (LeFF) of ViT (Vision Transformer), and the capability of capturing local context is improved while capturing long-range dependency of different parts of an input feature map, so that the restoration of more image details in the decoding process is facilitated;

s502, different feature decoding modules form a decoder according to tissue, each feature decoding module forms one decoding layer of the decoder, 4 decoding layers are added, and an output feature map generated by each decoding layer is consistent with an input feature map of the decoding layer in size;

s503, the feature decoding module decodes step by step from top to bottom to obtain an enhanced high-quality output image, wherein the processing procedure of step by step decoding is as follows: the highest decoding layer of the decoder is the 4 th layer, firstly, the layer characteristic decoding module directly takes a multi-channel characteristic image generated by the double-branch characteristic enhancement module at the same layer as the input, decodes the multi-channel characteristic image to obtain an output characteristic image consistent with the multi-channel characteristic image, and the output of the characteristic decoding module at the ith layer (i has the values of 4, 3 and 2 in sequence) is subjected to spatial upsampling and then carries out corresponding channel and corresponding spatial position with the characteristic image laterally output by the double-branch characteristic enhancement module at the (i-1) th layerAdding pixel values, sending the obtained combined result as input to a subsequent i-1 layer characteristic decoding module for decoding, and finally compressing and adjusting the output of the decoder 1 layer characteristic decoding module into 3 channels through 1X 1 convolution channels to obtain the low-illumination image enhancement network related to the low-illumination input image X _low Is of the enhanced outcome of (2)

S504, outputting an imageComparing with the true value image Y to obtain the low illumination image enhancement network related to the low illumination input image X _low The prediction loss of the enhancement result is a mixed loss consisting of Charbonnier loss, SSIM loss, VGG perception loss and PSNR loss, and the specific calculation method is as follows:

(1) Output imageThe Charbonnier loss relative to truth image Y is calculated by:

for measuring output image +.>Pixel level content difference from truth image Y, where Y represents input image X _low The corresponding normal illumination image is true image, epsilon ² Represents a constant term with positive value, epsilon in the experiment ² Take an empirical value of 10 ^-6 ；

(2) Output imageThe SSIM penalty for the truth image Y is calculated by:

wherein,for measuring two images +.>Structural similarity between Y (Structure Similarity, SSIM),> output image +.>Structural distortion relative to truth image Y;

(3) Output imageThe VGG perceived loss relative to the truth image Y is calculated by:

wherein phi is ^k (Y)、Representing a truth image Y and an output image, respectively +.>Multi-channel obtained at kth stage VGG module of VGG16 feature extraction backbone networkFeature map->Metric output image +.>Differences from the truth image Y with respect to semantic perception of image content;

(4) Output imageThe PSNR (peak signal to noise ratio) loss with respect to the truth image Y is calculated in the following manner:

wherein MAX _Y Representing the maximum value of the pixel value of the truth image Y in the luminance channel, MAX is usually taken for the luminance channel of 256 levels _Y ＝255，For two images +.>Y corresponds to the pixel-level mean square error between luminance channels;

(5) Output imageThe combination of Charbonnier loss, SSIM loss, VGG perceptual loss, and PSNR loss relative to the truth image Y collectively constitutes the final blending loss, calculated as:

the mixing loss is used for guiding the learning of the low-illumination image enhancement model, wherein the mixing coefficients respectively take the empirical value alpha ₁ ＝1、α ₂ ＝1、α ₃ ＝0.001、α ₄ ＝1。

In order to embody the effectiveness of the method, firstly, a LOL training set is adopted to train a low-illumination image enhancement model corresponding to the method, then a LOL testing set is used for testing, the testing result is compared with a Retinex-Net image enhancement method which is mainstream in the prior art, and performance evaluation is carried out by adopting evaluation indexes PSNR and SSIM which are commonly used in the field of image processing.

In the specific experimental process, based on a PyTorch 1.8 deep learning framework, 1 NVIDIA RTX 2080Ti GPU display card is used, the display memory is 11G, the input image size is set to 256X 256, a mini-batch gradient descent method is adopted, the size of each mini batch is uniformly set to 8, and a parameter beta is used ₁ ＝0.9、β ₂ An AdamW optimizer with a weight attenuation coefficient λ=0.999 and a weight attenuation coefficient λ=0.02 performs model learning, sets the total learning round number to 200, and sets the initial learning rate to 2×10 ^-4 Slowly adjusting the learning rate by adopting a cosine attenuation strategy, and finally reducing the learning rate to 1 multiplied by 10 ^-6 . Test result performance comparisons in the LOL test set are shown in table 1:

TABLE 1

	Average PSNR	SSIM average
			Retinex-Net	16.75	0.58
ours	19.98	0.73

In order to examine the application effectiveness of the proposed frequency domain fourier residual sub-block in the dual-branch feature enhancement module, the evaluation of the ablation experiment is performed by combining with the LOL test set, and the quantitative evaluation result of the ablation experiment is given in table 2, as can be seen from table 2, both the PSNR value and the SSIM value of the low-illumination image enhancement result can be effectively improved by using the dual-branch feature enhancement module with air-frequency combination, thereby being beneficial to obtaining a better image enhancement effect.

TABLE 2

Whether frequency domain fourier residual sub-block is used in feature enhancement module	Average PSNR	SSIM average
			Whether (i.e. local feature enhancement using airspace only)	18.88	0.71
Is (i.e. double branch feature enhancement using space-frequency combination mode)	19.98	0.73

Fig. 5 is a graph of partial experimental results obtained in the LOL test set by the method and the Retinex-Net method according to the present invention, and it can be known from the results that the low-illumination image enhancement method provided by the present invention can obtain enhancement results with more remarkable visual effects.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the embodiment of the invention, the double-branch feature coding module is constructed at the encoder end of the U-shaped network structure, and the CNN-based local feature coding branch and the transform-based global feature coding branch are adopted to perform feature coding from bottom to top, so that the capture of the richer local and global features of different levels of an input image in the coding process is facilitated.

2. The embodiment of the invention provides a double-branch feature fusion module, which adopts a spatial attention enhancement mechanism and a channel attention enhancement mechanism to fuse a local feature coding result and a global feature coding result output by a double-branch feature coding module, so that the fusion result can effectively combine spatial local context information and global semantic information of an input image.

3. The embodiment of the invention provides a new frequency domain Fourier residual sub-block which participates in forming a space-frequency combined characteristic enhancement module, performs information enhancement of a space-frequency action domain and a frequency action domain on an output characteristic diagram of a double-branch characteristic fusion module at a 'jump-connection part' of a U-shaped network, generates a final characteristic enhancement result by means of selective fusion of output results of two enhancement modes, and provides rich local and global information supplement for top-down characteristic decoding of a decoding module at a decoder end.

4. The embodiment of the invention adopts the enhancement type converter to form the feature decoding module at the decoder end, decodes the feature enhancement result step by step from top to bottom, captures the long-range dependency relationship of different parts of the input feature map, extracts local information in the feature map, is beneficial to the recovery of more image details in the decoding process, and finally obtains high-quality enhanced images at the output end of the low-illumination image enhancement network.

The embodiment of the invention has the beneficial effects that the brightness and the contrast ratio can be effectively improved aiming at the low-illumination image, and a high-quality enhancement result is obtained.

Example 2:

the embodiment of the invention also provides a low-illumination image enhancement system based on the double-branch feature processing, which comprises the following steps: an image acquisition device, a feature encoding device, a feature fusion device, a feature enhancement device and a feature decoding device.

The image acquisition device is used for constructing a low-illumination color image X _low Normal illumination image X corresponding to the same scene _norm Image pair (X) _low ,X _norm ) The paired data sets are divided to obtain training sets for low-illumination image enhancement model learning and test sets for model evaluation, wherein the low-illumination color image X _low Namely, an input image of the low-illumination image enhancement system;

the feature coding device consists of 4 feature coding layers, each feature coding layer consists of a CNN feature coding branch and a transform feature coding branch, the CNN feature coding branch and the transform feature coding branch respectively perform local feature coding and global feature coding on an input feature diagram of the feature coding layer where the CNN feature coding branch is positioned, a low-illumination color image is firstly sent to the lowest coding layer (layer number i=1) of the feature coding device to perform feature coding, and a coding result generated by the ith feature coding layer (i=1, 2, 3) is sent to the (i+1) th feature coding layer as an input feature diagram after being subjected to spatial downsampling so as to perform further feature coding, thereby realizing layer-by-layer feature coding of the low-illumination color input image from bottom to top;

the feature fusion device adopts a spatial attention and channel attention mechanism to fuse a local feature coding result and a global feature coding result output by the feature coding device so that the fusion result can effectively combine spatial local context information and global semantic information of an input image, wherein an ith (i=1, 2,3, 4) feature fusion device is used for fusing feature coding results generated by an ith feature coding layer in the feature coding device;

the feature enhancement device is used for firstly carrying out spatial domain local feature enhancement based on spatial domain convolution residual sub-block feature enhancement branches and frequency domain global feature enhancement based on frequency domain Fourier residual sub-block feature enhancement branches respectively aiming at an output feature map of the feature fusion device, generating a final feature enhancement result by means of selective fusion of output results of two enhancement modes, and providing rich local and global information supplement for feature decoding of the feature decoding device, wherein the ith (i=1, 2,3, 4) feature enhancement device is used for carrying out feature enhancement on the feature fusion result generated by the ith feature fusion device;

the feature decoding device is composed of 4 feature decoding layers, each feature decoding layer is used for carrying out feature decoding on a feature enhancement result from top to bottom, each feature decoding layer comprises an enhancement type converter, the feature decoding layer (layer number i=4 of the layer) positioned at the top of the feature decoding device firstly receives and decodes an output feature image of the 4 th feature enhancement device, the output result of the i (i is sequentially 4, 3 and 2) feature decoding layer is subjected to spatial up-sampling, then corresponds to a channel with pixel values at corresponding positions to be added with the feature image output by the i-1 feature enhancement device, then is input into the i-1 feature decoding layer to be decoded, finally, the output generated by the 1 st feature decoding layer of the feature decoding device is compressed and adjusted to be 3 channels through a channel with 1×1 convolution, namely the enhancement result finally output by the low-illumination image enhancement system, and the output image is compared with a true image, so that the prediction loss can be obtained.

The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present invention pertains are made without departing from the spirit of the present invention, and all modifications and improvements fall within the scope of the present invention as defined in the appended claims.

Claims

1. The low-illumination image enhancement method based on the double-branch feature processing is characterized by comprising the following steps of:

s1, acquiring a low-illumination color image;

s4, carrying out feature enhancement on the feature fusion result;

2. The method for enhancing a low-luminance image based on dual-branch feature processing as claimed in claim 1, wherein in step S2, the local feature encoding and the global feature encoding of the low-luminance color image are performed layer by layer from bottom to top by adopting a CNN feature encoding branch and a transform feature encoding branch, respectively.

3. The method for enhancing a low-illumination image based on dual-branch feature processing as claimed in claim 2, wherein in step S3, the local feature encoding result and the global feature encoding result are fused by dual-branch features based on a spatial attention mechanism and a channel attention mechanism, so as to obtain a fusion result of complementary spatial local context and global semantic information advantages.

4. The method for enhancing a low-illuminance image based on dual-branch feature processing according to claim 3, wherein in step S4, spatial domain local feature enhancement and frequency domain global feature enhancement are performed on the feature fusion result, and the output results of the two enhancement modes are selectively fused to generate a final feature enhancement result.

5. The method of claim 4, wherein in step S5, the enhancement result is feature decoded from top to bottom by using an enhancement converter to obtain an enhanced output image.

6. A low-light image enhancement system based on dual-branch feature processing, comprising the steps of:

an image acquisition device for acquiring a low-illuminance color image;

7. The system of claim 6, wherein the feature encoding device uses a CNN feature encoding branch and a transform feature encoding branch, respectively, to perform the local feature encoding and the global feature encoding of the low-luminance color image layer by layer from bottom to top.

8. The system for enhancing a low-illumination image based on dual-branch feature processing as claimed in claim 7, wherein the feature fusion means performs dual-branch feature fusion on the local feature encoding result and the global feature encoding result based on a spatial attention mechanism and a channel attention mechanism, so as to obtain a fusion result of complementary spatial local context and global semantic information advantages.

9. The system of claim 8, wherein the feature enhancement device performs spatial domain local feature enhancement and frequency domain global feature enhancement on the feature fusion result and selectively fuses the output results of the two enhancement modes to generate a final feature enhancement result.

10. The dual branch feature processing based low intensity image enhancement system of claim 9, wherein the feature decoding means uses an enhanced transform to perform feature decoding on the feature enhancement result from top to bottom to obtain the enhanced output image.