CN114862707A

CN114862707A - Multi-scale feature recovery image enhancement method and device and storage medium

Info

Publication number: CN114862707A
Application number: CN202210442217.2A
Authority: CN
Inventors: 朱冬; 杨易; 宋雯; 唐国梅; 杨颜; 仲元红
Original assignee: Chongqing Qiteng Technology Co ltd
Current assignee: Chongqing Qiteng Technology Co ltd
Priority date: 2022-04-25
Filing date: 2022-04-25
Publication date: 2022-08-05
Anticipated expiration: 2042-04-25
Also published as: CN114862707B

Abstract

The invention provides a multi-scale feature recovery image enhancement method, a multi-scale feature recovery image enhancement device and a storage medium. The method comprises the following steps: acquiring an image to be enhanced; inputting an image to be enhanced into a multi-scale feature recovery image enhancement model for processing to obtain an enhanced image; a Y channel enhancement module of the multi-scale feature recovery image enhancement model processes the Y channel image to obtain a Y channel enhanced image; the synthesis module is used for synthesizing the Y-channel enhanced image, the U-channel image and the V-channel image and converting the synthesized image into an RGB form; and the U-net multi-scale feature recovery module processes the composite image in the RGB form to obtain an enhanced image. The Y channel enhancement module enhances the brightness of the low-light image and recovers the original main characteristics hidden in the dark; the U-net multi-scale feature recovery module focuses on detail features of different positions in the image, so that effective feature information is further mined and recovered, and subsequent computer vision tasks are facilitated.

Description

Multi-scale feature recovery image enhancement method and device and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a multi-scale feature recovery image enhancement method and device and a storage medium.

Background

Bright images taken under evenly distributed illumination have distinct features, containing rich detailed information, and when there is insufficient ambient light or abnormal exposure problems, people may take poorly lit pictures (these pictures may be called low light pictures). These dark pictures not only affect human viewing, but also severely affect subsequent computer vision tasks. With the rapid development of computer technology, more and more people select electronic devices to store and process digital images, which makes the computer vision task very important. Low-light image enhancement is usually used as a key loop in preprocessing, and has important significance for improving the efficiency and performance of digital image processing.

Conventionally, for low-light image enhancement, there are generally image enhancement by a Histogram Equalization (HE) method, a Retinex model method, a defogging model method, and the like. The HE model has the disadvantage that the merging and stretching of the gray levels may cause color shifts and loss of detail in the image. The Retinex model affects the robustness of the image and may even cause significant color distortion in a particular scene. The defogging model-based low light enhancement method is lack of physical model support, and the brightness and contrast effects of the final enhancement image are general, and the enhancement quality needs to be improved. Furthermore, most of the methods mentioned so far only focus on the visual enjoyment of the final enhancement result to the viewer, and ignore key detail information such as image features required for subsequent computer vision tasks. The features are important for the image, the robustness of the image can be enhanced, and the efficiency of subsequent target segmentation and image identification operations can be improved.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art and provides a multi-scale feature recovery image enhancement method, a multi-scale feature recovery image enhancement device and a storage medium.

In order to achieve the above object of the present invention, according to a first aspect of the present invention, there is provided a multi-scale feature recovery image enhancement method including: acquiring an image to be enhanced; inputting the image to be enhanced into a multi-scale feature recovery image enhancement model for processing to obtain an enhanced image; the multi-scale feature recovery image enhancement model comprises a decomposition module, a Y channel enhancement module, a synthesis module and a U-net multi-scale feature recovery module; the decomposition module decomposes an image to be enhanced into a Y-channel image, a U-channel image and a V-channel image; the Y-channel enhancement module processes the Y-channel image to obtain a Y-channel enhanced image; the synthesis module is used for synthesizing the Y-channel enhanced image, the U-channel image and the V-channel image and converting the synthesized image into an RGB form; and the U-net multi-scale feature recovery module processes the composite image in the RGB form to obtain an enhanced image.

The technical scheme is as follows: the image to be enhanced is preferably, but not limited to, a low light image with a low brightness. The multi-scale features restore the image enhancement model to an "end-to-end" form. In a YUV space, Y, U, V three channels are independent from each other, a Y channel represents brightness, and the difference of the brightness of the low-light image is larger than that of a normal illumination image, so that the brightness of the low-light image is enhanced through a Y channel enhancing module and the original main characteristics hidden in the dark are recovered; the U-net multi-scale feature recovery module focuses on detail features of different positions in the image, so that effective feature information is further mined and recovered, and the multi-scale feature recovery image enhancement model can give consideration to the naturalness of global image enhancement and the accuracy of local feature recovery, and is beneficial to subsequent computer vision tasks.

In a preferred embodiment of the present invention, after the image to be enhanced is acquired, a step of performing denoising processing on the image to be enhanced is further included.

The technical scheme is as follows: and removing noise in the image to be enhanced, and reducing interference and uncertainty brought by the noise to subsequent processing and transmission.

In a preferred embodiment of the present invention, the Y-channel enhancement module includes a first convolution layer, a first active layer, a second convolution layer, a second active layer, a third convolution layer, a third active layer, a fourth convolution layer, and a fourth active layer sequentially connected to each other.

The technical scheme is as follows: and the Y-channel image enhancement processing is carried out through the four layers of convolution networks, and the structure is simple.

In a preferred embodiment of the present invention, the convolution kernel size of the first convolution layer is 9 × 9, the activation function of the first active layer is LRelu, the convolution kernel size of the second convolution layer is 1 × 1, the activation function of the second active layer is Relu, the convolution kernel sizes of the third convolution layer and the fourth convolution layer are 5 × 5, and both the activation functions of the third active layer and the fourth active layer are Relu.

In a preferred embodiment of the present invention, the U-net multi-scale feature recovery module includes a first downsampling module, a second downsampling module, a third downsampling module, a fourth upsampling module, a third upsampling module, a second upsampling module, and a first upsampling module, which are sequentially connected, and a first convolution splicing link, a second convolution splicing link, and a third convolution splicing link; the first downsampling module comprises a first convolution layer and a first maximum pooling layer which are sequentially connected, the second downsampling module comprises a second convolution layer and a second maximum pooling layer which are sequentially connected, the third downsampling module comprises a third convolution layer and a third maximum pooling layer which are sequentially connected, and the fourth downsampling module comprises a fourth convolution layer; the fourth up-sampling module comprises a fourth anti-convolution layer, the input end of the fourth anti-convolution layer is connected with the output end of the fourth convolution layer, the third up-sampling module comprises a third convolution combination layer and a third anti-convolution layer which are sequentially connected, the second up-sampling module comprises a second convolution combination layer and a second anti-convolution layer which are sequentially connected, and the first up-sampling module comprises a first convolution combination layer and a first anti-convolution layer which are sequentially connected; the first convolution splicing link is used for connecting a first convolution layer and a first convolution merging layer; the second convolution splicing link is used for connecting a second convolution layer and a second convolution merging layer; and the third convolution splicing link is used for connecting the third convolution layer and the third convolution merging layer.

The technical scheme is as follows: the U-net multi-scale feature recovery module can give consideration to the receptive fields of a high layer and a low layer, so that the semantic information representation capability of the image is enhanced, and a plurality of feature details of the space of the image are enriched.

In a preferred embodiment of the present invention, the training process of the multi-scale feature recovery image enhancement model is as follows: acquiring a normal illumination image, reducing the brightness of the normal illumination image to obtain a low-light image corresponding to the normal illumination image, and constructing a training set by taking the low-light image as a training sample; constructing a network structure of a multi-scale feature recovery image enhancement model; and training the network structure of the multi-scale feature recovery image enhancement model by using a training set, solving the perception loss of a normal illumination image corresponding to a training sample and an enhanced image output by a network by using a VGG-16 network in the training, taking the perception loss as a part of a loss function, and continuously adjusting the network structure parameters of the multi-scale feature recovery image enhancement model according to the loss function result in the training.

The technical scheme is as follows: the normal illumination image is used for generating a low-light image as a training sample, so that a network can well learn the mapping relation between the normal image and the low-light image; in the training process, the perception loss of the normal illumination image corresponding to the training sample and the enhanced image output by the network is obtained by the VGG-16 network, and the network structure parameters are adjusted by reducing the perception loss, so that the model obtains a better enhancement result.

In a preferred embodiment of the invention, the loss function I _total Comprises the following steps:

wherein i represents a pixel index, i is more than or equal to 1 and less than or equal to N, and N represents the number of pixels of the image to be enhanced; y' _di Expressing the pixel value of the ith pixel point of the Y-channel enhanced image; y is _i Representing Y-channel images in normal-light images corresponding to training samplesThe pixel value of the ith pixel point; i' _i Representing the pixel value of the ith pixel point of the network output enhanced image corresponding to the training sample; i is _i Representing the pixel value of the ith pixel point of the normal illumination image corresponding to the training sample; l is a radical of an alcohol _PL Indicating a loss of perception.

The technical scheme is as follows: the pixel difference obtained by the first term of the loss function guarantees the brightness recovery degree, the pixel difference obtained by the second term guarantees the naturalness of the whole color, the third term of the perception loss compares the output enhanced image with the corresponding normal illumination image under the characteristic information of different scales by virtue of a VGG-16 network, so that the definition of the output enhanced image is improved, and the problem that only the first two terms of the minimum mean square error loss are compared on the pixel level and the high-order image information loss possibly occurs on the characteristic level in the prior art is effectively solved, so that the recovered or reconstructed result image is blurred, and the subsequent computer vision task is not facilitated.

In a preferred embodiment of the invention, said perceptual loss L _PL Comprises the following steps:

wherein, I' represents an enhanced image output by a multi-scale feature recovery image enhancement mode network corresponding to the training sample; i represents a normal illumination image corresponding to the training sample; lambda ₁ Representing a first weight; lambda [ alpha ] ₂ Representing a second weight; lambda [ alpha ] ₃ Represents a third weight; lambda [ alpha ] ₄ Represents a fourth weight;

representing the difference between the features of the image I' extracted by the convolutional layer conv1-2 and the features of the image I in the VGG-16 network;

representing the difference between the features of the image I' and the features of the image I extracted by the convolutional layer conb2_2 in the VGG-16 network;

representing the difference between the features of the image I' and the features of the image I extracted by the convolutional layer conv3_2 in the VGG-16 network;

representing the difference between the features of the image I' and the features of the image I extracted by the convolutional layer conv4_2 in the VGG-16 network;

j denotes the layer number index of the VGG-16 network, j is 1, 2, 3, 4, phi _j (I ') represents the characteristics of the image I' extracted from the j-th layer of the VGG-16 network; phi is a _j (I) Representing the characteristics of the image I extracted from the j layer of the VGG-16 network; | | non-woven hair ₂ Expressing taking L2 norm; c _j Representing the number of channels of the features extracted from the j layer of the VGG-16 network; h _j Representing the length of the extracted features of the j layer of the VGG-16 network; w _j Representing the width of the extracted features at layer j of the VGG-16 network.

The technical scheme is as follows: the convolution characteristic in the neural network is fully utilized, and the characteristic information of the color, the edge and the like in the multi-scale space in the image is reserved, so that the final enhanced result image is more consistent with the image cognition of human beings on the visual and thinking level.

To achieve the above object, according to a second aspect of the present invention, there is provided a computer-readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the multi-scale feature restoration image enhancement method according to the first aspect of the present invention.

In order to achieve the above object of the present invention, according to a third aspect of the present invention, there is provided a multi-scale feature recovery image enhancement device including: the image acquisition module is used for acquiring an image to be enhanced; the multi-scale feature recovery image enhancement model module is used for processing an image to be enhanced to obtain an enhanced image; the multi-scale feature recovery image enhancement model module comprises a decomposition module, a Y channel enhancement module, a synthesis module and a U-net multi-scale feature recovery module; the decomposition module decomposes an image to be enhanced into a Y-channel image, a U-channel image and a V-channel image; the Y-channel enhancement module processes the Y-channel image to obtain a Y-channel enhanced image; the synthesis module is used for synthesizing the Y-channel enhanced image, the U-channel image and the V-channel image and converting the synthesized image into an RGB form; and the U-net multi-scale feature recovery module processes the composite image in the RGB form to obtain an enhanced image.

Drawings

FIG. 1 is a schematic flow chart of the overall method for enhancing the multi-scale feature recovery image according to a preferred embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a multi-scale feature recovery image enhancement model in a preferred embodiment of the present invention;

FIG. 3 is a schematic flow chart of an image enhancement algorithm according to a preferred embodiment of the present invention;

FIG. 4 is a comparison graph of the enhancement effect of a real low-light image in an application scene according to the present invention;

fig. 5 is a graph comparing the effect of enhanced mastic for target detection in an application scenario of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

The invention discloses a multi-scale feature recovery image enhancement method, which comprises the following steps in a preferred embodiment, as shown in a general flow chart shown in FIG. 1:

step S1, an image to be enhanced is acquired. The image to be enhanced is preferably but not limited to dark pictures shot in real life due to lack of ambient light or abnormal exposure caused by light source protrusion, and the like, and the pictures are called low-light images or low-light pictures, which not only affect the visual perception of human beings, but also greatly hinder subsequent computer vision tasks. The image to be enhanced may also be a picture that requires more detailed feature recovery although the lighting is normal.

In this embodiment, preferably, after the image to be enhanced is acquired, a step of performing denoising processing on the image to be enhanced is further included. A large number of experiments prove that the noise of the low-light image in reality is Gaussian noise, and the noise points usually have certain errors relative to pixels in the image, so that the noise points generate interference in the aspects of human visual perception and computer visual tasks. Therefore, the addition of a denoising module before network training can reduce the pressure of data processing, promote the development of an image enhancement test and facilitate the network to learn the real and effective characteristics in the image. Preferably, but not limited to, bilinear filtering and median filtering are selected for denoising, but these denoising algorithms are all processed by using local information of the image, which can weaken the edge and smooth the original characteristic information of the image. Further preferably, a Non-Local mean (NL-M) algorithm is used for denoising. The algorithm fully considers redundant information ubiquitous in the image, finds a similar neighborhood structure of a current position pixel from a specified range, and then replaces the pixel in the position with a weighted average estimation of the pixels in the region. The algorithm searches and calculates in the whole image by taking the image block as a unit, and replaces a more precise calculation result with a larger calculation amount, so that the detail characteristic information of the image can be retained to the greatest extent while the image is denoised.

And step S2, inputting the image to be enhanced into the multi-scale feature recovery image enhancement model for processing to obtain an enhanced image. As shown in fig. 2, the multi-scale feature recovery image enhancement model includes a decomposition module, a Y-channel enhancement module, a synthesis module, and a U-net multi-scale feature recovery module; the decomposition module decomposes the image to be enhanced into a Y-channel image, a U-channel image and a V-channel image; the Y-channel enhancement module processes the Y-channel image to obtain a Y-channel enhanced image; the synthesis module is used for synthesizing the Y-channel enhanced image, the U-channel image and the V-channel image and converting the synthesized image into an RGB form; and the U-net multi-scale feature recovery module processes the composite image in the RGB form to obtain an enhanced image. Further preferably, the system further comprises an output layer, wherein the input end of the output layer is connected with the output end of the U-net multi-scale feature recovery module, and the enhanced image is output through the output layer. The output layer is preferably, but not limited to, a convolutional layer with a convolutional kernel size of 3 × 3. As can be seen from fig. 3, the network structure of the multi-scale feature recovery image enhancement model is firstly a tight connection of a plurality of convolution layers and an activation layer, and then is followed by a multi-scale feature recovery network based on a U-net network, so as to learn a luminance function mapping expression from low light to normal illumination, and finally obtain an enhancement result image.

In this embodiment, as shown in fig. 3, the Y-channel enhancement module preferably includes a first convolution layer, a first active layer, a second convolution layer, a second active layer, a third convolution layer, a third active layer, a fourth convolution layer, and a fourth active layer connected in sequence. Further preferably, the convolution kernel size of the first convolution layer is 9 × 9, the activation function of the first active layer is LRelu, and the number of feature maps output by the first active layer is 64; the convolution kernel size of the second convolution layer is 1 multiplied by 1, the activation function of the second activation layer is Relu, and the number of feature maps output by the second activation layer is 32; the convolution kernel sizes of the third convolution layer and the fourth convolution layer are 5 × 5, and the activation function of the third activation layer and the activation function of the fourth activation layer are both Relu. The number of feature maps output by the third active layer is 16. The number of feature maps output by the fourth active layer is 1. This part completes the preliminary brightness enhancement of the low-light image on the Y channel, which can reduce the training difficulty of the subsequent U-net network.

In this embodiment, preferably, as shown in fig. 3, the U-net multi-scale feature recovery module includes a first down-sampling module, a second down-sampling module, a third down-sampling module, and a fourth down-sampling module connected in sequence, a fourth up-sampling module, a third up-sampling module, a second up-sampling module, and a first up-sampling module connected in sequence, and a first convolution splicing link, a second convolution splicing link, and a third convolution splicing link connecting the left and right portions; the first downsampling module comprises a first convolution layer and a first maximum pooling layer which are sequentially connected, the second downsampling module comprises a second convolution layer and a second maximum pooling layer which are sequentially connected, the third downsampling module comprises a third convolution layer and a third maximum pooling layer which are sequentially connected, and the fourth downsampling module comprises a fourth convolution layer; the fourth up-sampling module comprises a fourth anti-convolution layer, the input end of the fourth anti-convolution layer is connected with the output end of the fourth convolution layer, the third up-sampling module comprises a third convolution combination layer and a third anti-convolution layer which are sequentially connected, the second up-sampling module comprises a second convolution combination layer and a second anti-convolution layer which are sequentially connected, and the first up-sampling module comprises a first convolution combination layer and a first anti-convolution layer which are sequentially connected; the first convolution splicing link is used for connecting the first convolution layer and the first convolution merging layer; the second convolution splicing link is used for connecting the second convolution layer and the second convolution merging layer; and the third convolution splicing link is used for connecting the third convolution layer and the third convolution merging layer.

In the present embodiment, the convolution kernel size of the first convolution layer, the second convolution layer, the third convolution layer, and the fourth convolution layer is 3 × 3. The pooling windows of the first largest pooling layer, the second largest pooling layer, and the third largest pooling layer were 2 × 2. The convolution layer with the size of 3 x 3 convolution kernel is followed by the largest pool layer with the pooling window of 2 x 2 to complete one down-sampling operation, so that the feature map output with a specific scale can be obtained. And then, as the method is adopted, downsampling is carried out twice by using convolution kernels with the same size and the maximum pool layer, so that the image characteristics under different scales are reserved. Finally, the whole feature extraction part can be completed by adding a convolution layer with the convolution kernel size of 3 multiplied by 3. In the whole process, the number of features output after the first downsampling is 64, and then the number of features is increased to 128, 256 and 512 in sequence, and the total number of features including the original image is 4.

In this embodiment, the left side of the U structure is the feature mapping and fusion part. The specific structure is that an deconvolution layer is followed by a splicing and fusing step to form a feature mapping, and it is worth noting that the size of a convolution kernel of the deconvolution layer is not like the size of a downsampling layer generally has fixity, and the convolution kernel is closely related to the input and output information of the layer. And then, as the method is adopted, the convolution kernel of each layer is determined by the same strategy to carry out feature mapping twice, and each time, the feature mapping is spliced with the multi-scale feature map obtained in the right stage through the corresponding convolution splicing link. Finally, only one convolution layer (output layer) with convolution kernel size of 3 x 3 is needed to complete the final output of the enhancement result graph. In the whole process, the number of the features obtained after the first fusion splicing process is 256, then the features are sequentially reduced to 128 and 64, and finally, 3 feature maps output by convolution respectively represent three channels of the image.

In this embodiment, preferably, the training process of the multi-scale feature recovery image enhancement model is as follows:

step A, acquiring a normal illumination image, reducing the brightness of the normal illumination image to obtain a low-light image corresponding to the normal illumination image, and constructing a training set by taking the low-light image as a training sample;

b, constructing a network structure of the multi-scale feature recovery image enhancement model, specifically comprising a Y-channel enhancement module network and a U-net multi-scale feature recovery module network;

and step C, training the network structure of the multi-scale feature recovery image enhancement model by using the training set, solving the perception loss of the normal illumination image corresponding to the training sample and the enhanced image output by the network by using the VGG-16 network in the training, taking the perception loss as a part of a loss function, and continuously adjusting the network structure parameters of the multi-scale feature recovery image enhancement model according to the loss function result in the training.

In this embodiment, it is further preferred that the loss function I _total Comprises the following steps:

wherein i represents a pixel index, i is more than or equal to 1 and less than or equal to N, and N represents the number of pixels of the image to be enhanced; y' _di Expressing the pixel value of the ith pixel point of the Y-channel enhanced image; y is _i Representing the pixel value of the ith pixel point of the Y-channel image in the normal illumination image corresponding to the training sample; i' _i Representing the pixel value of the ith pixel point of the network output enhanced image corresponding to the training sample; i is _i Representing the pixel value of the ith pixel point of the normal illumination image corresponding to the training sample; l is _PL Indicating a loss of perception.

In the present embodiment, the loss function I _total The first term is used for calculating the pixel difference between the Y channel of the Y channel enhanced image and the Y channel of the normal illumination image in the network, and the recovery degree of the brightness is ensured. The second term calculates the pixel difference between the finally output enhanced image and the normal illumination image corresponding to the training sample in the RGB color space, and guarantees the naturalness of the whole color. The traditional MSE loss function is compared and differentiated on a pixel level, and the problem of high-order image information loss may be caused on a feature level, so that a restored or reconstructed result image is blurred, and the subsequent computer vision task is not facilitated. The perception loss combines the simple and effective thought of a pixel-by-pixel loss function and an image obtained by a pre-trained convolutional network (pre-trained convolutional network) has the advantage of high quality, so that the image can be obtained by the aid of the VGG-16 networkThe feature information of the result image and the feature information of the target image under different scales are compared, so that the definition of the final result image is improved, and the problem of image enhancement caused by only an MSE loss function is solved.

In the present embodiment, it is further preferable that the perception loss L is _PL Comprises the following steps:

wherein, I' represents an enhanced image output by a multi-scale feature recovery image enhancement mode network corresponding to the training sample; i represents a normal illumination image corresponding to the training sample; lambda [ alpha ] ₁ Representing a first weight; lambda [ alpha ] ₂ Representing a second weight; lambda [ alpha ] ₃ Represents a third weight; lambda [ alpha ] ₄ Represents a fourth weight;

representing the difference between the features of the image I' and the features of the image I extracted by the convolutional layer conv1_2 in the VGG-16 network;

the formula represents the Euclidean distance between the characteristic graphs obtained by processing the normal image corresponding to the training sample and the enhanced image of the corresponding output training sample through a specific convolution layer in the VGG-16 network, and the network continuously adjusts the parameter reduction through trainingThe gap between them to get better enhancement results. j denotes the layer number index of the VGG-16 network, j is 1, 2, 3, 4, phi _j (I ') represents the characteristics of the image I' extracted from the j-th layer of the VGG-16 network; phi is a _j (I) Representing the characteristics of the image I extracted from the j layer of the VGG-16 network; | | non-woven hair ₂ Expressing taking L2 norm; c _j Representing the number of channels of the features extracted by the j layer of the VGG-16 network; h _j Representing the length of the extracted features of the j layer of the VGG-16 network; w _j Representing the width of the extracted features at layer j of the VGG-16 network.

In the present embodiment, λ ₁ 、λ ₂ 、λ ₃ 、λ ₄ And the scale characteristic weights are represented, and the values of the scale characteristic weights are respectively 2.6, 4.8, 3.7 and 5.6. The perception loss fully utilizes the convolution characteristic in the neural network, and retains the characteristic information of the color, the edge and the like in the multi-scale space in the image, so that the final enhanced result image is more in line with the image cognition of human beings on the visual and thinking level.

In an application scene of the multi-scale feature recovery image enhancement method provided by the invention, experimental verification is carried out to obtain a large number of real dark images without supervision in real life.

1. Enhanced authentication

Fig. 4 is a comparison of the effect of the output picture after the multi-scale feature recovery image enhancement method provided by the invention and other existing low-light enhancement algorithms are enhanced, and it can be seen that existing LIME and DeHaze can generate some line noise at the boundary of an image object, and GP and BIMEF are slightly inferior in color recovery. OCTM can excessively enhance partial high-brightness areas in the image, and RetinexNet still has the problem of color distortion, which can reduce the readability of the image. Kidd and the method of the present invention are highly appreciated in various aspects, such as natural color and relatively low noise, and thus may provide a pleasing experience to the viewer. The enhancement method provided by the invention can correctly adjust the brightness of the image on the premise of reducing distortion and can better recover the characteristics of the image on a spatial domain.

2. Target detection experiment

And (3) adding a subsequent target detection machine vision task to judge whether the image enhancement method provided by the invention can complete the purpose of prediction, and testing the enhancement effect of each method on the subsequent computer vision task by using the LIME, the GP, the Kind, the IMEF and the enhancement result obtained by the image enhancement method provided by the invention through Y0lov 3. As shown in FIG. 5, the image enhancement algorithm provided by the present invention has the best detection result. In the example on the left side of fig. 5, Yolov3 misidentifies the top left warning board area in the enhanced image obtained by the other algorithm as an object such as a household appliance, for which only the enhanced image provided by the present invention provides image characteristic information of the sign. Meanwhile, although the GP algorithm can recover the passerby at the upper right corner, only the enhanced image obtained by the enhancement method provided by the invention can correctly give the characteristic range, so that the Yolov3 can indicate the existence of two passerbies. Moreover, the enhancement method provided by the invention can obtain the enhanced image and also can enable Yolov3 to distinguish the characteristic information of the indicator light of the image. In the example on the right in fig. 5, Yolov3 misidentifies the black pan in the LIME result image as an inkstone, while also detecting the white cup in the Bimef and Kind enhanced images as a jug, which means that both enhancement algorithms, although they can get a natural color enhanced image with normal brightness, are not powerful enough in detail feature recovery. In addition, the enhanced image obtained by the enhancement method provided by the invention is also a unique algorithm which can make Yolov3 recognize the existence of the knife and fork at the lower right corner. Therefore, the image enhancement method provided by the invention can focus more on the feature information of high order and low order in the image, so that the finally recovered image detail feature is more prominent, and the target detection result obtained by Yolov3 is more in line with the real situation.

The invention also discloses a computer readable storage medium, and in a preferred embodiment, at least one instruction, at least one program, a code set or an instruction set is stored in the storage medium, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by a processor to realize the multi-scale feature recovery image enhancement method provided by the invention.

The invention also discloses a multi-scale feature recovery image enhancement device, which comprises the following components in a preferred embodiment: the image acquisition module is used for acquiring an image to be enhanced; the multi-scale feature recovery image enhancement model module is used for processing an image to be enhanced to obtain an enhanced image; the multi-scale feature recovery image enhancement model module comprises a decomposition module, a Y channel enhancement module, a synthesis module and a U-net multi-scale feature recovery module; the decomposition module decomposes the image to be enhanced into a Y-channel image, a U-channel image and a V-channel image; the Y-channel enhancement module processes the Y-channel image to obtain a Y-channel enhanced image; the synthesis module is used for synthesizing the Y-channel enhanced image, the U-channel image and the V-channel image and converting the synthesized image into an RGB form; and the U-net multi-scale feature recovery module processes the composite image in the RGB form to obtain an enhanced image.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A multi-scale feature recovery image enhancement method is characterized by comprising the following steps:

acquiring an image to be enhanced;

inputting the image to be enhanced into a multi-scale feature recovery image enhancement model for processing to obtain an enhanced image;

the multi-scale feature recovery image enhancement model comprises a decomposition module, a Y channel enhancement module, a synthesis module and a U-net multi-scale feature recovery module; the decomposition module decomposes an image to be enhanced into a Y-channel image, a U-channel image and a V-channel image; the Y-channel enhancement module processes the Y-channel image to obtain a Y-channel enhanced image; the synthesis module is used for synthesizing the Y-channel enhanced image, the U-channel image and the V-channel image and converting the synthesized image into an RGB form; and the U-net multi-scale feature recovery module processes the composite image in the RGB form to obtain an enhanced image.

2. The method for enhancing a multi-scale feature recovery image as recited in claim 1, further comprising a step of denoising the image to be enhanced after the image to be enhanced is acquired.

3. The method of claim 1 or 2, wherein the Y-channel enhancement module comprises a first convolution layer, a first active layer, a second convolution layer, a second active layer, a third convolution layer, a third active layer, a fourth convolution layer, and a fourth active layer connected in sequence.

4. The method of claim 3, wherein the convolution kernel size of the first convolution layer is 9 x 9, the activation function of the first activation layer is LRelu, the convolution kernel size of the second convolution layer is 1 x 1, the activation function of the second activation layer is Relu, the convolution kernel sizes of the third convolution layer and the fourth convolution layer are 5 x 5, and both the activation functions of the third activation layer and the fourth activation layer are Relu.

5. The method for enhancing multi-scale feature recovery images according to claim 1, 2 or 4, wherein the U-net multi-scale feature recovery modules comprise a first down-sampling module, a second down-sampling module, a third down-sampling module, a fourth up-sampling module, a third up-sampling module, a second up-sampling module, and a first up-sampling module, which are connected in sequence, and a first convolution concatenation link, a second convolution concatenation link, a third convolution concatenation link;

the first downsampling module comprises a first convolution layer and a first maximum pooling layer which are sequentially connected, the second downsampling module comprises a second convolution layer and a second maximum pooling layer which are sequentially connected, the third downsampling module comprises a third convolution layer and a third maximum pooling layer which are sequentially connected, and the fourth downsampling module comprises a fourth convolution layer;

the fourth up-sampling module comprises a fourth anti-convolution layer, the input end of the fourth anti-convolution layer is connected with the output end of the fourth convolution layer, the third up-sampling module comprises a third convolution combination layer and a third anti-convolution layer which are sequentially connected, the second up-sampling module comprises a second convolution combination layer and a second anti-convolution layer which are sequentially connected, and the first up-sampling module comprises a first convolution combination layer and a first anti-convolution layer which are sequentially connected;

the first convolution splicing link is used for connecting a first convolution layer and a first convolution merging layer;

the second convolution splicing link is used for connecting a second convolution layer and a second convolution merging layer;

and the third convolution splicing link is used for connecting the third convolution layer and the third convolution merging layer.

6. The method for enhancing multi-scale feature recovery images according to claim 1, 2 or 4, wherein the training process of the multi-scale feature recovery image enhancement model is as follows:

acquiring a normal illumination image, reducing the brightness of the normal illumination image to obtain a low-light image corresponding to the normal illumination image, and constructing a training set by taking the low-light image as a training sample;

constructing a network structure of a multi-scale feature recovery image enhancement model;

and training the network structure of the multi-scale feature recovery image enhancement model by using a training set, solving the perception loss of a normal illumination image corresponding to a training sample and an enhanced image output by a network by using a VGG-16 network in the training, taking the perception loss as a part of a loss function, and continuously adjusting the network structure parameters of the multi-scale feature recovery image enhancement model according to the loss function result in the training.

7. The method of multi-scale feature recovery image enhancement of claim 6, in which the loss function I _total Comprises the following steps:

8. The method of multi-scale feature recovery image enhancement as claimed in claim 6 or 7, wherein the perceptual loss L _PL Comprises the following steps:

to representThe difference between the features of the image I' and the features of the image I extracted by the convolutional layer conv1_2 in the VGG-16 network;

representing the difference between the features of the image I' and the features of the image I extracted by the convolutional layer conv2_2 in the VGG-16 network;

j denotes the layer number index of the VGG-16 network, j is 1, 2, 3, 4, phi _j (I ') represents the characteristics of the image I' extracted from the j-th layer of the VGG-16 network; phi is a _j (I) Representing the features of the image I extracted from the j layer of the VGG-16 network; | | non-woven hair ₂ Expressing taking L2 norm; c _j Representing the number of channels of the features extracted from the j layer of the VGG-16 network; h _j Representing the length of the features extracted from the j layer of the VGG-16 network; w _j Representing the width of the extracted features at layer j of the VGG-16 network.

9. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the multi-scale feature restoration image enhancement method of any of claims 1 to 8.

10. A multi-scale feature restoration image enhancement apparatus, comprising:

the image acquisition module is used for acquiring an image to be enhanced;

the multi-scale feature recovery image enhancement model module is used for processing an image to be enhanced to obtain an enhanced image;

the multi-scale feature recovery image enhancement model module comprises a decomposition module, a Y channel enhancement module, a synthesis module and a U-net multi-scale feature recovery module; the decomposition module decomposes an image to be enhanced into a Y-channel image, a U-channel image and a V-channel image; the Y-channel enhancement module processes the Y-channel image to obtain a Y-channel enhanced image; the synthesis module is used for synthesizing the Y-channel enhanced image, the U-channel image and the V-channel image and converting the synthesized image into an RGB form; and the U-net multi-scale feature recovery module processes the composite image in the RGB form to obtain an enhanced image.