CN113096029A

CN113096029A - High dynamic range image generation method based on multi-branch codec neural network

Info

Publication number: CN113096029A
Application number: CN202110246503.7A
Authority: CN
Inventors: 霍永青; 李翰林; 甘静; 刘耀辉; 武畅
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-07-09

Abstract

The invention discloses a high dynamic range image generation method based on a multi-branch codec neural network, which comprises the following steps: s1: collecting and cleaning an HDR image; s2: preprocessing the cleaned HDR image to obtain an LDR image; s3: taking the LDR image as the input of a multi-branch codec neural network model, and training until convergence; s4: and testing the input test image by using the trained multi-branch codec neural network model to generate a high dynamic range image. The invention provides a high dynamic range image generation method based on a neural network. The invention can output the single-frame low dynamic range image captured in the real scene through the neural network of the multi-branch codec structure and then output the high dynamic range image with high imaging quality.

Description

High dynamic range image generation method based on multi-branch codec neural network

Technical Field

The invention belongs to the technical field of high dynamic image generation, and particularly relates to a high dynamic range image generation method based on a multi-branch codec neural network.

Background

In recent years, High Dynamic Range (HDR) image correlation techniques have been widely studied and applied in academic and industrial fields. The high dynamic range image mainly relates to the technology of high dynamic range image acquisition, coding, display and the like. For high dynamic range image acquisition techniques, the most common method is to capture multiple frames of differently exposed Low Dynamic Range (LDR) images of the same scene and then synthesize them to obtain an HDR image. However, when a multi-frame LDR image is used to generate a high-quality HDR image, background alignment needs to be performed on the multi-frame image with different exposures and foreground motion needs to be solved, which all have adverse effects on algorithm complexity and reconstruction effect. In addition, a large number of images exist in reality as single-frame images, and most of the images actually taken by people are often single-exposure images. As camera performance improves, the captured single exposure image possesses enough information to reconstruct a high dynamic range image. Therefore, methods of generating HDR images from single-frame LDR images have also received attention from researchers.

The conventional method of single exposure HDR image imaging is to stretch the luminance channel dynamic range of a single frame image in a specific way, so as to obtain an HDR image with a high dynamic range. The main algorithms can be divided into two categories: a Camera Response Function (CRF) based method and an Inverse Tone Mapping operator (ITM) based method. The first method is to estimate a corresponding camera response function according to an input image, and then to obtain a target HDR image by applying the function to a pixel value of an original irradiation field. The second method is a mainstream single-frame HDR image generation method, and by using a segmented mapping function or a specific inverse tone mapping operator to calculate different exposure areas of an image, the dynamic range of an original LDR image can be expanded, and the detail information of an imaging defective area of the original image is enhanced, so that the generated HDR image has a better visual effect.

With the development of deep learning techniques, in recent years, methods of generating HDR images using deep convolutional neural networks have begun to appear. The deep network can replace various complex algorithms in the traditional method to realize the nonlinear mapping from the LDR image to the HDR image, and can also improve the defects that the traditional method is insufficient in generalization, the algorithm is complex and is difficult to realize on hardware and the like. On the task of generating a single-exposure HDR image, the convolutional neural network extracts and combines the bottom layer features to obtain abstract feature representation in a high-level meaning, and has strong fitting capability. The method for deep learning is used for enhancing or recovering and estimating the detail information of the poor region of the single-frame image, the original scene information corresponding to the single-frame LDR image can be greatly reproduced, and compared with the traditional HDR image acquisition method, the trained deep network has the advantages of smaller computational complexity and better real-time property.

Disclosure of Invention

The invention aims to solve the problem of high dynamic image generation and provides a high dynamic range image generation method based on a multi-branch codec neural network.

The technical scheme of the invention is as follows: a high dynamic range image generation method based on a multi-branch codec neural network comprises the following steps:

s1: collecting and cleaning an HDR image;

s2: preprocessing the cleaned HDR image to obtain an LDR image;

s3: taking the LDR image as the input of a multi-branch codec neural network model, and training until convergence;

s4: and testing the input test image by using the trained multi-branch codec neural network model to generate a high dynamic range image.

The invention has the beneficial effects that: the invention provides a high dynamic range image generation method based on a neural network. The invention can output the single-frame low dynamic range image captured in the real scene through the neural network of the multi-branch codec structure and then output the high dynamic range image with high imaging quality.

Further, in step S1, an HDR image is generated using a multiple exposure method; the cleaning method comprises the following steps: and eliminating bad pixels with the pixel value of 0 in the area exceeding the threshold value and damaged data of the image file by utilizing manual work or scripts.

Further, step S2 includes the following sub-steps:

s21: randomly cropping the cleaned HDR image, and adjusting the size of the HDR image to 256 × 256;

s22: and sequentially carrying out random tone, saturation adjustment, random histogram cutting and tone mapping of random parameters on the HDR image after size adjustment to obtain an LDR image.

The beneficial effects of the further scheme are as follows: in the present invention, the HDR image with high quality has a high resolution, and in order to reduce the computational burden, it is necessary to randomly crop the original image, then resize is a small image with a fixed size (the size used by the model in the present invention is 256 × 256), and then perform random tone, saturation adjustment, random histogram cropping, and tone mapping of random parameters on the obtained image to obtain an LDR image. The series of operations need to control the generation of an LDR image with moderate quality, if the quality of the generated LDR image is too poor, the network can not be converged normally, and if the quality is too high, the network effect is not obvious.

Further, in step S2, the clipping objects for random clipping and random histogram clipping are the first 3% to 5% of pixels with the highest pixel value in the RGB channels of the whole image.

The beneficial effects of the further scheme are as follows: in the invention, the used cutting proportion is the first 3-5% pixels with the highest pixel value in the RGB channel of the whole picture, the data degradation can be effectively learned by a network and can obtain a certain quality improvement effect, and the image brightness is suppressed, so that the value of a low exposure area is limited in a low range, and certain information loss appears after 8-bit quantization for network learning quantization error.

Further, in step S3, the multi-branch codec neural network model includes an encoder and a decoder;

the encoder comprises a detail information processing network, an intermediate frequency information processing network and a global information processing network; the detail information processing network is used for extracting the detail information of the LDR image; the intermediate frequency information processing network is used for extracting intermediate frequency characteristic information of the LDR image; the global information processing network is used for extracting global characteristic information of the LDR image;

the decoder comprises an information fusion network; the information fusion network is used for fusing the cascaded detail information, the intermediate frequency characteristic information and the global characteristic information.

The beneficial effects of the further scheme are as follows: in the present invention, the encoder network finally obtains the feature information of 64 channels size 1 × 1, and uses the copy operation to obtain the feature information of 64 channels 256 × 256 size for final decoding. And the LDR image is input into a network, and after calculation through each branch of the encoder, the characteristic information output by each part of the encoder is fused.

Further, in the detail information processing network, the number of channels is 64 and 128 respectively, the step size is 1, the filling value of the convolution kernel is 1, and the shape is 3 x 3;

in the intermediate frequency information processing network, the number of channels is 64, the step length is 1, the filling value of a convolution kernel is 2, the shape is 3 x 3, and the sparse convolution coefficient is 2;

in the global information processing network, the number of channels is 64, and the step length is 1; the 1 st to 6 th sets of convolution kernels have a fill value of 1, a shape of 3 x 3, and the seventh set of convolution kernels has a fill value of 0, a shape of 4 x 4.

The beneficial effects of the further scheme are as follows: in the invention, each square in the detail information processing network represents a group of convolution kernels, the number of channels (channel) is 64 and 128 respectively, the step length (stride) is 1, the padding value (padding) is 1, and the shape is 3 x 3; the part pays attention to the learning of the pixel level and finishes the extraction of detail information.

Each square in the intermediate frequency information processing network represents a group of convolution kernels, the number of channels is 64, the step length is 1, the filling value is 2, the shape is 3 x 3, and the sparse convolution with the coefficient of 2 is used for expanding the network receptive field, so that the partial network extracts the intermediate frequency characteristic information of the input LDR image.

Each square in the global information processing network represents a group of convolution kernels, the number of channels is 64, and the step length is 1; the first 6 sets of convolution kernel fill values are 1, the shape is 3 x 3, and the last set of convolution kernel fill values are 0, the shape is 4 x 4, so that the partial network extracts the input LDR image global feature information.

The partial network finally obtains the feature information of 64 channels with size 1 × 1, and obtains the feature information of 64 channels with size 256 × 256 by using the copy operation for final decoding.

Further, in the information fusion network, the number of channels is 64 and 3, the shape is 1 × 1, the step size is 1, and the convolution kernel filling value is 0.

Furthermore, in the multi-branch codec neural network model, the activation functions of the detail information processing network, the intermediate frequency information processing network and the global information processing network are Selu; the activation function of the first convolution module in the information fusion network is Selu, and the activation function of the second convolution module is sigmoid.

Further, in step S3, training the loss function of the multi-branch codec neural network model

The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

I_i＝0.299X_r+0.587X_g+0.114X_b，M_ithe final mask is shown to be the final mask,

a mask for the bright area is shown,

denotes a dark area mask, ta denotes a threshold for determining whether an image area is highly exposed, tb denotes a threshold for determining whether an image area is underexposed, I_iRepresenting imagesBrightness, X_rRepresenting the red channel value, X, of pixel X_gRepresenting the green channel value, X, of pixel X_bRepresenting the blue channel value of pixel X, w representing the width of the image, h representing the height of the image,

representing the net output value of the ith pixel in channel c,. epsilon.represents the minimum value, Y_i,cRepresents the group Truth value, α, of the ith pixel in channel c^LThe weight representing the hue loss in the loss function,

representing ith pixel value, H, of a color channel of a network output image_iRepresents the ith pixel value of the color channel of the GroundTruth image, and log (-) represents the logarithm operation.

Drawings

FIG. 1 is a flow chart of a high dynamic range image generation method;

FIG. 2 is a training image contrast map;

FIG. 3 is a diagram of a network architecture;

FIG. 4 is a graph of network model test results;

FIG. 5 is a diagram of a forward inference process;

FIG. 6 is a comparative outdoor 1 chart;

FIG. 7 is a comparative view of the outdoor unit 2;

FIG. 8 is a comparison of outdoor evening;

FIG. 9 is a test image contrast map;

FIG. 10 is a graph of HDR-VDP2 visualization results.

Detailed Description

The embodiments of the present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a high dynamic range image generation based on multi-branch codec neural network, the method includes the following steps:

s1: collecting and cleaning an HDR image;

s2: preprocessing the cleaned HDR image to obtain an LDR image;

In the embodiment of the present invention, as shown in fig. 1, in step S1, an HDR image is generated using a multiple exposure method; the cleaning method comprises the following steps: and eliminating bad pixels with the pixel value of 0 in the area exceeding the threshold value and damaged data of the image file by utilizing manual work or scripts.

In an embodiment of the invention, the threshold is 70%.

In the embodiment of the present invention, as shown in fig. 1, step S2 includes the following sub-steps:

The pair of training images obtained by pre-processing is shown in fig. 2, where the left two are LDR images for input and the right two are the corresponding tone-mapped HDR images. It is clear that HDR images have better detail.

In the embodiment of the present invention, as shown in fig. 1, in step S2, the clipping objects for random clipping and random histogram clipping are the first 3% -5% pixels with the highest pixel value in the RGB channels of the whole image.

In the invention, the used cutting proportion is the first 3-5% pixels with the highest pixel value in the RGB channel of the whole picture, the data degradation can be effectively learned by a network and can obtain a certain quality improvement effect, and the image brightness is suppressed, so that the value of a low exposure area is limited in a low range, and certain information loss appears after 8-bit quantization for network learning quantization error.

In the embodiment of the present invention, as shown in fig. 3, in step S3, the multi-branch codec neural network model includes an encoder and a decoder;

In the present invention, the encoder network finally obtains the feature information of 64 channels size 1 × 1, and uses the copy operation to obtain the feature information of 64 channels 256 × 256 size for final decoding. And the LDR image is input into a network, and after calculation through each branch of the encoder, the characteristic information output by each part of the encoder is fused.

In the embodiment of the present invention, as shown in fig. 3, in the detail information processing network, the number of channels is 64 and 128, respectively, the step size is 1, the convolution kernel filling value is 1, and the shape is 3 × 3;

In the invention, each square in the detail information processing network represents a group of convolution kernels, the number of channels (channel) is 64 and 128 respectively, the step length (stride) is 1, the padding value (padding) is 1, and the shape is 3 x 3; the part pays attention to the learning of the pixel level and finishes the extraction of detail information.

In the embodiment of the present invention, as shown in fig. 3, in the information fusion network, the number of channels is 64 and 3, the shape is 1 × 1, the step size is 1, and the convolution kernel padding value is 0.

In the embodiment of the present invention, as shown in fig. 3, in the multi-branch codec neural network model, the activation functions of the detail information processing network, the intermediate frequency information processing network, and the global information processing network are Selu; the activation function of the first convolution module in the information fusion network is Selu, and the activation function of the second convolution module is sigmoid.

In the embodiment of the present invention, as shown in FIG. 1, in step S3, the loss function of the multi-branch codec neural network model is trained

The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

a mask for the bright area is shown,

denotes a dark area mask, ta denotes a threshold for determining whether an image area is highly exposed, tb denotes a threshold for determining whether an image area is underexposed, I_iRepresenting the brightness, X, of the image_rRepresenting the red channel value, X, of pixel X_gRepresenting the green channel value, X, of pixel X_bRepresenting the blue channel value of pixel X, w representing the width of the image, h representing the height of the image,

In the embodiment of the invention, after the network model training is converged, the network model is tested. The input test image and the output result graph are shown in fig. 4.

As shown in fig. 5, to better understand the present invention, a summary of the whole forward reasoning process of the network is made:

1. calculating a mask on the LDR image according to a threshold value;

the LDR image is input into the network after being converted into an HDR domain (gamma transformation is used);

3. performing dot multiplication on the network output by using the mask obtained by calculation, wherein the operation is used for extracting information of a corresponding area in the image;

4. performing dot multiplication on the input converted to the HDR domain by using the calculated 1-mask, and fully utilizing the information of the normally exposed region in the original image;

5. and (4) summing the outputs in the steps (3) and (4) to obtain a final output result.

To verify the effect of the present invention, the above-mentioned results were compared with "Ldr to hdr image mapping with iterative preprocessing" (Huo in the present invention) proposed by Y.Huo et al in 2013 and "expand dNet: A Deep conditional New Network for High Dynamic Range Expansion from Low Dynamic Range Content" (Exexpand dNet in the present invention) proposed by D.Marnerides et al in 2018. The HDR images generated by the three algorithms are tone mapped to give a subjective result comparison graph. The results of the objective comparison are given using the HDR-VDP2 index.

The subjective results were: in fig. 6, 7, and 8, the input images for evaluation are outdoor 1, outdoor 2, and outdoor evening, respectively. The test image includes over-exposed and under-exposed cases. The upper left corner of each group of graphs is an LDR image of an input network, the upper right corner is a Huo result graph, the lower left corner is an ExpandNet result graph, and the lower right corner is a result graph output by the method. By contrast, the result of the present invention produces more natural colors and richer details in the poor exposure areas, and the noise in the low exposure areas can be effectively suppressed.

The objective results are: using HDR-VDP2 as a measure of objective index, the test images shown in fig. 9 were taken as outdoor 1, outdoor 2, and outdoor evening, respectively, in the clockwise direction; the corresponding visualization result graph (fig. 10) and the Q-value comparison table (table 1) are respectively given. The same test pictures as compared with the subjective results were used for evaluation, outdoor 1, outdoor 2, and outdoor evening, respectively, clockwise from the first picture in the first row. In fig. 10, the input diagrams corresponding to the images from the first row to the last row are outdoor 1, outdoor 2, and outdoor evening, respectively. The first column is the results of Huo, the second column is the results of ExpandNet, and the third column is the results of the present invention. Wherein more blue regions represents a higher quality HDR image generated. Table 1 is an evaluation index Q value in HDR-VDP2, a larger value indicates that the generated HDR image has lower perceptual difference from its real HDR image, and the present invention achieves a higher score compared with the other two methods; fig. 10 is a graph of HDR-VDP2 visualization, where the blue pixels are in the area indicating less perceptual difference between the original image and the target image. On the contrary, the red pixel represents that the region is located with a larger perception difference from the target image; the green pixels indicate a difference in their perception between red and blue. Compared with the other two methods, the visual result graph has more blue areas and less red areas, and generates a higher-quality image; the method has certain advantages in the aspects of visualization result graphs and specific Q values.

TABLE 1

The working principle and the process of the invention are as follows: the invention estimates the detail information lost by the high exposure area and the low exposure area of the single-frame LDR image by a deep learning method to obtain the HDR image. For a normal LDR image, the details of the bright or dark part of the image are often lost due to insufficient dynamic range of the camera itself or too high contrast caused by strong illumination in the same scene. Meanwhile, when the camera records the illumination of a natural scene, the high and low brightness values are subjected to nonlinear compression due to hardware limitation, so that the acquired image data cannot truly reflect scene information; the key to solving the HDR image reconstruction problem of a single-frame LDR image is to restore the detail poor regions of high and low exposures that are compressed nonlinearly. The invention uses deep learning mode to realize lost information estimation and linearization of high exposure area and low exposure area, and rebuilds HDR image with better quality.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A high dynamic range image generation method based on a multi-branch codec neural network is characterized by comprising the following steps:

s1: collecting and cleaning an HDR image;

s2: preprocessing the cleaned HDR image to obtain an LDR image;

2. The method for generating a high dynamic range image based on a multi-branch codec neural network as claimed in claim 1, wherein in step S1, the HDR image is generated by a multi-exposure method; the cleaning method comprises the following steps: and eliminating bad pixels with the pixel value of 0 in the area exceeding the threshold value and damaged data of the image file by utilizing manual work or scripts.

3. The multi-branch codec neural network-based high dynamic range image generation method according to claim 1, wherein the step S2 includes the following sub-steps:

4. The method for generating high dynamic range image based on multi-branch codec neural network as claimed in claim 3, wherein in step S2, the clipping objects for random clipping and random histogram clipping are the first 3% -5% pixels with the highest pixel value in RGB channel of the whole image.

5. The multi-branch codec neural network-based high dynamic range image generation method according to claim 1, wherein in the step S3, the multi-branch codec neural network model includes an encoder and a decoder;

6. The method according to claim 5, wherein in the detail information processing network, the number of channels is 64 and 128, the step size is 1, the convolution kernel filling value is 1, and the shape is 3 x 3;

7. The method according to claim 5, wherein the information fusion network comprises 64 and 3 channels, 1 x 1 in shape, 1 step size, and 0 convolution kernel filling value.

8. The method according to claim 5, wherein in the multi-branch codec neural network model, the activation functions of the detail information processing network, the intermediate frequency information processing network, and the global information processing network are Selu; the activation function of the first convolution module in the information fusion network is Selu, and the activation function of the second convolution module is sigmoid.

9. The method for generating high dynamic range image based on multi-branch codec neural network of claim 1, wherein in step S3, the loss function of the multi-branch codec neural network model is trained

The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

a mask for the bright area is shown,