CN111372006B

CN111372006B - High dynamic range imaging method and system for mobile terminal

Info

Publication number: CN111372006B
Application number: CN202010139138.5A
Authority: CN
Inventors: 吴国滔; 张伟; 宋然; 张倩; 顾建军
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-03-03
Filing date: 2020-03-03
Publication date: 2021-05-07
Anticipated expiration: 2040-03-03
Also published as: CN111372006A

Abstract

The invention provides a high dynamic range imaging method and a system facing a mobile terminal, wherein a high dynamic range image is processed to generate a corresponding low dynamic range image, the low dynamic range image is input into a neural network model based on an encoder-decoder structure, the input of the neural network model and the corresponding high dynamic range image solve loss in a logarithmic domain, the neural network model is optimized by using a gradient descent method, the optimized neural network model is transplanted to the mobile terminal and is regarded as a high-order mapping function, and end-to-end mapping from the low dynamic range image to the high dynamic range image is completed. The practicability of the single-frame high dynamic range imaging method is effectively improved.

Description

High dynamic range imaging method and system for mobile terminal

Technical Field

The disclosure belongs to the technical field of imaging, and relates to a high dynamic range imaging method and system for a mobile terminal.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

High dynamic range imaging is an imaging technique that can capture true luminance information in a recorded scene. Dynamic range refers to the ratio of maximum luminance to minimum luminance in a scene, and real scenes generally have a wide dynamic range. E.g. dynamic range of natural environment up to 10⁸1, the dynamic range that the most common shooting equipment in people's life, namely the smart phone, can capture is only 10²1 this results in the captured image not being able to record the full dynamic range of the real scene, so that the bright and dark areas of the scene appear as overexposed and darkUnderexposure, greatly reducing the visibility of the image and post-processing space.

At present, the algorithm research for high dynamic range imaging at home and abroad mainly comprises a multi-frame synthesis algorithm and a single-frame reconstruction algorithm. The single frame reconstruction algorithm utilizes a mapping function to expand the dynamic range of a single frame image. The method only needs to shoot a single-frame image, and has the advantages of difficult occurrence of artifacts, convenient use and the like. Obviously, the mapping function is a key factor of the performance of the single-frame high dynamic range imaging algorithm, and therefore, finding a mapping function with better performance becomes a hot point for research.

The deep neural network has good function model fitting capability, and can learn complex function mapping relation from complex image input in a supervision training mode, so that the deep neural network becomes an effective means for improving the performance of a single-frame reconstruction algorithm. However, conventional deep neural networks tend to be large in scale, are computationally demanding, and are difficult to deploy on mobile devices with limited computational resources.

Disclosure of Invention

In order to solve the above problems, the present disclosure provides a mobile-end-oriented high dynamic range imaging method and system, which effectively simplify the scale of a network, enable a network model to run on a mobile device, and effectively improve the imaging effect and the practicability.

According to some embodiments, the following technical scheme is adopted in the disclosure:

a high dynamic range imaging method facing a mobile terminal comprises the following steps:

processing the High Dynamic Range (HDR) image to generate a corresponding Low Dynamic Range (LDR) image, inputting the low dynamic range image into a neural network model based on an encoder-decoder structure, solving the loss of the input of the neural network model and the corresponding high dynamic range image in a logarithmic domain, optimizing the neural network model by using a gradient descent method, transplanting the optimized neural network model to a mobile terminal, regarding the neural network model as a high-order mapping function, and completing the end-to-end mapping from the low dynamic range image to the high dynamic range image.

As an alternative implementationAccording to the scheme, in the process of processing the high dynamic range image, the high dynamic range image is obtained, and a clustering algorithm is operated in a camera response curve set to select most representative multiple approximate satisfaction y-x²And response curves of the functional relation, and applying the response curves to the high dynamic range image and simultaneously clipping saturated pixel values to obtain the low dynamic range image.

As an alternative embodiment, the method of acquiring a high dynamic range image includes downloading the disclosed high dynamic range image or a high dynamic range image photographed with a camera or synthesized by bracket exposure.

As an alternative embodiment, before applying the camera response curve, a scaling factor is determined, the high dynamic range image is divided by the scaling factor to saturate a part of the pixel values, at this time, an obtained camera response curve is randomly selected and applied to the processed high dynamic range image, and the saturated pixels in the result are clipped, so that the corresponding low dynamic range image can be obtained.

As an alternative embodiment, the neural network model comprises two parts of an encoder and a decoder, an input image enters the encoder, the encoder extracts features of different levels of the input image from a low dimension to a high dimension along with the increase of the network depth, the features of the same level are used as the next level input of the encoder, and the same level is connected to the decoder through a short chain to store the position information of the features, so that the image blurring is avoided.

As a further embodiment, the encoder is configured to implement feature extraction and downsampling, in the feature extraction, the 3x3DW Conv extracted features passing through step size 1 are input, then the features between channels are interacted through 1x1 Conv, and finally residual error linkage is adopted, so that the gradient disappearance phenomenon is relieved, and the multiplexing of the learned features is strengthened; in the down-sampling, the depth of the feature is first boosted by 1x1 Conv, then the receptive field is expanded by skipping some pixels by 3x3DW Conv with step size of 2, and finally the depth of the feature is reduced by 1x1 Conv and is directly output to the next layer and the corresponding part of the decoder.

As an alternative embodiment, the decoder is configured to upsample the input feature map by a nearest neighbor domain interpolation method, then perform feature extraction by DW Conv with a step size of 1, and connect the result with the upsampled result in a depth dimension, and finally learn and combine feature maps with a depth of n by using 1x1 Conv to obtain feature maps with a depth of 2 × n, thereby avoiding artifacts occurring at edge portions of the image.

As an alternative embodiment, the neural network model only performs dynamic range extension on the overexposed part of the low dynamic range image in the process from the low dynamic range image to the high dynamic range image.

As an alternative embodiment, the neural network model is optimized using a gradient descent method with losses in the logarithmic domain.

A mobile-end oriented high dynamic range imaging system, comprising:

the low dynamic range image acquisition module is used for acquiring a low dynamic range image close to right exposure and outputting the low dynamic range image to the neural network forward reasoning module;

a neural network forward reasoning module that provides: the interface of the input data and the required input of the neural network and the interface of the neural network forward reasoning are output to the image display module;

and the image display module is used for carrying out tone mapping on the output so that the output can be displayed.

A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute a mobile-end oriented high dynamic range imaging method.

A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium stores instructions adapted to be loaded by a processor and to perform a method of mobile-end oriented high dynamic range imaging.

Compared with the prior art, the beneficial effect of this disclosure is:

the neural network model comprises an encoder and a decoder, wherein the encoder is used for extracting features of different scales from an input image, the decoder is used for reconstructing the image by using the obtained features, and the part with the equivalent feature size between the encoder and the decoder adopts short links, so that the training speed and the training precision can be improved; and simultaneously, the two parts of the encoder and the decoder are respectively optimized in light weight, so that the neural network can run on mobile terminal equipment, the effect which is comparable to that of a large neural network is achieved, and the practicability of the single-frame high dynamic range imaging method is effectively improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

FIG. 1 is a flow chart of the present embodiment;

FIG. 2 is a general structure of a lightweight neural network model;

FIGS. 3(a) and 3(b) are encoder units of a network model;

fig. 4 shows a decoder unit of the network model.

The specific implementation mode is as follows:

the present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

One or more embodiments provide a mobile-end single-frame high dynamic range imaging method for a lightweight neural network, which includes generating a locally overexposed Low Dynamic Range (LDR) image from a captured High Dynamic Range (HDR) image through a known Camera Response Function (CRF), and training the neural network by using the LDR-HDR image pair as an input and a reference true value (ground true) of the neural network, respectively. After the training is completed, when an LDR image with right exposure (local overexposure) is input, an HDR image with an extended dynamic range and a restored overexposed region can be obtained.

Referring to fig. 1, the single-frame high dynamic range imaging method includes the following steps:

s101, collecting HDR images, running a clustering algorithm in a camera response curve set to select five most representative response curves, and applying the response curves to the HDR images and simultaneously editing saturated pixel values to obtain LDR images.

Specifically, the internet is searched for and downloaded public HDR images, which can also be captured by a camera with high latitude or acquired by exposure synthesis, and typical features of the images are HDR files stored in 32-bit floating point numbers; then selecting a typical camera response curve, wherein the method is to run a k-means clustering algorithm in a public camera response curve data set, and finally selecting five most representative approximate curves meeting the condition that y is x²A curve of the functional relation is reserved; in order to ensure the effect of the method and speed up the training convergence speed, the LDR image generated from the HDR image should not contain excessive overexposed regions, and therefore, before applying the camera response curve, a scaling coefficient needs to be determined, and the HDR image is divided by the coefficient to saturate some pixel values.

In this embodiment, 3372 HDR images with different resolutions are collected, counts of different pixel values are counted through an image histogram, when the total count reaches 85% to 90% of total pixels of the image, the current pixel value is recorded as the above-mentioned proportional coefficient, the HDR image is divided by the coefficient, so that the total pixel value between 0 and 1 can account for 85% to 90% of the total pixel value of the image, at this time, a camera response curve obtained as above is randomly selected and applied to the processed HDR image, and saturated pixels in the result are clipped, so that a corresponding LDR image can be obtained.

S102, designing a training lightweight neural network, wherein the network adopts an encoder-decoder structure, and the network can be regarded as a high-order mapping function of LDR to HDR.

Referring to fig. 2, the lightweight neural network includes two parts, an encoder and a decoder. The input image enters an encoder, and with the improvement of the network depth, the encoder extracts features of different levels of the input image from a low dimension to a high dimension, wherein the features of the same level have two uses: one is as the next layer input to the encoder and the other is by short links (skip connections) to the same level of the decoder to preserve the location information of the features to avoid image blur.

The encoder part mainly comprises two modules, which are respectively responsible for feature extraction under the same feature map size, please refer to fig. 3(a), and for down-sampling, please refer to fig. 3 (b). In the feature extraction part, 3x3 channel-by-channel convolution (DW Conv) with the step length of 1 is input to extract features, then the features between channels are interacted through 1x1 Conv, and finally residual linking is adopted, so that the gradient disappearance phenomenon is relieved, and meanwhile, the multiplexing of the learned features is strengthened. In the down-sampling part, the depth of the feature is firstly boosted by 1x1 Conv, then the receptive field is expanded by skipping some pixels by 3x3DW Conv with the step size of 2, compared with the mode of simultaneously using pooling with the step size of 1, the calculation amount of the mode is only 1/4, and finally the depth of the feature is reduced by 1x1 Conv and directly output to the next layer and the corresponding part of the decoder.

In this embodiment, the encoder portion is a modified MobileNetV2 network, compared to the original network: the encoder removes all batch normalization layers, thereby avoiding the negative influence of the characteristic re-standardization on the network; the encoder uses Relu activation function instead of Relu6 activation function, so that a larger characteristic range is allowed, and the quality of generated images is improved; the encoder uses DW Conv instead of traditional Conv on the first convolution layer to further reduce the calculation force, and simultaneously one convolution layer with increased full resolution is used for better storing the high-frequency information of the image, so that the problem of fuzzy edge of the generated image is avoided; the encoder removes the full link layer in the MobileNetV2 and replaces the full link layer with a newly created convolutional layer, which is responsible for the feature transition between the encoder and the decoder.

The decoder part is composed of a single module, please refer to fig. 4, the input feature map is first up-sampled by nearest neighbor interpolation (nearest neighbor interpolation), then feature extraction is completed by DW Conv with step size of 1, the result is connected with the up-sampled result in the depth dimension (contentate), finally, feature maps with 2 × n depth are learned and combined with feature maps with depth of n by using 1 × 1 Conv, thereby avoiding artifacts (artifacts) at the edge part of the image.

In this embodiment, the conventional way of convolution after interpolation needs to use a special padding operation in convolution, specifically, it needs to use a reflection padding (or "minor padding") to avoid the occurrence of artifacts in the edge portion of the image, however, in the mobile end scenario, the reflection padding needs a longer time than the conventional zero padding, and an intuitive explanation is that the reflection padding needs to read the values inside the edge of the feature map first and then fill the values outside the edge, and the zero padding only needs to fill the zero value directly outside the edge. In order to solve the problem of edge artifacts, and simultaneously use a zero padding operation which takes a shorter time, the present embodiment first proposes the decoder module shown in fig. 4, and its main idea is to solve the problem of edge artifacts by learning and combining the feature map without artifacts after upsampling and the feature map with artifacts after convolution by 3x3DW Conv through 1x1 Conv.

In this embodiment, in order to reduce the amount of calculation, the light weight neural network adopts all the common convolutions of 3 × 3DW Conv and the convolution kernel size of 1, the input of the network is the LDR image in S101, and the output of the network is the HDR image. Since going from LDR directly to HDR is a typical ill-conditioned problem, the network will be hard to converge, so the present network is only responsible for dynamic range extension of overexposed parts in LDR images, which can be achieved by using a mask whose mathematical expression is:

in the formula, mask_iMask, alpha, representing the ith input image_iRepresenting the largest pixel value in rgb channels in the ith input image at each position, thd being a threshold when the input image is normalized to [0,1 ]]Time, pixel value in image>A pixel of 1-thd will be considered to be overexposed, and in this embodiment, thd is set to 0.05. The final processing results are expressed as follows:

final＝(1-mask_i)*LDR+mask_i*f(LDR)

wherein f (LDR) represents the result obtained after the LDR image is input into the lightweight neural network, and the result is mixed with the non-overexposed part of the original LDR image to obtain the final result.

S103, solving loss in a logarithmic domain, and optimizing the neural network by adopting a gradient descent method.

Specifically, the pixel value of the HDR image varies from zero to hundreds of thousands of pixels according to the ambient lighting conditions, and if the network output and the HDR image are directly lost, such as the commonly used L2 loss, the network output and the HDR image will have a very fluctuating result, and the network will oscillate when propagating in the reverse direction, so that the network is difficult to converge. To solve this problem, a strategy of loss in the logarithmic domain is adopted.

In this embodiment, the mathematical expression of the method is:

loss＝mask*|log(predict+ε)-log(gt+ε)|

in the formula, as described in S102, mask is output of the neural network when the LDR image is input, gt is the HDR image corresponding to the input LDR image, and epsilon is a minimum value, so as to avoid mathematical errors when the pixel value is zero.

S104, deploying the neural network on the mobile equipment, inputting an image close to right exposure, completing end-to-end LDR to HDR reconstruction, finally performing tone mapping on an output result, storing and displaying the result on the equipment.

Specifically, after the training is completed, a network weight and a topology map are obtained, and when the network weight and the topology map are deployed on the mobile terminal device, the weight and the topology map need to be converted into a format that can be resolved by the mobile terminal. Because the tolerance of the cmos of the mobile phone is certain, an image exposed to the right means that the visibility in a dark place is improved, while a bright place may be overexposed, and the overexposed part is repaired by the local neural network, and finally, a final HDR image is obtained through a mask.

In this embodiment, the final network weight is loaded at the desktop end, and the tfliteconverter.from _ session tool in the tenserflow is used to directly convert from the session to the format required by the mobile end, thereby avoiding tedious operations such as curing. In order to adapt to the habit of the user, the embodiment adopts a mode of shooting by an original camera program, allows the user to control exposure time and gain to enable an image to be exposed to the right, then integrates the mobile terminal single-frame high dynamic range imaging method in the embodiment into the app, loads the image in the app when in use, and stores and displays the image on the device after being processed by the method in the embodiment.

The resulting mobile-end high dynamic range imaging system, comprising:

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. A high dynamic range imaging method facing a mobile terminal is characterized in that: the method comprises the following steps:

processing a High Dynamic Range (HDR) image to generate a corresponding Low Dynamic Range (LDR) image, inputting the low dynamic range image into a neural network model based on an encoder-decoder structure, wherein a training lightweight neural network is designed, the network adopts the encoder-decoder structure, the network is regarded as a high-order mapping function of LDR to HDR, the input of the neural network model and the corresponding high dynamic range image are subjected to loss solving in a logarithmic domain, and the neural network model is optimized by using a gradient descent method, and the specific method comprises the following steps:

the method is realized through a mask, and the mathematical expression of the mask is as follows:

mask_imask, alpha, representing the ith input image_iRepresenting the largest pixel value in rgb channels in the ith input image at each position, thd being a threshold when the input image is normalized to [0,1 ]]Then, the pixel with pixel value 1-thd in the image will be considered as overexposure, thd is set to 0.05, and the final processing result is expressed as follows:

final＝(1-mask_i)*LDR+mask_i*f(LDR)

wherein f (LDR) represents a result obtained after an LDR image is input into the light weight neural network, and the result is mixed with the non-overexposed part of the original LDR image to obtain a final result;

loss＝mask*|log(predict+ε)-log(gt+ε)|

in the formula, the prediction is the output of a neural network when an LDR image is input, gt is an HDR image corresponding to the input LDR image, and epsilon is a minimum value, so that mathematical errors when a pixel value is zero are avoided;

and transplanting the optimized neural network model to a mobile terminal, and regarding the neural network model as a high-order mapping function to complete end-to-end mapping from a low dynamic range image to a high dynamic range image.

2. The mobile-end-oriented high dynamic range imaging method according to claim 1, wherein: in the process of processing the high dynamic range image, acquiring the high dynamic range image, and running a clustering algorithm in a camera response curve set to select most representative multiple approximate satisfaction y x²And response curves of the functional relation, and applying the response curves to the high dynamic range image and simultaneously clipping saturated pixel values to obtain the low dynamic range image.

3. The mobile-end-oriented high dynamic range imaging method according to claim 1, wherein: the method of acquiring the high dynamic range image includes downloading the disclosed high dynamic range image or a high dynamic range image photographed with a camera or synthesized by bracket exposure.

4. The mobile-end-oriented high dynamic range imaging method according to claim 1, wherein: before the camera response curve is applied, a proportionality coefficient is determined, the high dynamic range image is divided by the proportionality coefficient to enable partial pixel values to be saturated, at the moment, an obtained camera response curve is randomly selected to be applied to the processed high dynamic range image, saturated pixels in a result are clipped, and then a corresponding low dynamic range image can be obtained.

5. The mobile-end-oriented high dynamic range imaging method according to claim 1, wherein: the neural network model comprises two parts of an encoder and a decoder, an input image enters the encoder, the encoder extracts features of different levels of the input image from a low dimension to a high dimension along with the increase of network depth, the features of the same level are used as the next level input of the encoder, and the features are connected to the same level of the decoder through short links to store the position information of the features, so that the image blurring is avoided.

6. The mobile-end-oriented high dynamic range imaging method according to claim 5, wherein: the encoder is configured to realize feature extraction and down-sampling, and when the feature extraction is performed, 3x3DW Conv with the step length of 1 is input to extract features, then the features between 1x1 Conv interaction channels are passed, and finally residual error linkage is adopted to relieve the gradient disappearance phenomenon, and meanwhile, the multiplexing of the learned features is strengthened; during down-sampling, the depth of the feature is firstly boosted by 1x1 Conv, then the receptive field is expanded by skipping some pixels by 3x3DW Conv with the step length of 2, and finally the depth of the feature is reduced by 1x1 Conv and is directly output to the next layer and the corresponding part of a decoder;

or the like, or, alternatively,

the decoder is configured to upsample the input feature map by a nearest neighbor domain interpolation method, then complete feature extraction by the DW Conv with the step size of 1, connect the result with the upsampled result in a depth dimension, and finally learn and combine the feature map with the depth of n by using the feature map with the depth of 2 × n by using the 1x1 Conv, so as to avoid the occurrence of artifacts at the edge part of the image.

7. The mobile-end-oriented high dynamic range imaging method according to claim 1, wherein: and in the process from the low dynamic range image to the high dynamic range image, the neural network model only expands the dynamic range of the overexposed part of the low dynamic range image.

8. A mobile-end-oriented high dynamic range imaging method as claimed in claim 1, utilizing a mobile-end-oriented high dynamic range imaging system, characterized by: the method comprises the following steps:

9. A computer-readable storage medium characterized by: stored with instructions adapted to be loaded by a processor of a terminal device and to perform a mobile-end oriented high dynamic range imaging method according to any one of claims 1-7.

10. A terminal device is characterized in that: the system comprises a processor and a computer readable storage medium, wherein the processor is used for realizing instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform a mobile-end oriented high dynamic range imaging method according to any one of claims 1-7.