CN113132655A - HDR video synthesis method based on deep learning - Google Patents

HDR video synthesis method based on deep learning Download PDF

Info

Publication number
CN113132655A
CN113132655A CN202110252970.0A CN202110252970A CN113132655A CN 113132655 A CN113132655 A CN 113132655A CN 202110252970 A CN202110252970 A CN 202110252970A CN 113132655 A CN113132655 A CN 113132655A
Authority
CN
China
Prior art keywords
layer
image
hdr
format
hdr video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110252970.0A
Other languages
Chinese (zh)
Inventor
侯向辉
蔡泽永
李昕虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110252970.0A priority Critical patent/CN113132655A/en
Publication of CN113132655A publication Critical patent/CN113132655A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention relates to a HDR video synthesis method based on deep learning, which combines a coding and decoding structure with a Unet structure, uses Resnet to further optimize in the middle of a network structure, abstracts the characteristics of various layers of images and finally obtains HDR images. The method has great significance for generating high-quality HDR video by using the LDR video with multiple exposures recorded by a common camera.

Description

HDR video synthesis method based on deep learning
Technical Field
The invention relates to the technical field of video synthesis, in particular to a HDR video synthesis method based on deep learning.
Background
Most digital image and video material currently available captures only a small portion of the visual information visible to the human eye and is not of sufficient quality for reproduction by next generation display devices. The limiting factors are the limited color gamut and the limitation of the dynamic range (contrast) captured by the camera and stored by most image and video formats. Conventional low contrast range and limited gamut imaging (LDR imaging) is limited to only three 8-bit integer color channels and does not provide the precision required by recent developments in image capture, processing, storage and display technology.
High Dynamic Range (HDR) techniques have emerged to increase the acceptable luminance range of images. After increasing the dynamic range of a picture, much information that would otherwise be impossible to display due to being too dark or too bright can be displayed on the image. Compared with HDR picture processing techniques, HDR video processing techniques are more difficult. Unlike the processing of a single HDR picture, HDR video needs to achieve consistency between frames on the basis of single picture processing. HDR video processing techniques are of less interest than HDR picture processing techniques due to their higher difficulty. However, HDR video processing technology has many advantages, including reducing the cost and hardware burden of current mobile phones and improving video quality, so the development of HDR video processing technology is a necessary and tedious task.
In order to better and faster solve the problem of HDR video processing, experts at home and abroad develop a plurality of algorithms, but the effect is not very ideal. The first HDR video reconstruction algorithm, specifically designed for alternating exposure sequences, is proposed as one that uses optical flow to align adjacent frames with reference frames. They then use a weighting strategy to merge adjacent frames to avoid ghosting. However, in the case of large-scale motion in the image, their method usually introduces an optical flow aerial image in the final result. As another example, Mangiat and Gibson improved the above method using a block-based motion estimation method and optimization step, however, their methods still show occluded motion artifacts in case of large-scale motion. And Kalantari et al propose patch-based optimization to systematically synthesize missing exposures for each frame, however it takes a long time to generate an HDR frame and ghost artifacts or unstable and unnatural motion may be generated. Finally, some recent methods propose HDR video generation as a maximum a posteriori estimation problem, but this method is time consuming, and about 2 hours is not enough to generate a frame with a resolution of 1280 × 720. In addition, their method produces results that are noisy, with ghosting and discoloration occurring in complex cases.
Disclosure of Invention
The present invention combines the encoding and decoding structure with the Unet structure (the Unet network architecture is a network structure specially for processing the image segmentation problem), further optimizes the network structure by using the Resnet (the Resnet is a network composed of residual blocks, and the residual blocks are basically a hierarchical structure formed by adding a shortcut connection (shortcut-connection) at the beginning and the end of several continuous layers), abstracts the characteristics of various layers of the image, and finally obtains the HDR image. The method has great significance for generating high-quality HDR video by using the LDR video with multiple exposures recorded by a common camera.
The invention achieves the aim through the following technical scheme: a HDR video synthesis method based on deep learning comprises the following steps:
(1) selecting a video picture data set of the LiU HDRv hierarchy, and finding Astronauts, bridge and bridge _2 as a training data set and a testing data set; converting the image format into an LDR (low density digital image) by utilizing Luminince-hdr-2.5.1;
(2) segmenting the data set picture to obtain a plurality of small pictures to increase the training data volume; converting the file into tfrecrd and storing the tfrecrd in a save _ train folder; extracting continuous three-frame LDR images from the data set each time, enabling the images to be more consistent with pictures shot in real life by using segmentation, standardization, overturning and deformation operations, and storing the pictures into tfrecrd;
(3) reading tfrecord to obtain a plurality of continuous LDR frames (I)1,I2,I3Throwing into a neural network for training; the neural network will mediate HDR (I) of this framer) Taking the image as a reference frame, and calculating a final loss function; obtaining an HDR frame with comprehensive information through neural network learning; thereby enabling the compositing of HDR video.
Preferably, the HDR video synthesizing method based on deep learning further includes:
(4) evaluating the synthesis quality of the HDR video by adopting the peak signal-to-noise ratio as an evaluation standard; the peak signal-to-noise ratio ranges from 0,100, and the higher the PSNR value is, the better the generated picture quality is, i.e. the capability of the neural network to reconstruct the HDR picture is high.
Preferably, the step (2) is specifically: converting the small graph from BGR to LDR in RGB format; randomly turning the image by 90 degrees, 180 degrees or 270 degrees, and randomly mirroring and reversing; cutting the image into 256 × 256 format, packaging 3 LDR images and a corresponding HDR image in a format of batchsize equal to 20, and converting the LDR images and the HDR image into tfrecrd for storage; wherein Tfrecord is a format for storing a series of binary files.
Preferably, the neural network design in step (3) is specifically:
(I) the coding layer adopts a network structure of three convolution levels; each level is a convolution layer and a pooling layer for abstracting the characteristics of three levels; the convolutional layer uses a kernel with the length and the width of 5, the step length is equal to 2, and the padding mode is VALID; in order to reduce the length and width of the tensor by half every time of convolution, a part of mirror inversion is added on the upper side, the lower side, the left side and the right side of the tensor respectively, so that the length and width of the generated matrix are exactly half of the original length and width; the number of convolution kernels of the first layer is 64, the number of convolution kernels of the second layer is 128, and the number of convolution kernels of the third layer is 256; the pooling layer adopts a linear rectification function;
(II) merging layers, wherein the three images become three tensors with the same format after passing through three encoders respectively; fusing the three tensors in the third dimension, and enabling the three tensors to pass through a convolution level, wherein the kernel parameter is the same as the Encoder, but the number of the kernel parameter is 512; then, inputting the obtained tensor into a residual error network (Resnet); wherein the residual error network has nine blocks; each residual block consists of two continuous convolution layers and a normalization layer, and each residual block ensures that the input format and the output format are the same;
(III) a decoding layer, wherein the decoding layer uses a four-layer network structure corresponding to a merging layer and a coding layer, each layer is merged on a third dimension with a layer with the same depth in front of a residual error network, and then an deconvolution layer and a normalization layer are introduced; the length and width of each layer of output tensor are 2 times of the input, the number of channels is half of the input, except the last layer; the number of channels in the last layer is equal to 3.
Preferably, the loss function in step (3) is a non-negative real-valued function, and the loss function specifically adopted is L2 loss; the formula is as follows:
L2=|f(x)-Y|2
L′2=2f′(x)(f(x)-Y)
where Y is the true value and f (x) is the predicted value.
Preferably, the step (3) is specifically as follows:
(3.1) reading tfrecrd and inputting the tfrecrd into a neural network, respectively reading three different variables by using three encoders, converting the three variables by using a three-layer convolution layer and a one-layer normalization layer, and inputting the variables into a merging layer; inputting the data into a decoding layer through further conversion of Resnet; obtaining the tensor of the HDR image finally through a decoding layer formed by deconvolution and combination operation;
(3.2) adopting an Adam optimizer, calculating the L2loss of the generated image and the reference frame, and reversely propagating to update the weight;
(3.3) tone mapping output, using a global tone mapping technique to map the HDR image into an LDR image, stored into a file, in the format png; after the tensor is finally obtained, because the image needs to be visualized, the HDR image is converted into an LDR image; converting the output numerical range into 0-255 by utilizing tone mapping output; wherein the tone mapping function is:
Figure BDA0002966786080000051
where μ is equal to 5000, xi,jIs the value of the original tensor; the base of Log is e.
Preferably, the peak SNR in step (4) is an objective standard for evaluating an image, which is a logarithmic value of a mean square error between an original image and a processed image relative to (2^ n-1) ^2, i.e. a square of a maximum value of a signal, where n is a number of bits per sample and has a unit of dB; the specific calculation formula is as follows:
Figure BDA0002966786080000052
Figure BDA0002966786080000053
where MSE represents the mean square error of the current image X and the reference image Y, H, W being the height and width of the images, respectively; MAXIIs the maximum value representing the color of the image point, and if each sample point is represented by 8 bits, the value is 255.
The invention has the beneficial effects that: the method can realize the HDR video synthesis network based on the coding and decoding structure to generate the high-quality HDR video, and has great significance for generating the high-quality HDR video by the multi-exposure LDR video.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of the overall architecture of the network of the present invention;
FIG. 3 is a schematic diagram of an overview of the coding layer of the present invention;
FIG. 4 is a schematic diagram of the convolution hierarchy of the present invention;
FIG. 5 is a schematic diagram of the overall architecture of the merging layer of the present invention;
FIG. 6 is a detailed design diagram of the residual block of the present invention;
FIG. 7 is a block diagram of the decoding layer architecture of the present invention.
Detailed Description
The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:
example (b): as shown in fig. 1 and fig. 2, a method for synthesizing an HDR video based on deep learning specifically includes the following steps:
(1) selecting a video picture data set of the LiU HDRv hierarchy, and finding Astronauts, bridge and bridge _2 as a training data set and a testing data set; luminence-hdr-2.5.1 was used to convert the image format to LDR.
(2) And segmenting the data set picture to obtain thirty thousand small pictures to increase the training data volume. And converts it to tfrecrd, which is stored in the save _ train folder. Then, three continuous frames of LDR images are extracted from the data set each time, and the images are more consistent with the pictures shot in real life by using the operations of segmentation, standardization, turnover, deformation and the like and are stored in tfrecrd. This step may increase the robustness of the procedure, preventing overfitting. The method specifically comprises the following steps: firstly, converting the BGR into an LDR in an RGB format; then the image is randomly turned over by 90 degrees, 180 degrees or 270 degrees and is randomly mirror-inverted so as to reduce the dependence of the network on the image direction; and then cut into 256 by 256 format. And finally packaging the 3 LDR images and the corresponding HDR image in a form that the batch size is equal to 20, and converting the batch size into tfrecord for storage. Tfrecord is a format for storing a series of binary files, and the binary data occupies less space on a disk and requires less time for copying, so that it can speed up reading of file information. Therefore, tfrecrd is used in the system to store the training data, and since the original file is a BGR image, it is first converted into RGB format.
(3) Reading tfrecord to obtain three continuous LDR frames I1,I2,I3And after the algorithm is started, putting the algorithm into a network for training. The neural network will mediate HDR (I) of this framer) The image is used as a reference frame, and a final Loss function is calculated. After learning of the neural network, a HDR frame with comprehensive information can be obtained. Where the loss function is the core part of the empirical risk function and is also an important part of the structural risk function. The degree of inconsistency between the predicted value f (x) and the true value Y for the estimation model. In other words, it can be interpreted as the difference between the predicted value and the true value Y. It is a non-negative real-valued function. The smaller the loss function, the better the robustness of the model. The training results are approximately similar to the real situation. The loss function used by the present invention is L2 loss. The formula is as follows:
L2=|f(x)-Y|2
L′2=2f′(x)(f(x)-Y)
the step (3) specifically comprises the following steps:
(3.1) reading tfrecrd, inputting the tfrecrd into a network, reading three different variables by using three encoders respectively, converting the three variables by using three convolution layers and one normalization layer, and inputting the variables into a merging layer; inputting the data into a decoding layer through further conversion of Resnet; and finally obtaining the tensor of the HDR image after passing through a decoding layer consisting of deconvolution and combination operations.
(3.2) using an Adam optimizer, calculate the L2loss of the generated image and reference frame and propagate backward to update the weights. Adam is used for optimizing the model, the Adam is an existing excellent optimization function, the Adagarad is good at processing sparse gradients, RMSprop is good at processing non-stationary targets, the requirement on memory is low, and the time required for convergence is shortened.
(3.3) tone mapping output, mapping the HDR image to an LDR image using global tone mapping technique, storing into a file, in format png. After the tensor is finalized, the HDR image is converted to an LDR image since the image needs to be visualized. A tone mapping output is used. The numerical range of the output is converted to 0-255.
The tone mapping function is:
Figure BDA0002966786080000081
where μ is equal to 5000, xi,jIs the value of the original tensor. The base of Log is e.
The neural network design specifically comprises the following steps:
(i) and the coding layer adopts a network structure of three convolution levels, as shown in figure 3. Each level is a convolutional layer plus a pooling layer to abstract the features of the three levels. The convolutional layer uses a kernel with a length and width of 5, the step size is equal to 2, and the padding mode is VALID. In order to reduce the length and width of the tensor to half after each convolution, a part of mirror inversion is added on the upper, lower, left and right sides of the tensor, so that the length and width of the generated matrix are exactly half of the original length and width. The number of convolution kernels for the first layer is 64, the second layer is 128, and the third layer is 256. The pooling layer uses a linear rectification function. The convolution hierarchy is shown in fig. 4.
(ii) In the merging layer, the three images pass through three encoders respectively, and become three tensors with the same format. The algorithm fuses the three tensors in the third dimension. And then passed through a convolution level with kernel parameters identical to Encoder but with a number of 512. The resulting tensor then begins to be input into the residual network (Resnet). The residual network has nine blocks. Each residual block is composed of two successive convolutional layers and a normalization layer. Each residual block ensures that the input and output formats are the same. The overall architecture of the merging layer is shown in fig. 5. The detail design of the residual block is shown in fig. 6.
(iii) And a decoding layer using a four-layer network structure corresponding to the merging layer and the encoding layer. Each layer is merged in the third dimension with the same depth levels before the residual network, and then the deconvolution layer and the normalization layer are introduced. The length and width of each layer of output tensor are both 2 times of the input, and the number of channels is half of the input except the last layer. The number of channels in the last layer is equal to 3. So that the tensor format at the final output is identical to that at the input. The overall architecture of the decoding layer is shown in fig. 7.
(4) In the system, a peak signal to noise ratio (PSNR) is used as an evaluation standard, the PSNR ranges from [0,100], and the higher the PSNR value is, the better the generated picture quality is represented, namely the capability of reconstructing an HDR picture by a neural network is high. The method specifically comprises the following steps: PSNR is an abbreviation for "Peak Signal to Noise Ratio," i.e., Peak Signal-to-Noise Ratio, and is an objective criterion for evaluating images. Generally, after image compression, the output image is different from the original image to some extent. In order to measure the quality of processed images, the PSNR value is usually used to measure whether a certain processing procedure is satisfactory. It is the log of the mean square error between the original image and the processed image relative to (2^ n-1) ^2 (the square of the maximum value of the signal, n is the number of bits per sample), and its unit is dB. The specific calculation formula is as follows:
Figure BDA0002966786080000091
Figure BDA0002966786080000092
in summary, the present invention is significant for generating high-quality HDR video from multi-exposure LDR video.
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A HDR video synthesis method based on deep learning is characterized by comprising the following steps:
(1) selecting a video picture data set of the LiU HDRv hierarchy, and finding Astronauts, bridge and bridge _2 as a training data set and a testing data set; converting the image format into an LDR (low density digital image) by utilizing Luminince-hdr-2.5.1;
(2) segmenting the data set picture to obtain a plurality of small pictures to increase the training data volume; converting the file into tfrecrd and storing the tfrecrd in a save _ train folder; extracting continuous three-frame LDR images from the data set each time, enabling the images to be more consistent with pictures shot in real life by using segmentation, standardization, overturning and deformation operations, and storing the pictures into tfrecrd;
(3) reading tfrecord to obtain a plurality of continuous LDR frames (I)1,I2,I3Throwing into a neural network for training; the neural network will mediate HDR (I) of this framer) Taking the image as a reference frame, and calculating a final loss function; obtaining an HDR frame with comprehensive information through neural network learning; thereby enabling the compositing of HDR video.
2. The method of claim 1, wherein the HDR video synthesis method based on deep learning is as follows: the HDR video synthesis method based on deep learning further comprises the following steps:
(4) evaluating the synthesis quality of the HDR video by adopting the peak signal-to-noise ratio as an evaluation standard; the peak signal-to-noise ratio ranges from 0,100, and the higher the PSNR value is, the better the generated picture quality is, i.e. the capability of the neural network to reconstruct the HDR picture is high.
3. The method of claim 1, wherein the HDR video synthesis method based on deep learning is as follows: the step (2) is specifically to convert the small image from BGR to LDR in RGB format; randomly turning the image by 90 degrees, 180 degrees or 270 degrees, and randomly mirroring and reversing; cutting the image into 256 × 256 format, packaging 3 LDR images and a corresponding HDR image in a format of batchsize equal to 20, and converting the LDR images and the HDR image into tfrecrd for storage; wherein Tfrecord is a format for storing a series of binary files.
4. The method of claim 1, wherein the HDR video synthesis method based on deep learning is as follows: the neural network design in the step (3) is specifically as follows:
(I) the coding layer adopts a network structure of three convolution levels; each level is a convolution layer and a pooling layer for abstracting the characteristics of three levels; the convolutional layer uses a kernel with the length and the width of 5, the step length is equal to 2, and the padding mode is VALID; in order to reduce the length and width of the tensor by half every time of convolution, a part of mirror inversion is added on the upper side, the lower side, the left side and the right side of the tensor respectively, so that the length and width of the generated matrix are exactly half of the original length and width; the number of convolution kernels of the first layer is 64, the number of convolution kernels of the second layer is 128, and the number of convolution kernels of the third layer is 256; the pooling layer adopts a linear rectification function;
(II) merging layers, wherein the three images become three tensors with the same format after passing through three encoders respectively; fusing the three tensors in the third dimension, and enabling the three tensors to pass through a convolution level, wherein the kernel parameter is the same as the Encoder, but the number of the kernel parameter is 512; then, inputting the obtained tensor into a residual error network (Resnet); wherein the residual error network has nine blocks; each residual block consists of two continuous convolution layers and a normalization layer, and each residual block ensures that the input format and the output format are the same;
(III) a decoding layer, wherein the decoding layer uses a four-layer network structure corresponding to a merging layer and a coding layer, each layer is merged on a third dimension with a layer with the same depth in front of a residual error network, and then an deconvolution layer and a normalization layer are introduced; the length and width of each layer of output tensor are 2 times of the input, the number of channels is half of the input, except the last layer; the number of channels in the last layer is equal to 3.
5. The HDR video synthesis method based on deep learning as claimed in claim 4, wherein: the loss function in the step (3) is a non-negative real value function, and the specifically adopted loss function is L2 loss; the formula is as follows:
L2=|f(x)-Y|2
L′2=2f′(x)(f(x)-Y)
where Y is the true value and f (x) is the predicted value.
6. The method of claim 5, wherein the HDR video synthesis method based on deep learning is characterized in that: the step (3) is specifically as follows:
(3.1) reading tfrecrd and inputting the tfrecrd into a neural network, respectively reading three different variables by using three encoders, converting the three variables by using a three-layer convolution layer and a one-layer normalization layer, and inputting the variables into a merging layer; inputting the data into a decoding layer through further conversion of Resnet; obtaining the tensor of the HDR image finally through a decoding layer formed by deconvolution and combination operation;
(3.2) adopting an Adam optimizer, calculating the L2loss of the generated image and the reference frame, and reversely propagating to update the weight;
(3.3) tone mapping output, using a global tone mapping technique to map the HDR image into an LDR image, stored into a file, in the format png; after the tensor is finally obtained, because the image needs to be visualized, the HDR image is converted into an LDR image; converting the output numerical range into 0-255 by utilizing tone mapping output; wherein the tone mapping function is:
Figure FDA0002966786070000031
where μ is equal to 5000, xi,jIs the value of the original tensor; the base of Log is e.
7. The method of claim 2, wherein the HDR video synthesis method based on deep learning is as follows: the peak signal-to-noise ratio in the step (4) is an objective standard for evaluating the image, and is a logarithmic value of a mean square error between the original image and the processed image relative to (2^ n-1) ^2, namely a square of a maximum value of a signal, wherein n is a bit number of each sampling value and a unit is dB; the specific calculation formula is as follows:
Figure FDA0002966786070000041
Figure FDA0002966786070000042
where MSE represents the mean square error of the current image X and the reference image Y, H, W being the height and width of the images, respectively; MAXIIs the maximum value representing the color of the image point, and if each sample point is represented by 8 bits, the value is 255.
CN202110252970.0A 2021-03-09 2021-03-09 HDR video synthesis method based on deep learning Pending CN113132655A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110252970.0A CN113132655A (en) 2021-03-09 2021-03-09 HDR video synthesis method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110252970.0A CN113132655A (en) 2021-03-09 2021-03-09 HDR video synthesis method based on deep learning

Publications (1)

Publication Number Publication Date
CN113132655A true CN113132655A (en) 2021-07-16

Family

ID=76772801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110252970.0A Pending CN113132655A (en) 2021-03-09 2021-03-09 HDR video synthesis method based on deep learning

Country Status (1)

Country Link
CN (1) CN113132655A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2632162A1 (en) * 2012-02-27 2013-08-28 Thomson Licensing Method and device for encoding an HDR video image, method and device for decoding an HDR video image
CN111242883A (en) * 2020-01-10 2020-06-05 西安电子科技大学 Dynamic scene HDR reconstruction method based on deep learning
CN111709896A (en) * 2020-06-18 2020-09-25 三星电子(中国)研发中心 Method and equipment for mapping LDR video into HDR video
CN111835983A (en) * 2020-07-23 2020-10-27 福州大学 Multi-exposure-image high-dynamic-range imaging method and system based on generation countermeasure network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2632162A1 (en) * 2012-02-27 2013-08-28 Thomson Licensing Method and device for encoding an HDR video image, method and device for decoding an HDR video image
CN111242883A (en) * 2020-01-10 2020-06-05 西安电子科技大学 Dynamic scene HDR reconstruction method based on deep learning
CN111709896A (en) * 2020-06-18 2020-09-25 三星电子(中国)研发中心 Method and equipment for mapping LDR video into HDR video
CN111835983A (en) * 2020-07-23 2020-10-27 福州大学 Multi-exposure-image high-dynamic-range imaging method and system based on generation countermeasure network

Similar Documents

Publication Publication Date Title
CN109064507B (en) Multi-motion-stream deep convolution network model method for video prediction
CN111192200A (en) Image super-resolution reconstruction method based on fusion attention mechanism residual error network
CN111028308B (en) Steganography and reading method for information in image
CN111709896B (en) Method and equipment for mapping LDR video into HDR video
CN110717868B (en) Video high dynamic range inverse tone mapping model construction and mapping method and device
CN112927202A (en) Method and system for detecting Deepfake video with combination of multiple time domains and multiple characteristics
CN110675321A (en) Super-resolution image reconstruction method based on progressive depth residual error network
CN112288632B (en) Single image super-resolution method and system based on simplified ESRGAN
CN111105376B (en) Single-exposure high-dynamic-range image generation method based on double-branch neural network
CN113096029A (en) High dynamic range image generation method based on multi-branch codec neural network
CN110225260B (en) Three-dimensional high dynamic range imaging method based on generation countermeasure network
CN112381716B (en) Image enhancement method based on generation type countermeasure network
CN114170286B (en) Monocular depth estimation method based on unsupervised deep learning
JP2011522496A (en) Image coding method by texture synthesis
CN115170915A (en) Infrared and visible light image fusion method based on end-to-end attention network
CN116757955A (en) Multi-fusion comparison network based on full-dimensional dynamic convolution
CN115984117A (en) Variational self-coding image super-resolution method and system based on channel attention
CN115222592A (en) Underwater image enhancement method based on super-resolution network and U-Net network and training method of network model
CN113132655A (en) HDR video synthesis method based on deep learning
CN115880158A (en) Blind image super-resolution reconstruction method and system based on variational self-coding
CN115018733A (en) High dynamic range imaging and ghost image removing method based on generation countermeasure network
CN114820354A (en) Traditional image compression and enhancement method based on reversible tone mapping network
CN115587934A (en) Image super-resolution reconstruction and defogging method and system based on loss classification and double-branch network
CN115841523A (en) Double-branch HDR video reconstruction algorithm based on Raw domain
CN114663315A (en) Image bit enhancement method and device for generating countermeasure network based on semantic fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210716

WD01 Invention patent application deemed withdrawn after publication