CN115115516B - Real world video super-resolution construction method based on Raw domain - Google Patents
Real world video super-resolution construction method based on Raw domain Download PDFInfo
- Publication number
- CN115115516B CN115115516B CN202210733861.5A CN202210733861A CN115115516B CN 115115516 B CN115115516 B CN 115115516B CN 202210733861 A CN202210733861 A CN 202210733861A CN 115115516 B CN115115516 B CN 115115516B
- Authority
- CN
- China
- Prior art keywords
- frame
- raw
- format
- resolution
- branch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title description 2
- 238000000034 method Methods 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 16
- 230000004927 fusion Effects 0.000 claims abstract description 15
- 238000012360 testing method Methods 0.000 claims abstract description 5
- 238000012937 correction Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 12
- 101100248200 Arabidopsis thaliana RGGB gene Proteins 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000013135 deep learning Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 230000003287 optical effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000003993 interaction Effects 0.000 claims description 5
- 230000009286 beneficial effect Effects 0.000 claims description 4
- 230000002123 temporal effect Effects 0.000 claims description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000011160 research Methods 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/35—Determination of transform parameters for the alignment of images, i.e. image registration using statistical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N9/00—Details of colour television systems
- H04N9/64—Circuits for processing colour signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Television Systems (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
The invention discloses a method for constructing real-world video super-resolution based on a Raw domain, and relates to the technical field of video signal processing. The method for constructing the real-world video super-resolution based on the Raw domain comprises the following steps: s1, establishing a real world Raw video super-resolution data set; s2, designing a real world Raw video super-resolution algorithm based on the S1; s3, training a model; s4, inputting the low-resolution Raw video sequence in the test set into the model to obtain a corresponding super-resolution input result; the invention constructs a first real-world VSR data set with three multiplying factors in Raw and sRGB domains, and provides a reference data set for training and evaluating a real original VSR method; the invention improves the real LR video super-resolution performance to a new height by utilizing the proposed combination pair Ji Jiaohu module, the time and channel fusion module.
Description
Technical Field
The invention belongs to the technical field of video signal processing, and relates to a method for constructing real-world video super-resolution based on a Raw domain.
Background
Capturing video with a short focus lens can expand the viewing angle by sacrificing resolution, while capturing with a long focus lens can increase resolution by sacrificing viewing angle; video super-resolution (VSR) is an efficient way to acquire wide-angle and high-resolution (HR) video; video super-resolution reconstructs high-resolution video from low-resolution (LR) inputs by exploring spatial and temporal correlations of the input sequences; in recent years, the development of video super-resolution has been shifted from traditional model-driven to deep learning-based approaches.
The performance of these deep learning based SR methods depends largely on the training dataset, considering that synthetic LR-HR datasets, such as DIV2K and REDS, cannot represent a degradation model between the truly captured LR and HR images, and thus many real SR datasets were constructed to improve real world SR performance; however, most of these datasets are for static LR-HR images, such as RealSR and imagepair. Recently, researchers have proposed the first real world VSR dataset by capturing using the iPhone11ProMax multi-camera system; however, parallax between LR and HR cameras increases the difficulty of alignment, and due to the limited focal length of the cell phone camera, there is only 2 times the LR-HR sequence pair in the dataset.
On the other hand, the trend of real scene image (video) restoration using Raw images, such as dim light enhancement, denoising, deblurring, and super resolution; the main reason is that the Raw image has a relatively wide bit depth (12 or 14 bits), i.e. contains the most primitive information, and its intensity is linear with illumination; however, little effort is still spent exploring the super resolution of Raw video; researchers synthesize LR Raw frames by downsampling from captured HR Raw frames, suggesting a Raw video super-resolution dataset; nevertheless, there is still a gap between the synthesized LR original frames and the true captured frames, which makes SR models trained on synthesized data not well generalize to real scenes.
Disclosure of Invention
The invention solves the technical problems:
(1) The invention aims to establish a real-world Raw video super-resolution data set and provides a video super-resolution algorithm adapted to the Raw data on the basis.
(II) in order to achieve the above-mentioned purpose, the invention has adopted the following technical scheme:
the method for constructing the real-world video super-resolution based on the Raw domain comprises the following steps:
s1, establishing a real world Raw video super-resolution data set: the data set establishment process mainly comprises the following 3 steps:
s101, hardware design: the incident light is split into two beams with a brightness ratio of 1 by a spectroscope: 1 and a transmitted beam; capturing LR-HR frame pairs of different proportions using a DSLR camera of a zoom lens; designing and printing a 3D model box to fix the spectroscope, placing the DSLR camera and the spectroscope box on an optical plate, and fixing a tripod below the DSLR camera and the spectroscope box;
s102, data acquisition: acquiring a Raw video in an MLV format, and then processing the Raw video in the MLV format by using MlRawViewer software to obtain corresponding sRGB frames and Raw frames in a DNG format;
s103, data processing: using a coarse-to-fine alignment strategy to generate aligned LR-HR frames, including sRGB frame pairsAnd Raw frame pair->
S2, adopting an LR Raw frame and an HR sRGB frame based on the frame after S1 data processingDesigning a real world Raw video super-resolution algorithm for the training pair;
s3, training a model: constructing a model based on an algorithm designed in the step S2, training the model by utilizing a deep learning framework Pytorch platform, iterating 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until the loss converges to obtain a final model;
s4, inputting the low-resolution Raw video sequence in the test set into the model to obtain a corresponding super-resolution input result.
Preferably, the step of performing data processing on the sRGB frame using the alignment policy in S103 is as follows:
s1031, firstly, estimating a homography matrix H between up-sampling LR and HR frames by using SIFT key points selected by a RANSAC algorithm;
s1032 then align HR framesTo coarsely crop out the corresponding region in the LR frame that matches the HR frame;
s1033, performing pixel-by-pixel alignment on the matching area by using a traditional optical flow estimation method DeepFlow;
s1034, finally, clipping the center region to eliminate alignment artifacts around the boundary, generating an aligned LR-HR frame in the RGB domain, usingAnd (3) representing.
Preferably, the Raw frame should pass the same alignment strategy as the sRGB frame, however, directly applying global and local alignment would disrupt the Bayer format of the Raw input; the method comprises the steps of performing data processing on a Raw frame by adopting an alignment strategy, firstly, reorganizing an original frame into an RGGB subformat, wherein the size of the RGGB subformat is half of that of an sRGB frame, and therefore, the H matrix calculated from the sRGB frame needs to be changed by readjusting a translation parameter at a rate of 0.5; deepflow is also processed in the same manner and in this way a Raw frame pair is generated
Preferably, the real world Raw video super-resolution algorithm flow described in S2 mainly includes the following steps:
s201, double-branch strategy and feature extraction: to fully utilize the information of the Raw data, the LR consecutive frames are inputInto two branches of the network in different forms; the Bayer format branch directly uses the Raw continuous frame itself as input; the sub-frame format branching uses the recombined RGGB sub-format to form a new sequence as input; the input of the Bayer format branch is denoted +.>The input of the sub-frame branch is denoted +.>The number of channels is 4 times; the Bayer format branches keep the original sequence of original pixels, which is beneficial to space reconstruction; although the sub-frame format branch cannot preserve the original pixel order, it can exploit far-neighbor correlation to generate details; then, the two inputs respectively pass through a feature extraction module, wherein the feature extraction module is composed of five residual blocks;
s202, joint alignment: due to the time offset between adjacent frames, it is necessary to warp adjacent frames toA center frame; the alignment is carried out on the basis of a multilevel cascade alignment strategy, namely an alignment offset is calculated from a subframe format branch, and then the calculated offset is directly copied to a Bayer format branch for alignment, namely two branches are aligned together; features in sub-frame format branchingAnd->L-1 times of convolution downsampling is carried out to form an L-level pyramid; pyramid features in Bayer format branches are constructed in the same way; the offset of the first stage is calculated according to the aggregate characteristics of the first stage and the up-sampling result of the offset of the (l+1) th stage:
since the input of the subframe format branch is actually a downsampled version of the Bayer format branch, the offset value of the Bayer format branch should be twice that of the subframe format branch; thus, the offset in the sub-frame format branch may be usedUp-sampling by 2 times and amplifying by 2 times to obtain Bayer format branch of the first stage +.>Offset of (2):
given an offset, the alignment features of the two branches can be expressed as:
where g represents the mapping function implemented by several convolution layers and Dconv represents the deformable convolution. Dconv of the two branches share the same weight at the respective level;
after L levels are aligned, further use is made ofAnd->Offset calculated between +.>To optimize->And->And generates final alignment results for adjacent features in both branches>And->
S203, an interaction module: the Bayer format branch features are downsampled by 3 x 3 convolution (stride=2) and the LeakyRelu layer, and these downsampled features are aggregated with features in the subframe format branches; similarly, sub-frame format branching features are up-sampled by a Pixelshuffle and then aggregated with features in the Bayer format branching;
s204, time fusion: aggregating the remote features with a non-local time awareness module to enhance feature representation along a time dimension; features are then fused together using a Temporal Spatial Attention (TSA) based fusion;
s205, channel fusion: combining features in two branches together using channel fusion, becauseAnd->May have different contributions to the final SR reconstruction; fusing the two branch features through channel weighted average by adopting selective kernel convolution (SKF);
s206, reconstruction and upsampling: features after fusionInput into a reconstruction module implemented by 10 ResNet blocks for SR reconstruction; after reconstruction, the reconstruction is carried out by utilizing a Pixelshuffle to carry out up-sampling, and then three-channel output is generated by utilizing a convolution layer; meanwhile, the module also utilizes two long jump connections, one is the input of Bayer format for LR, which is processed by a convolution layer first, and then is up-sampled to three-channel output through Pixelshubble; the other is used for LR subframe format input, because the space size is half of the original input, the LR subframe format input is up-sampled twice, and three outputs are added to generate a final HR result +.>
S207, color correction and loss function: actual shooting dataAnd->There is a difference in color and brightness, and directly exploiting the pixel loss between output and HR may result in the network optimizing color and brightness corrections without concern for the task of SR; to solve this problem, color correction is used before the loss calculation, i.e. channel-based colors are used for the RGB channels, respectivelyColor correction, instead of computing a 3×3 color correction matrix to correct them simultaneously: />
Wherein alpha is c Is the scaling factor of channel c by minimizingAnd HR downsampled version ∈ ->The least squares loss between corresponding pixels.
S208, optimizing the network using the chaseonnier loss between the corrected output and HR.
The beneficial effects of the invention include the following three points:
(1) The invention constructs a first real world VSR data set with three multiplying factors in the Raw and sRGB domains, which provides a reference data set for training and evaluation of a real original VSR method.
(2) The invention provides a Real-Raw VSR method based on a Real-world Raw video super-resolution data set obtained in S1, wherein the Real-Raw VSR method is used for processing Raw data input in two branches, one branch is used for Bayer format input, and the other branch is used for subframe format input; by utilizing the proposed joint alignment, interaction and time and channel fusion module, the complementary information of the two branches is well explored, and the real LR video super-resolution performance is improved to a new height.
(3) Experiments based on the invention show that the proposed method is superior to the currently mainstream VSR method of Raw and sRGB; through research and exploration of the invention, more researches on a video super-resolution method based on a Raw domain are hoped to be inspired.
Drawings
FIG. 1 is a hardware platform and a data processing flow chart in a method for constructing real world video super-resolution based on a Raw domain, which is adopted by the invention;
FIG. 2 is a flowchart of an algorithm in a method for constructing real world video super-resolution based on a Raw domain, which is adopted by the invention;
FIG. 3 is a block diagram of a joint alignment module in a method for constructing real world video super-resolution based on a Raw domain, which is adopted by the invention;
FIG. 4 is a table showing the comparison of the results of the algorithm used in the present invention with other video super-resolution algorithms on the test set.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1:
referring to fig. 1, the method for constructing the super-resolution of the real world video based on the Raw domain includes the following steps:
s1, establishing a real world Raw video super-resolution data set: the data set establishment process mainly comprises the following 3 steps:
s101, hardware design: in order to capture LR-HR frame pairs of different proportions, a DSLR camera with 18-135mm zoom lens was used instead of the cell phone camera; to avoid the influence of natural light from other directions, the spectroscope is fixed by designing and printing a 3D model box; so that two cameras can receive natural light from the same viewpoint, and the size of the spectroscope is 150×150x1 (mm) 3 ) Sufficient to cover the camera lens; the camera and the spectroscope box are placed on the optical plate, and the tripod is fixed below the camera and the spectroscope box so as to improve the stability of the camera and the spectroscope box;
s102, data acquisition: according to the invention, two Canon 60D cameras with the third party software magicLantern upgraded are used for acquiring the Raw video in MagicLanternVideo (MLV) format; in order to keep the cameras synchronous, the invention uses the infrared remote controller to send signals to the two cameras to control shooting at the same time, and in the shooting process, the invention keeps the ISO of the two cameras in the range of 100 to 1600 to avoid noise, and the exposure time ranges from 1/400 seconds to 1/31 seconds to capture slow motion and fast motion; all other settings are set to default values to simulate a real captured scene; then, the invention uses MlRawViewer software to process the MLV video to obtain corresponding sRGB frames and Raw frames in DNG format; for each scene, the invention captures a short video of 6 seconds at a frame rate of 25FPS, i.e., each video contains approximately 150 frames of Raw and sRGB format maps;
s103, data processing: due to the presence of lens aberrations, misalignment still exists between LR-HR pairs; the present invention thus utilizes a coarse-to-fine alignment strategy to generate aligned LR-HR frames, including sRGB frame pairsAnd Raw frame pairs
The step of performing data processing on the sRGB frame using the alignment policy in S103 is as follows:
s1031, firstly, estimating a homography matrix H between up-sampling LR and HR frames by using SIFT key points selected by a RANSAC algorithm;
s1032 then align HR framesTo coarsely crop out the corresponding region in the LR frame that matches the HR frame;
s1033, performing pixel-by-pixel alignment on the matching area by using a traditional optical flow estimation method DeepFlow;
s1034, finally, clipping the center region to eliminate alignment artifacts around the boundary, generating an aligned LR-HR frame in the RGB domain, usingA representation;
the Raw frame should pass through the same as the sRGB frameHowever, directly applying global and local alignment would destroy the Bayer format of the Raw input; the method comprises the steps of performing data processing on a Raw frame by adopting an alignment strategy, firstly, reorganizing an original frame into an RGGB subformat, wherein the size of the RGGB subformat is half of that of an sRGB frame, and therefore, the H matrix calculated from the sRGB frame needs to be changed by readjusting a translation parameter at a rate of 0.5; deepflow is also processed in the same manner and in this way a Raw frame pair is generated
S2, adopting an LR Raw frame and an HR sRGB frame based on the frame after S1 data processingDesigning a real world Raw video super-resolution algorithm for the training pair;
s3, training a model: the continuous frame number input by the training model in the invention is 5 frames, the used optimizer is an Adam optimizer, and the initial learning rate is set to be 0.0001; constructing a model based on an algorithm designed in the step S2, training the model by utilizing a deep learning framework Pytorch platform, iterating 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until the loss converges to obtain a final model;
training a model by using a deep learning framework Pytorch platform, iterating 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until the loss converges to obtain a final model;
s4, inputting the low-resolution Raw video sequence in the test set into the model to obtain a corresponding super-resolution input result;
the invention constructs a first real world VSR data set with three multiplying factors in the Raw and sRGB domains, which provides a reference data set for training and evaluation of a real original VSR method.
Example 2:
referring to fig. 2-4, the difference based on embodiment 1 is that:
the real world Raw video super-resolution algorithm flow described in S2 mainly comprises the following steps:
s201, double-branch strategy and feature extraction: to fully utilize the information of the Raw data, the LR consecutive frames are inputInto two branches of the network in different forms; the Bayer format branch directly uses the Raw continuous frame itself as input; the sub-frame format branching uses the recombined RGGB sub-format to form a new sequence as input; the input of the Bayer format branch is denoted +.>The input of the sub-frame branch is denoted +.>The number of channels is 4 times; the Bayer format branches keep the original sequence of original pixels, which is beneficial to space reconstruction; although the sub-frame format branch cannot preserve the original pixel order, it can exploit far-neighbor correlation to generate details; then, the two inputs respectively pass through a feature extraction module, wherein the feature extraction module is composed of five residual blocks;
s202, joint alignment: due to the time offset between adjacent frames, it is necessary to warp adjacent frames to a center frame; the alignment is carried out on the basis of a multilevel cascade alignment strategy, namely an alignment offset is calculated from a subframe format branch, and then the calculated offset is directly copied to a Bayer format branch for alignment, namely two branches are aligned together; features in sub-frame format branchingAnd->L-1 times of convolution downsampling is carried out to form an L-level pyramid; pyramid features in Bayer format branches are constructed in the same way; the offset of the first stage is calculated according to the aggregate characteristics of the first stage and the up-sampling result of the offset of the (l+1) th stage:
since the input of the subframe format branch is actually a downsampled version of the Bayer format branch, the offset value of the Bayer format branch should be twice that of the subframe format branch; thus, the offset in the sub-frame format branch may be usedUp-sampling by 2 times and amplifying by 2 times to obtain Bayer format branch of the first stage +.>Offset of (2):
given an offset, the alignment features of the two branches can be expressed as:
where g represents the mapping function implemented by several convolution layers and Dconv represents the deformable convolution. Dconv of the two branches share the same weight at the respective level;
after L levels are aligned, further use is made ofAnd->Offset calculated between +.>To optimize->And->And generates final alignment results for adjacent features in both branches>And->
S203, an interaction module: the Bayer format branch features are downsampled by 3 x 3 convolution (stride=2) and the LeakyRelu layer, and these downsampled features are aggregated with features in the subframe format branches; similarly, sub-frame format branching features are up-sampled by a Pixelshuffle and then aggregated with features in the Bayer format branching;
s204, time fusion: aggregating the remote features with a non-local time awareness module to enhance feature representation along a time dimension; features are then fused together using a Temporal Spatial Attention (TSA) based fusion;
s205, channel fusion: combining features in two branches together using channel fusion, becauseAnd->May have different contributions to the final SR reconstruction; fusing the two branch features through channel weighted average by adopting selective kernel convolution (SKF);
s206, reconstruction and upsampling: features after fusionInput to the system consisting of 10RThe reconstruction module realized by the eNet block is used for SR reconstruction; after reconstruction, the reconstruction is carried out by utilizing a Pixelshuffle to carry out up-sampling, and then three-channel output is generated by utilizing a convolution layer; meanwhile, the module also utilizes two long jump connections, one is the input of Bayer format for LR, which is processed by a convolution layer first, and then is up-sampled to three-channel output through Pixelshubble; the other is used for LR subframe format input, because the space size is half of the original input, the LR subframe format input is up-sampled twice, and three outputs are added to generate a final HR result +.>
S207, color correction and loss function: actual shooting dataAnd->There is a difference in color and brightness, and directly exploiting the pixel loss between output and HR may result in the network optimizing color and brightness corrections without concern for the task of SR; to solve this problem, color correction is used before the loss calculation, i.e., channel-based color correction is used for RGB channels, respectively, instead of calculating a 3×3 color correction matrix to correct them simultaneously:
wherein alpha is c Is the scaling factor of channel c by minimizingAnd HR downsampled version ∈ ->The least squares loss between corresponding pixels.
S208, optimizing the network using the chaseonnier loss between the corrected output and HR;
based on the data set, the invention provides a Real-RawVSR method, which processes the Raw data input in two branches, wherein one branch is used for Bayer format input, and the other branch is used for subframe format input; by utilizing the proposed joint alignment, interaction and time channel fusion module, the complementary information of the two branches is well explored; experiments show that the proposed method is superior to the currently mainstream VSR method of Raw and sRGB; through research and exploration of the invention, more researches on a video super-resolution method based on a Raw domain are hoped to be inspired. The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical solution and the modified concept thereof, within the scope of the present invention.
Claims (4)
1. The method for constructing the real-world video super-resolution based on the Raw domain is characterized by comprising the following steps of: the method comprises the following steps:
s1, establishing a real world Raw video super-resolution data set: the data set establishment process mainly comprises the following 3 steps:
s101, hardware design: the incident light is split into two beams with a brightness ratio of 1 by a spectroscope: 1 and a transmitted beam; capturing LR-HR frame pairs of different proportions using a DSLR camera of a zoom lens; designing and printing a 3D model box to fix the spectroscope, placing the DSLR camera and the spectroscope box on an optical plate, and fixing a tripod below the DSLR camera and the spectroscope box;
s102, data acquisition: acquiring a Raw video in an MLV format, and then processing the Raw video in the MLV format by using MlRawViewer software to obtain corresponding sRGB frames and Raw frames in a DNG format;
s103, data processing: using a coarse-to-fine alignment strategy to generate aligned LR-HR frames, including sRGB frame pairsAnd Raw frame pair->
S2, adopting an LR Raw frame and an HR sRGB frame based on the frame after S1 data processingDesigning a real world Raw video super-resolution algorithm for the training pair;
s3, training a model: constructing a model based on an algorithm designed in the step S2, training the model by utilizing a deep learning framework Pytorch platform, iterating 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until the loss converges to obtain a final model;
s4, inputting the low-resolution Raw video sequence in the test set into the model obtained in the S3, and obtaining a corresponding super-resolution input result.
2. The method for constructing the real world video super resolution based on the Raw domain according to claim 1, wherein the method comprises the following steps: the step of performing data processing on the sRGB frame using the alignment policy in S103 is as follows:
s1031, firstly, estimating a homography matrix H between up-sampling LR and HR frames by using SIFT key points selected by a RANSAC algorithm;
s1032 then align HR framesTo coarsely crop out the corresponding region in the LR frame that matches the HR frame;
s1033, performing pixel-by-pixel alignment on the matching area by using a traditional optical flow estimation method DeepFlow;
3. The method for constructing the real world video super resolution based on the Raw domain according to claim 2, wherein the method comprises the following steps:
the method comprises the steps of performing data processing on a Raw frame by adopting an alignment strategy, firstly, reorganizing an original frame into an RGGB subformat, wherein the size of the RGGB subformat is half of that of an sRGB frame, and therefore, the H matrix calculated from the sRGB frame needs to be changed by readjusting a translation parameter at a rate of 0.5; deepflow is also processed in the same manner and in this way a Raw frame pair is generated
4. The method for constructing the real world video super resolution based on the Raw domain according to claim 1, wherein the method comprises the following steps: the real world Raw video super-resolution algorithm flow described in S2 mainly comprises the following steps:
s201, double-branch strategy and feature extraction: to fully utilize the information of the Raw data, the LR consecutive frames are inputInto two branches of the network in different forms; the Bayer format branch directly uses the Raw continuous frame itself as input; the sub-frame format branching uses the recombined RGGB sub-format to form a new sequence as input; the input of the Bayer format branch is denoted +.>The input of the sub-frame branch is denoted +.>The number of channels is 4 times; the Bayer format branches keep the original sequence of original pixels, which is beneficial to space reconstruction; although the sub-frame format branch does not preserve the original pixel order, it can exploit far-neighbor correlation to generate a thinA section; then, the two inputs respectively pass through a feature extraction module, wherein the feature extraction module is composed of five residual blocks;
s202, joint alignment: due to the time offset between adjacent frames, it is necessary to warp adjacent frames to a center frame; the alignment is carried out on the basis of a multilevel cascade alignment strategy, namely an alignment offset is calculated from a subframe format branch, and then the calculated offset is directly copied to a Bayer format branch for alignment, namely two branches are aligned together; features in sub-frame format branchingAnd->L-1 times of convolution downsampling is carried out to form an L-level pyramid; pyramid features in Bayer format branches are constructed in the same way; the offset of the first stage is calculated according to the aggregate characteristics of the first stage and the up-sampling result of the offset of the (l+1) th stage:
since the input of the subframe format branch is actually a downsampled version of the Bayer format branch, the offset value of the Bayer format branch should be twice that of the subframe format branch; thus, the offset in the sub-frame format branch may be usedUp-sampling by 2 times and amplifying by 2 times to obtain Bayer format branch of the first stage +.>Offset of (2):
given an offset, the alignment features of the two branches can be expressed as:
where g represents the mapping function implemented by several convolution layers and Dconv represents the deformable convolution; dconv of the two branches share the same weight at the respective level;
after L levels are aligned, further use is made ofAnd->Offset calculated between +.>To optimizeAnd->And generates final alignment results for adjacent features in both branches>And->
S203, an interaction module: the Bayer format branch features are downsampled by a 3 x 3 convolution and LeakyRelu layer, and these downsampled features are aggregated with features in the subframe format branches; similarly, sub-frame format branching features are up-sampled by a Pixelshuffle and then aggregated with features in the Bayer format branching;
s204, time fusion: aggregating the remote features with a non-local time awareness module to enhance feature representation along a time dimension; then fusing the features together using a fusion based on temporal spatial attention;
s205, channel fusion: combining features in two branches together using channel fusion, becauseAnd->May have different contributions to the final SR reconstruction; fusing the two branch features through channel weighted average by adopting selective kernel convolution;
s206, reconstruction and upsampling: features after fusionInput into a reconstruction module implemented by 10 ResNet blocks for SR reconstruction; after reconstruction, the reconstruction is carried out by utilizing a Pixelshuffle to carry out up-sampling, and then three-channel output is generated by utilizing a convolution layer; meanwhile, the module also utilizes two long jump connections, one is the input of Bayer format for LR, which is processed by a convolution layer first, and then is up-sampled to three-channel output through Pixelshubble; the other is used for LR subframe format input, which is up-sampled twice because of its half space size as the original input, and the three outputs are added to generate the final HR result
S207, color correction and loss function: actual shooting dataAnd->There is a difference in color and brightness, and directly exploiting the pixel loss between output and HR may result in the network optimizing color and brightness corrections without concern for the task of SR; to solve this problem, color correction is used before the loss calculation, i.e., channel-based color correction is used for RGB channels, respectively, instead of calculating a 3×3 color correction matrix to correct them simultaneously:
wherein alpha is c Is the scaling factor of channel c by minimizingAnd HR downsampled version ∈ ->A least squares loss between corresponding pixels;
s208, optimizing the network using the chaseonnier loss between the corrected output and HR.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210733861.5A CN115115516B (en) | 2022-06-27 | 2022-06-27 | Real world video super-resolution construction method based on Raw domain |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210733861.5A CN115115516B (en) | 2022-06-27 | 2022-06-27 | Real world video super-resolution construction method based on Raw domain |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115115516A CN115115516A (en) | 2022-09-27 |
CN115115516B true CN115115516B (en) | 2023-05-12 |
Family
ID=83329552
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210733861.5A Active CN115115516B (en) | 2022-06-27 | 2022-06-27 | Real world video super-resolution construction method based on Raw domain |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115115516B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116051380B (en) * | 2023-01-13 | 2023-08-22 | 深圳大学 | Video super-resolution processing method and electronic equipment |
CN116596779B (en) * | 2023-04-24 | 2023-12-01 | 天津大学 | Transform-based Raw video denoising method |
CN117745539A (en) * | 2023-11-13 | 2024-03-22 | 天津大学 | Real double-shot reference superdivision method based on Kernel-Free matching |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700392A (en) * | 2020-12-01 | 2021-04-23 | 华南理工大学 | Video super-resolution processing method, device and storage medium |
CN113538249A (en) * | 2021-09-03 | 2021-10-22 | 中国矿业大学 | Image super-resolution reconstruction method and device for video monitoring high-definition presentation |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010122934A (en) * | 2008-11-20 | 2010-06-03 | Sony Corp | Image processing apparatus, image processing method, and program |
US9384386B2 (en) * | 2014-08-29 | 2016-07-05 | Motorola Solutions, Inc. | Methods and systems for increasing facial recognition working rang through adaptive super-resolution |
US10783611B2 (en) * | 2018-01-02 | 2020-09-22 | Google Llc | Frame-recurrent video super-resolution |
US11790489B2 (en) * | 2020-04-07 | 2023-10-17 | Samsung Electronics Co., Ltd. | Systems and method of training networks for real-world super resolution with unknown degradations |
CN111583112A (en) * | 2020-04-29 | 2020-08-25 | 华南理工大学 | Method, system, device and storage medium for video super-resolution |
CN113240581A (en) * | 2021-04-09 | 2021-08-10 | 辽宁工程技术大学 | Real world image super-resolution method for unknown fuzzy kernel |
CN112991183B (en) * | 2021-04-09 | 2023-06-20 | 华南理工大学 | Video super-resolution method based on multi-frame attention mechanism progressive fusion |
CN113469884A (en) * | 2021-07-15 | 2021-10-01 | 长视科技股份有限公司 | Video super-resolution method, system, equipment and storage medium based on data simulation |
-
2022
- 2022-06-27 CN CN202210733861.5A patent/CN115115516B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700392A (en) * | 2020-12-01 | 2021-04-23 | 华南理工大学 | Video super-resolution processing method, device and storage medium |
CN113538249A (en) * | 2021-09-03 | 2021-10-22 | 中国矿业大学 | Image super-resolution reconstruction method and device for video monitoring high-definition presentation |
Also Published As
Publication number | Publication date |
---|---|
CN115115516A (en) | 2022-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115115516B (en) | Real world video super-resolution construction method based on Raw domain | |
Kalantari et al. | Deep HDR video from sequences with alternating exposures | |
CN102693538B (en) | Generate the global alignment method and apparatus of high dynamic range images | |
JP2019514116A (en) | Efficient canvas view generation from intermediate views | |
CN108875900B (en) | Video image processing method and device, neural network training method and storage medium | |
CN113454680A (en) | Image processor | |
KR20130013288A (en) | High dynamic range image creation apparatus of removaling ghost blur by using multi exposure fusion and method of the same | |
Liu et al. | Exploit camera raw data for video super-resolution via hidden markov model inference | |
CN113228094A (en) | Image processor | |
CN111986084A (en) | Multi-camera low-illumination image quality enhancement method based on multi-task fusion | |
CN114972061B (en) | Method and system for denoising and enhancing dim light video | |
CN108024054A (en) | Image processing method, device and equipment | |
US11849264B2 (en) | Apparatus and method for white balance editing | |
CN112508812A (en) | Image color cast correction method, model training method, device and equipment | |
Yue et al. | Real-rawvsr: Real-world raw video super-resolution with a benchmark dataset | |
CN112750092A (en) | Training data acquisition method, image quality enhancement model and method and electronic equipment | |
CN114862707A (en) | Multi-scale feature recovery image enhancement method and device and storage medium | |
CN113379609A (en) | Image processing method, storage medium and terminal equipment | |
Shen et al. | Spatial temporal video enhancement using alternating exposures | |
CN117768774A (en) | Image processor, image processing method, photographing device and electronic device | |
WO2023110880A1 (en) | Image processing methods and systems for low-light image enhancement using machine learning models | |
CN111161189A (en) | Single image re-enhancement method based on detail compensation network | |
CN116245968A (en) | Method for generating HDR image based on LDR image of transducer | |
CN111968039A (en) | Day and night universal image processing method, device and equipment based on silicon sensor camera | |
CN112991174A (en) | Method and system for improving resolution of single-frame infrared image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |