CN115115516B - Real world video super-resolution construction method based on Raw domain - Google Patents

Real world video super-resolution construction method based on Raw domain Download PDF

Info

Publication number
CN115115516B
CN115115516B CN202210733861.5A CN202210733861A CN115115516B CN 115115516 B CN115115516 B CN 115115516B CN 202210733861 A CN202210733861 A CN 202210733861A CN 115115516 B CN115115516 B CN 115115516B
Authority
CN
China
Prior art keywords
frame
raw
format
resolution
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210733861.5A
Other languages
Chinese (zh)
Other versions
CN115115516A (en
Inventor
岳焕景
张芝铭
杨敬钰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202210733861.5A priority Critical patent/CN115115516B/en
Publication of CN115115516A publication Critical patent/CN115115516A/en
Application granted granted Critical
Publication of CN115115516B publication Critical patent/CN115115516B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/35Determination of transform parameters for the alignment of images, i.e. image registration using statistical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Television Systems (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The invention discloses a method for constructing real-world video super-resolution based on a Raw domain, and relates to the technical field of video signal processing. The method for constructing the real-world video super-resolution based on the Raw domain comprises the following steps: s1, establishing a real world Raw video super-resolution data set; s2, designing a real world Raw video super-resolution algorithm based on the S1; s3, training a model; s4, inputting the low-resolution Raw video sequence in the test set into the model to obtain a corresponding super-resolution input result; the invention constructs a first real-world VSR data set with three multiplying factors in Raw and sRGB domains, and provides a reference data set for training and evaluating a real original VSR method; the invention improves the real LR video super-resolution performance to a new height by utilizing the proposed combination pair Ji Jiaohu module, the time and channel fusion module.

Description

Real world video super-resolution construction method based on Raw domain
Technical Field
The invention belongs to the technical field of video signal processing, and relates to a method for constructing real-world video super-resolution based on a Raw domain.
Background
Capturing video with a short focus lens can expand the viewing angle by sacrificing resolution, while capturing with a long focus lens can increase resolution by sacrificing viewing angle; video super-resolution (VSR) is an efficient way to acquire wide-angle and high-resolution (HR) video; video super-resolution reconstructs high-resolution video from low-resolution (LR) inputs by exploring spatial and temporal correlations of the input sequences; in recent years, the development of video super-resolution has been shifted from traditional model-driven to deep learning-based approaches.
The performance of these deep learning based SR methods depends largely on the training dataset, considering that synthetic LR-HR datasets, such as DIV2K and REDS, cannot represent a degradation model between the truly captured LR and HR images, and thus many real SR datasets were constructed to improve real world SR performance; however, most of these datasets are for static LR-HR images, such as RealSR and imagepair. Recently, researchers have proposed the first real world VSR dataset by capturing using the iPhone11ProMax multi-camera system; however, parallax between LR and HR cameras increases the difficulty of alignment, and due to the limited focal length of the cell phone camera, there is only 2 times the LR-HR sequence pair in the dataset.
On the other hand, the trend of real scene image (video) restoration using Raw images, such as dim light enhancement, denoising, deblurring, and super resolution; the main reason is that the Raw image has a relatively wide bit depth (12 or 14 bits), i.e. contains the most primitive information, and its intensity is linear with illumination; however, little effort is still spent exploring the super resolution of Raw video; researchers synthesize LR Raw frames by downsampling from captured HR Raw frames, suggesting a Raw video super-resolution dataset; nevertheless, there is still a gap between the synthesized LR original frames and the true captured frames, which makes SR models trained on synthesized data not well generalize to real scenes.
Disclosure of Invention
The invention solves the technical problems:
(1) The invention aims to establish a real-world Raw video super-resolution data set and provides a video super-resolution algorithm adapted to the Raw data on the basis.
(II) in order to achieve the above-mentioned purpose, the invention has adopted the following technical scheme:
the method for constructing the real-world video super-resolution based on the Raw domain comprises the following steps:
s1, establishing a real world Raw video super-resolution data set: the data set establishment process mainly comprises the following 3 steps:
s101, hardware design: the incident light is split into two beams with a brightness ratio of 1 by a spectroscope: 1 and a transmitted beam; capturing LR-HR frame pairs of different proportions using a DSLR camera of a zoom lens; designing and printing a 3D model box to fix the spectroscope, placing the DSLR camera and the spectroscope box on an optical plate, and fixing a tripod below the DSLR camera and the spectroscope box;
s102, data acquisition: acquiring a Raw video in an MLV format, and then processing the Raw video in the MLV format by using MlRawViewer software to obtain corresponding sRGB frames and Raw frames in a DNG format;
s103, data processing: using a coarse-to-fine alignment strategy to generate aligned LR-HR frames, including sRGB frame pairs
Figure GDA0004169932550000031
And Raw frame pair->
Figure GDA0004169932550000032
S2, adopting an LR Raw frame and an HR sRGB frame based on the frame after S1 data processing
Figure GDA0004169932550000033
Designing a real world Raw video super-resolution algorithm for the training pair;
s3, training a model: constructing a model based on an algorithm designed in the step S2, training the model by utilizing a deep learning framework Pytorch platform, iterating 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until the loss converges to obtain a final model;
s4, inputting the low-resolution Raw video sequence in the test set into the model to obtain a corresponding super-resolution input result.
Preferably, the step of performing data processing on the sRGB frame using the alignment policy in S103 is as follows:
s1031, firstly, estimating a homography matrix H between up-sampling LR and HR frames by using SIFT key points selected by a RANSAC algorithm;
s1032 then align HR frames
Figure GDA0004169932550000034
To coarsely crop out the corresponding region in the LR frame that matches the HR frame;
s1033, performing pixel-by-pixel alignment on the matching area by using a traditional optical flow estimation method DeepFlow;
s1034, finally, clipping the center region to eliminate alignment artifacts around the boundary, generating an aligned LR-HR frame in the RGB domain, using
Figure GDA0004169932550000041
And (3) representing.
Preferably, the Raw frame should pass the same alignment strategy as the sRGB frame, however, directly applying global and local alignment would disrupt the Bayer format of the Raw input; the method comprises the steps of performing data processing on a Raw frame by adopting an alignment strategy, firstly, reorganizing an original frame into an RGGB subformat, wherein the size of the RGGB subformat is half of that of an sRGB frame, and therefore, the H matrix calculated from the sRGB frame needs to be changed by readjusting a translation parameter at a rate of 0.5; deepflow is also processed in the same manner and in this way a Raw frame pair is generated
Figure GDA0004169932550000042
Preferably, the real world Raw video super-resolution algorithm flow described in S2 mainly includes the following steps:
s201, double-branch strategy and feature extraction: to fully utilize the information of the Raw data, the LR consecutive frames are input
Figure GDA0004169932550000043
Into two branches of the network in different forms; the Bayer format branch directly uses the Raw continuous frame itself as input; the sub-frame format branching uses the recombined RGGB sub-format to form a new sequence as input; the input of the Bayer format branch is denoted +.>
Figure GDA0004169932550000044
The input of the sub-frame branch is denoted +.>
Figure GDA0004169932550000045
The number of channels is 4 times; the Bayer format branches keep the original sequence of original pixels, which is beneficial to space reconstruction; although the sub-frame format branch cannot preserve the original pixel order, it can exploit far-neighbor correlation to generate details; then, the two inputs respectively pass through a feature extraction module, wherein the feature extraction module is composed of five residual blocks;
s202, joint alignment: due to the time offset between adjacent frames, it is necessary to warp adjacent frames toA center frame; the alignment is carried out on the basis of a multilevel cascade alignment strategy, namely an alignment offset is calculated from a subframe format branch, and then the calculated offset is directly copied to a Bayer format branch for alignment, namely two branches are aligned together; features in sub-frame format branching
Figure GDA0004169932550000051
And->
Figure GDA0004169932550000052
L-1 times of convolution downsampling is carried out to form an L-level pyramid; pyramid features in Bayer format branches are constructed in the same way; the offset of the first stage is calculated according to the aggregate characteristics of the first stage and the up-sampling result of the offset of the (l+1) th stage:
Figure GDA0004169932550000053
since the input of the subframe format branch is actually a downsampled version of the Bayer format branch, the offset value of the Bayer format branch should be twice that of the subframe format branch; thus, the offset in the sub-frame format branch may be used
Figure GDA0004169932550000054
Up-sampling by 2 times and amplifying by 2 times to obtain Bayer format branch of the first stage +.>
Figure GDA0004169932550000055
Offset of (2):
Figure GDA0004169932550000056
given an offset, the alignment features of the two branches can be expressed as:
Figure GDA0004169932550000057
Figure GDA0004169932550000061
where g represents the mapping function implemented by several convolution layers and Dconv represents the deformable convolution. Dconv of the two branches share the same weight at the respective level;
after L levels are aligned, further use is made of
Figure GDA0004169932550000062
And->
Figure GDA0004169932550000063
Offset calculated between +.>
Figure GDA0004169932550000064
To optimize->
Figure GDA0004169932550000065
And->
Figure GDA0004169932550000066
And generates final alignment results for adjacent features in both branches>
Figure GDA0004169932550000067
And->
Figure GDA0004169932550000068
S203, an interaction module: the Bayer format branch features are downsampled by 3 x 3 convolution (stride=2) and the LeakyRelu layer, and these downsampled features are aggregated with features in the subframe format branches; similarly, sub-frame format branching features are up-sampled by a Pixelshuffle and then aggregated with features in the Bayer format branching;
s204, time fusion: aggregating the remote features with a non-local time awareness module to enhance feature representation along a time dimension; features are then fused together using a Temporal Spatial Attention (TSA) based fusion;
s205, channel fusion: combining features in two branches together using channel fusion, because
Figure GDA0004169932550000069
And->
Figure GDA00041699325500000610
May have different contributions to the final SR reconstruction; fusing the two branch features through channel weighted average by adopting selective kernel convolution (SKF);
s206, reconstruction and upsampling: features after fusion
Figure GDA00041699325500000611
Input into a reconstruction module implemented by 10 ResNet blocks for SR reconstruction; after reconstruction, the reconstruction is carried out by utilizing a Pixelshuffle to carry out up-sampling, and then three-channel output is generated by utilizing a convolution layer; meanwhile, the module also utilizes two long jump connections, one is the input of Bayer format for LR, which is processed by a convolution layer first, and then is up-sampled to three-channel output through Pixelshubble; the other is used for LR subframe format input, because the space size is half of the original input, the LR subframe format input is up-sampled twice, and three outputs are added to generate a final HR result +.>
Figure GDA0004169932550000071
S207, color correction and loss function: actual shooting data
Figure GDA0004169932550000072
And->
Figure GDA0004169932550000073
There is a difference in color and brightness, and directly exploiting the pixel loss between output and HR may result in the network optimizing color and brightness corrections without concern for the task of SR; to solve this problem, color correction is used before the loss calculation, i.e. channel-based colors are used for the RGB channels, respectivelyColor correction, instead of computing a 3×3 color correction matrix to correct them simultaneously: />
Figure GDA0004169932550000074
Wherein alpha is c Is the scaling factor of channel c by minimizing
Figure GDA0004169932550000075
And HR downsampled version ∈ ->
Figure GDA0004169932550000076
The least squares loss between corresponding pixels.
S208, optimizing the network using the chaseonnier loss between the corrected output and HR.
The beneficial effects of the invention include the following three points:
(1) The invention constructs a first real world VSR data set with three multiplying factors in the Raw and sRGB domains, which provides a reference data set for training and evaluation of a real original VSR method.
(2) The invention provides a Real-Raw VSR method based on a Real-world Raw video super-resolution data set obtained in S1, wherein the Real-Raw VSR method is used for processing Raw data input in two branches, one branch is used for Bayer format input, and the other branch is used for subframe format input; by utilizing the proposed joint alignment, interaction and time and channel fusion module, the complementary information of the two branches is well explored, and the real LR video super-resolution performance is improved to a new height.
(3) Experiments based on the invention show that the proposed method is superior to the currently mainstream VSR method of Raw and sRGB; through research and exploration of the invention, more researches on a video super-resolution method based on a Raw domain are hoped to be inspired.
Drawings
FIG. 1 is a hardware platform and a data processing flow chart in a method for constructing real world video super-resolution based on a Raw domain, which is adopted by the invention;
FIG. 2 is a flowchart of an algorithm in a method for constructing real world video super-resolution based on a Raw domain, which is adopted by the invention;
FIG. 3 is a block diagram of a joint alignment module in a method for constructing real world video super-resolution based on a Raw domain, which is adopted by the invention;
FIG. 4 is a table showing the comparison of the results of the algorithm used in the present invention with other video super-resolution algorithms on the test set.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1:
referring to fig. 1, the method for constructing the super-resolution of the real world video based on the Raw domain includes the following steps:
s1, establishing a real world Raw video super-resolution data set: the data set establishment process mainly comprises the following 3 steps:
s101, hardware design: in order to capture LR-HR frame pairs of different proportions, a DSLR camera with 18-135mm zoom lens was used instead of the cell phone camera; to avoid the influence of natural light from other directions, the spectroscope is fixed by designing and printing a 3D model box; so that two cameras can receive natural light from the same viewpoint, and the size of the spectroscope is 150×150x1 (mm) 3 ) Sufficient to cover the camera lens; the camera and the spectroscope box are placed on the optical plate, and the tripod is fixed below the camera and the spectroscope box so as to improve the stability of the camera and the spectroscope box;
s102, data acquisition: according to the invention, two Canon 60D cameras with the third party software magicLantern upgraded are used for acquiring the Raw video in MagicLanternVideo (MLV) format; in order to keep the cameras synchronous, the invention uses the infrared remote controller to send signals to the two cameras to control shooting at the same time, and in the shooting process, the invention keeps the ISO of the two cameras in the range of 100 to 1600 to avoid noise, and the exposure time ranges from 1/400 seconds to 1/31 seconds to capture slow motion and fast motion; all other settings are set to default values to simulate a real captured scene; then, the invention uses MlRawViewer software to process the MLV video to obtain corresponding sRGB frames and Raw frames in DNG format; for each scene, the invention captures a short video of 6 seconds at a frame rate of 25FPS, i.e., each video contains approximately 150 frames of Raw and sRGB format maps;
s103, data processing: due to the presence of lens aberrations, misalignment still exists between LR-HR pairs; the present invention thus utilizes a coarse-to-fine alignment strategy to generate aligned LR-HR frames, including sRGB frame pairs
Figure GDA0004169932550000101
And Raw frame pairs
Figure GDA0004169932550000102
The step of performing data processing on the sRGB frame using the alignment policy in S103 is as follows:
s1031, firstly, estimating a homography matrix H between up-sampling LR and HR frames by using SIFT key points selected by a RANSAC algorithm;
s1032 then align HR frames
Figure GDA0004169932550000103
To coarsely crop out the corresponding region in the LR frame that matches the HR frame;
s1033, performing pixel-by-pixel alignment on the matching area by using a traditional optical flow estimation method DeepFlow;
s1034, finally, clipping the center region to eliminate alignment artifacts around the boundary, generating an aligned LR-HR frame in the RGB domain, using
Figure GDA0004169932550000104
A representation;
the Raw frame should pass through the same as the sRGB frameHowever, directly applying global and local alignment would destroy the Bayer format of the Raw input; the method comprises the steps of performing data processing on a Raw frame by adopting an alignment strategy, firstly, reorganizing an original frame into an RGGB subformat, wherein the size of the RGGB subformat is half of that of an sRGB frame, and therefore, the H matrix calculated from the sRGB frame needs to be changed by readjusting a translation parameter at a rate of 0.5; deepflow is also processed in the same manner and in this way a Raw frame pair is generated
Figure GDA0004169932550000105
S2, adopting an LR Raw frame and an HR sRGB frame based on the frame after S1 data processing
Figure GDA0004169932550000106
Designing a real world Raw video super-resolution algorithm for the training pair;
s3, training a model: the continuous frame number input by the training model in the invention is 5 frames, the used optimizer is an Adam optimizer, and the initial learning rate is set to be 0.0001; constructing a model based on an algorithm designed in the step S2, training the model by utilizing a deep learning framework Pytorch platform, iterating 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until the loss converges to obtain a final model;
training a model by using a deep learning framework Pytorch platform, iterating 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until the loss converges to obtain a final model;
s4, inputting the low-resolution Raw video sequence in the test set into the model to obtain a corresponding super-resolution input result;
the invention constructs a first real world VSR data set with three multiplying factors in the Raw and sRGB domains, which provides a reference data set for training and evaluation of a real original VSR method.
Example 2:
referring to fig. 2-4, the difference based on embodiment 1 is that:
the real world Raw video super-resolution algorithm flow described in S2 mainly comprises the following steps:
s201, double-branch strategy and feature extraction: to fully utilize the information of the Raw data, the LR consecutive frames are input
Figure GDA0004169932550000121
Into two branches of the network in different forms; the Bayer format branch directly uses the Raw continuous frame itself as input; the sub-frame format branching uses the recombined RGGB sub-format to form a new sequence as input; the input of the Bayer format branch is denoted +.>
Figure GDA0004169932550000122
The input of the sub-frame branch is denoted +.>
Figure GDA0004169932550000123
The number of channels is 4 times; the Bayer format branches keep the original sequence of original pixels, which is beneficial to space reconstruction; although the sub-frame format branch cannot preserve the original pixel order, it can exploit far-neighbor correlation to generate details; then, the two inputs respectively pass through a feature extraction module, wherein the feature extraction module is composed of five residual blocks;
s202, joint alignment: due to the time offset between adjacent frames, it is necessary to warp adjacent frames to a center frame; the alignment is carried out on the basis of a multilevel cascade alignment strategy, namely an alignment offset is calculated from a subframe format branch, and then the calculated offset is directly copied to a Bayer format branch for alignment, namely two branches are aligned together; features in sub-frame format branching
Figure GDA0004169932550000124
And->
Figure GDA0004169932550000125
L-1 times of convolution downsampling is carried out to form an L-level pyramid; pyramid features in Bayer format branches are constructed in the same way; the offset of the first stage is calculated according to the aggregate characteristics of the first stage and the up-sampling result of the offset of the (l+1) th stage:
Figure GDA0004169932550000126
since the input of the subframe format branch is actually a downsampled version of the Bayer format branch, the offset value of the Bayer format branch should be twice that of the subframe format branch; thus, the offset in the sub-frame format branch may be used
Figure GDA0004169932550000127
Up-sampling by 2 times and amplifying by 2 times to obtain Bayer format branch of the first stage +.>
Figure GDA0004169932550000131
Offset of (2):
Figure GDA0004169932550000132
given an offset, the alignment features of the two branches can be expressed as:
Figure GDA0004169932550000133
Figure GDA0004169932550000134
where g represents the mapping function implemented by several convolution layers and Dconv represents the deformable convolution. Dconv of the two branches share the same weight at the respective level;
after L levels are aligned, further use is made of
Figure GDA0004169932550000135
And->
Figure GDA0004169932550000136
Offset calculated between +.>
Figure GDA0004169932550000137
To optimize->
Figure GDA0004169932550000138
And->
Figure GDA0004169932550000139
And generates final alignment results for adjacent features in both branches>
Figure GDA00041699325500001310
And->
Figure GDA00041699325500001311
S203, an interaction module: the Bayer format branch features are downsampled by 3 x 3 convolution (stride=2) and the LeakyRelu layer, and these downsampled features are aggregated with features in the subframe format branches; similarly, sub-frame format branching features are up-sampled by a Pixelshuffle and then aggregated with features in the Bayer format branching;
s204, time fusion: aggregating the remote features with a non-local time awareness module to enhance feature representation along a time dimension; features are then fused together using a Temporal Spatial Attention (TSA) based fusion;
s205, channel fusion: combining features in two branches together using channel fusion, because
Figure GDA00041699325500001312
And->
Figure GDA00041699325500001313
May have different contributions to the final SR reconstruction; fusing the two branch features through channel weighted average by adopting selective kernel convolution (SKF);
s206, reconstruction and upsampling: features after fusion
Figure GDA0004169932550000141
Input to the system consisting of 10RThe reconstruction module realized by the eNet block is used for SR reconstruction; after reconstruction, the reconstruction is carried out by utilizing a Pixelshuffle to carry out up-sampling, and then three-channel output is generated by utilizing a convolution layer; meanwhile, the module also utilizes two long jump connections, one is the input of Bayer format for LR, which is processed by a convolution layer first, and then is up-sampled to three-channel output through Pixelshubble; the other is used for LR subframe format input, because the space size is half of the original input, the LR subframe format input is up-sampled twice, and three outputs are added to generate a final HR result +.>
Figure GDA0004169932550000142
S207, color correction and loss function: actual shooting data
Figure GDA0004169932550000143
And->
Figure GDA0004169932550000144
There is a difference in color and brightness, and directly exploiting the pixel loss between output and HR may result in the network optimizing color and brightness corrections without concern for the task of SR; to solve this problem, color correction is used before the loss calculation, i.e., channel-based color correction is used for RGB channels, respectively, instead of calculating a 3×3 color correction matrix to correct them simultaneously:
Figure GDA0004169932550000145
wherein alpha is c Is the scaling factor of channel c by minimizing
Figure GDA0004169932550000146
And HR downsampled version ∈ ->
Figure GDA0004169932550000147
The least squares loss between corresponding pixels.
S208, optimizing the network using the chaseonnier loss between the corrected output and HR;
based on the data set, the invention provides a Real-RawVSR method, which processes the Raw data input in two branches, wherein one branch is used for Bayer format input, and the other branch is used for subframe format input; by utilizing the proposed joint alignment, interaction and time channel fusion module, the complementary information of the two branches is well explored; experiments show that the proposed method is superior to the currently mainstream VSR method of Raw and sRGB; through research and exploration of the invention, more researches on a video super-resolution method based on a Raw domain are hoped to be inspired. The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical solution and the modified concept thereof, within the scope of the present invention.

Claims (4)

1. The method for constructing the real-world video super-resolution based on the Raw domain is characterized by comprising the following steps of: the method comprises the following steps:
s1, establishing a real world Raw video super-resolution data set: the data set establishment process mainly comprises the following 3 steps:
s101, hardware design: the incident light is split into two beams with a brightness ratio of 1 by a spectroscope: 1 and a transmitted beam; capturing LR-HR frame pairs of different proportions using a DSLR camera of a zoom lens; designing and printing a 3D model box to fix the spectroscope, placing the DSLR camera and the spectroscope box on an optical plate, and fixing a tripod below the DSLR camera and the spectroscope box;
s102, data acquisition: acquiring a Raw video in an MLV format, and then processing the Raw video in the MLV format by using MlRawViewer software to obtain corresponding sRGB frames and Raw frames in a DNG format;
s103, data processing: using a coarse-to-fine alignment strategy to generate aligned LR-HR frames, including sRGB frame pairs
Figure FDA0004169932530000011
And Raw frame pair->
Figure FDA0004169932530000012
S2, adopting an LR Raw frame and an HR sRGB frame based on the frame after S1 data processing
Figure FDA0004169932530000013
Designing a real world Raw video super-resolution algorithm for the training pair;
s3, training a model: constructing a model based on an algorithm designed in the step S2, training the model by utilizing a deep learning framework Pytorch platform, iterating 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until the loss converges to obtain a final model;
s4, inputting the low-resolution Raw video sequence in the test set into the model obtained in the S3, and obtaining a corresponding super-resolution input result.
2. The method for constructing the real world video super resolution based on the Raw domain according to claim 1, wherein the method comprises the following steps: the step of performing data processing on the sRGB frame using the alignment policy in S103 is as follows:
s1031, firstly, estimating a homography matrix H between up-sampling LR and HR frames by using SIFT key points selected by a RANSAC algorithm;
s1032 then align HR frames
Figure FDA0004169932530000021
To coarsely crop out the corresponding region in the LR frame that matches the HR frame;
s1033, performing pixel-by-pixel alignment on the matching area by using a traditional optical flow estimation method DeepFlow;
s1034, finally, clipping the center region to eliminate alignment artifacts around the boundary, generating an aligned LR-HR frame in the RGB domain, using
Figure FDA0004169932530000022
And (3) representing.
3. The method for constructing the real world video super resolution based on the Raw domain according to claim 2, wherein the method comprises the following steps:
the method comprises the steps of performing data processing on a Raw frame by adopting an alignment strategy, firstly, reorganizing an original frame into an RGGB subformat, wherein the size of the RGGB subformat is half of that of an sRGB frame, and therefore, the H matrix calculated from the sRGB frame needs to be changed by readjusting a translation parameter at a rate of 0.5; deepflow is also processed in the same manner and in this way a Raw frame pair is generated
Figure FDA0004169932530000031
4. The method for constructing the real world video super resolution based on the Raw domain according to claim 1, wherein the method comprises the following steps: the real world Raw video super-resolution algorithm flow described in S2 mainly comprises the following steps:
s201, double-branch strategy and feature extraction: to fully utilize the information of the Raw data, the LR consecutive frames are input
Figure FDA0004169932530000032
Into two branches of the network in different forms; the Bayer format branch directly uses the Raw continuous frame itself as input; the sub-frame format branching uses the recombined RGGB sub-format to form a new sequence as input; the input of the Bayer format branch is denoted +.>
Figure FDA0004169932530000033
The input of the sub-frame branch is denoted +.>
Figure FDA0004169932530000034
The number of channels is 4 times; the Bayer format branches keep the original sequence of original pixels, which is beneficial to space reconstruction; although the sub-frame format branch does not preserve the original pixel order, it can exploit far-neighbor correlation to generate a thinA section; then, the two inputs respectively pass through a feature extraction module, wherein the feature extraction module is composed of five residual blocks;
s202, joint alignment: due to the time offset between adjacent frames, it is necessary to warp adjacent frames to a center frame; the alignment is carried out on the basis of a multilevel cascade alignment strategy, namely an alignment offset is calculated from a subframe format branch, and then the calculated offset is directly copied to a Bayer format branch for alignment, namely two branches are aligned together; features in sub-frame format branching
Figure FDA0004169932530000035
And->
Figure FDA0004169932530000036
L-1 times of convolution downsampling is carried out to form an L-level pyramid; pyramid features in Bayer format branches are constructed in the same way; the offset of the first stage is calculated according to the aggregate characteristics of the first stage and the up-sampling result of the offset of the (l+1) th stage:
Figure FDA0004169932530000041
since the input of the subframe format branch is actually a downsampled version of the Bayer format branch, the offset value of the Bayer format branch should be twice that of the subframe format branch; thus, the offset in the sub-frame format branch may be used
Figure FDA0004169932530000042
Up-sampling by 2 times and amplifying by 2 times to obtain Bayer format branch of the first stage +.>
Figure FDA0004169932530000043
Offset of (2):
Figure FDA0004169932530000044
given an offset, the alignment features of the two branches can be expressed as:
Figure FDA0004169932530000045
Figure FDA0004169932530000046
where g represents the mapping function implemented by several convolution layers and Dconv represents the deformable convolution; dconv of the two branches share the same weight at the respective level;
after L levels are aligned, further use is made of
Figure FDA0004169932530000047
And->
Figure FDA0004169932530000048
Offset calculated between +.>
Figure FDA0004169932530000049
To optimize
Figure FDA00041699325300000410
And->
Figure FDA00041699325300000411
And generates final alignment results for adjacent features in both branches>
Figure FDA00041699325300000412
And->
Figure FDA00041699325300000413
S203, an interaction module: the Bayer format branch features are downsampled by a 3 x 3 convolution and LeakyRelu layer, and these downsampled features are aggregated with features in the subframe format branches; similarly, sub-frame format branching features are up-sampled by a Pixelshuffle and then aggregated with features in the Bayer format branching;
s204, time fusion: aggregating the remote features with a non-local time awareness module to enhance feature representation along a time dimension; then fusing the features together using a fusion based on temporal spatial attention;
s205, channel fusion: combining features in two branches together using channel fusion, because
Figure FDA0004169932530000051
And->
Figure FDA0004169932530000052
May have different contributions to the final SR reconstruction; fusing the two branch features through channel weighted average by adopting selective kernel convolution;
s206, reconstruction and upsampling: features after fusion
Figure FDA0004169932530000053
Input into a reconstruction module implemented by 10 ResNet blocks for SR reconstruction; after reconstruction, the reconstruction is carried out by utilizing a Pixelshuffle to carry out up-sampling, and then three-channel output is generated by utilizing a convolution layer; meanwhile, the module also utilizes two long jump connections, one is the input of Bayer format for LR, which is processed by a convolution layer first, and then is up-sampled to three-channel output through Pixelshubble; the other is used for LR subframe format input, which is up-sampled twice because of its half space size as the original input, and the three outputs are added to generate the final HR result
Figure FDA0004169932530000054
S207, color correction and loss function: actual shooting data
Figure FDA0004169932530000055
And->
Figure FDA0004169932530000056
There is a difference in color and brightness, and directly exploiting the pixel loss between output and HR may result in the network optimizing color and brightness corrections without concern for the task of SR; to solve this problem, color correction is used before the loss calculation, i.e., channel-based color correction is used for RGB channels, respectively, instead of calculating a 3×3 color correction matrix to correct them simultaneously:
Figure FDA0004169932530000061
wherein alpha is c Is the scaling factor of channel c by minimizing
Figure FDA0004169932530000062
And HR downsampled version ∈ ->
Figure FDA0004169932530000063
A least squares loss between corresponding pixels;
s208, optimizing the network using the chaseonnier loss between the corrected output and HR.
CN202210733861.5A 2022-06-27 2022-06-27 Real world video super-resolution construction method based on Raw domain Active CN115115516B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210733861.5A CN115115516B (en) 2022-06-27 2022-06-27 Real world video super-resolution construction method based on Raw domain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210733861.5A CN115115516B (en) 2022-06-27 2022-06-27 Real world video super-resolution construction method based on Raw domain

Publications (2)

Publication Number Publication Date
CN115115516A CN115115516A (en) 2022-09-27
CN115115516B true CN115115516B (en) 2023-05-12

Family

ID=83329552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210733861.5A Active CN115115516B (en) 2022-06-27 2022-06-27 Real world video super-resolution construction method based on Raw domain

Country Status (1)

Country Link
CN (1) CN115115516B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051380B (en) * 2023-01-13 2023-08-22 深圳大学 Video super-resolution processing method and electronic equipment
CN116596779B (en) * 2023-04-24 2023-12-01 天津大学 Transform-based Raw video denoising method
CN117745539A (en) * 2023-11-13 2024-03-22 天津大学 Real double-shot reference superdivision method based on Kernel-Free matching

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700392A (en) * 2020-12-01 2021-04-23 华南理工大学 Video super-resolution processing method, device and storage medium
CN113538249A (en) * 2021-09-03 2021-10-22 中国矿业大学 Image super-resolution reconstruction method and device for video monitoring high-definition presentation

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010122934A (en) * 2008-11-20 2010-06-03 Sony Corp Image processing apparatus, image processing method, and program
US9384386B2 (en) * 2014-08-29 2016-07-05 Motorola Solutions, Inc. Methods and systems for increasing facial recognition working rang through adaptive super-resolution
US10783611B2 (en) * 2018-01-02 2020-09-22 Google Llc Frame-recurrent video super-resolution
US11790489B2 (en) * 2020-04-07 2023-10-17 Samsung Electronics Co., Ltd. Systems and method of training networks for real-world super resolution with unknown degradations
CN111583112A (en) * 2020-04-29 2020-08-25 华南理工大学 Method, system, device and storage medium for video super-resolution
CN113240581A (en) * 2021-04-09 2021-08-10 辽宁工程技术大学 Real world image super-resolution method for unknown fuzzy kernel
CN112991183B (en) * 2021-04-09 2023-06-20 华南理工大学 Video super-resolution method based on multi-frame attention mechanism progressive fusion
CN113469884A (en) * 2021-07-15 2021-10-01 长视科技股份有限公司 Video super-resolution method, system, equipment and storage medium based on data simulation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700392A (en) * 2020-12-01 2021-04-23 华南理工大学 Video super-resolution processing method, device and storage medium
CN113538249A (en) * 2021-09-03 2021-10-22 中国矿业大学 Image super-resolution reconstruction method and device for video monitoring high-definition presentation

Also Published As

Publication number Publication date
CN115115516A (en) 2022-09-27

Similar Documents

Publication Publication Date Title
CN115115516B (en) Real world video super-resolution construction method based on Raw domain
Kalantari et al. Deep HDR video from sequences with alternating exposures
CN102693538B (en) Generate the global alignment method and apparatus of high dynamic range images
JP2019514116A (en) Efficient canvas view generation from intermediate views
CN108875900B (en) Video image processing method and device, neural network training method and storage medium
CN113454680A (en) Image processor
KR20130013288A (en) High dynamic range image creation apparatus of removaling ghost blur by using multi exposure fusion and method of the same
Liu et al. Exploit camera raw data for video super-resolution via hidden markov model inference
CN113228094A (en) Image processor
CN111986084A (en) Multi-camera low-illumination image quality enhancement method based on multi-task fusion
CN114972061B (en) Method and system for denoising and enhancing dim light video
CN108024054A (en) Image processing method, device and equipment
US11849264B2 (en) Apparatus and method for white balance editing
CN112508812A (en) Image color cast correction method, model training method, device and equipment
Yue et al. Real-rawvsr: Real-world raw video super-resolution with a benchmark dataset
CN112750092A (en) Training data acquisition method, image quality enhancement model and method and electronic equipment
CN114862707A (en) Multi-scale feature recovery image enhancement method and device and storage medium
CN113379609A (en) Image processing method, storage medium and terminal equipment
Shen et al. Spatial temporal video enhancement using alternating exposures
CN117768774A (en) Image processor, image processing method, photographing device and electronic device
WO2023110880A1 (en) Image processing methods and systems for low-light image enhancement using machine learning models
CN111161189A (en) Single image re-enhancement method based on detail compensation network
CN116245968A (en) Method for generating HDR image based on LDR image of transducer
CN111968039A (en) Day and night universal image processing method, device and equipment based on silicon sensor camera
CN112991174A (en) Method and system for improving resolution of single-frame infrared image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant