CN115115516A - Real-world video super-resolution algorithm based on Raw domain - Google Patents

Real-world video super-resolution algorithm based on Raw domain Download PDF

Info

Publication number
CN115115516A
CN115115516A CN202210733861.5A CN202210733861A CN115115516A CN 115115516 A CN115115516 A CN 115115516A CN 202210733861 A CN202210733861 A CN 202210733861A CN 115115516 A CN115115516 A CN 115115516A
Authority
CN
China
Prior art keywords
frame
raw
branch
format
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210733861.5A
Other languages
Chinese (zh)
Other versions
CN115115516B (en
Inventor
岳焕景
张芝铭
杨敬钰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202210733861.5A priority Critical patent/CN115115516B/en
Publication of CN115115516A publication Critical patent/CN115115516A/en
Application granted granted Critical
Publication of CN115115516B publication Critical patent/CN115115516B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/35Determination of transform parameters for the alignment of images, i.e. image registration using statistical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a real-world video super-resolution algorithm based on a Raw domain, and relates to the technical field of video signal processing. The real-world video super-resolution algorithm based on the Raw domain comprises the following steps: s1, establishing a real world Raw video super-resolution data set; s2, designing a real-world Raw video super-resolution algorithm based on S1; s3, training a model; s4, inputting the low-resolution Raw video sequence in the test set into the model to obtain a corresponding super-resolution input result; the invention constructs a first real world VSR data set with three multiplying powers in Raw and sRGB domains, and provides a reference data set for training and evaluating a real original VSR method; the invention improves the real LR video super-resolution performance to a new height by utilizing the proposed joint alignment interaction module and the time and channel fusion module.

Description

Real-world video super-resolution algorithm based on Raw domain
Technical Field
The invention belongs to the technical field of video signal processing, and relates to a real-world video super-resolution algorithm based on a Raw domain.
Background
Capturing video with a short focus lens can enlarge the viewing angle by sacrificing resolution, while capturing with a long focus lens can improve resolution by sacrificing viewing angle; video super-resolution (VSR) is an efficient way to acquire wide-angle and high-resolution (HR) video; video super-resolution reconstructs high-resolution video from low-resolution (LR) input by exploring the spatial and temporal correlation of the input sequence; in recent years, the development of video super-resolution has shifted from traditional model-driven to deep learning-based approaches.
The performance of these deep learning based SR methods depends to a large extent on the training dataset, considering that synthetic LR-HR datasets, such as DIV2K and REDS, cannot represent a model of degradation between the real captured LR and HR images, and therefore many real SR datasets are constructed to improve real-world SR performance; however, most of these datasets are for static LR-HR images, such as RealSR and ImagePairs. Recently, researchers have proposed the first real-world VSR dataset by capturing with the multi-camera system using iPhone11 ProMax; however, parallax between the LR and HR cameras increases the difficulty of alignment, and since the handset camera has a limited focal length, there are only 2 times as many LR-HR sequence pairs in the data set.
On the other hand, the Raw image is used for carrying out the trend of real scene image (video) recovery, such as weak light enhancement, denoising, deblurring and super-resolution; the main reason is that Raw images have a wide bit depth (12 or 14 bits), i.e. contain the most primitive information, and their intensity is linear with the illumination; however, the work to explore the super-resolution of Raw video is still rare; researchers have proposed a Raw video super-resolution data set by synthesizing LR original frames by down-sampling from captured HR original frames; nevertheless, there is still a gap between the synthesized LR original frame and the real captured frame, which makes the SR model trained on the synthesized data not well generalized to real scenes.
Disclosure of Invention
The technical problems to be solved by the invention are as follows:
(1) the invention aims to establish a real-world Raw video super-resolution data set and provides a video super-resolution algorithm adapted to Raw data on the basis of the real-world Raw video super-resolution data set.
In order to achieve the purpose, the invention adopts the following technical scheme:
the real-world video super-resolution algorithm based on the Raw domain comprises the following steps:
s1, establishing a real world Raw video super-resolution data set: the process of establishing the data set mainly comprises the following 3 steps:
s101, hardware design: the incident light is divided into two beams by a spectroscope, the brightness ratio of the two beams is 1: 1, reflected and transmitted beams; capturing LR-HR frame pairs of different scales using a zoom lens DSLR camera; designing and printing a 3D model box to fix a spectroscope, placing a DSLR camera and the spectroscope box on an optical plate, and fixing a tripod below the DSLR camera and the spectroscope box;
s102, data acquisition: acquiring a Raw video in an MLV format, and then processing the Raw video in the MLV format by using MlRawViewer software to obtain a corresponding sRGB frame and a corresponding Raw frame in a DNG format;
s103, data processing: utilizing a coarse-to-fine alignment strategy to generate aligned LR-HR frames, including sRGB frame pairs
Figure BDA0003714834410000031
And Raw frame pair
Figure BDA0003714834410000032
S2, based on the processed frame S1, adopting LR Raw frame and HR sRGB frame
Figure BDA0003714834410000033
Designing a real-world Raw video super-resolution algorithm for the training pair;
s3, training a model: building a model based on an algorithm designed in S2, utilizing a deep learning frame Pythrch platform training model, iterating for 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until loss is converged to obtain a final model;
and S4, inputting the low-resolution Raw video sequence in the test set into the model to obtain a corresponding super-resolution input result.
Preferably, the step of performing data processing on the sRGB frame by using the alignment policy in S103 is as follows:
s1031, firstly, estimating a homography matrix H between the up-sampling LR and the HR frames by using SIFT key points selected by a RANSAC algorithm;
s1032, then aligning the HR frame
Figure BDA0003714834410000041
To roughly crop out the corresponding region in the LR frame that matches the HR frame;
s1033, performing pixel-by-pixel alignment on the matching area by utilizing a traditional optical flow estimation method DeepFlow;
s1034, finally, cutting the central area to eliminate the alignment artifact around the boundary, generating the aligned LR-HR frame in the RGB domain, and using
Figure BDA0003714834410000042
And (4) showing.
Preferably, Raw frames should pass the same alignment strategy as sRGB frames, however, applying global and local alignment directly would destroy the Bayer format of the Raw input; performing data processing on the Raw frame by adopting an alignment strategy, firstly recombining the original frame into an RGGB subformat, wherein the size of the RGGB subformat is half of that of the sRGB frame, so that an H matrix calculated from the sRGB frame needs to be changed by readjusting translation parameters at a ratio of 0.5; deepflow is also processed in the same manner and in this way generates the Raw frame pairs
Figure BDA0003714834410000043
Preferably, the real-world Raw video super-resolution algorithm flow described in S2 mainly includes the following steps:
s201, double-branch strategy and feature extraction: in order to fully utilize the information of Raw data, input LR continuous frames
Figure BDA0003714834410000044
Sending the data into two branches of the network in different forms; the Bayer format branch directly uses Raw continuous frames as input; the sub-frame format branch uses the recombined RGGB sub-format to form a new sequence as input; representing the input of the Bayer format branch as
Figure BDA0003714834410000045
The input of the sub-frame branch is represented as
Figure BDA0003714834410000046
The number of channels is 4 times; the Bayer format branches keep the original sequence of original pixels, which is beneficial to spatial reconstruction; although the sub-frame format branch cannot preserve the original pixel order, it can exploit far-neighbor correlation to generate detail; then, the two inputs respectively pass through a feature extraction module, wherein the feature extraction module is composed of five residual blocks;
s202, joint alignment: because of the time misalignment between the adjacent frames, the adjacent frames need to be distorted to the central frame; aligning on the basis of a multi-stage cascade alignment strategy, namely calculating alignment offset from a subframe format branch, and then directly copying the calculated offset to a Bayer format branch for aligning, namely that the two branches are aligned together; features in subframe format branching
Figure BDA0003714834410000051
And
Figure BDA0003714834410000052
performing convolution and downsampling for L-1 times to form an L-level pyramid; the pyramid features in the Bayer format branches are constructed in the same way; the offset of the l-th stage is calculated according to the aggregation characteristic of the l-th stage and the upsampling result of the offset of the (l +1) -th stage:
Figure BDA0003714834410000053
since the input of the sub-frame format branches is actually BayA downsampled version of the Bayer format branch, so the offset value of the Bayer format branch should be twice that of the subframe format branch; thus, the offset in the branching of the subframe format may be passed
Figure BDA0003714834410000054
Carrying out 2 times of upsampling and 2 times of amplification to obtain the Bayer format branch of the l level
Figure BDA0003714834410000055
Offset amount of (2):
Figure BDA0003714834410000056
given the offset, the alignment characteristics of the two branches can be expressed as:
Figure BDA0003714834410000057
Figure BDA0003714834410000061
where g denotes the mapping function implemented by several convolution layers and Dconv denotes the deformable convolution. Dconv of both branches share the same weight at the respective level;
after L levels of alignment, further use
Figure BDA0003714834410000062
And
Figure BDA0003714834410000063
calculated offset between
Figure BDA0003714834410000064
To optimize
Figure BDA0003714834410000065
And
Figure BDA0003714834410000066
and generates final alignment results for adjacent features in both branches
Figure BDA0003714834410000067
And
Figure BDA0003714834410000068
s203, an interaction module: the Bayer pattern branch features are downsampled by 3 × 3 convolution (stride 2) and the LeakyRelu layers, and these downsampled features are aggregated with the features in the subframe pattern branch; similarly, the subframe format branch features are upsampled by pixelschuffle and then aggregated with the features in the Bayer format branch;
s204, time fusion: aggregating remote features with a non-local temporal attention module to enhance the feature representation along a temporal dimension; features are then fused together using spatio-temporal attention (TSA) based fusion;
s205, channel fusion: features in both branches are merged together using channel fusion, since
Figure BDA0003714834410000069
And
Figure BDA00037148344100000610
may contribute differently to the final SR reconstruction; fusing two branch characteristics by adopting selective kernel convolution (SKF) through channel weighted average;
s206, reconstruction and upsampling: the features after fusion
Figure BDA00037148344100000611
Inputting the data into a reconstruction module realized by 10 ResNet blocks for SR reconstruction; after reconstruction, utilizing Pixelshuffle to perform upsampling on the reconstructed image, and then utilizing a convolutional layer to generate three-channel output; while the module also utilizes two long hop connections, one is the input to the Bayer format for LR, which is first processed by the convolutional layer and then upsampled by PixelshuffleOutput to three channels; the other is for the LR subframe format input, which is up-sampled twice because its spatial size is half of the original input, and the three outputs are added to generate the final HR result
Figure BDA0003714834410000071
S207, color correction and loss function: of actual photographic data
Figure BDA0003714834410000072
And
Figure BDA0003714834410000073
there are differences in color and brightness, exploiting pixel loss directly between the output and HR may lead to a network optimizing color and brightness correction without paying attention to the task of SR; to solve this problem, instead of computing a 3 × 3 color correction matrix to correct them simultaneously, color correction is used before the loss computation, i.e. channel-based color correction is used separately for the RGB channels:
Figure BDA0003714834410000074
wherein alpha is c Is the scaling factor of channel c, which is obtained by minimizing
Figure BDA0003714834410000075
And versions of HR downsampling
Figure BDA0003714834410000076
Calculated corresponding to the least squares penalty between pixels.
S208, optimizing the network by using the Charbonier loss between the corrected output and the HR.
The beneficial effects of the invention comprise the following three points:
(1) the invention constructs a first real-world VSR data set with three multiplying factors in the Raw and sRGB domains, and provides a reference data set for training and evaluating a real original VSR method.
(2) The invention provides a Real-RawVSR method based on a Real-world Raw video super-resolution data set obtained in S1, wherein Raw data input in two branches is processed, one branch is used for Bayer format input, and the other branch is used for subframe format input; by utilizing the proposed joint alignment, interaction and time and channel fusion module, complementary information of two branches is well explored, and the real LR video super-resolution performance is improved to a new height.
(3) Experiments carried out based on the invention show that the proposed method is superior to the VSR method of Raw and sRGB in the current mainstream; through research and exploration of the invention, the research of a video super-resolution method based on the Raw domain can be expected to be inspired.
Drawings
FIG. 1 is a hardware platform and data processing flow chart in a real-world video super-resolution algorithm based on a Raw domain adopted by the invention;
FIG. 2 is a flow chart of an algorithm in a real-world video super-resolution algorithm based on a Raw domain adopted by the invention;
FIG. 3 is a combined alignment module structure diagram in the real-world video super-resolution algorithm based on the Raw domain adopted by the present invention;
FIG. 4 is a table comparing the result indexes of the algorithm used in the present invention and other video super-resolution algorithms in the test set.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
referring to fig. 1, the real-world video super-resolution algorithm based on the Raw domain includes the following steps:
s1, establishing a real-world Raw video super-resolution data set: the process of establishing the data set mainly comprises the following 3 steps:
s101, hardware design: to capture LR-HR frame pairs of different scales, a DSLR camera with an 18-135mm zoom lens was used instead of a cell phone camera; in order to avoid the influence of natural light from other directions, a 3D model box is designed and printed to fix the spectroscope; so that two cameras can receive natural light from the same viewpoint, and the size of the spectroscope is 150 multiplied by 1 (mm) 3 ) Sufficient to cover the camera lens; according to the invention, the camera and the spectroscope box are placed on the optical plate, and the tripod is fixed below the optical plate, so that the stability of the optical plate is improved;
s102, data acquisition: the method uses two Canon 60D cameras with upgraded third-party software MagicLantern to acquire Raw videos in a MagicLantern video (MLV) format; in order to keep the cameras synchronous, the invention uses an infrared remote controller to send signals to the two cameras so as to simultaneously control shooting, and in the shooting process, the invention keeps the ISO of the two cameras in the range of 100 to 1600 so as to avoid noise, and the exposure time ranges from 1/400 seconds to 1/31 seconds so as to capture slow motion and fast motion; all other settings are set to default values to simulate a real capture scene; then, the MLV video is processed by using MlRawViewer software to obtain a corresponding sRGB frame and a corresponding Raw frame in a DNG format; for each scene, the invention captures a 6 second short video, with a frame rate of 25FPS, i.e., each video contains about 150 frames of Raw and sRGB format maps;
s103, data processing: misalignment still exists between the LR-HR pairs due to lens distortion; the present invention therefore utilizes a coarse-to-fine alignment strategy to generate aligned LR-HR frames, including sRGB frame pairs
Figure BDA0003714834410000101
And Raw frame pair
Figure BDA0003714834410000102
The step of performing data processing on the sRGB frame by using the alignment policy in S103 is as follows:
s1031, firstly, estimating a homography matrix H between the up-sampling LR and the HR frames by using SIFT key points selected by a RANSAC algorithm;
s1032, then aligning the HR frame
Figure BDA0003714834410000103
To roughly crop out the corresponding region in the LR frame that matches the HR frame;
s1033, performing pixel-by-pixel alignment on the matching area by utilizing a traditional optical flow estimation method DeepFlow;
s1034, finally, cutting the central area to eliminate the alignment artifact around the boundary, generating the aligned LR-HR frame in the RGB domain, and using
Figure BDA0003714834410000104
Represents;
the Raw frame should pass the same alignment strategy as the sRGB frame, however, applying global and local alignment directly would destroy the Bayer format of the Raw input; performing data processing on the Raw frame by adopting an alignment strategy, firstly recombining the original frame into an RGGB subformat, wherein the size of the RGGB subformat is half of that of the sRGB frame, so that an H matrix calculated from the sRGB frame needs to be changed by readjusting translation parameters at a ratio of 0.5; deepflow is also processed in the same manner, and in this way, generates a Raw frame pair
Figure BDA0003714834410000105
S2, based on the processed frame S1, adopting LR Raw frame and HR sRGB frame
Figure BDA0003714834410000106
Designing a real-world Raw video super-resolution algorithm for a training pair;
s3, training a model: the continuous frame number input by the training model is 5 frames, the used optimizer is an Adam optimizer, and the initial learning rate is set to be 0.0001; building a model based on an algorithm designed in S2, utilizing a deep learning frame Pythrch platform training model, iterating for 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until loss is converged to obtain a final model;
utilizing a deep learning frame Pythroch platform training model, iterating for 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until loss is converged to obtain a final model;
s4, inputting the low-resolution Raw video sequence in the test set into the model to obtain a corresponding super-resolution input result;
the invention constructs a first real-world VSR data set with three multiplying factors in the Raw and sRGB domains, and provides a reference data set for training and evaluating a real original VSR method.
Example 2:
referring to fig. 2-4, there is a difference based on embodiment 1:
the real-world Raw video super-resolution algorithm flow in the S2 mainly comprises the following steps:
s201, double-branch strategy and feature extraction: in order to fully utilize the information of Raw data, input LR continuous frames
Figure BDA0003714834410000121
Sending the data into two branches of the network in different forms; the Bayer format branch directly uses Raw continuous frames as input; the sub-frame format branch uses the recombined RGGB sub-format to form a new sequence as input; representing the input of the Bayer format branch as
Figure BDA0003714834410000122
The input of the sub-frame branch is represented as
Figure BDA0003714834410000123
The number of channels is 4 times; the Bayer format branches keep the original sequence of the original pixels, which is beneficial to space reconstruction; although the sub-frame format branch cannot preserve the original pixel order, it can exploit far-neighbor correlation to generate detail; then, the two inputs respectively pass through a feature extraction module, wherein the feature extraction module is composed of five residual blocks;
s202, joint alignment: because of the time misalignment between the adjacent frames, the adjacent frames need to be distorted to the central frame; aligning on the basis of a multi-stage cascade alignment strategy, namely calculating alignment offset from a subframe format branch, and then directly copying the calculated offset to a Bayer format branch for aligning, namely that the two branches are aligned together; features in sub-frame format branching
Figure BDA0003714834410000124
And
Figure BDA0003714834410000125
performing convolution and downsampling for L-1 times to form an L-level pyramid; the pyramid features in the Bayer format branches are constructed in the same way; the offset of the l-th stage is calculated according to the aggregation characteristic of the l-th stage and the upsampling result of the offset of the (l +1) -th stage:
Figure BDA0003714834410000126
since the input of the sub-frame format branch is actually a down-sampled version of the Bayer format branch, the offset value of the Bayer format branch should be twice that of the sub-frame format branch; thus, the offset in the branching of the subframe format may be passed
Figure BDA0003714834410000127
Carrying out 2 times of upsampling and 2 times of amplification to obtain the Bayer format branch of the l level
Figure BDA0003714834410000131
Offset amount of (2):
Figure BDA0003714834410000132
given the offset, the alignment characteristics of the two branches can be expressed as:
Figure BDA0003714834410000133
Figure BDA0003714834410000134
where g denotes the mapping function implemented by several convolution layers and Dconv denotes the deformable convolution. Dconv of both branches share the same weight at the respective level;
after L levels of alignment, further use
Figure BDA0003714834410000135
And
Figure BDA0003714834410000136
calculated offset between
Figure BDA0003714834410000137
To optimize
Figure BDA0003714834410000138
And
Figure BDA0003714834410000139
and generates final alignment results for adjacent features in both branches
Figure BDA00037148344100001310
And
Figure BDA00037148344100001311
s203, an interaction module: the Bayer pattern branch features are downsampled by 3 × 3 convolution (stride 2) and the LeakyRelu layers, and these downsampled features are aggregated with the features in the subframe pattern branch; similarly, the subframe format branch features are upsampled by pixelschuffle and then aggregated with the features in the Bayer format branch;
s204, time fusion: aggregating remote features with a non-local temporal attention module to enhance the feature representation along the temporal dimension; features are then fused together using spatio-temporal attention (TSA) based fusion;
s205, channel fusion: features in both branches are merged together using channel fusion, since
Figure BDA00037148344100001312
And
Figure BDA00037148344100001313
may contribute differently to the final SR reconstruction; fusing two branch characteristics by adopting selective kernel convolution (SKF) through channel weighted average;
s206, reconstruction and upsampling: feature after fusion
Figure BDA0003714834410000141
Inputting the data into a reconstruction module realized by 10 ResNet blocks for SR reconstruction; after reconstruction, utilizing Pixelshuffle to perform upsampling on the reconstructed image, and then utilizing a convolutional layer to generate three-channel output; the module also utilizes two long hop connections, one is the input for the Bayer format of LR, which is first processed by the convolutional layer and then up-sampled by Pixelshuffle to three-channel output; the other is for the LR subframe format input, which is up-sampled twice because its spatial size is half of the original input, and the three outputs are added to generate the final HR result
Figure BDA0003714834410000142
S207, color correction and loss function: of actual shot data
Figure BDA0003714834410000143
And
Figure BDA0003714834410000144
there are differences in color and brightness, exploiting pixel loss directly between the output and HR may lead to a network optimizing color and brightness correction without paying attention to the task of SR; to solve this problem, the loss is calculatedInstead of computing a 3 x 3 color correction matrix to correct them simultaneously, color correction was previously used, i.e. channel-based color correction was used separately for the RGB channels:
Figure BDA0003714834410000145
wherein alpha is c Is the scaling factor of channel c by minimizing
Figure BDA0003714834410000146
And versions of HR downsampling
Figure BDA0003714834410000147
Calculated corresponding to the least squares penalty between pixels.
S208, optimizing the network by using the Charbonier loss between the corrected output and the HR;
based on the data set, the invention provides a Real-RawVSR method, by processing Raw data input in two branches, one branch is used for Bayer format input, and the other branch is used for subframe format input; by utilizing the proposed joint alignment, interaction and time and channel fusion module, the complementary information of the two branches is well explored; experiments show that the proposed method is superior to the current mainstream VSR methods of Raw and sRGB; through research and exploration of the invention, the research of a video super-resolution method based on the Raw domain can be expected to be inspired.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the equivalent replacement or change according to the technical solution and the modified concept of the present invention should be covered by the scope of the present invention.

Claims (4)

1. The real world video super-resolution algorithm based on the Raw domain is characterized in that: the method comprises the following steps:
s1, establishing a real-world Raw video super-resolution data set: the process of establishing the data set mainly comprises the following 3 steps:
s101, hardware design: the incident light is divided into two beams by a spectroscope, the brightness ratio of the two beams is 1: 1, reflected and transmitted beams; capturing LR-HR frame pairs of different scales using a zoom lens DSLR camera; designing and printing a 3D model box to fix a spectroscope, placing a DSLR camera and the spectroscope box on an optical plate, and fixing a tripod below the DSLR camera and the spectroscope box;
s102, data acquisition: acquiring a Raw video in an MLV format, and then processing the Raw video in the MLV format by using MlRawViewer software to obtain a corresponding sRGB frame and a corresponding Raw frame in a DNG format;
s103, data processing: utilizing a coarse-to-fine alignment strategy to generate aligned LR-HR frames, including sRGB frame pairs
Figure FDA0003714834400000011
And Raw frame pair
Figure FDA0003714834400000012
S2, based on the processed frame S1, adopting LR Raw frame and HR sRGB frame
Figure FDA0003714834400000013
Designing a real-world Raw video super-resolution algorithm for a training pair;
s3, training a model: building a model based on an algorithm designed in S2, utilizing a deep learning frame Pythrch platform training model, iterating for 300k times on the whole data set, then reducing the learning rate to 0.00001, and continuing iterating until loss is converged to obtain a final model;
and S4, inputting the low-resolution Raw video sequence in the test set into the model obtained in the S3 to obtain a corresponding super-resolution input result.
2. The real-world video super-resolution algorithm based on the Raw domain as claimed in claim 1, wherein: the step of performing data processing on the sRGB frame by using the alignment policy in S103 is as follows:
s1031, firstly, estimating a homography matrix H between the up-sampling LR and the HR frames by using SIFT key points selected by a RANSAC algorithm;
s1032 then align the HR frames
Figure FDA0003714834400000022
To roughly crop out the corresponding region in the LR frame that matches the HR frame;
s1033, performing pixel-by-pixel alignment on the matching area by utilizing a traditional optical flow estimation method DeepFlow;
s1034, finally, cutting the central area to eliminate the alignment artifact around the boundary, generating the aligned LR-HR frame in the RGB domain, and using
Figure FDA0003714834400000021
And (4) showing.
3. The real-world video super-resolution algorithm based on the Raw domain as claimed in claim 2, wherein:
performing data processing on the Raw frame by adopting an alignment strategy, firstly recombining the original frame into an RGGB subformat, wherein the size of the RGGB subformat is half of that of the sRGB frame, so that an H matrix calculated from the sRGB frame needs to be changed by readjusting translation parameters at a ratio of 0.5; deepflow is also processed in the same manner, and in this way, generates a Raw frame pair
Figure FDA0003714834400000031
4. The real-world video super-resolution algorithm based on the Raw domain as claimed in claim 1, wherein: the real-world Raw video super-resolution algorithm flow in the S2 mainly comprises the following steps:
s201, double-branch strategy and feature extraction: to fully utilize the information of Raw data, input LR continuous frames
Figure FDA0003714834400000032
Sending the data into two branches of the network in different forms; the Bayer format branch directly uses Raw continuous frames as input; the sub-frame format branch uses the recombined RGGB sub-format to form a new sequence as input; representing the input of the Bayer format branch as
Figure FDA0003714834400000033
The input of the sub-frame branch is expressed as
Figure FDA0003714834400000034
The number of channels is 4 times; the Bayer format branches keep the original sequence of original pixels, which is beneficial to spatial reconstruction; although the sub-frame format branch cannot preserve the original pixel order, it can exploit far-neighbor correlation to generate detail; then, the two inputs respectively pass through a feature extraction module, wherein the feature extraction module is composed of five residual blocks;
s202, joint alignment: because of the time misalignment between the adjacent frames, the adjacent frames need to be distorted to the central frame; aligning on the basis of a multi-stage cascade alignment strategy, namely calculating alignment offset from a subframe format branch, and directly copying the calculated offset to a Bayer format branch for aligning, namely that two branches are aligned together; features in subframe format branching
Figure FDA0003714834400000035
And
Figure FDA0003714834400000036
performing convolution and downsampling for L-1 times to form an L-level pyramid; pyramid features in Bayer format branches are constructed in the same manner; the offset of the l-th stage is calculated according to the aggregation characteristic of the l-th stage and the upsampling result of the offset of the (l +1) -th stage:
Figure FDA0003714834400000041
since the input of the sub-frame format branch is actually a down-sampled version of the Bayer format branch, the offset value of the Bayer format branch should be twice that of the sub-frame format branch; thus, the offset in the branching of the subframe format may be passed
Figure FDA0003714834400000042
Carrying out 2 times of upsampling and 2 times of amplification to obtain the Bayer format branch of the l level
Figure FDA0003714834400000043
Offset of (c):
Figure FDA0003714834400000044
given the offset, the alignment characteristics of the two branches can be expressed as:
Figure FDA0003714834400000045
Figure FDA0003714834400000046
where g denotes the mapping function implemented by several convolution layers and Dconv denotes the deformable convolution. Dconv of both branches share the same weight at the respective level;
after L levels of alignment, further use
Figure FDA0003714834400000047
And
Figure FDA0003714834400000048
calculated offset between
Figure FDA0003714834400000049
To optimize
Figure FDA00037148344000000413
And
Figure FDA00037148344000000410
and generates final alignment results for adjacent features in both branches
Figure FDA00037148344000000411
And
Figure FDA00037148344000000412
s203, an interaction module: the Bayer pattern branch features are downsampled by 3 × 3 convolution (stride 2) and the LeakyRelu layers, and these downsampled features are aggregated with the features in the subframe pattern branch; similarly, the subframe format branch features are upsampled by Pixelshuffle and then aggregated with the features in the Bayer format branch;
s204, time fusion: aggregating remote features with a non-local temporal attention module to enhance the feature representation along a temporal dimension; features are then fused together using spatio-temporal attention (TSA) based fusion;
s205, channel fusion: features in both branches are merged together using channel fusion, since
Figure FDA0003714834400000051
And
Figure FDA0003714834400000052
may contribute differently to the final SR reconstruction; fusing two branch characteristics by adopting selective kernel convolution (SKF) through channel weighted average;
s206, reconstruction and upsampling: feature after fusion
Figure FDA0003714834400000053
Input to a reconstruction module implemented by 10 ResNet blocksFor SR reconstruction; after reconstruction, utilizing Pixelshuffle to perform upsampling on the reconstructed image, and then utilizing a convolutional layer to generate three-channel output; the module also utilizes two long hop connections, one is the input for the Bayer format of LR, which is first processed by the convolutional layer and then up-sampled by Pixelshuffle to three-channel output; the other is for the LR subframe format input, which is up-sampled twice because its spatial size is half of the original input, and the three outputs are added to generate the final HR result
Figure FDA0003714834400000054
S207, color correction and loss function: of actual photographic data
Figure FDA0003714834400000055
And
Figure FDA0003714834400000056
there are differences in color and brightness, exploiting pixel loss directly between the output and HR may lead to a network optimizing color and brightness correction without paying attention to the task of SR; to solve this problem, instead of computing a 3 × 3 color correction matrix to correct them simultaneously, color correction is used before the loss computation, i.e. channel-based color correction is used separately for the RGB channels:
Figure FDA0003714834400000061
wherein alpha is c Is the scaling factor of channel c by minimizing
Figure FDA0003714834400000062
And HR down-sampled versions
Figure FDA0003714834400000063
Calculated corresponding to the least squares penalty between pixels.
S208, optimizing the network by using the Charbonier loss between the corrected output and the HR.
CN202210733861.5A 2022-06-27 2022-06-27 Real world video super-resolution construction method based on Raw domain Active CN115115516B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210733861.5A CN115115516B (en) 2022-06-27 2022-06-27 Real world video super-resolution construction method based on Raw domain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210733861.5A CN115115516B (en) 2022-06-27 2022-06-27 Real world video super-resolution construction method based on Raw domain

Publications (2)

Publication Number Publication Date
CN115115516A true CN115115516A (en) 2022-09-27
CN115115516B CN115115516B (en) 2023-05-12

Family

ID=83329552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210733861.5A Active CN115115516B (en) 2022-06-27 2022-06-27 Real world video super-resolution construction method based on Raw domain

Country Status (1)

Country Link
CN (1) CN115115516B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051380A (en) * 2023-01-13 2023-05-02 深圳大学 Video super-resolution processing method and electronic equipment
CN116596779A (en) * 2023-04-24 2023-08-15 天津大学 Transform-based Raw video denoising method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100123792A1 (en) * 2008-11-20 2010-05-20 Takefumi Nagumo Image processing device, image processing method and program
US20160063316A1 (en) * 2014-08-29 2016-03-03 Motorola Solutions, Inc. Methods and systems for increasing facial recognition working rang through adaptive super-resolution
US20190206026A1 (en) * 2018-01-02 2019-07-04 Google Llc Frame-Recurrent Video Super-Resolution
CN111583112A (en) * 2020-04-29 2020-08-25 华南理工大学 Method, system, device and storage medium for video super-resolution
CN112700392A (en) * 2020-12-01 2021-04-23 华南理工大学 Video super-resolution processing method, device and storage medium
CN112991183A (en) * 2021-04-09 2021-06-18 华南理工大学 Video super-resolution method based on multi-frame attention mechanism progressive fusion
CN113240581A (en) * 2021-04-09 2021-08-10 辽宁工程技术大学 Real world image super-resolution method for unknown fuzzy kernel
CN113469884A (en) * 2021-07-15 2021-10-01 长视科技股份有限公司 Video super-resolution method, system, equipment and storage medium based on data simulation
US20210312591A1 (en) * 2020-04-07 2021-10-07 Samsung Electronics Co., Ltd. Systems and method of training networks for real-world super resolution with unknown degradations
CN113538249A (en) * 2021-09-03 2021-10-22 中国矿业大学 Image super-resolution reconstruction method and device for video monitoring high-definition presentation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100123792A1 (en) * 2008-11-20 2010-05-20 Takefumi Nagumo Image processing device, image processing method and program
US20160063316A1 (en) * 2014-08-29 2016-03-03 Motorola Solutions, Inc. Methods and systems for increasing facial recognition working rang through adaptive super-resolution
US20190206026A1 (en) * 2018-01-02 2019-07-04 Google Llc Frame-Recurrent Video Super-Resolution
US20210312591A1 (en) * 2020-04-07 2021-10-07 Samsung Electronics Co., Ltd. Systems and method of training networks for real-world super resolution with unknown degradations
CN111583112A (en) * 2020-04-29 2020-08-25 华南理工大学 Method, system, device and storage medium for video super-resolution
CN112700392A (en) * 2020-12-01 2021-04-23 华南理工大学 Video super-resolution processing method, device and storage medium
CN112991183A (en) * 2021-04-09 2021-06-18 华南理工大学 Video super-resolution method based on multi-frame attention mechanism progressive fusion
CN113240581A (en) * 2021-04-09 2021-08-10 辽宁工程技术大学 Real world image super-resolution method for unknown fuzzy kernel
CN113469884A (en) * 2021-07-15 2021-10-01 长视科技股份有限公司 Video super-resolution method, system, equipment and storage medium based on data simulation
CN113538249A (en) * 2021-09-03 2021-10-22 中国矿业大学 Image super-resolution reconstruction method and device for video monitoring high-definition presentation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
X. YANG ET AL: "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme", 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) *
詹克羽等: "一种多尺度三维卷积的视频超分辨率方法", 西安电子科技大学学报 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051380A (en) * 2023-01-13 2023-05-02 深圳大学 Video super-resolution processing method and electronic equipment
CN116051380B (en) * 2023-01-13 2023-08-22 深圳大学 Video super-resolution processing method and electronic equipment
CN116596779A (en) * 2023-04-24 2023-08-15 天津大学 Transform-based Raw video denoising method
CN116596779B (en) * 2023-04-24 2023-12-01 天津大学 Transform-based Raw video denoising method

Also Published As

Publication number Publication date
CN115115516B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
Jiang et al. Learning to see moving objects in the dark
US11037278B2 (en) Systems and methods for transforming raw sensor data captured in low-light conditions to well-exposed images using neural network architectures
CN107123089B (en) Remote sensing image super-resolution reconstruction method and system based on depth convolution network
Kalantari et al. Deep HDR video from sequences with alternating exposures
CN115115516B (en) Real world video super-resolution construction method based on Raw domain
CN112085659B (en) Panorama splicing and fusing method and system based on dome camera and storage medium
KR20130013288A (en) High dynamic range image creation apparatus of removaling ghost blur by using multi exposure fusion and method of the same
CN113228094A (en) Image processor
CN113850367B (en) Network model training method, image processing method and related equipment thereof
CN111986084A (en) Multi-camera low-illumination image quality enhancement method based on multi-task fusion
Zamir et al. Learning digital camera pipeline for extreme low-light imaging
CN112508812A (en) Image color cast correction method, model training method, device and equipment
Yue et al. Real-rawvsr: Real-world raw video super-resolution with a benchmark dataset
Zhao et al. End-to-end denoising of dark burst images using recurrent fully convolutional networks
CN111986106A (en) High dynamic image reconstruction method based on neural network
Shen et al. Spatial temporal video enhancement using alternating exposures
CN112750092A (en) Training data acquisition method, image quality enhancement model and method and electronic equipment
CN113379609A (en) Image processing method, storage medium and terminal equipment
WO2023110880A1 (en) Image processing methods and systems for low-light image enhancement using machine learning models
Guo et al. Low-light color imaging via cross-camera synthesis
Ye et al. LFIENet: Light field image enhancement network by fusing exposures of LF-DSLR image pairs
CN114862735A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114862707A (en) Multi-scale feature recovery image enhancement method and device and storage medium
CN112991174A (en) Method and system for improving resolution of single-frame infrared image
Guo et al. Low-light color imaging via dual camera acquisition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant