CN111583112A - Method, system, device and storage medium for video super-resolution - Google Patents
Method, system, device and storage medium for video super-resolution Download PDFInfo
- Publication number
- CN111583112A CN111583112A CN202010353851.XA CN202010353851A CN111583112A CN 111583112 A CN111583112 A CN 111583112A CN 202010353851 A CN202010353851 A CN 202010353851A CN 111583112 A CN111583112 A CN 111583112A
- Authority
- CN
- China
- Prior art keywords
- resolution
- frame
- resolution video
- video
- video frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 52
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000006870 function Effects 0.000 claims description 34
- 238000005070 sampling Methods 0.000 claims description 23
- 239000011159 matrix material Substances 0.000 claims description 13
- 238000013507 mapping Methods 0.000 claims description 10
- 230000009191 jumping Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 5
- 230000003287 optical effect Effects 0.000 abstract description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000011084 recovery Methods 0.000 description 4
- 238000012800 visualization Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000008707 rearrangement Effects 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Multimedia (AREA)
- Television Systems (AREA)
Abstract
The invention discloses a video super-resolution generation method, a system, a device and a storage medium, wherein the method comprises the steps of obtaining a low-resolution video frame to be processed, processing the low-resolution video frame through a video super-resolution model, outputting a high-resolution video, and collecting a training sample, wherein the training sample comprises a high-resolution video frame sample and a low-resolution video frame sample; and establishing a video super-resolution model based on a preset loss function and the high-resolution video frame sample according to the acquired training sample. The method realizes motion compensation processing and feature enhancement between low-resolution video frames and restores high-frequency information of the video frames through the selected video super-resolution model, so that the output high-resolution video contains more image details, the definition of the video is improved, and the interference of optical flow errors in the optical flow-based video super-resolution method on the restoration of the final video frames is avoided. The method can be widely applied to the technical field of image processing.
Description
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, a system, an apparatus, and a storage medium for video super-resolution.
Background
In recent years, with the increasing demand for image and video quality, how to improve the image and video quality becomes an increasingly important issue. The video super-resolution aims to repair a low-resolution video, so that the video contains more detail information, and the definition of the video is improved. The video super-resolution technology has important practical significance; for example, in the field of video monitoring, the resolution of a camera is limited or the camera is too far away from a shot target, and the obtained monitoring video has the problems of low resolution and difficulty in distinguishing the target, so that the problem of difficulty in mining required information from the video is solved. Through the video super-resolution technology, the video can be recovered to a certain extent, and the quality of the monitoring video is improved. In the aspect of video entertainment, with the rapid development of high-resolution display devices, the corresponding ultra-high-resolution video film sources are in short supply, and meanwhile, the network transmission of ultra-high-resolution videos is also difficult. The video super-resolution technology can make up for missing film sources, visual experience of audiences is improved, and low-resolution videos can be restored through the super-resolution technology after transmission is completed, so that transmission cost is greatly saved, and transmission efficiency is improved.
Current video super-resolution methods can be divided into two major categories: the super-resolution method based on single-frame images and the super-resolution method based on multi-frame images. The single-frame image super-resolution method is used for completing the video super-resolution task, the motion correlation of video frames can be ignored, and the video super-resolution result with higher fidelity can not be obtained by utilizing time domain information in multiple frames, so that the method is a suboptimal option. As the extension of the single-frame image super-resolution algorithm, the multi-frame image based super-resolution method can better utilize inter-frame complementary information and improve the quality of a super-resolution result.
In recent years, with the development of deep learning and convolutional neural networks, a video super-resolution technology based on multi-frame images has made a great breakthrough. However, in the case of complex motion or large-scale motion, how to maintain high-precision video super-resolution is still a difficult problem, and the performance of the algorithm still needs to be improved. At present, many video super-resolution algorithms based on convolutional neural network perform motion estimation on video frames through optical flow, and explicitly perform motion compensation processing so as to extract valuable information from aligned video frames. Due to the introduction of an additional optical flow estimation network, an end-to-end architecture cannot be realized, and meanwhile, optical flow errors can interfere with the recovery of a final video frame, so that an optimal super-resolution result cannot be generated. Therefore, a more accurate and efficient video super-resolution method is needed to further improve the recovery capability of the video super-resolution network, so that the video super-resolution network can cope with video super-resolution tasks in various complex scenes.
Disclosure of Invention
In order to solve the above technical problems, it is an object of the present invention to provide a method, system, apparatus and storage medium for generating video super-resolution.
The first technical scheme adopted by the invention is as follows:
the method for generating the video super-resolution comprises the following steps:
acquiring a low-resolution video frame to be processed;
processing the low-resolution video frame through a video super-resolution model, and outputting a high-resolution video;
the video super-resolution model training process comprises the following steps:
acquiring training samples, wherein the training samples comprise high-resolution video frame samples and low-resolution video frame samples;
and establishing a video super-resolution model based on a preset loss function and the high-resolution video frame sample according to the acquired training sample.
Optionally, the step of acquiring a training sample, where the training sample includes a high resolution video frame sample and a low resolution video frame sample, specifically includes the following steps:
collecting a high-resolution video sample, obtaining a high-resolution video frame sample by adopting a threshold lens segmentation algorithm, and backing up the high-resolution video frame sample;
adopting an image scaling algorithm to carry out down-sampling on the high-resolution video frame sample to generate a low-resolution video frame sample;
and acquiring a high-resolution video frame sample and a low-resolution video frame sample to establish a training sample.
Optionally, the step of establishing a video super-resolution model based on the preset loss function and the high-resolution video frame sample according to the acquired training sample specifically includes the following steps:
acquiring a set number of low-resolution video frame samples, and setting a reference frame and an adjacent frame;
extracting features of the reference frame and the adjacent frame based on a residual error network, and generating reference frame features and adjacent frame features;
aligning the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, carrying out first series connection on the aligned adjacent frame characteristics and the aligned reference frame characteristics, and outputting characteristic data fused with high-frequency information;
transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection, and reconstructing a high-resolution video frame;
and reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function, and establishing a video super-resolution model.
Optionally, the deformable convolution network is provided with 5 variable convolution layers, a multi-level feature fusion structure formed by 8 void convolutions, and 2 convolution kernels, and the step of aligning the adjacent frames by combining the deformable convolution network, the reference frame features, and the adjacent features specifically includes the following steps:
before each variable convolution layer is input, performing second concatenation on the adjacent frame features and the reference frame features in the channel dimension;
after the series connection of adjacent frame features is compressed by each convolution kernel and is superposed by cavity convolution, the offset and the adjustment coefficient of the convolution kernel are output;
and each variable convolution layer carries out self-adaptive sampling on adjacent features according to the convolution kernel offset and the adjusting coefficient, and outputs the adjacent frame features after motion compensation.
Optionally, the step of establishing a correlation between the adjacent frame and the reference frame by using a preset function and a relationship matrix, performing a first concatenation on the aligned adjacent frame feature and the aligned reference frame feature, and outputting feature data fused with high-frequency information specifically includes the following steps:
determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by using the relation matrix;
determining the correlation degree of the adjacent frame and the reference frame by adopting a preset function according to the mapping relation;
aligning the regions with the unaligned adjacent frame features in a jumping connection mode according to the correlation;
and performing first series connection on the aligned adjacent frame features and the reference frame features, and outputting feature data fused with high-frequency information.
Optionally, the step of reconstructing the high-resolution video frame by using residual dense connection to transmit the feature data fused with the high-frequency information into the reference frame feature specifically includes the following steps:
globally jumping and accessing the reference frame characteristics based on the characteristic data which is connected and fused with high-frequency information by dense connection and residual error;
and rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
Optionally, the step of processing the low-resolution video frame through the video super-resolution model and outputting the high-resolution video specifically includes the following steps:
inputting the low-resolution video frame into a video super-resolution model in a sliding window mode, and outputting a high-resolution video frame;
searching adjacent frames nearest to the video frame of the starting end or the tail end of the video frame sequence, complementing the number of the adjacent frames, and outputting a high-resolution starting end or tail end video frame;
and recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence to output the high-resolution video.
The second technical scheme adopted by the invention is as follows:
a video super-resolution generation system, comprising:
the acquisition module is used for acquiring a low-resolution video frame to be processed;
the output module is used for processing the low-resolution video frame through a video super-resolution model and outputting a high-resolution video;
the training module comprises:
the sampling submodule is used for acquiring a training sample, and the training sample contains a high-resolution video frame sample and a low-resolution video frame sample;
and the model establishing submodule is used for establishing a video super-resolution model based on a preset loss function and a high-resolution video frame sample according to the collected training sample.
Optionally, the acquisition sub-module comprises:
the acquisition unit is used for acquiring a high-resolution video sample, and obtaining and backing up the high-resolution video frame sample by adopting a threshold lens segmentation algorithm;
the sampling unit is used for carrying out downsampling on the high-resolution video frame sample by adopting an image scaling algorithm to generate a low-resolution video frame sample;
and the sample establishing unit is used for acquiring the high-resolution video frame sample and the low-resolution video frame sample to establish a training sample.
Optionally, the model building submodule includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a set number of low-resolution video frame samples and setting a reference frame and an adjacent frame;
the generating unit is used for extracting features of the reference frame and the adjacent frame based on a residual error network and generating reference frame features and adjacent frame features;
the alignment unit is used for carrying out alignment processing on the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
the first output unit is used for establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, performing first series connection on the aligned adjacent frame characteristics and the aligned reference frame characteristics, and outputting characteristic data fused with high-frequency information;
the reconstruction unit is used for transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection and reconstructing a high-resolution video frame;
and the model establishing unit is used for reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function and establishing a video super-resolution model.
Optionally, the deformable convolutional network is provided with 5 variable convolutional layers, a multi-level feature fusion structure formed by 8 cavity convolutions, and 2 convolution kernels, and the alignment unit includes:
a second concatenation subunit, configured to perform a second concatenation of the adjacent frame feature and the reference frame feature in the channel dimension before inputting each of the variable convolutional layers;
the first output subunit is used for outputting convolution kernel offset and an adjusting coefficient after the characteristics of the adjacent frames after series connection are compressed by each convolution kernel and are superposed by cavity convolution;
and the second output subunit is used for performing self-adaptive sampling on the adjacent features by each variable convolution layer according to the convolution kernel offset and the adjusting coefficient and outputting the adjacent frame features after motion compensation.
Optionally, the first output unit includes:
the first determining subunit is used for determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by adopting the relation matrix;
the second determining subunit is configured to determine, according to the mapping relationship, a correlation degree between the adjacent frame and the reference frame by using a preset function;
the alignment subunit is used for performing alignment processing on the regions with the unaligned adjacent frame features in a jumping connection mode according to the correlation degree;
and the third output subunit is used for performing first series connection on the aligned adjacent frame features and the reference frame features and outputting feature data fused with high-frequency information.
Optionally, the reconstruction unit comprises:
the access subunit is used for globally jumping and accessing the reference frame characteristics based on the characteristic data which is formed by fusing the dense connection and the residual connection and has high-frequency information;
and the rearrangement subunit is used for rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
Optionally, the output module includes:
the second output unit is used for inputting the low-resolution video frame into the video super-resolution model in a sliding window mode and outputting the high-resolution video frame;
a third output unit, configured to search for an adjacent frame that is closest to a start end or a tail end video frame of the sequence of video frames, complement the number of the adjacent frames, and output a high resolution start end or tail end video frame;
and the fourth output unit is used for recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence and outputting the high-resolution video.
The third technical scheme adopted by the invention is as follows:
an apparatus, the memory for storing at least one program, the processor for loading the at least one program to perform the method described above.
The fourth technical scheme adopted by the invention is as follows:
a storage medium having stored therein a processor-executable program for performing the method as described above when executed by a processor.
The invention has the beneficial effects that: the method has the advantages that the obtained low-resolution video frame to be processed is subjected to resolution processing by adopting the training sample containing the high-resolution video frame sample and the low-resolution video frame sample and the video training model established by the preset loss function, so that the effect of recovering the low-resolution video frame into the high-resolution video frame can be accurately and efficiently realized, and the interference of optical flow errors in the optical flow video super-resolution method on the recovery of the final video frame can be avoided.
Drawings
FIG. 1 is a flow chart illustrating steps of a method for generating super-resolution video provided by the present invention;
FIG. 2 is a block diagram of a system for generating super-resolution video provided by the present invention;
FIG. 3 is a schematic flow chart of a video super-resolution model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating the operation of the deformable convolution layer in a deforming operation in accordance with an embodiment of the present invention;
FIG. 5 is a diagram illustrating a multi-level feature fusion structure in a deformable convolutional network according to an embodiment of the present invention;
FIG. 6 is a structural diagram illustrating the correlation between adjacent frames and reference frames according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of residual error tight connection in the reconstruction operation according to an embodiment of the present invention;
FIG. 8 is a graph comparing visualization results using the existing optimal solution of Visd4 data set with the solution of the present application;
FIG. 9 is a comparison graph of the visualization results of the prior optimal solution and the solution of the present application using the SPMCS data set;
FIG. 10 is a graph comparing visualization results of the prior optimal solution using the Vimeo-90K-T data set with the solution of the present application.
Detailed Description
Example 1
As shown in fig. 1, the present embodiment provides a method for generating a video super-resolution, which includes the following steps:
s1, acquiring a low-resolution video frame to be processed, wherein the video frame comprises a complex motion scene;
s2, processing the low-resolution video frame through a video super-resolution model, and outputting a high-resolution video;
the video super-resolution model training process comprises the following steps:
s3, collecting training samples, wherein the training samples comprise high-resolution video frame samples and low-resolution video frame samples;
and S4, establishing a video super-resolution model based on the preset loss function and the high-resolution video frame sample according to the collected training sample.
Optionally, the step S2 includes:
s21, inputting the low-resolution video frame into a video super-resolution model in a sliding window mode, and outputting a high-resolution video frame;
s22, searching the adjacent frame nearest to the video frame at the start end or the tail end of the video frame sequence, complementing the number of the adjacent frames, and outputting the video frame at the start end or the tail end of high resolution;
and S23, recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence, and outputting the high-resolution video.
Optionally, the step S31 includes:
s31, collecting a high-resolution video sample, obtaining the high-resolution video frame sample by adopting a threshold shot segmentation algorithm and backing up the high-resolution video frame sample;
s32, adopting an image scaling algorithm to carry out down-sampling on the high-resolution video frame sample to generate a low-resolution video frame sample;
and S33, acquiring the high-resolution video frame sample and the low-resolution video frame sample to establish a training sample.
Optionally, the step S4 includes:
s41, acquiring a set number of low-resolution video frame samples, and setting a reference frame and an adjacent frame;
s42, extracting features of the reference frame and the adjacent frame based on a residual error network, and generating reference frame features and adjacent frame features;
s43, aligning the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
s44, establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, carrying out first series connection on the aligned adjacent frame characteristics and the reference frame characteristics, and outputting characteristic data fused with high-frequency information;
s45, transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection, and reconstructing a high-resolution video frame;
and S46, reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function, and establishing a video super-resolution model.
Optionally, the variable convolutional network is provided with 5 variable convolutional layers, a multi-level feature fusion structure formed by 8 cavity convolutions, and 2 convolutional kernels, and the step S43 includes:
s431, before inputting each variable convolution layer, performing second series connection on the adjacent frame features and the reference frame features in the channel dimension;
s432, after compressing each convolution kernel and performing cavity convolution and superposition on the adjacent frame features after series connection, outputting convolution kernel offset and an adjusting coefficient;
and S433, each variable convolution layer carries out self-adaptive sampling on adjacent features according to the convolution kernel offset and the adjusting coefficient, and outputs the adjacent frame features after motion compensation.
Optionally, the step S44 includes:
s441, determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by using the relation matrix;
s442, determining the correlation degree between the adjacent frame and the reference frame by adopting a preset function according to the mapping relation;
s443, aligning the regions with the unaligned adjacent frame features by adopting a jump connection mode according to the correlation;
and S444, carrying out first series connection on the aligned adjacent frame features and the reference frame features, and outputting feature data fused with high-frequency information.
Optionally, the step S45 includes:
s451, globally jumping access reference frame features of feature data fused with high-frequency information based on dense connection and residual connection;
and S452, rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
Example 2
As shown in fig. 2, the present embodiment provides a system for generating a video super-resolution, the system including:
the acquisition module is used for acquiring a low-resolution video frame to be processed;
the output module is used for processing the low-resolution video frame through a video super-resolution model and outputting a high-resolution video;
the training module comprises:
the sampling submodule is used for acquiring a training sample, and the training sample contains a high-resolution video frame sample and a low-resolution video frame sample;
and the model establishing submodule is used for establishing a video super-resolution model based on a preset loss function and a high-resolution video frame sample according to the collected training sample.
Optionally, the acquisition sub-module comprises:
the acquisition unit is used for acquiring a high-resolution video sample, and obtaining and backing up the high-resolution video frame sample by adopting a threshold lens segmentation algorithm;
the sampling unit is used for carrying out downsampling on the high-resolution video frame sample by adopting an image scaling algorithm to generate a low-resolution video frame sample;
and the sample establishing unit is used for acquiring the high-resolution video frame sample and the low-resolution video frame sample to establish a training sample.
Optionally, the model building submodule includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a set number of low-resolution video frame samples and setting a reference frame and an adjacent frame;
the generating unit is used for extracting features of the reference frame and the adjacent frame based on a residual error network and generating reference frame features and adjacent frame features;
the alignment unit is used for carrying out alignment processing on the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
the first output unit is used for establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, performing first series connection on the aligned adjacent frame characteristics and the aligned reference frame characteristics, and outputting characteristic data fused with high-frequency information;
the reconstruction unit is used for transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection and reconstructing a high-resolution video frame;
and the model establishing unit is used for reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function and establishing a video super-resolution model.
Optionally, the deformable convolutional network is provided with 5 variable convolutional layers, a multi-level feature fusion structure formed by 8 cavity convolutions, and 2 convolution kernels, and the alignment unit includes:
a second concatenation subunit, configured to perform a second concatenation of the adjacent frame feature and the reference frame feature in the channel dimension before inputting each of the variable convolutional layers;
the first output subunit is used for outputting convolution kernel offset and an adjusting coefficient after the characteristics of the adjacent frames after series connection are compressed by each convolution kernel and are superposed by cavity convolution;
and the second output subunit is used for performing self-adaptive sampling on the adjacent features by each variable convolution layer according to the convolution kernel offset and the adjusting coefficient and outputting the adjacent frame features after motion compensation.
Optionally, the first output unit includes:
the first determining subunit is used for determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by adopting the relation matrix;
the second determining subunit is configured to determine, according to the mapping relationship, a correlation degree between the adjacent frame and the reference frame by using a preset function;
the alignment subunit is used for performing alignment processing on the regions with the unaligned adjacent frame features in a jumping connection mode according to the correlation degree;
and the third output subunit is used for performing first series connection on the aligned adjacent frame features and the reference frame features and outputting feature data fused with high-frequency information.
Optionally, the reconstruction unit comprises:
the access subunit is used for globally jumping and accessing the reference frame characteristics based on the characteristic data which is formed by fusing the dense connection and the residual connection and has high-frequency information;
and the rearrangement subunit is used for rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
Optionally, the output module includes:
the second output unit is used for inputting the low-resolution video frame into the video super-resolution model in a sliding window mode and outputting the high-resolution video frame;
a third output unit, configured to search for an adjacent frame that is closest to a start end or a tail end video frame of the sequence of video frames, complement the number of the adjacent frames, and output a high resolution start end or tail end video frame;
and the fourth output unit is used for recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence and outputting the high-resolution video.
Example 3
The present embodiments provide an apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one program causes the at least one processor to implement the steps of a method for generating video super-resolution as described in embodiment 1 above.
Example 4
A storage medium having stored therein a program executable by a processor, the program being executed by the processor for performing the steps of a method for generating video super-resolution as described in embodiment 1.
Example 5
Referring to fig. 3 to 10, a flow chart of a method for generating video super-resolution specifically includes the following steps:
A. acquiring training samples, wherein the training samples comprise high-resolution video frame samples and low-resolution video frame samples;
B. establishing a video super-resolution model according to the collected training samples;
C. acquiring a low-resolution video frame to be processed;
D. and processing the low-resolution video frame to be processed through the video super-resolution model, and outputting a high-resolution video.
Wherein, the specific implementation scheme of the step A is as follows:
a1, acquiring a public large-scale video data set Vimeo-90K as a training data set. The data set comprises a plurality of video frames with different motion scale ranges, so that the trained video resolution model has better generalization capability. The data set consisted of 64612 training samples, each sample containing 7 consecutive video frames of the same scene, of size 448 × 256.
A2, backing up the high-resolution video frame sample by using an image scaling algorithm such as an imresize function of MATLAB, then carrying out double-triple down-sampling for 4 times to obtain a corresponding low-resolution video frame sample, wherein the size of the low-resolution video frame sample is 112 x 64, and the backed-up high-resolution video frame sample and the low-resolution video frame sample generated by sampling form a pair of training samples; and a horizontal or vertical overturning, 90-degree rotation and random cutting of image blocks are adopted as a data enhancement mode.
The specific embodiment of the step B is as follows:
b1, selecting continuous 7 low-resolution video frame samples, randomly cutting 50 × 50 image blocks at the same position as input, wherein the middle frame is used as a reference frame to be restored and is marked as a reference frame to be restoredOther frames as neighboring frames to aid recovery, notedi∈[t-3,t+3]And i ≠ t.
B2, using residual error network to input reference frameAnd adjacent frameComposed 7 low resolution video frame samplesAnd carrying out shallow feature extraction.
It can be understood that the feature extraction module HfeaThe resulting features contain 64 channels, which are the same size as the input picture, where the residual network consists of 5 concatenated residual blocks, each residual block contains two 3 × 3 convolutional layers, a ReLU activation function, and a skip connection.
B3, feature F in all video framesTIn (1), the obtained reference frame feature is marked as FtFeatures of adjacent frames are denoted as Fi. Using a deformable convolutional network to characterize each adjacent frame by FiAligning to reference frame feature FtIn (1), the aligned adjacent frame is characterized as
The part consisting of the deformable convolutional network can be understood as the alignment block HalignReferring to fig. 4 and 5, the alignment module (deformable convolutional network) comprises a multilevel feature fusion structure formed by convolving 5 cascaded deformable convolutional layers, 23 × 3 convolutional layers and 8 3 × 3 holes with the hole rates of 1 to 8 respectivelytAnd adjacent frame feature FiFirstly, series connection is carried out in channel dimension, the number of channels is compressed back to 64 through a 3 × 3 convolutional layer, then 8 3 × 3 cavity convolutions are used for effectively expanding the reception field, the output characteristic channel number is 32, the cavity convolution results are superposed and summed one by one to obtain the convolution results after superposition of 8 reception fields from small to large, after series connection, 1 × 1 convolution is used for compressing the number of channels to 64, and then two parameters needed by a deformable convolution kernel generated by a 3 × 3 convolutional layer are the convolution kernel offset delta PiAnd the regulating coefficient Δ Mi。
This process can be expressed as:
ΔPi,ΔMi=f([Fi,Ft])
the characteristic fusion can effectively enlarge the receptive field through a space pyramid structure formed by the cavity convolution, and the convolution results with different cavity rates are overlapped, so that the acquired information is richer, and the method is greatly helpful for capturing the motion relation of the adjacent frame characteristics and the reference frame characteristics on the pixel level and generating more accurate deformable convolution parameters. Method for obtaining convolution kernel offset delta P by deformable convolution layeriAnd the regulating coefficient Δ MiThen, the feature F of the adjacent frame can be adaptively setiAnd performing up-sampling to realize implicit motion compensation processing.
F is to bei,b-1And Fi,bThe operation of the deformable convolution as input and output to one of the deformable convolution layers can be expressed as follows:
wherein p iskK sample point locations, ω, representing the convolution kernelkFor a convolution kernel of 3 × 3, K is 9 and pk∈ { (-1, -1), (-1,0), …, (1,1) }. Deformable convolution is offset by an additional convolution kernel Δ pi,kSo that the sampling position can be correspondingly adjusted according to different central points p, and the coefficient delta m can be adjusted simultaneouslyi,kSo that the corresponding weights of the convolution kernels can also be dynamically changed. Wherein, Δ Pi={Δpi,k},ΔMi={Δmi,kAnd the whole sampling process is self-adaptive and can accept end-to-end training, so that an excellent motion compensation effect is realized.
After passing through 5 deformable convolution layers, the adjacent frame feature FiAnd a coarse-to-fine alignment process is realized, and the alignment precision is gradually improved. The alignment module has the following formula:
b4, respectively processing the adjacent frame features and the reference frame features by adopting 1 × 1 convolution, performing matrix multiplication after dimensional transformation, and obtaining the correlation degree between the adjacent frame and the reference frame by using a softmax function, namely the correlation degree between a certain pixel point in the adjacent frame features and all pixel points in the reference frame features; and performing first series connection on the aligned adjacent frame characteristics and the reference frame characteristics, and outputting characteristic data fused with high-frequency information.
This section can be understood as the attention module HnlI.e., the areas where the alignment module failed to align adjacent frames in step B3 are again emphasized. The attention module is designed based on a non-local mechanism, and the calculation in the module can be expressed as:
x′p=wzsoftmax((wuxp)Twvyq)(wgyq)+xp
wherein x ispAnd yqRespectively representing input adjacent frame featuresAnd reference frame feature FtOne pixel point of, x'pPixel points, W, on adjacent frame features representing corresponding outputsuxp,WvyqAnd WgyqRespectively representing data obtained by performing convolution transformation on input adjacent frame features and reference frame features through 3 pieces of 1 × 1, WzIt is indicated that the feature data obtained by the correlation calculation is subjected to 1 × 1 convolution transformation again.
all adjacent frame featuresAnd reference frame FtPerforming series connection in channel dimension, compressing the number of channels by using one 3 × 3 convolution layer, and outputting feature data F fused with high-frequency informationfusionIn which F isfusionCan be expressed as:
b5, adopting residual error dense connection to fuse the feature data F with the high-frequency informationfusionIncoming reference frame feature FtReconstructing high resolution video frames
The residual dense connections in this section can be understood as reconstruction modules. The reconstruction block, i.e. (residual tightly connected part) comprises 23 cascaded residual tightly connected blocks HRRDBsAnd a globalA jump connection. As shown in FIG. 7, each residual error dense block HRRDBsEach dense connection block consists of 5 convolutional layers, the increment number of channels in each dense connection block is set to be 32, and the output of each convolutional layer is transmitted to a subsequent convolutional layer in the dense connection block as an additional input through a plurality of jump connections. The residual error dense connection block combines the advantages of dense connection and residual error connection, and effectively extracts high-frequency information contained in the characteristics by utilizing multi-layer characteristics. Global skip connect incoming reference frame feature FtAt the end of the network, a 3 × 3 convolutional layer is used to expand the number of channels to 64 × 16, a sub-pixel upsampling layer is used to rearrange the pixels in the channel dimension to the spatial dimension, resulting in a 4-fold expanded feature with a channel number of 64, and then a 3 × 3 convolutional layer is used to output a 3-channel high resolution reference frameThis operation is denoted as Hrec. The reconstruction process can be expressed as follows:
b6, adopting loss function to reconstruct high-resolution video frameAnd reversely converging with the backed-up high-resolution video frame sample to establish a video super-resolution model.
Loss function L1The formula is as follows:
where W, H and C represent the width, height, and number of channels, respectively, of the high resolution video frame. And setting a learning rate, reversely propagating the gradient by minimizing the loss function error, updating network parameters, and continuously iterating until the network is trained to be convergent.
In the backward convergence training, the batch size is set to 8, and the initial learning rate is set to 10-4In the iterative training process, according to the convergence condition of the network, the learning rate can be halved for the first time after 70 periods, the training of the video resolution model is accelerated, and then the learning rate is halved every 20 periods1=0.9,β20.999 and ∈ -10-8. Using L1Loss function, calculation of high resolution video frames generated by video resolution modelWith the original high resolution video frameAnd back-propagates the updated network parameters by minimizing the error. And training the network to be converged in 120 periods. Then the loss function L is used2And continuously training for 10 periods, and finely adjusting network parameters to further improve the performance of the video resolution model.
The scheme of the step C is specifically as follows:
respectively acquiring a Vid4 data set, an SPMCS data set and a Vimeo-90K-T data set, wherein the data sets comprise various videos of large-scale motion scenes and complex motion scenes, and video frames under the same shot are segmented and extracted in advance according to the scenes.
The scheme of the step D is specifically as follows:
b, extracting frames of the Vid4 data set, the SPMCS data set and the Vimeo-90K-T data set video frames to be recovered after splitting at intervals of step 1, inputting continuous 7 low-resolution video frames into a trained video super-resolution model each time, performing the implementation scheme of the step B on the input Vid4 data set, SPMCS data set and Vimeo-90K-T data set video frames through the video super-resolution model respectively in a sliding window-based mode, wherein for the video frames at the starting end or the tail end of the video frame sequence, searching the adjacent frames most adjacent to the video frames at the starting end or the tail end of the video frame sequence, complementing the number of the required adjacent frames, inputting the video super-resolution model in a sliding window-based mode, outputting high-resolution video frames after recombining the high-resolution video frames based on the video frame sequence, as shown in figures 8 to 10, and tables 1 to 3, wherein tables 1 to 3 are corresponding comparison graphs of the existing optimal scheme of the Vid4 dataset, the SPMCS dataset and the Vimeo-90K-T dataset and the scheme of the present application on peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) indexes; fig. 8 to fig. 10 are corresponding comparison graphs of visualization results of the existing optimal scheme using the Vid4 data set, the SPMCS data set and the Vimeo-90K-T data set, respectively, and the scheme of the present application.
TABLE 1
TABLE 2
TABLE 3
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A method for generating video super-resolution is characterized by comprising the following steps:
acquiring a low-resolution video frame to be processed;
processing the low-resolution video frame through a video super-resolution model, and outputting a high-resolution video;
the video super-resolution model training process comprises the following steps:
acquiring training samples, wherein the training samples comprise high-resolution video frame samples and low-resolution video frame samples;
and establishing a video super-resolution model based on a preset loss function and the high-resolution video frame sample according to the acquired training sample.
2. The method for generating super-resolution video images according to claim 1, wherein the training samples are collected,
the step of training samples containing high resolution video frame samples and low resolution video frame samples specifically comprises the following steps:
collecting a high-resolution video sample, obtaining a high-resolution video frame sample by adopting a threshold lens segmentation algorithm, and backing up the high-resolution video frame sample;
adopting an image scaling algorithm to carry out down-sampling on the high-resolution video frame sample to generate a low-resolution video frame sample;
and acquiring a high-resolution video frame sample and a low-resolution video frame sample to establish a training sample.
3. The method for generating video super-resolution according to claim 2, wherein the step of establishing a video super-resolution model based on the preset loss function and the high-resolution video frame sample according to the collected training samples specifically comprises the following steps:
acquiring a set number of low-resolution video frame samples, and setting a reference frame and an adjacent frame;
extracting features of the reference frame and the adjacent frame based on a residual error network, and generating reference frame features and adjacent frame features;
aligning the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, carrying out first series connection on the aligned adjacent frame characteristics and the aligned reference frame characteristics, and outputting characteristic data fused with high-frequency information;
transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection, and reconstructing a high-resolution video frame;
and reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function, and establishing a video super-resolution model.
4. The method for generating video super-resolution according to claim 3, wherein the deformable convolution network has 5 variable convolution layers, a multi-level feature fusion structure composed of 8 hole convolutions and 2 convolution kernels, and the step of aligning the adjacent frames by combining the deformable convolution network, the reference frame feature and the adjacent features specifically comprises the following steps:
before each variable convolution layer is input, performing second concatenation on the adjacent frame features and the reference frame features in the channel dimension;
after the series connection of adjacent frame features is compressed by each convolution kernel and is superposed by cavity convolution, the offset and the adjustment coefficient of the convolution kernel are output;
and each variable convolution layer carries out self-adaptive sampling on adjacent features according to the convolution kernel offset and the adjusting coefficient, and outputs the adjacent frame features after motion compensation.
5. The method for generating video super-resolution according to claim 4, wherein the step of establishing the correlation between the adjacent frame and the reference frame by using a preset function and a relation matrix, performing the first concatenation on the aligned adjacent frame feature and the reference frame feature, and outputting the feature data fused with the high-frequency information specifically comprises the steps of:
determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by using the relation matrix;
determining the correlation degree of the adjacent frame and the reference frame by adopting a preset function according to the mapping relation;
aligning the regions with the unaligned adjacent frame features in a jumping connection mode according to the correlation;
and performing first series connection on the aligned adjacent frame features and the reference frame features, and outputting feature data fused with high-frequency information.
6. The method for generating super-resolution video of claim 5, wherein the step of reconstructing the high-resolution video frame by using residual dense connection to transmit the feature data fused with the high-frequency information into the reference frame feature comprises the following steps:
globally jumping and accessing the reference frame characteristics based on the characteristic data which is connected and fused with high-frequency information by dense connection and residual error;
and rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
7. The method for generating super-resolution video of claim 1, wherein the step of processing the low-resolution video frame by the super-resolution video model and outputting the high-resolution video comprises the following steps:
inputting the low-resolution video frame into a video super-resolution model in a sliding window mode, and outputting a high-resolution video frame;
searching adjacent frames nearest to the video frame of the starting end or the tail end of the video frame sequence, complementing the number of the adjacent frames, and outputting a high-resolution starting end or tail end video frame;
and recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence to output the high-resolution video.
8. A system for generating super-resolution video, comprising:
the acquisition module is used for acquiring a low-resolution video frame to be processed;
the output module is used for processing the low-resolution video frame through a video super-resolution model and outputting a high-resolution video;
the training module comprises:
the sampling submodule is used for acquiring a training sample, and the training sample contains a high-resolution video frame sample and a low-resolution video frame sample;
and the model establishing submodule is used for establishing a video super-resolution model based on a preset loss function and a high-resolution video frame sample according to the collected training sample.
9. An apparatus comprising a memory for storing at least one program and a processor for loading the at least one program to perform the method of any one of claims 1-7.
10. A storage medium having stored therein a program executable by a processor, wherein the program executable by the processor is adapted to perform the method of any one of claims 1-7 when executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010353851.XA CN111583112A (en) | 2020-04-29 | 2020-04-29 | Method, system, device and storage medium for video super-resolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010353851.XA CN111583112A (en) | 2020-04-29 | 2020-04-29 | Method, system, device and storage medium for video super-resolution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111583112A true CN111583112A (en) | 2020-08-25 |
Family
ID=72121524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010353851.XA Pending CN111583112A (en) | 2020-04-29 | 2020-04-29 | Method, system, device and storage medium for video super-resolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111583112A (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112102166A (en) * | 2020-08-26 | 2020-12-18 | 上海交通大学 | Method and device for combining super-resolution, color gamut expansion and inverse tone mapping |
CN112330543A (en) * | 2020-12-01 | 2021-02-05 | 上海网达软件股份有限公司 | Video super-resolution method and system based on self-supervision learning |
CN112365403A (en) * | 2020-11-20 | 2021-02-12 | 山东大学 | Video super-resolution recovery method based on deep learning and adjacent frames |
CN112669216A (en) * | 2021-01-05 | 2021-04-16 | 华南理工大学 | Super-resolution reconstruction network of parallel cavity new structure based on federal learning |
CN112700392A (en) * | 2020-12-01 | 2021-04-23 | 华南理工大学 | Video super-resolution processing method, device and storage medium |
CN112750094A (en) * | 2020-12-30 | 2021-05-04 | 合肥工业大学 | Video processing method and system |
CN112766340A (en) * | 2021-01-11 | 2021-05-07 | 中山大学 | Depth capsule network image classification method and system based on adaptive spatial mode |
CN112785667A (en) * | 2021-01-25 | 2021-05-11 | 北京有竹居网络技术有限公司 | Video generation method, device, medium and electronic equipment |
CN113038055A (en) * | 2021-01-27 | 2021-06-25 | 维沃移动通信有限公司 | Image processing method and device and electronic equipment |
CN113033616A (en) * | 2021-03-02 | 2021-06-25 | 北京大学 | High-quality video reconstruction method, device, equipment and storage medium |
CN113066013A (en) * | 2021-05-18 | 2021-07-02 | 广东奥普特科技股份有限公司 | Method, system, device and storage medium for generating visual image enhancement |
CN113066014A (en) * | 2021-05-19 | 2021-07-02 | 云南电网有限责任公司电力科学研究院 | Image super-resolution method and device |
CN113139907A (en) * | 2021-05-18 | 2021-07-20 | 广东奥普特科技股份有限公司 | Generation method, system, device and storage medium for visual resolution enhancement |
CN113205456A (en) * | 2021-04-30 | 2021-08-03 | 东北大学 | Super-resolution reconstruction method for real-time video session service |
CN113487481A (en) * | 2021-07-02 | 2021-10-08 | 河北工业大学 | Circular video super-resolution method based on information construction and multi-density residual block |
CN113610706A (en) * | 2021-07-19 | 2021-11-05 | 河南大学 | Fuzzy monitoring image super-resolution reconstruction method based on convolutional neural network |
CN113724136A (en) * | 2021-09-06 | 2021-11-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Video restoration method, device and medium |
CN113766250A (en) * | 2020-09-29 | 2021-12-07 | 四川大学 | Compressed image quality improving method based on sampling reconstruction and feature enhancement |
CN113902620A (en) * | 2021-10-25 | 2022-01-07 | 浙江大学 | Video super-resolution system and method based on deformable convolution network |
CN113902623A (en) * | 2021-11-22 | 2022-01-07 | 天津大学 | Method for super-resolution of arbitrary-magnification video by introducing scale information |
CN114429602A (en) * | 2022-01-04 | 2022-05-03 | 北京三快在线科技有限公司 | Semantic segmentation method and device, electronic equipment and storage medium |
CN114494023A (en) * | 2022-04-06 | 2022-05-13 | 电子科技大学 | Video super-resolution implementation method based on motion compensation and sparse enhancement |
CN114862688A (en) * | 2022-03-14 | 2022-08-05 | 杭州群核信息技术有限公司 | Video frame insertion method, device and system based on deep learning |
WO2022166245A1 (en) * | 2021-02-08 | 2022-08-11 | 南京邮电大学 | Super-resolution reconstruction method for video frame |
CN115035230A (en) * | 2022-08-12 | 2022-09-09 | 阿里巴巴(中国)有限公司 | Video rendering processing method, device and equipment and storage medium |
CN115115516A (en) * | 2022-06-27 | 2022-09-27 | 天津大学 | Real-world video super-resolution algorithm based on Raw domain |
CN115396710A (en) * | 2022-08-09 | 2022-11-25 | 深圳乐播科技有限公司 | Method for H5 or small program to project short video and related device |
CN116128735A (en) * | 2023-04-17 | 2023-05-16 | 中国工程物理研究院电子工程研究所 | Multispectral image demosaicing structure and method based on densely connected residual error network |
CN116797462A (en) * | 2023-08-18 | 2023-09-22 | 深圳市优森美科技开发有限公司 | Real-time video super-resolution reconstruction method based on deep learning |
WO2023185284A1 (en) * | 2022-03-31 | 2023-10-05 | 网银在线(北京)科技有限公司 | Video processing method and apparatuses |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017219263A1 (en) * | 2016-06-22 | 2017-12-28 | 中国科学院自动化研究所 | Image super-resolution enhancement method based on bidirectional recursion convolution neural network |
CN109118431A (en) * | 2018-09-05 | 2019-01-01 | 武汉大学 | A kind of video super-resolution method for reconstructing based on more memories and losses by mixture |
CN110120011A (en) * | 2019-05-07 | 2019-08-13 | 电子科技大学 | A kind of video super resolution based on convolutional neural networks and mixed-resolution |
-
2020
- 2020-04-29 CN CN202010353851.XA patent/CN111583112A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017219263A1 (en) * | 2016-06-22 | 2017-12-28 | 中国科学院自动化研究所 | Image super-resolution enhancement method based on bidirectional recursion convolution neural network |
CN109118431A (en) * | 2018-09-05 | 2019-01-01 | 武汉大学 | A kind of video super-resolution method for reconstructing based on more memories and losses by mixture |
CN110120011A (en) * | 2019-05-07 | 2019-08-13 | 电子科技大学 | A kind of video super resolution based on convolutional neural networks and mixed-resolution |
Non-Patent Citations (1)
Title |
---|
WANG HUA ET AL.: "《Deformable Non-Local Network for Video Super-Resolution》" * |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112102166B (en) * | 2020-08-26 | 2023-12-01 | 上海交通大学 | Combined super-resolution, color gamut expansion and inverse tone mapping method and equipment |
CN112102166A (en) * | 2020-08-26 | 2020-12-18 | 上海交通大学 | Method and device for combining super-resolution, color gamut expansion and inverse tone mapping |
CN113766250A (en) * | 2020-09-29 | 2021-12-07 | 四川大学 | Compressed image quality improving method based on sampling reconstruction and feature enhancement |
CN112365403A (en) * | 2020-11-20 | 2021-02-12 | 山东大学 | Video super-resolution recovery method based on deep learning and adjacent frames |
CN112365403B (en) * | 2020-11-20 | 2022-12-27 | 山东大学 | Video super-resolution recovery method based on deep learning and adjacent frames |
CN112330543A (en) * | 2020-12-01 | 2021-02-05 | 上海网达软件股份有限公司 | Video super-resolution method and system based on self-supervision learning |
CN112700392A (en) * | 2020-12-01 | 2021-04-23 | 华南理工大学 | Video super-resolution processing method, device and storage medium |
CN112750094A (en) * | 2020-12-30 | 2021-05-04 | 合肥工业大学 | Video processing method and system |
CN112750094B (en) * | 2020-12-30 | 2022-12-09 | 合肥工业大学 | Video processing method and system |
CN112669216B (en) * | 2021-01-05 | 2022-04-22 | 华南理工大学 | Super-resolution reconstruction network of parallel cavity new structure based on federal learning |
CN112669216A (en) * | 2021-01-05 | 2021-04-16 | 华南理工大学 | Super-resolution reconstruction network of parallel cavity new structure based on federal learning |
CN112766340B (en) * | 2021-01-11 | 2024-06-04 | 中山大学 | Depth capsule network image classification method and system based on self-adaptive spatial mode |
CN112766340A (en) * | 2021-01-11 | 2021-05-07 | 中山大学 | Depth capsule network image classification method and system based on adaptive spatial mode |
CN112785667A (en) * | 2021-01-25 | 2021-05-11 | 北京有竹居网络技术有限公司 | Video generation method, device, medium and electronic equipment |
CN113038055B (en) * | 2021-01-27 | 2023-06-23 | 维沃移动通信有限公司 | Image processing method and device and electronic equipment |
CN113038055A (en) * | 2021-01-27 | 2021-06-25 | 维沃移动通信有限公司 | Image processing method and device and electronic equipment |
WO2022166245A1 (en) * | 2021-02-08 | 2022-08-11 | 南京邮电大学 | Super-resolution reconstruction method for video frame |
US11995796B2 (en) * | 2021-02-08 | 2024-05-28 | Nanjing University Of Posts And Telecommunications | Method of reconstruction of super-resolution of video frame |
US20220261959A1 (en) * | 2021-02-08 | 2022-08-18 | Nanjing University Of Posts And Telecommunications | Method of reconstruction of super-resolution of video frame |
CN113033616B (en) * | 2021-03-02 | 2022-12-02 | 北京大学 | High-quality video reconstruction method, device, equipment and storage medium |
CN113033616A (en) * | 2021-03-02 | 2021-06-25 | 北京大学 | High-quality video reconstruction method, device, equipment and storage medium |
CN113205456A (en) * | 2021-04-30 | 2021-08-03 | 东北大学 | Super-resolution reconstruction method for real-time video session service |
CN113205456B (en) * | 2021-04-30 | 2023-09-22 | 东北大学 | Super-resolution reconstruction method for real-time video session service |
CN113066013A (en) * | 2021-05-18 | 2021-07-02 | 广东奥普特科技股份有限公司 | Method, system, device and storage medium for generating visual image enhancement |
WO2022242029A1 (en) * | 2021-05-18 | 2022-11-24 | 广东奥普特科技股份有限公司 | Generation method, system and apparatus capable of visual resolution enhancement, and storage medium |
CN113139907B (en) * | 2021-05-18 | 2023-02-14 | 广东奥普特科技股份有限公司 | Generation method, system, device and storage medium for visual resolution enhancement |
CN113139907A (en) * | 2021-05-18 | 2021-07-20 | 广东奥普特科技股份有限公司 | Generation method, system, device and storage medium for visual resolution enhancement |
WO2022241995A1 (en) * | 2021-05-18 | 2022-11-24 | 广东奥普特科技股份有限公司 | Visual image enhancement generation method and system, device, and storage medium |
CN113066014A (en) * | 2021-05-19 | 2021-07-02 | 云南电网有限责任公司电力科学研究院 | Image super-resolution method and device |
CN113066014B (en) * | 2021-05-19 | 2022-09-02 | 云南电网有限责任公司电力科学研究院 | Image super-resolution method and device |
CN113487481A (en) * | 2021-07-02 | 2021-10-08 | 河北工业大学 | Circular video super-resolution method based on information construction and multi-density residual block |
CN113610706A (en) * | 2021-07-19 | 2021-11-05 | 河南大学 | Fuzzy monitoring image super-resolution reconstruction method based on convolutional neural network |
CN113724136A (en) * | 2021-09-06 | 2021-11-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Video restoration method, device and medium |
CN113902620A (en) * | 2021-10-25 | 2022-01-07 | 浙江大学 | Video super-resolution system and method based on deformable convolution network |
CN113902623A (en) * | 2021-11-22 | 2022-01-07 | 天津大学 | Method for super-resolution of arbitrary-magnification video by introducing scale information |
CN114429602A (en) * | 2022-01-04 | 2022-05-03 | 北京三快在线科技有限公司 | Semantic segmentation method and device, electronic equipment and storage medium |
CN114862688A (en) * | 2022-03-14 | 2022-08-05 | 杭州群核信息技术有限公司 | Video frame insertion method, device and system based on deep learning |
CN114862688B (en) * | 2022-03-14 | 2024-08-16 | 杭州群核信息技术有限公司 | Video frame inserting method, device and system based on deep learning |
WO2023185284A1 (en) * | 2022-03-31 | 2023-10-05 | 网银在线(北京)科技有限公司 | Video processing method and apparatuses |
CN114494023A (en) * | 2022-04-06 | 2022-05-13 | 电子科技大学 | Video super-resolution implementation method based on motion compensation and sparse enhancement |
CN115115516A (en) * | 2022-06-27 | 2022-09-27 | 天津大学 | Real-world video super-resolution algorithm based on Raw domain |
CN115396710A (en) * | 2022-08-09 | 2022-11-25 | 深圳乐播科技有限公司 | Method for H5 or small program to project short video and related device |
CN115035230A (en) * | 2022-08-12 | 2022-09-09 | 阿里巴巴(中国)有限公司 | Video rendering processing method, device and equipment and storage medium |
CN116128735A (en) * | 2023-04-17 | 2023-05-16 | 中国工程物理研究院电子工程研究所 | Multispectral image demosaicing structure and method based on densely connected residual error network |
CN116797462A (en) * | 2023-08-18 | 2023-09-22 | 深圳市优森美科技开发有限公司 | Real-time video super-resolution reconstruction method based on deep learning |
CN116797462B (en) * | 2023-08-18 | 2023-10-24 | 深圳市优森美科技开发有限公司 | Real-time video super-resolution reconstruction method based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111583112A (en) | Method, system, device and storage medium for video super-resolution | |
CN111311490B (en) | Video super-resolution reconstruction method based on multi-frame fusion optical flow | |
CN111028150B (en) | Rapid space-time residual attention video super-resolution reconstruction method | |
CN115222601A (en) | Image super-resolution reconstruction model and method based on residual mixed attention network | |
US20190124346A1 (en) | Real time end-to-end learning system for a high frame rate video compressive sensing network | |
CN114677304B (en) | Image deblurring algorithm based on knowledge distillation and deep neural network | |
CN110610467B (en) | Multi-frame video compression noise removing method based on deep learning | |
CN112699844A (en) | Image super-resolution method based on multi-scale residual error level dense connection network | |
CN113747242A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN115689917A (en) | Efficient space-time super-resolution video compression restoration method based on deep learning | |
CN114757828A (en) | Transformer-based video space-time super-resolution method | |
CN113850718A (en) | Video synchronization space-time super-resolution method based on inter-frame feature alignment | |
CN114926336A (en) | Video super-resolution reconstruction method and device, computer equipment and storage medium | |
CN114372918A (en) | Super-resolution image reconstruction method and system based on pixel level attention mechanism | |
CN111860363A (en) | Video image processing method and device, electronic equipment and storage medium | |
Yue et al. | A global appearance and local coding distortion based fusion framework for CNN based filtering in video coding | |
CN116883265A (en) | Image deblurring method based on enhanced feature fusion mechanism | |
Chandramouli et al. | A generative model for generic light field reconstruction | |
CN111833245A (en) | Super-resolution reconstruction method based on multi-scene video frame supplementing algorithm | |
CN113393382B (en) | Binocular picture super-resolution reconstruction method based on multi-dimensional parallax prior | |
Hu et al. | Store and fetch immediately: Everything is all you need for space-time video super-resolution | |
CN115797178B (en) | Video super-resolution method based on 3D convolution | |
Pang et al. | Video super-resolution using a hierarchical recurrent multireceptive-field integration network | |
CN115065796A (en) | Method and device for generating video intermediate frame | |
Wang et al. | Bi-RSTU: Bidirectional recurrent upsampling network for space-time video super-resolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200825 |