CN111583112A - Method, system, device and storage medium for video super-resolution - Google Patents

Method, system, device and storage medium for video super-resolution Download PDF

Info

Publication number
CN111583112A
CN111583112A CN202010353851.XA CN202010353851A CN111583112A CN 111583112 A CN111583112 A CN 111583112A CN 202010353851 A CN202010353851 A CN 202010353851A CN 111583112 A CN111583112 A CN 111583112A
Authority
CN
China
Prior art keywords
resolution
frame
resolution video
video
video frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010353851.XA
Other languages
Chinese (zh)
Inventor
王�华
金龙存
彭新一
刘闯闯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010353851.XA priority Critical patent/CN111583112A/en
Publication of CN111583112A publication Critical patent/CN111583112A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Television Systems (AREA)

Abstract

The invention discloses a video super-resolution generation method, a system, a device and a storage medium, wherein the method comprises the steps of obtaining a low-resolution video frame to be processed, processing the low-resolution video frame through a video super-resolution model, outputting a high-resolution video, and collecting a training sample, wherein the training sample comprises a high-resolution video frame sample and a low-resolution video frame sample; and establishing a video super-resolution model based on a preset loss function and the high-resolution video frame sample according to the acquired training sample. The method realizes motion compensation processing and feature enhancement between low-resolution video frames and restores high-frequency information of the video frames through the selected video super-resolution model, so that the output high-resolution video contains more image details, the definition of the video is improved, and the interference of optical flow errors in the optical flow-based video super-resolution method on the restoration of the final video frames is avoided. The method can be widely applied to the technical field of image processing.

Description

Method, system, device and storage medium for video super-resolution
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, a system, an apparatus, and a storage medium for video super-resolution.
Background
In recent years, with the increasing demand for image and video quality, how to improve the image and video quality becomes an increasingly important issue. The video super-resolution aims to repair a low-resolution video, so that the video contains more detail information, and the definition of the video is improved. The video super-resolution technology has important practical significance; for example, in the field of video monitoring, the resolution of a camera is limited or the camera is too far away from a shot target, and the obtained monitoring video has the problems of low resolution and difficulty in distinguishing the target, so that the problem of difficulty in mining required information from the video is solved. Through the video super-resolution technology, the video can be recovered to a certain extent, and the quality of the monitoring video is improved. In the aspect of video entertainment, with the rapid development of high-resolution display devices, the corresponding ultra-high-resolution video film sources are in short supply, and meanwhile, the network transmission of ultra-high-resolution videos is also difficult. The video super-resolution technology can make up for missing film sources, visual experience of audiences is improved, and low-resolution videos can be restored through the super-resolution technology after transmission is completed, so that transmission cost is greatly saved, and transmission efficiency is improved.
Current video super-resolution methods can be divided into two major categories: the super-resolution method based on single-frame images and the super-resolution method based on multi-frame images. The single-frame image super-resolution method is used for completing the video super-resolution task, the motion correlation of video frames can be ignored, and the video super-resolution result with higher fidelity can not be obtained by utilizing time domain information in multiple frames, so that the method is a suboptimal option. As the extension of the single-frame image super-resolution algorithm, the multi-frame image based super-resolution method can better utilize inter-frame complementary information and improve the quality of a super-resolution result.
In recent years, with the development of deep learning and convolutional neural networks, a video super-resolution technology based on multi-frame images has made a great breakthrough. However, in the case of complex motion or large-scale motion, how to maintain high-precision video super-resolution is still a difficult problem, and the performance of the algorithm still needs to be improved. At present, many video super-resolution algorithms based on convolutional neural network perform motion estimation on video frames through optical flow, and explicitly perform motion compensation processing so as to extract valuable information from aligned video frames. Due to the introduction of an additional optical flow estimation network, an end-to-end architecture cannot be realized, and meanwhile, optical flow errors can interfere with the recovery of a final video frame, so that an optimal super-resolution result cannot be generated. Therefore, a more accurate and efficient video super-resolution method is needed to further improve the recovery capability of the video super-resolution network, so that the video super-resolution network can cope with video super-resolution tasks in various complex scenes.
Disclosure of Invention
In order to solve the above technical problems, it is an object of the present invention to provide a method, system, apparatus and storage medium for generating video super-resolution.
The first technical scheme adopted by the invention is as follows:
the method for generating the video super-resolution comprises the following steps:
acquiring a low-resolution video frame to be processed;
processing the low-resolution video frame through a video super-resolution model, and outputting a high-resolution video;
the video super-resolution model training process comprises the following steps:
acquiring training samples, wherein the training samples comprise high-resolution video frame samples and low-resolution video frame samples;
and establishing a video super-resolution model based on a preset loss function and the high-resolution video frame sample according to the acquired training sample.
Optionally, the step of acquiring a training sample, where the training sample includes a high resolution video frame sample and a low resolution video frame sample, specifically includes the following steps:
collecting a high-resolution video sample, obtaining a high-resolution video frame sample by adopting a threshold lens segmentation algorithm, and backing up the high-resolution video frame sample;
adopting an image scaling algorithm to carry out down-sampling on the high-resolution video frame sample to generate a low-resolution video frame sample;
and acquiring a high-resolution video frame sample and a low-resolution video frame sample to establish a training sample.
Optionally, the step of establishing a video super-resolution model based on the preset loss function and the high-resolution video frame sample according to the acquired training sample specifically includes the following steps:
acquiring a set number of low-resolution video frame samples, and setting a reference frame and an adjacent frame;
extracting features of the reference frame and the adjacent frame based on a residual error network, and generating reference frame features and adjacent frame features;
aligning the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, carrying out first series connection on the aligned adjacent frame characteristics and the aligned reference frame characteristics, and outputting characteristic data fused with high-frequency information;
transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection, and reconstructing a high-resolution video frame;
and reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function, and establishing a video super-resolution model.
Optionally, the deformable convolution network is provided with 5 variable convolution layers, a multi-level feature fusion structure formed by 8 void convolutions, and 2 convolution kernels, and the step of aligning the adjacent frames by combining the deformable convolution network, the reference frame features, and the adjacent features specifically includes the following steps:
before each variable convolution layer is input, performing second concatenation on the adjacent frame features and the reference frame features in the channel dimension;
after the series connection of adjacent frame features is compressed by each convolution kernel and is superposed by cavity convolution, the offset and the adjustment coefficient of the convolution kernel are output;
and each variable convolution layer carries out self-adaptive sampling on adjacent features according to the convolution kernel offset and the adjusting coefficient, and outputs the adjacent frame features after motion compensation.
Optionally, the step of establishing a correlation between the adjacent frame and the reference frame by using a preset function and a relationship matrix, performing a first concatenation on the aligned adjacent frame feature and the aligned reference frame feature, and outputting feature data fused with high-frequency information specifically includes the following steps:
determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by using the relation matrix;
determining the correlation degree of the adjacent frame and the reference frame by adopting a preset function according to the mapping relation;
aligning the regions with the unaligned adjacent frame features in a jumping connection mode according to the correlation;
and performing first series connection on the aligned adjacent frame features and the reference frame features, and outputting feature data fused with high-frequency information.
Optionally, the step of reconstructing the high-resolution video frame by using residual dense connection to transmit the feature data fused with the high-frequency information into the reference frame feature specifically includes the following steps:
globally jumping and accessing the reference frame characteristics based on the characteristic data which is connected and fused with high-frequency information by dense connection and residual error;
and rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
Optionally, the step of processing the low-resolution video frame through the video super-resolution model and outputting the high-resolution video specifically includes the following steps:
inputting the low-resolution video frame into a video super-resolution model in a sliding window mode, and outputting a high-resolution video frame;
searching adjacent frames nearest to the video frame of the starting end or the tail end of the video frame sequence, complementing the number of the adjacent frames, and outputting a high-resolution starting end or tail end video frame;
and recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence to output the high-resolution video.
The second technical scheme adopted by the invention is as follows:
a video super-resolution generation system, comprising:
the acquisition module is used for acquiring a low-resolution video frame to be processed;
the output module is used for processing the low-resolution video frame through a video super-resolution model and outputting a high-resolution video;
the training module comprises:
the sampling submodule is used for acquiring a training sample, and the training sample contains a high-resolution video frame sample and a low-resolution video frame sample;
and the model establishing submodule is used for establishing a video super-resolution model based on a preset loss function and a high-resolution video frame sample according to the collected training sample.
Optionally, the acquisition sub-module comprises:
the acquisition unit is used for acquiring a high-resolution video sample, and obtaining and backing up the high-resolution video frame sample by adopting a threshold lens segmentation algorithm;
the sampling unit is used for carrying out downsampling on the high-resolution video frame sample by adopting an image scaling algorithm to generate a low-resolution video frame sample;
and the sample establishing unit is used for acquiring the high-resolution video frame sample and the low-resolution video frame sample to establish a training sample.
Optionally, the model building submodule includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a set number of low-resolution video frame samples and setting a reference frame and an adjacent frame;
the generating unit is used for extracting features of the reference frame and the adjacent frame based on a residual error network and generating reference frame features and adjacent frame features;
the alignment unit is used for carrying out alignment processing on the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
the first output unit is used for establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, performing first series connection on the aligned adjacent frame characteristics and the aligned reference frame characteristics, and outputting characteristic data fused with high-frequency information;
the reconstruction unit is used for transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection and reconstructing a high-resolution video frame;
and the model establishing unit is used for reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function and establishing a video super-resolution model.
Optionally, the deformable convolutional network is provided with 5 variable convolutional layers, a multi-level feature fusion structure formed by 8 cavity convolutions, and 2 convolution kernels, and the alignment unit includes:
a second concatenation subunit, configured to perform a second concatenation of the adjacent frame feature and the reference frame feature in the channel dimension before inputting each of the variable convolutional layers;
the first output subunit is used for outputting convolution kernel offset and an adjusting coefficient after the characteristics of the adjacent frames after series connection are compressed by each convolution kernel and are superposed by cavity convolution;
and the second output subunit is used for performing self-adaptive sampling on the adjacent features by each variable convolution layer according to the convolution kernel offset and the adjusting coefficient and outputting the adjacent frame features after motion compensation.
Optionally, the first output unit includes:
the first determining subunit is used for determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by adopting the relation matrix;
the second determining subunit is configured to determine, according to the mapping relationship, a correlation degree between the adjacent frame and the reference frame by using a preset function;
the alignment subunit is used for performing alignment processing on the regions with the unaligned adjacent frame features in a jumping connection mode according to the correlation degree;
and the third output subunit is used for performing first series connection on the aligned adjacent frame features and the reference frame features and outputting feature data fused with high-frequency information.
Optionally, the reconstruction unit comprises:
the access subunit is used for globally jumping and accessing the reference frame characteristics based on the characteristic data which is formed by fusing the dense connection and the residual connection and has high-frequency information;
and the rearrangement subunit is used for rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
Optionally, the output module includes:
the second output unit is used for inputting the low-resolution video frame into the video super-resolution model in a sliding window mode and outputting the high-resolution video frame;
a third output unit, configured to search for an adjacent frame that is closest to a start end or a tail end video frame of the sequence of video frames, complement the number of the adjacent frames, and output a high resolution start end or tail end video frame;
and the fourth output unit is used for recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence and outputting the high-resolution video.
The third technical scheme adopted by the invention is as follows:
an apparatus, the memory for storing at least one program, the processor for loading the at least one program to perform the method described above.
The fourth technical scheme adopted by the invention is as follows:
a storage medium having stored therein a processor-executable program for performing the method as described above when executed by a processor.
The invention has the beneficial effects that: the method has the advantages that the obtained low-resolution video frame to be processed is subjected to resolution processing by adopting the training sample containing the high-resolution video frame sample and the low-resolution video frame sample and the video training model established by the preset loss function, so that the effect of recovering the low-resolution video frame into the high-resolution video frame can be accurately and efficiently realized, and the interference of optical flow errors in the optical flow video super-resolution method on the recovery of the final video frame can be avoided.
Drawings
FIG. 1 is a flow chart illustrating steps of a method for generating super-resolution video provided by the present invention;
FIG. 2 is a block diagram of a system for generating super-resolution video provided by the present invention;
FIG. 3 is a schematic flow chart of a video super-resolution model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating the operation of the deformable convolution layer in a deforming operation in accordance with an embodiment of the present invention;
FIG. 5 is a diagram illustrating a multi-level feature fusion structure in a deformable convolutional network according to an embodiment of the present invention;
FIG. 6 is a structural diagram illustrating the correlation between adjacent frames and reference frames according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of residual error tight connection in the reconstruction operation according to an embodiment of the present invention;
FIG. 8 is a graph comparing visualization results using the existing optimal solution of Visd4 data set with the solution of the present application;
FIG. 9 is a comparison graph of the visualization results of the prior optimal solution and the solution of the present application using the SPMCS data set;
FIG. 10 is a graph comparing visualization results of the prior optimal solution using the Vimeo-90K-T data set with the solution of the present application.
Detailed Description
Example 1
As shown in fig. 1, the present embodiment provides a method for generating a video super-resolution, which includes the following steps:
s1, acquiring a low-resolution video frame to be processed, wherein the video frame comprises a complex motion scene;
s2, processing the low-resolution video frame through a video super-resolution model, and outputting a high-resolution video;
the video super-resolution model training process comprises the following steps:
s3, collecting training samples, wherein the training samples comprise high-resolution video frame samples and low-resolution video frame samples;
and S4, establishing a video super-resolution model based on the preset loss function and the high-resolution video frame sample according to the collected training sample.
Optionally, the step S2 includes:
s21, inputting the low-resolution video frame into a video super-resolution model in a sliding window mode, and outputting a high-resolution video frame;
s22, searching the adjacent frame nearest to the video frame at the start end or the tail end of the video frame sequence, complementing the number of the adjacent frames, and outputting the video frame at the start end or the tail end of high resolution;
and S23, recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence, and outputting the high-resolution video.
Optionally, the step S31 includes:
s31, collecting a high-resolution video sample, obtaining the high-resolution video frame sample by adopting a threshold shot segmentation algorithm and backing up the high-resolution video frame sample;
s32, adopting an image scaling algorithm to carry out down-sampling on the high-resolution video frame sample to generate a low-resolution video frame sample;
and S33, acquiring the high-resolution video frame sample and the low-resolution video frame sample to establish a training sample.
Optionally, the step S4 includes:
s41, acquiring a set number of low-resolution video frame samples, and setting a reference frame and an adjacent frame;
s42, extracting features of the reference frame and the adjacent frame based on a residual error network, and generating reference frame features and adjacent frame features;
s43, aligning the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
s44, establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, carrying out first series connection on the aligned adjacent frame characteristics and the reference frame characteristics, and outputting characteristic data fused with high-frequency information;
s45, transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection, and reconstructing a high-resolution video frame;
and S46, reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function, and establishing a video super-resolution model.
Optionally, the variable convolutional network is provided with 5 variable convolutional layers, a multi-level feature fusion structure formed by 8 cavity convolutions, and 2 convolutional kernels, and the step S43 includes:
s431, before inputting each variable convolution layer, performing second series connection on the adjacent frame features and the reference frame features in the channel dimension;
s432, after compressing each convolution kernel and performing cavity convolution and superposition on the adjacent frame features after series connection, outputting convolution kernel offset and an adjusting coefficient;
and S433, each variable convolution layer carries out self-adaptive sampling on adjacent features according to the convolution kernel offset and the adjusting coefficient, and outputs the adjacent frame features after motion compensation.
Optionally, the step S44 includes:
s441, determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by using the relation matrix;
s442, determining the correlation degree between the adjacent frame and the reference frame by adopting a preset function according to the mapping relation;
s443, aligning the regions with the unaligned adjacent frame features by adopting a jump connection mode according to the correlation;
and S444, carrying out first series connection on the aligned adjacent frame features and the reference frame features, and outputting feature data fused with high-frequency information.
Optionally, the step S45 includes:
s451, globally jumping access reference frame features of feature data fused with high-frequency information based on dense connection and residual connection;
and S452, rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
Example 2
As shown in fig. 2, the present embodiment provides a system for generating a video super-resolution, the system including:
the acquisition module is used for acquiring a low-resolution video frame to be processed;
the output module is used for processing the low-resolution video frame through a video super-resolution model and outputting a high-resolution video;
the training module comprises:
the sampling submodule is used for acquiring a training sample, and the training sample contains a high-resolution video frame sample and a low-resolution video frame sample;
and the model establishing submodule is used for establishing a video super-resolution model based on a preset loss function and a high-resolution video frame sample according to the collected training sample.
Optionally, the acquisition sub-module comprises:
the acquisition unit is used for acquiring a high-resolution video sample, and obtaining and backing up the high-resolution video frame sample by adopting a threshold lens segmentation algorithm;
the sampling unit is used for carrying out downsampling on the high-resolution video frame sample by adopting an image scaling algorithm to generate a low-resolution video frame sample;
and the sample establishing unit is used for acquiring the high-resolution video frame sample and the low-resolution video frame sample to establish a training sample.
Optionally, the model building submodule includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a set number of low-resolution video frame samples and setting a reference frame and an adjacent frame;
the generating unit is used for extracting features of the reference frame and the adjacent frame based on a residual error network and generating reference frame features and adjacent frame features;
the alignment unit is used for carrying out alignment processing on the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
the first output unit is used for establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, performing first series connection on the aligned adjacent frame characteristics and the aligned reference frame characteristics, and outputting characteristic data fused with high-frequency information;
the reconstruction unit is used for transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection and reconstructing a high-resolution video frame;
and the model establishing unit is used for reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function and establishing a video super-resolution model.
Optionally, the deformable convolutional network is provided with 5 variable convolutional layers, a multi-level feature fusion structure formed by 8 cavity convolutions, and 2 convolution kernels, and the alignment unit includes:
a second concatenation subunit, configured to perform a second concatenation of the adjacent frame feature and the reference frame feature in the channel dimension before inputting each of the variable convolutional layers;
the first output subunit is used for outputting convolution kernel offset and an adjusting coefficient after the characteristics of the adjacent frames after series connection are compressed by each convolution kernel and are superposed by cavity convolution;
and the second output subunit is used for performing self-adaptive sampling on the adjacent features by each variable convolution layer according to the convolution kernel offset and the adjusting coefficient and outputting the adjacent frame features after motion compensation.
Optionally, the first output unit includes:
the first determining subunit is used for determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by adopting the relation matrix;
the second determining subunit is configured to determine, according to the mapping relationship, a correlation degree between the adjacent frame and the reference frame by using a preset function;
the alignment subunit is used for performing alignment processing on the regions with the unaligned adjacent frame features in a jumping connection mode according to the correlation degree;
and the third output subunit is used for performing first series connection on the aligned adjacent frame features and the reference frame features and outputting feature data fused with high-frequency information.
Optionally, the reconstruction unit comprises:
the access subunit is used for globally jumping and accessing the reference frame characteristics based on the characteristic data which is formed by fusing the dense connection and the residual connection and has high-frequency information;
and the rearrangement subunit is used for rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
Optionally, the output module includes:
the second output unit is used for inputting the low-resolution video frame into the video super-resolution model in a sliding window mode and outputting the high-resolution video frame;
a third output unit, configured to search for an adjacent frame that is closest to a start end or a tail end video frame of the sequence of video frames, complement the number of the adjacent frames, and output a high resolution start end or tail end video frame;
and the fourth output unit is used for recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence and outputting the high-resolution video.
Example 3
The present embodiments provide an apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one program causes the at least one processor to implement the steps of a method for generating video super-resolution as described in embodiment 1 above.
Example 4
A storage medium having stored therein a program executable by a processor, the program being executed by the processor for performing the steps of a method for generating video super-resolution as described in embodiment 1.
Example 5
Referring to fig. 3 to 10, a flow chart of a method for generating video super-resolution specifically includes the following steps:
A. acquiring training samples, wherein the training samples comprise high-resolution video frame samples and low-resolution video frame samples;
B. establishing a video super-resolution model according to the collected training samples;
C. acquiring a low-resolution video frame to be processed;
D. and processing the low-resolution video frame to be processed through the video super-resolution model, and outputting a high-resolution video.
Wherein, the specific implementation scheme of the step A is as follows:
a1, acquiring a public large-scale video data set Vimeo-90K as a training data set. The data set comprises a plurality of video frames with different motion scale ranges, so that the trained video resolution model has better generalization capability. The data set consisted of 64612 training samples, each sample containing 7 consecutive video frames of the same scene, of size 448 × 256.
A2, backing up the high-resolution video frame sample by using an image scaling algorithm such as an imresize function of MATLAB, then carrying out double-triple down-sampling for 4 times to obtain a corresponding low-resolution video frame sample, wherein the size of the low-resolution video frame sample is 112 x 64, and the backed-up high-resolution video frame sample and the low-resolution video frame sample generated by sampling form a pair of training samples; and a horizontal or vertical overturning, 90-degree rotation and random cutting of image blocks are adopted as a data enhancement mode.
The specific embodiment of the step B is as follows:
b1, selecting continuous 7 low-resolution video frame samples, randomly cutting 50 × 50 image blocks at the same position as input, wherein the middle frame is used as a reference frame to be restored and is marked as a reference frame to be restored
Figure BDA0002472799810000101
Other frames as neighboring frames to aid recovery, noted
Figure BDA0002472799810000102
i∈[t-3,t+3]And i ≠ t.
B2, using residual error network to input reference frame
Figure BDA0002472799810000103
And adjacent frame
Figure BDA0002472799810000104
Composed 7 low resolution video frame samples
Figure BDA0002472799810000105
And carrying out shallow feature extraction.
It can be understood that the feature extraction module HfeaThe resulting features contain 64 channels, which are the same size as the input picture, where the residual network consists of 5 concatenated residual blocks, each residual block contains two 3 × 3 convolutional layers, a ReLU activation function, and a skip connection.
Figure BDA0002472799810000106
B3, feature F in all video framesTIn (1), the obtained reference frame feature is marked as FtFeatures of adjacent frames are denoted as Fi. Using a deformable convolutional network to characterize each adjacent frame by FiAligning to reference frame feature FtIn (1), the aligned adjacent frame is characterized as
Figure BDA0002472799810000111
The part consisting of the deformable convolutional network can be understood as the alignment block HalignReferring to fig. 4 and 5, the alignment module (deformable convolutional network) comprises a multilevel feature fusion structure formed by convolving 5 cascaded deformable convolutional layers, 23 × 3 convolutional layers and 8 3 × 3 holes with the hole rates of 1 to 8 respectivelytAnd adjacent frame feature FiFirstly, series connection is carried out in channel dimension, the number of channels is compressed back to 64 through a 3 × 3 convolutional layer, then 8 3 × 3 cavity convolutions are used for effectively expanding the reception field, the output characteristic channel number is 32, the cavity convolution results are superposed and summed one by one to obtain the convolution results after superposition of 8 reception fields from small to large, after series connection, 1 × 1 convolution is used for compressing the number of channels to 64, and then two parameters needed by a deformable convolution kernel generated by a 3 × 3 convolutional layer are the convolution kernel offset delta PiAnd the regulating coefficient Δ Mi
This process can be expressed as:
ΔPi,ΔMi=f([Fi,Ft])
the characteristic fusion can effectively enlarge the receptive field through a space pyramid structure formed by the cavity convolution, and the convolution results with different cavity rates are overlapped, so that the acquired information is richer, and the method is greatly helpful for capturing the motion relation of the adjacent frame characteristics and the reference frame characteristics on the pixel level and generating more accurate deformable convolution parameters. Method for obtaining convolution kernel offset delta P by deformable convolution layeriAnd the regulating coefficient Δ MiThen, the feature F of the adjacent frame can be adaptively setiAnd performing up-sampling to realize implicit motion compensation processing.
F is to bei,b-1And Fi,bThe operation of the deformable convolution as input and output to one of the deformable convolution layers can be expressed as follows:
Figure BDA0002472799810000112
wherein p iskK sample point locations, ω, representing the convolution kernelkFor a convolution kernel of 3 × 3, K is 9 and pk∈ { (-1, -1), (-1,0), …, (1,1) }. Deformable convolution is offset by an additional convolution kernel Δ pi,kSo that the sampling position can be correspondingly adjusted according to different central points p, and the coefficient delta m can be adjusted simultaneouslyi,kSo that the corresponding weights of the convolution kernels can also be dynamically changed. Wherein, Δ Pi={Δpi,k},ΔMi={Δmi,kAnd the whole sampling process is self-adaptive and can accept end-to-end training, so that an excellent motion compensation effect is realized.
After passing through 5 deformable convolution layers, the adjacent frame feature FiAnd a coarse-to-fine alignment process is realized, and the alignment precision is gradually improved. The alignment module has the following formula:
Figure BDA0002472799810000113
b4, respectively processing the adjacent frame features and the reference frame features by adopting 1 × 1 convolution, performing matrix multiplication after dimensional transformation, and obtaining the correlation degree between the adjacent frame and the reference frame by using a softmax function, namely the correlation degree between a certain pixel point in the adjacent frame features and all pixel points in the reference frame features; and performing first series connection on the aligned adjacent frame characteristics and the reference frame characteristics, and outputting characteristic data fused with high-frequency information.
This section can be understood as the attention module HnlI.e., the areas where the alignment module failed to align adjacent frames in step B3 are again emphasized. The attention module is designed based on a non-local mechanism, and the calculation in the module can be expressed as:
x′p=wzsoftmax((wuxp)Twvyq)(wgyq)+xp
wherein x ispAnd yqRespectively representing input adjacent frame features
Figure BDA0002472799810000121
And reference frame feature FtOne pixel point of, x'pPixel points, W, on adjacent frame features representing corresponding outputsuxp,WvyqAnd WgyqRespectively representing data obtained by performing convolution transformation on input adjacent frame features and reference frame features through 3 pieces of 1 × 1, WzIt is indicated that the feature data obtained by the correlation calculation is subjected to 1 × 1 convolution transformation again.
The output of the attention module is noted
Figure BDA0002472799810000122
The process can be expressed as:
Figure BDA0002472799810000123
all adjacent frame features
Figure BDA0002472799810000124
And reference frame FtPerforming series connection in channel dimension, compressing the number of channels by using one 3 × 3 convolution layer, and outputting feature data F fused with high-frequency informationfusionIn which F isfusionCan be expressed as:
Figure BDA0002472799810000125
b5, adopting residual error dense connection to fuse the feature data F with the high-frequency informationfusionIncoming reference frame feature FtReconstructing high resolution video frames
Figure BDA0002472799810000126
The residual dense connections in this section can be understood as reconstruction modules. The reconstruction block, i.e. (residual tightly connected part) comprises 23 cascaded residual tightly connected blocks HRRDBsAnd a globalA jump connection. As shown in FIG. 7, each residual error dense block HRRDBsEach dense connection block consists of 5 convolutional layers, the increment number of channels in each dense connection block is set to be 32, and the output of each convolutional layer is transmitted to a subsequent convolutional layer in the dense connection block as an additional input through a plurality of jump connections. The residual error dense connection block combines the advantages of dense connection and residual error connection, and effectively extracts high-frequency information contained in the characteristics by utilizing multi-layer characteristics. Global skip connect incoming reference frame feature FtAt the end of the network, a 3 × 3 convolutional layer is used to expand the number of channels to 64 × 16, a sub-pixel upsampling layer is used to rearrange the pixels in the channel dimension to the spatial dimension, resulting in a 4-fold expanded feature with a channel number of 64, and then a 3 × 3 convolutional layer is used to output a 3-channel high resolution reference frame
Figure BDA0002472799810000127
This operation is denoted as Hrec. The reconstruction process can be expressed as follows:
Figure BDA0002472799810000128
b6, adopting loss function to reconstruct high-resolution video frame
Figure BDA0002472799810000131
And reversely converging with the backed-up high-resolution video frame sample to establish a video super-resolution model.
Loss function L1The formula is as follows:
Figure BDA0002472799810000132
where W, H and C represent the width, height, and number of channels, respectively, of the high resolution video frame. And setting a learning rate, reversely propagating the gradient by minimizing the loss function error, updating network parameters, and continuously iterating until the network is trained to be convergent.
In the backward convergence training, the batch size is set to 8, and the initial learning rate is set to 10-4In the iterative training process, according to the convergence condition of the network, the learning rate can be halved for the first time after 70 periods, the training of the video resolution model is accelerated, and then the learning rate is halved every 20 periods1=0.9,β20.999 and ∈ -10-8. Using L1Loss function, calculation of high resolution video frames generated by video resolution model
Figure BDA0002472799810000133
With the original high resolution video frame
Figure BDA0002472799810000134
And back-propagates the updated network parameters by minimizing the error. And training the network to be converged in 120 periods. Then the loss function L is used2And continuously training for 10 periods, and finely adjusting network parameters to further improve the performance of the video resolution model.
The scheme of the step C is specifically as follows:
respectively acquiring a Vid4 data set, an SPMCS data set and a Vimeo-90K-T data set, wherein the data sets comprise various videos of large-scale motion scenes and complex motion scenes, and video frames under the same shot are segmented and extracted in advance according to the scenes.
The scheme of the step D is specifically as follows:
b, extracting frames of the Vid4 data set, the SPMCS data set and the Vimeo-90K-T data set video frames to be recovered after splitting at intervals of step 1, inputting continuous 7 low-resolution video frames into a trained video super-resolution model each time, performing the implementation scheme of the step B on the input Vid4 data set, SPMCS data set and Vimeo-90K-T data set video frames through the video super-resolution model respectively in a sliding window-based mode, wherein for the video frames at the starting end or the tail end of the video frame sequence, searching the adjacent frames most adjacent to the video frames at the starting end or the tail end of the video frame sequence, complementing the number of the required adjacent frames, inputting the video super-resolution model in a sliding window-based mode, outputting high-resolution video frames after recombining the high-resolution video frames based on the video frame sequence, as shown in figures 8 to 10, and tables 1 to 3, wherein tables 1 to 3 are corresponding comparison graphs of the existing optimal scheme of the Vid4 dataset, the SPMCS dataset and the Vimeo-90K-T dataset and the scheme of the present application on peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) indexes; fig. 8 to fig. 10 are corresponding comparison graphs of visualization results of the existing optimal scheme using the Vid4 data set, the SPMCS data set and the Vimeo-90K-T data set, respectively, and the scheme of the present application.
TABLE 1
Figure BDA0002472799810000141
TABLE 2
Figure BDA0002472799810000142
TABLE 3
Figure BDA0002472799810000143
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for generating video super-resolution is characterized by comprising the following steps:
acquiring a low-resolution video frame to be processed;
processing the low-resolution video frame through a video super-resolution model, and outputting a high-resolution video;
the video super-resolution model training process comprises the following steps:
acquiring training samples, wherein the training samples comprise high-resolution video frame samples and low-resolution video frame samples;
and establishing a video super-resolution model based on a preset loss function and the high-resolution video frame sample according to the acquired training sample.
2. The method for generating super-resolution video images according to claim 1, wherein the training samples are collected,
the step of training samples containing high resolution video frame samples and low resolution video frame samples specifically comprises the following steps:
collecting a high-resolution video sample, obtaining a high-resolution video frame sample by adopting a threshold lens segmentation algorithm, and backing up the high-resolution video frame sample;
adopting an image scaling algorithm to carry out down-sampling on the high-resolution video frame sample to generate a low-resolution video frame sample;
and acquiring a high-resolution video frame sample and a low-resolution video frame sample to establish a training sample.
3. The method for generating video super-resolution according to claim 2, wherein the step of establishing a video super-resolution model based on the preset loss function and the high-resolution video frame sample according to the collected training samples specifically comprises the following steps:
acquiring a set number of low-resolution video frame samples, and setting a reference frame and an adjacent frame;
extracting features of the reference frame and the adjacent frame based on a residual error network, and generating reference frame features and adjacent frame features;
aligning the adjacent frames by combining the deformable convolution network, the reference frame features and the adjacent features;
establishing the correlation degree of the adjacent frame and the reference frame by adopting a preset function and a relation matrix, carrying out first series connection on the aligned adjacent frame characteristics and the aligned reference frame characteristics, and outputting characteristic data fused with high-frequency information;
transmitting the feature data fused with the high-frequency information into the reference frame features by adopting residual error dense connection, and reconstructing a high-resolution video frame;
and reversely converging the reconstructed high-resolution video frame and the backed-up high-resolution video frame sample based on a preset loss function, and establishing a video super-resolution model.
4. The method for generating video super-resolution according to claim 3, wherein the deformable convolution network has 5 variable convolution layers, a multi-level feature fusion structure composed of 8 hole convolutions and 2 convolution kernels, and the step of aligning the adjacent frames by combining the deformable convolution network, the reference frame feature and the adjacent features specifically comprises the following steps:
before each variable convolution layer is input, performing second concatenation on the adjacent frame features and the reference frame features in the channel dimension;
after the series connection of adjacent frame features is compressed by each convolution kernel and is superposed by cavity convolution, the offset and the adjustment coefficient of the convolution kernel are output;
and each variable convolution layer carries out self-adaptive sampling on adjacent features according to the convolution kernel offset and the adjusting coefficient, and outputs the adjacent frame features after motion compensation.
5. The method for generating video super-resolution according to claim 4, wherein the step of establishing the correlation between the adjacent frame and the reference frame by using a preset function and a relation matrix, performing the first concatenation on the aligned adjacent frame feature and the reference frame feature, and outputting the feature data fused with the high-frequency information specifically comprises the steps of:
determining the mapping relation between any pixel point in the adjacent frame and all pixel points of the reference frame by using the relation matrix;
determining the correlation degree of the adjacent frame and the reference frame by adopting a preset function according to the mapping relation;
aligning the regions with the unaligned adjacent frame features in a jumping connection mode according to the correlation;
and performing first series connection on the aligned adjacent frame features and the reference frame features, and outputting feature data fused with high-frequency information.
6. The method for generating super-resolution video of claim 5, wherein the step of reconstructing the high-resolution video frame by using residual dense connection to transmit the feature data fused with the high-frequency information into the reference frame feature comprises the following steps:
globally jumping and accessing the reference frame characteristics based on the characteristic data which is connected and fused with high-frequency information by dense connection and residual error;
and rearranging the spatial dimension of the pixels of the reference frame by adopting a preset sub-pixel sampling layer and a convolution kernel to establish a high-resolution video frame.
7. The method for generating super-resolution video of claim 1, wherein the step of processing the low-resolution video frame by the super-resolution video model and outputting the high-resolution video comprises the following steps:
inputting the low-resolution video frame into a video super-resolution model in a sliding window mode, and outputting a high-resolution video frame;
searching adjacent frames nearest to the video frame of the starting end or the tail end of the video frame sequence, complementing the number of the adjacent frames, and outputting a high-resolution starting end or tail end video frame;
and recombining the output high-resolution video frame and/or the high-resolution start end or tail end video frame based on the video frame sequence to output the high-resolution video.
8. A system for generating super-resolution video, comprising:
the acquisition module is used for acquiring a low-resolution video frame to be processed;
the output module is used for processing the low-resolution video frame through a video super-resolution model and outputting a high-resolution video;
the training module comprises:
the sampling submodule is used for acquiring a training sample, and the training sample contains a high-resolution video frame sample and a low-resolution video frame sample;
and the model establishing submodule is used for establishing a video super-resolution model based on a preset loss function and a high-resolution video frame sample according to the collected training sample.
9. An apparatus comprising a memory for storing at least one program and a processor for loading the at least one program to perform the method of any one of claims 1-7.
10. A storage medium having stored therein a program executable by a processor, wherein the program executable by the processor is adapted to perform the method of any one of claims 1-7 when executed by the processor.
CN202010353851.XA 2020-04-29 2020-04-29 Method, system, device and storage medium for video super-resolution Pending CN111583112A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010353851.XA CN111583112A (en) 2020-04-29 2020-04-29 Method, system, device and storage medium for video super-resolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010353851.XA CN111583112A (en) 2020-04-29 2020-04-29 Method, system, device and storage medium for video super-resolution

Publications (1)

Publication Number Publication Date
CN111583112A true CN111583112A (en) 2020-08-25

Family

ID=72121524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010353851.XA Pending CN111583112A (en) 2020-04-29 2020-04-29 Method, system, device and storage medium for video super-resolution

Country Status (1)

Country Link
CN (1) CN111583112A (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102166A (en) * 2020-08-26 2020-12-18 上海交通大学 Method and device for combining super-resolution, color gamut expansion and inverse tone mapping
CN112330543A (en) * 2020-12-01 2021-02-05 上海网达软件股份有限公司 Video super-resolution method and system based on self-supervision learning
CN112365403A (en) * 2020-11-20 2021-02-12 山东大学 Video super-resolution recovery method based on deep learning and adjacent frames
CN112669216A (en) * 2021-01-05 2021-04-16 华南理工大学 Super-resolution reconstruction network of parallel cavity new structure based on federal learning
CN112700392A (en) * 2020-12-01 2021-04-23 华南理工大学 Video super-resolution processing method, device and storage medium
CN112750094A (en) * 2020-12-30 2021-05-04 合肥工业大学 Video processing method and system
CN112766340A (en) * 2021-01-11 2021-05-07 中山大学 Depth capsule network image classification method and system based on adaptive spatial mode
CN112785667A (en) * 2021-01-25 2021-05-11 北京有竹居网络技术有限公司 Video generation method, device, medium and electronic equipment
CN113038055A (en) * 2021-01-27 2021-06-25 维沃移动通信有限公司 Image processing method and device and electronic equipment
CN113033616A (en) * 2021-03-02 2021-06-25 北京大学 High-quality video reconstruction method, device, equipment and storage medium
CN113066013A (en) * 2021-05-18 2021-07-02 广东奥普特科技股份有限公司 Method, system, device and storage medium for generating visual image enhancement
CN113066014A (en) * 2021-05-19 2021-07-02 云南电网有限责任公司电力科学研究院 Image super-resolution method and device
CN113139907A (en) * 2021-05-18 2021-07-20 广东奥普特科技股份有限公司 Generation method, system, device and storage medium for visual resolution enhancement
CN113205456A (en) * 2021-04-30 2021-08-03 东北大学 Super-resolution reconstruction method for real-time video session service
CN113487481A (en) * 2021-07-02 2021-10-08 河北工业大学 Circular video super-resolution method based on information construction and multi-density residual block
CN113610706A (en) * 2021-07-19 2021-11-05 河南大学 Fuzzy monitoring image super-resolution reconstruction method based on convolutional neural network
CN113724136A (en) * 2021-09-06 2021-11-30 腾讯音乐娱乐科技(深圳)有限公司 Video restoration method, device and medium
CN113766250A (en) * 2020-09-29 2021-12-07 四川大学 Compressed image quality improving method based on sampling reconstruction and feature enhancement
CN113902620A (en) * 2021-10-25 2022-01-07 浙江大学 Video super-resolution system and method based on deformable convolution network
CN113902623A (en) * 2021-11-22 2022-01-07 天津大学 Method for super-resolution of arbitrary-magnification video by introducing scale information
CN114429602A (en) * 2022-01-04 2022-05-03 北京三快在线科技有限公司 Semantic segmentation method and device, electronic equipment and storage medium
CN114494023A (en) * 2022-04-06 2022-05-13 电子科技大学 Video super-resolution implementation method based on motion compensation and sparse enhancement
CN114862688A (en) * 2022-03-14 2022-08-05 杭州群核信息技术有限公司 Video frame insertion method, device and system based on deep learning
WO2022166245A1 (en) * 2021-02-08 2022-08-11 南京邮电大学 Super-resolution reconstruction method for video frame
CN115035230A (en) * 2022-08-12 2022-09-09 阿里巴巴(中国)有限公司 Video rendering processing method, device and equipment and storage medium
CN115115516A (en) * 2022-06-27 2022-09-27 天津大学 Real-world video super-resolution algorithm based on Raw domain
CN115396710A (en) * 2022-08-09 2022-11-25 深圳乐播科技有限公司 Method for H5 or small program to project short video and related device
CN116128735A (en) * 2023-04-17 2023-05-16 中国工程物理研究院电子工程研究所 Multispectral image demosaicing structure and method based on densely connected residual error network
CN116797462A (en) * 2023-08-18 2023-09-22 深圳市优森美科技开发有限公司 Real-time video super-resolution reconstruction method based on deep learning
WO2023185284A1 (en) * 2022-03-31 2023-10-05 网银在线(北京)科技有限公司 Video processing method and apparatuses

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017219263A1 (en) * 2016-06-22 2017-12-28 中国科学院自动化研究所 Image super-resolution enhancement method based on bidirectional recursion convolution neural network
CN109118431A (en) * 2018-09-05 2019-01-01 武汉大学 A kind of video super-resolution method for reconstructing based on more memories and losses by mixture
CN110120011A (en) * 2019-05-07 2019-08-13 电子科技大学 A kind of video super resolution based on convolutional neural networks and mixed-resolution

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017219263A1 (en) * 2016-06-22 2017-12-28 中国科学院自动化研究所 Image super-resolution enhancement method based on bidirectional recursion convolution neural network
CN109118431A (en) * 2018-09-05 2019-01-01 武汉大学 A kind of video super-resolution method for reconstructing based on more memories and losses by mixture
CN110120011A (en) * 2019-05-07 2019-08-13 电子科技大学 A kind of video super resolution based on convolutional neural networks and mixed-resolution

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG HUA ET AL.: "《Deformable Non-Local Network for Video Super-Resolution》" *

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102166B (en) * 2020-08-26 2023-12-01 上海交通大学 Combined super-resolution, color gamut expansion and inverse tone mapping method and equipment
CN112102166A (en) * 2020-08-26 2020-12-18 上海交通大学 Method and device for combining super-resolution, color gamut expansion and inverse tone mapping
CN113766250A (en) * 2020-09-29 2021-12-07 四川大学 Compressed image quality improving method based on sampling reconstruction and feature enhancement
CN112365403A (en) * 2020-11-20 2021-02-12 山东大学 Video super-resolution recovery method based on deep learning and adjacent frames
CN112365403B (en) * 2020-11-20 2022-12-27 山东大学 Video super-resolution recovery method based on deep learning and adjacent frames
CN112330543A (en) * 2020-12-01 2021-02-05 上海网达软件股份有限公司 Video super-resolution method and system based on self-supervision learning
CN112700392A (en) * 2020-12-01 2021-04-23 华南理工大学 Video super-resolution processing method, device and storage medium
CN112750094A (en) * 2020-12-30 2021-05-04 合肥工业大学 Video processing method and system
CN112750094B (en) * 2020-12-30 2022-12-09 合肥工业大学 Video processing method and system
CN112669216B (en) * 2021-01-05 2022-04-22 华南理工大学 Super-resolution reconstruction network of parallel cavity new structure based on federal learning
CN112669216A (en) * 2021-01-05 2021-04-16 华南理工大学 Super-resolution reconstruction network of parallel cavity new structure based on federal learning
CN112766340B (en) * 2021-01-11 2024-06-04 中山大学 Depth capsule network image classification method and system based on self-adaptive spatial mode
CN112766340A (en) * 2021-01-11 2021-05-07 中山大学 Depth capsule network image classification method and system based on adaptive spatial mode
CN112785667A (en) * 2021-01-25 2021-05-11 北京有竹居网络技术有限公司 Video generation method, device, medium and electronic equipment
CN113038055B (en) * 2021-01-27 2023-06-23 维沃移动通信有限公司 Image processing method and device and electronic equipment
CN113038055A (en) * 2021-01-27 2021-06-25 维沃移动通信有限公司 Image processing method and device and electronic equipment
WO2022166245A1 (en) * 2021-02-08 2022-08-11 南京邮电大学 Super-resolution reconstruction method for video frame
US11995796B2 (en) * 2021-02-08 2024-05-28 Nanjing University Of Posts And Telecommunications Method of reconstruction of super-resolution of video frame
US20220261959A1 (en) * 2021-02-08 2022-08-18 Nanjing University Of Posts And Telecommunications Method of reconstruction of super-resolution of video frame
CN113033616B (en) * 2021-03-02 2022-12-02 北京大学 High-quality video reconstruction method, device, equipment and storage medium
CN113033616A (en) * 2021-03-02 2021-06-25 北京大学 High-quality video reconstruction method, device, equipment and storage medium
CN113205456A (en) * 2021-04-30 2021-08-03 东北大学 Super-resolution reconstruction method for real-time video session service
CN113205456B (en) * 2021-04-30 2023-09-22 东北大学 Super-resolution reconstruction method for real-time video session service
CN113066013A (en) * 2021-05-18 2021-07-02 广东奥普特科技股份有限公司 Method, system, device and storage medium for generating visual image enhancement
WO2022242029A1 (en) * 2021-05-18 2022-11-24 广东奥普特科技股份有限公司 Generation method, system and apparatus capable of visual resolution enhancement, and storage medium
CN113139907B (en) * 2021-05-18 2023-02-14 广东奥普特科技股份有限公司 Generation method, system, device and storage medium for visual resolution enhancement
CN113139907A (en) * 2021-05-18 2021-07-20 广东奥普特科技股份有限公司 Generation method, system, device and storage medium for visual resolution enhancement
WO2022241995A1 (en) * 2021-05-18 2022-11-24 广东奥普特科技股份有限公司 Visual image enhancement generation method and system, device, and storage medium
CN113066014A (en) * 2021-05-19 2021-07-02 云南电网有限责任公司电力科学研究院 Image super-resolution method and device
CN113066014B (en) * 2021-05-19 2022-09-02 云南电网有限责任公司电力科学研究院 Image super-resolution method and device
CN113487481A (en) * 2021-07-02 2021-10-08 河北工业大学 Circular video super-resolution method based on information construction and multi-density residual block
CN113610706A (en) * 2021-07-19 2021-11-05 河南大学 Fuzzy monitoring image super-resolution reconstruction method based on convolutional neural network
CN113724136A (en) * 2021-09-06 2021-11-30 腾讯音乐娱乐科技(深圳)有限公司 Video restoration method, device and medium
CN113902620A (en) * 2021-10-25 2022-01-07 浙江大学 Video super-resolution system and method based on deformable convolution network
CN113902623A (en) * 2021-11-22 2022-01-07 天津大学 Method for super-resolution of arbitrary-magnification video by introducing scale information
CN114429602A (en) * 2022-01-04 2022-05-03 北京三快在线科技有限公司 Semantic segmentation method and device, electronic equipment and storage medium
CN114862688A (en) * 2022-03-14 2022-08-05 杭州群核信息技术有限公司 Video frame insertion method, device and system based on deep learning
CN114862688B (en) * 2022-03-14 2024-08-16 杭州群核信息技术有限公司 Video frame inserting method, device and system based on deep learning
WO2023185284A1 (en) * 2022-03-31 2023-10-05 网银在线(北京)科技有限公司 Video processing method and apparatuses
CN114494023A (en) * 2022-04-06 2022-05-13 电子科技大学 Video super-resolution implementation method based on motion compensation and sparse enhancement
CN115115516A (en) * 2022-06-27 2022-09-27 天津大学 Real-world video super-resolution algorithm based on Raw domain
CN115396710A (en) * 2022-08-09 2022-11-25 深圳乐播科技有限公司 Method for H5 or small program to project short video and related device
CN115035230A (en) * 2022-08-12 2022-09-09 阿里巴巴(中国)有限公司 Video rendering processing method, device and equipment and storage medium
CN116128735A (en) * 2023-04-17 2023-05-16 中国工程物理研究院电子工程研究所 Multispectral image demosaicing structure and method based on densely connected residual error network
CN116797462A (en) * 2023-08-18 2023-09-22 深圳市优森美科技开发有限公司 Real-time video super-resolution reconstruction method based on deep learning
CN116797462B (en) * 2023-08-18 2023-10-24 深圳市优森美科技开发有限公司 Real-time video super-resolution reconstruction method based on deep learning

Similar Documents

Publication Publication Date Title
CN111583112A (en) Method, system, device and storage medium for video super-resolution
CN111311490B (en) Video super-resolution reconstruction method based on multi-frame fusion optical flow
CN111028150B (en) Rapid space-time residual attention video super-resolution reconstruction method
CN115222601A (en) Image super-resolution reconstruction model and method based on residual mixed attention network
US20190124346A1 (en) Real time end-to-end learning system for a high frame rate video compressive sensing network
CN114677304B (en) Image deblurring algorithm based on knowledge distillation and deep neural network
CN110610467B (en) Multi-frame video compression noise removing method based on deep learning
CN112699844A (en) Image super-resolution method based on multi-scale residual error level dense connection network
CN113747242A (en) Image processing method, image processing device, electronic equipment and storage medium
CN115689917A (en) Efficient space-time super-resolution video compression restoration method based on deep learning
CN114757828A (en) Transformer-based video space-time super-resolution method
CN113850718A (en) Video synchronization space-time super-resolution method based on inter-frame feature alignment
CN114926336A (en) Video super-resolution reconstruction method and device, computer equipment and storage medium
CN114372918A (en) Super-resolution image reconstruction method and system based on pixel level attention mechanism
CN111860363A (en) Video image processing method and device, electronic equipment and storage medium
Yue et al. A global appearance and local coding distortion based fusion framework for CNN based filtering in video coding
CN116883265A (en) Image deblurring method based on enhanced feature fusion mechanism
Chandramouli et al. A generative model for generic light field reconstruction
CN111833245A (en) Super-resolution reconstruction method based on multi-scene video frame supplementing algorithm
CN113393382B (en) Binocular picture super-resolution reconstruction method based on multi-dimensional parallax prior
Hu et al. Store and fetch immediately: Everything is all you need for space-time video super-resolution
CN115797178B (en) Video super-resolution method based on 3D convolution
Pang et al. Video super-resolution using a hierarchical recurrent multireceptive-field integration network
CN115065796A (en) Method and device for generating video intermediate frame
Wang et al. Bi-RSTU: Bidirectional recurrent upsampling network for space-time video super-resolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200825