CN113610707A

CN113610707A - Video super-resolution method based on time attention and cyclic feedback network

Info

Publication number: CN113610707A
Application number: CN202110838280.3A
Authority: CN
Inventors: 张庆武; 朱鉴; 蔡金峰; 陈炳丰; 蔡瑞初; 郝志峰
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-11-05
Anticipated expiration: 2041-07-23
Also published as: CN113610707B

Abstract

The invention provides a video super-resolution method based on time attention and a cyclic feedback network, which applies the characteristics of different contribution degrees of visual information provided by adjacent frames with different distances from a target frame to a super-resolution reconstruction effect, a feedback mechanism of a human visual system and a cyclic feedback guidance characteristic in a process of learning new knowledge by human to a video super-resolution technology, adopts a time attention module to learn an attention diagram of a video sequence on a time axis, and can effectively distinguish the contribution of the adjacent frames with different time degrees to the final reconstruction effect; after the video sequences are rearranged, the cyclic feedback module carries out cyclic feedback hyper-resolution to finally obtain a super-resolution network model, and the model has the characteristic of emphatically learning information with high contribution degree to hyper-resolution reconstruction and strong high-level feature learning capability, so that the video super-resolution effect is improved.

Description

Video super-resolution method based on time attention and cyclic feedback network

Technical Field

The invention relates to the technical field of video processing, in particular to a video super-resolution method based on time attention and a circular feedback network.

Background

The video super-resolution method is a method of generating a high-resolution video from a low-resolution video, and has been widely studied for decades as a typical computer vision problem. Not only has great significance in theory, but also has urgent need in practical application. For example, in the aspect of video monitoring, banks, stations, airports, residential areas and the like are provided with a plurality of monitoring cameras, and the video quality can be improved and the detailed information of people and articles can be conveniently observed through a video super-resolution technology; in the aspect of traffic management, because a scene observed by a camera is large, detailed information of vehicles running at high speed and pedestrians cannot be acquired, a multi-video super-resolution reconstruction technology is utilized, the illegal or accident process of the vehicles can be reproduced in more detail, and the license plate or the face of a certain person in a large scene can be identified; in the aspect of criminal investigation work, for low-resolution videos (such as videos shot by cameras in occasions such as banks and streets) obtained in a case scene, the video super-resolution technology is utilized, so that the video quality can be improved; in sports, there are often many objects moving at high speed to be captured (e.g. tennis, table tennis, etc.), and the video super-resolution reconstruction can help us to observe the details of these dynamic events more clearly. With the development of video super-resolution related theories and technologies, video super-resolution has become one of the hot research problems in the field of computer vision.

Compared with the single-frame image super-resolution, the video super-resolution task increases the time sequence information. Video super-resolution techniques based on deep learning can be broadly classified into (1) methods based on multi-frame concatenation, according to different ways of using timing information; (2) 3D convolution based methods; (3) a cyclic structure based approach.

The method based on multi-frame concatenation can be regarded as converting single-frame super-resolution into multi-frame input. If the method wants to use good time sequence information, the adjacent frames can not be aligned to the target frame, and the frame alignment mode can be divided into optical flow alignment and deformable convolution alignment. The EDVR network proposed by Wang et al belongs to a Deformable convolution alignment method [1] Wang X, Chan K, Yu K, et al. RBPN Network [2] Haris M, Shakhnarovich G, Ukita N.Current Back-project Network for Video Super-Resolution [ C ]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE,2019 belongs to optical flow alignment in this type of method, RBPN utilizes information of adjacent frames by combining the ideas of SISR and MISR, which often introduces excessive noise due to alignment at the pixel level, thereby affecting the accuracy of the final reconstruction result. The method based on multi-frame cascading well utilizes the advantage of multi-frame information complementation, but only fuses the characteristic cascading together and does not really represent the motion information between frames.

The method based on the 3D convolution is to process time sequence information in a video by utilizing the characteristic that the 3D convolution can learn time information, and Cabilllero et al firstly proposes that the 3D convolution can be regarded as a slow inter-frame information fusion process. Huang et al propose BRCN model [3] Y Huang, W Wang, L Wang. bidirectional recovery conditional Networks for Multi-Frame Super-Resolution [ J ] MIT Press,2015 by using the idea of 3D convolution in combination with RNN, but their work still uses a shallow network and the information that can be learned is very limited. Thus, FSTRN [4] Li S, He F, Du B, et al, fast spatial-Temporal reactive Network for Video Super-Resolution [ J ] 2019, proposed by Li et al, employs a deep 3D convolutional Network with hopping connections, in which separable 3D convolutions are used to reduce the amount of 3D convolution computation.

The method based on the loop structure performs the time sequence information fusion in the video through RNN, LSTM, etc. The first proposed of such methods is a bi-directional RNN, which has a small network capacity and no subsequent inter-frame alignment steps. Guo et al improves the bi-directional RNN by employing a motion compensation module and a convolutional LSTM layer. Recent advances in video super-resolution (VSR) have shown a strength of deep learning, which can achieve better reconstruction performance. However, the existing video SR methods based on deep learning basically gradually fuse the input multi-frame timing information, and obtain the final result after a reconstruction. In the existing method, (1) the characteristic that the contribution degree of visual information provided by adjacent frames with different distances from a target frame to the super-resolution reconstruction effect is different is not fully utilized in the aspect of time sequence information utilization; (2) the feedback mechanisms common in the human visual system, and the cyclic feedback guidance features in the process of human learning new knowledge, have not been fully utilized.

Disclosure of Invention

The invention aims to overcome at least one technical problem, provides a video super-resolution method based on time attention and a loop feedback network, constructs a model with the characteristic of emphatically learning information with high contribution degree to super-resolution reconstruction and strong high-level feature learning capability, and effectively improves the effect of video super-resolution.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a video super-resolution method based on time attention and a loop feedback network comprises the following steps:

s1: constructing a super-resolution network model, which comprises a time attention module and a cyclic feedback module;

s2: acquiring a public video super-resolution training data set from a network and preprocessing the data set to acquire a training low-resolution (LR) video sequence;

s3: determining a target frame needing to be subjected to over-scoring, and performing up-sampling on the target frame to obtain a preliminary over-scoring result of the target frame lacking details;

s4: inputting the LR video sequence and the initial super-resolution result into a super-resolution network model, extracting a feature map of the LR video sequence, and aligning the feature map to a target frame by adopting deformable convolution to obtain an aligned LR feature map sequence;

s5: inputting the aligned LR characteristic map sequence into a time attention module to obtain an LR characteristic map sequence after attention of a time dimension;

s6: after the LR characteristic diagram sequence is reordered, inputting the LR characteristic diagram sequence into a cyclic feedback module for cyclic feedback overcutting, and acquiring a cyclic feedback overcutting result sequence of a target frame;

s7: setting a loss function according to the cyclic feedback hyper-resolution result sequence, training the super-resolution network model, and acquiring the trained super-resolution network model;

s8: and performing super-resolution reconstruction on the to-be-super-resolution video by using the trained super-resolution network model.

Wherein the video hyper-resolution training dataset is obtained from an existing public high resolution dataset Vimeo-90 k.

In step S2, the preprocessing process specifically includes:

s21: intercepting original video frames with the same frame number at the same position for all video super-resolution training data;

s22: downsampling an original video frame to obtain an LR video frame;

s23: converting all LR video frames into a tensor data structure, and carrying out normalization processing;

s24: and carrying out random data enhancement operation on the normalized LR video frame.

In step S22, a downsampling operation is performed on the original video frame by using a gaussian kernel fuzzy downsampling method.

In step S3, a bicubic interpolation upsampling method is used to perform upsampling on a target frame that needs to be subjected to super-segmentation, so as to obtain a preliminary super-segmentation result that the target frame lacks details.

The super-resolution network model further comprises a multi-scale feature extraction module; in step S4, inputting an LR video sequence into a multi-scale feature extraction module, and obtaining a feature map of k sizes for each video, where k is a positive integer;

specifically, the alignment operation is performed on the target frame by adopting deformable convolution, specifically, a PCD feature alignment module at the front end of an EDVR model is adopted, the feature is extracted to obtain a feature map of each size, the feature map is input into the feature alignment module, and the deformable convolution alignment operation is performed upwards step by step from small to large according to the sizes to obtain an alignment feature map sequence (F) aligned to the target frame₁,…,F_c,…,F_n) (ii) a Where n denotes the number of frames of the input LR video sequence, F_nLR feature map representing the nth video frame, F_cAn LR feature map of the target frame is shown.

Wherein, in the step S5, the time attention module is composed of a BN layer and a convolutional layer; the specific implementation process is as follows:

will align LR feature map (F)₁,…,F_c,…,F_n) In the sequence input time attention module, a single-channel feature map (F) is obtained after the BN layer and the convolutional layer calculation₁ ^a,…,F_c ^a,…,F_n ^a) Further cascading it; then, calculating the weight along the time dimension by a softmax function to obtain an attention weight map (M)₁,…,M_c,…,M_n) Wherein the sum of the n weight maps at the same position is 1; finally, multiplying the LR feature map with the aligned LR feature map to obtain an attention-paid LR feature map sequence (F)₁ ^at,…,F_c ^at,…,F_n ^at) Namely:

F_n ^at＝M_n⊙F_n,n∈[1:n]。

in step S6, the loop feedback and over-division processing specifically includes:

s61: after the LR characteristic diagram sequence is reordered in sequence, inputting a first characteristic diagram corresponding to the LR characteristic diagram sequence into a cyclic feedback module for carrying out first cyclic feedback super-division to obtain a super-division characteristic diagram;

s62: rebuilding the super-resolution feature map obtained by the super-resolution to obtain the super-resolution residual error information of the target frame, and adding the super-resolution residual error information to the primary super-resolution result of the target frame to obtain the super-resolution result of the target frame;

s63: and according to the LR characteristic diagram sequence, sequentially inputting the corresponding characteristic diagram and the super-resolution characteristic diagram output by the previous cycle of the target frame into a cycle feedback module to perform cycle feedback super-resolution until a cycle result is obtained, and obtaining the multi-time super-resolution result of the target frame to obtain the cycle feedback super-resolution result sequence of the target frame.

Wherein, the step S61 specifically includes:

map LR signature (F)₁ ^at,…,F_c ^at,…,F_n ^at) Reordering is carried out from near to far according to the distance from the target frame, and in the middle of the sequence, the feature map of the target frame is reused at the end position for guiding the residual error information extraction of the cyclic feedback hyper-segmentation module, namely:

LR feature map sequence to be reordered

The input cycle feedback module carries out n +2 times of cycle feedback overcenter according to the sequence of the feature graph, the input content of each cycle overcenter is an LR video frame feature graph corresponding to the cycle and a feature graph output at the end of the previous cycle, and the output result is the feature graph of the cycle overcenter, namely:

wherein the content of the first and second substances,

a super-resolution feature map representing a target frame of the nth-cycle feedback super-resolution output, f_FB(. X) represents a cyclic feedback hyper-divide module,

a super-resolution feature map representing a target frame of the n-1 st loop feedback super-resolution output; when the circulation is carried out for the first time,

namely;

in step S62, reconstructing the super-resolution feature map obtained by each loop feedback to obtain the reconstructed super-resolution residual information of the target frame, and adding the reconstructed super-resolution residual information to the preliminary super-resolution result of the target frame to obtain the super-resolution result of the target frame; the method specifically comprises the following steps:

the sub-super-divided feature map of the target frame

Inputting into a super-resolution reconstruction module for reconstruction to obtain the reconstructed residual information of the target frame

Namely:

wherein the content of the first and second substances,

representing the hyper-reconstructed residual information of the target frame of the nth cycle, f_RB() represents a reconstruction module; adding the super-resolution reconstruction residual information of the target frame and the preliminary super-resolution result of the target frame obtained in the step S3 at the pixel position corresponding to the pixel level to obtain the cyclic super-resolution video frame of the target frame, that is:

wherein the content of the first and second substances,

the sub-super-divided video frame representing the target frame of the nth loop, f_up(. X.) denotes an upsampling operation, I_CRepresenting the target frame.

In step S7, the loss function is an L2 norm loss function, which is specifically represented as:

wherein, WⁿRepresenting the over-divided result of the target frame obtained by the nth cycle

The ratio of the calculated loss in the total loss function, I_HRA ground route representing a target frame;

and then, the video super-resolution training data set carries out iterative training on the constructed super-resolution network model, and finally the trained super-resolution network model is obtained.

In the scheme, firstly, a target frame needing to be subjected to super-resolution in a video sequence is subjected to bicubic interpolation up-sampling to obtain a preliminary super-resolution result of the target frame lacking details; then inputting an LR video sequence of a video frame sequence contained in the training video data set after Gaussian blur kernel degradation into a video super-resolution network model for feature map extraction and feature map alignment to obtain an LR feature map of the aligned video frame sequence; and the obtained LR characteristic map sequence passes through a time attention module, and an attention map of the video sequence on a time axis is learned, so that the contribution degree of adjacent frames of different time spans to the final reconstruction effect is distinguished.

Then, rearranging LR characteristic graphs of the video frame sequence after the attention of time according to the distance from the target frame to the far, and reusing the characteristic graphs of the target frame at the middle and the tail of the video sequence for circularly feeding back and guiding the characteristic learning of the frame with the far distance; finally, the LR characteristic diagram of the rearranged video frame sequence is subjected to stepwise loop feedback hyper-division operation to obtain a hyper-division characteristic diagram with higher layer characteristics; finally, reconstructing the hyper-resolution feature map of the target frame to obtain reconstructed hyper-resolution residual error information of the target frame, and adding the reconstructed hyper-resolution residual error information and the initial hyper-resolution result frame of the target frame to obtain a final hyper-resolution video frame of the target frame; and finally, circulating the hyper-resolution LR characteristic diagram sequence until all the characteristic frames are input into the circulating feedback module to obtain the hyper-resolution frame sequence of the target frame to complete the hyper-resolution. Training a video super-resolution network model by setting a loss function to obtain a trained super-resolution network model, and performing super-resolution reconstruction on a to-be-super-resolution video by using the trained super-resolution network model; the method effectively improves the video super-resolution effect, and the detail effect of the reconstructed video frame is obviously improved.

According to the video super-resolution method based on the time attention and the cyclic feedback network, the characteristics that the contribution degree of visual information provided by adjacent frames with different distances from a target frame to the super-resolution reconstruction effect is different, the feedback mechanism of a human visual system and the cyclic feedback guidance characteristic in the process of learning new knowledge by human are applied to the video super-resolution technology, so that the model has the characteristic of emphatically learning information with high contribution degree to the super-resolution reconstruction and strong high-level feature learning capacity, and the video super-resolution effect is improved.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention provides a video super-resolution method based on time attention and a loop feedback network, which adopts a time attention module to learn an attention diagram of a video sequence on a time axis, and can effectively distinguish the contribution of adjacent frames with different time degrees to the final reconstruction effect; after the video sequences are rearranged, the cyclic feedback module carries out cyclic feedback super-division to finally obtain a super-resolution network model, the video super-resolution effect is obviously improved, and the detail reconstruction effect of the reconstructed video frames is better and obvious.

Drawings

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;

FIG. 2 is an expanded view of the loop feedback module according to an embodiment of the present invention;

fig. 3 is a data flow diagram of the system according to an embodiment of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, a video super-resolution method based on time attention and loop feedback network includes the following steps:

in this embodiment, the video in the existing public high-resolution data set Vimeo-90k is selected as training video data, and the video data is preprocessed.

in this embodiment, the training video data is 5 frames, an intermediate frame is selected as a target frame to be subjected to super-segmentation, and a bicubic interpolation up-sampling operation is performed on the target frame to obtain a preliminary super-segmentation video frame.

in this embodiment, as shown in fig. 2, the video super-resolution network model includes a multi-scale feature extraction module, a deformable convolution alignment module, a time attention module, a cyclic feedback module, and a feature hyper-resolution module; inputting the normalized 5-frame video frame sequence into a multi-scale feature extraction module of a video super-resolution network model, wherein the multi-scale feature extraction module consists of 5 basic residual blocks and obtains a multi-scale feature group through convolution downsampling; the deformable convolution alignment module is specifically a PCD feature alignment module at the front end of the existing EDVR model;

inputting the normalized 5 frames of LR video frames into a multi-scale feature extraction module, and obtaining feature maps of 3 sizes from large to small in each frame of video;

inputting the feature map of each size into a feature alignment module to perform deformable convolution alignment from small to large and fusion operation of different size feature maps to obtain an alignment feature map of a 5-frame video sequence.

S5: inputting the aligned 5-frame LR feature map sequence into a time attention module to obtain the LR feature map sequence after attention of a time dimension, wherein the time attention module consists of a BN layer and a 3x3 convolution layer;

aligning the aligned feature map sequence (F)₁,F₂,F₃,F₄,F₅) First through the BN layer and then through a 3x3 convolution to compute a single channel feature map (F)₁ ^a,F₂ ^a,F₃ ^a,F₄ ^a,F₅ ^a). They are further cascaded and then weighted along the time dimension by a softmax function to obtain an attention weight map (M)₁,M₂,M₃,M₄,M₅) And the sum of the 5 weight map time axes at the same spatial position is equal to 1, and finally multiplied by the aligned feature map sequence to obtain an attention-paid LR feature map sequence (F)₁ ^at,F₂ ^at,F₃ ^at,F₄ ^at,F₅ ^at) Namely:

F_n ^at＝M_n⊙F_n,n∈[1:5]

More specifically, in step S2, the preprocessing process specifically includes:

s21: intercepting original video frames with the length of 448 and the width of 256 at the same position for all video hyper-resolution training data;

s22: the Gaussian blur kernel downsampling method downsamples the original video frame to reduce the original video frame by 4 times to obtain an LR video frame with the length of 112 and the width of 64;

s24: and carrying out random data enhancement operation on the normalized LR video frame, wherein the data enhancement operation comprises turning operation and mirroring operation.

More specifically, in step S6, the loop feedback and over-division processing specifically includes:

More specifically, the step S61 specifically includes:

map LR signature (F)₁ ^at,F₂ ^at,F₃ ^at,F₄ ^at,F₅ ^at) Reordering is carried out from near to far according to the distance from the target frame, and in the middle of the sequence, the feature map of the target frame is reused at the end position for guiding the residual error information extraction of the cyclic feedback hyper-segmentation module, namely:

subscript renumbering becomes (F)₁ ^at,F₂ ^at,F₃ ^at,F₄ ^at,F₅ ^at,F₆ ^at,F₇ ^at)；

As shown in FIG. 3, in this embodiment, 7 sets of LR feature maps (F) are used₁ ^at,F₂ ^at,F₃ ^at,F₄ ^at,F₅ ^at,F₆ ^at,F₇ ^at) The input feedback module carries out cyclic feedback overcutting according to the sequence of the feature maps, the input content of each cyclic overcutting is an LR video frame feature map corresponding to the cycle and an overcutting feature map of a target frame output at the end of the previous cycle, and the output result is the feature map of the cyclic overcutting;

iteration 1, n is 1:

wherein the content of the first and second substances,

a hyper-resolution feature map representing the target frame of the 1 st loop feedback hyper-resolution output, f_FB(. X) represents a cyclic feedback hyper-divide module,

a hyper-resolution feature map representing a target frame of the 0 th loop feedback hyper-resolution output; when the circulation is carried out for the first time,

namely;

the sub-super-divided feature map of the target frame

Namely:

wherein the content of the first and second substances,

representing the hyper-reconstructed residual information of the target frame of the 1 st cycle, f_RB() represents a reconstruction module;

adding the super-resolution reconstruction residual information of the target frame and the preliminary super-resolution result of the target frame obtained in the step S4 at the pixel position corresponding to the pixel level to obtain the cyclic super-resolution video frame of the target frame, that is:

wherein the content of the first and second substances,

the sub-super-divided video frame, f, representing the target frame of the 1 st cycle_up(. X.) denotes an upsampling operation, I_CRepresenting a target frame;

then, LR feature map sequences (F)₁ ^at,F₂ ^at,F₃ ^at,F₄ ^at,F₅ ^at,F₆ ^at,F₇ ^at) Inputting a loop feedback block until the loop is finished, and obtaining a 7-time loop feedback hyper-resolution result sequence of the target frame;

iteration 2, n is 2:

wherein the content of the first and second substances,

a super-resolution feature map of a target frame representing the super-resolution output of the 2 nd loop feedback,

the super-resolution reconstructed residual information representing the target frame of the 2 nd loop,

a hyper-divided video frame representing a target frame of the 2 nd cycle;

…

iteration 7, n-7:

wherein the content of the first and second substances,

a 7 th cycle feedback hyper-division output target frame hyper-division characteristic diagram,

the super-resolution reconstructed residual information representing the target frame of the 7 th loop,

a hyper-divided video frame representing the target frame of the 7 th cycle;

the final super-divided video frame of the target frame is formed into a final super-divided video frame sequence of the target frame

More specifically, in step S7, the loss function is an L2 norm loss function, which is specifically represented as:

The ratio of the calculated loss in the total loss function is n-7; i is_HRA ground route representing a target frame; in this example, WⁿThe values are all 1.

In this embodiment, the final super-resolution video frame of the target frame obtained by 7 times of loop feedback

All used for calculating loss functions, and taking the hyper-resolution video frame of the target frame fed back in the last cycle

As a target frame I_CThe result of the over-classification of (1).

In a specific implementation process, the method provided by the embodiment is adopted to carry out super-resolution reconstruction on the to-be-super-resolution video, the video super-resolution effect can be effectively improved under the condition of less parameters, the detail effect of the reconstructed video frame is excellent, and powerful support is provided for the technical fields of satellite images, video monitoring, medical imaging, military and the like.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A video super-resolution method based on time attention and a loop feedback network is characterized by comprising the following steps:

s2: acquiring a public video super-resolution training data set from a network and preprocessing the data set to acquire a trained LR video sequence;

2. The video super-resolution method based on the temporal attention and loop feedback network of claim 1, wherein in the step S2, the preprocessing process specifically includes:

s22: downsampling an original video frame to obtain an LR video frame;

3. The video super-resolution method based on temporal attention and loop feedback network of claim 2, wherein in step S22, the original video frame is downsampled by using gaussian kernel fuzzy downsampling method.

4. The video super-resolution method based on temporal attention and loop feedback network of claim 1, wherein in step S3, a bicubic interpolation up-sampling method is used to perform an up-sampling operation on the target frame to be super-divided, so as to obtain a preliminary super-division result of the target frame lacking details.

5. The video super-resolution method based on the temporal attention and loop feedback network of claim 1, wherein the super-resolution network model further comprises a multi-scale feature extraction module; in step S4, inputting an LR video sequence into a multi-scale feature extraction module, and obtaining a feature map of k sizes for each video, where k is a positive integer;

6. The method for super-resolution of videos based on temporal attention and loop feedback network of claim 5, wherein in step S5, the temporal attention module is composed of a BN layer and a convolutional layer; the specific implementation process is as follows:

will align LR feature map (F)₁,…,F_c,…,F_n) In the sequence input time attention module, a single-channel feature mapping is obtained after a BN layer and convolution layer calculation

Further cascading it; then, calculating a weight value along a time dimension through a softmax function to obtainAttention weight graph (M)₁,…,M_c,…,M_n) Wherein the sum of the n weight maps at the same position is 1; finally, multiplying the LR characteristic map with the aligned LR characteristic map to obtain an LR characteristic map sequence after attention

Namely:

7. the video super-resolution method based on temporal attention and loop feedback network of claim 6, wherein in step S6, the loop feedback super-resolution processing procedure specifically comprises:

8. The video super-resolution method based on the temporal attention and loop feedback network of claim 7, wherein the step S61 specifically comprises:

sequence of LR profiles

According to distanceReordering the target frame from near to far, and in the middle of the sequence, reusing the feature map of the target frame at the end position to guide the residual information extraction of the loop feedback super-partition module, namely:

LR feature map sequence to be reordered

wherein the content of the first and second substances,

namely;

9. the video super-resolution method based on time attention and loop feedback network of claim 8, wherein in step S62, the super-resolution feature map obtained by each loop feedback is reconstructed to obtain the super-resolution residual information of the time reconstruction of the target frame, and the super-resolution residual information is added to the preliminary super-resolution result of the target frame to obtain the super-resolution result of the time of the target frame; the method specifically comprises the following steps:

the sub-super-divided feature map of the target frame

Namely:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

the sub-super-divided video frame representing the target frame of the nth loop, f_up(+) denotes an upsampling operation, I_CRepresenting the target frame.

10. The video super-resolution method based on temporal attention and loop feedback network of claim 9, wherein in step S7, the loss function is an L2 norm loss function, which is specifically expressed as: