CN117114992A

CN117114992A - Real-time super-resolution method and device

Info

Publication number: CN117114992A
Application number: CN202311082020.3A
Authority: CN
Inventors: 王锐; 霍宇驰; 钟智华
Original assignee: Guangguangyun Hangzhou Technology Co ltd; Zhejiang University ZJU
Current assignee: Guangguangyun Hangzhou Technology Co ltd; Zhejiang University ZJU
Priority date: 2023-08-26
Filing date: 2023-08-26
Publication date: 2023-11-24
Anticipated expiration: 2043-08-26
Also published as: CN117114992B

Abstract

The invention discloses a real-time super-resolution method and a device, comprising the following steps: the input module is used for inputting high-resolution data and current frame low-resolution data containing illumination information, wherein the high-resolution data is auxiliary information which can be obtained in the rendering process; the prediction module is used for predicting a high-resolution drawing result which corresponds to the low-resolution data of the current frame and contains illumination information according to the input high-resolution data and the low-resolution data of the current frame through a prediction network; the output module is used for outputting the high-resolution drawing result of the current frame. The method and the device can efficiently utilize the high-resolution information on the premise of ensuring the network execution speed, and improve the super-resolution imaging quality.

Description

Real-time super-resolution method and device

Technical Field

The invention belongs to the technical field of graphic application, and particularly relates to a real-time super-resolution method and device.

Background

With the development of real-time graphics application software and hardware, the requirements on the comprehensive computing capability of the GPU are exponentially increased, and the development speed of the GPU hardware equipment is far beyond the degree that the development speed of the GPU hardware equipment can meet. The improvement of resolution and refresh rate of the output device and the improvement of the sense of reality requirement of rendering results of the real-time graphics application in the market greatly improves the calculated amount required to be processed by the real-time graphics application, and the calculated amount is difficult to load by modern GPU hardware. Algorithms that allow graphics applications to render results at a low resolution and then restore them to high resolution results are becoming a necessary choice for modern real-time graphics applications.

In real-time graphics applications, the resolution of rendering is a sampling rate problem, and the problem of aliasing due to undersampling occurs when rendering is performed directly at the original resolution. According to nyquist's sampling law, the sampling frequency must be greater than twice the ideal low-pass cut-off frequency in order to recover the original signal. For modern real-time graphics application scenarios, which are rich in a large amount of detailed information, the sampling rate of one sample per pixel is far from sufficient to recover the high frequency information. While rendering at a lower resolution means that less than one sample is used per pixel, which would make aliasing more problematic. In order to solve the problem of sample feeding caused by insufficient sampling rate, one idea is to re-project the sampling result of the historical frame to the current frame position by using the time sequence information applied in real time, and mix the sampling result with the sampling result of the current frame in a certain way so as to make up the insufficient sampling rate of the current frame and finally obtain the output result of the current frame. However, since the scene of the real-time graphics application generally changes the scene content and the view angle at the same time every moment, the direct use of the historical frame samples can cause the final result to have false results such as "ghosting" or "flickering". While some heuristic methods can solve the problem, they can bring about the problem of detail loss, resulting in reduced quality of the final output result. In combination with the artificial intelligence technology which is rapidly developed in recent years, better utilization of time sequence information through a neural network becomes an option with great potential. The center of gravity of the existing neural network is placed on the quality of the final result, but for real-time application, the operation time is a key factor affecting the real-time performance, so how to balance the operation time and the quality of the result is important.

The existing methods mostly only use low-resolution samples as input, which makes the detail reduction task of the high-frequency region difficult to complete well, and the high-frequency region basically only outputs a fuzzy result without providing additional high-resolution information. Other methods of inputting additional high resolution information sacrifice network performance when using such high resolution information, allowing the network run time to be greatly increased.

The patent application with the application number of 202111513198.X discloses a real-time super-resolution reconstruction method based on historical feature fusion, which is characterized in that nonlinear feature extraction and feature fusion are carried out on a current frame and a historical frame to obtain a nonlinear feature map after feature fusion, and a super-resolution reconstruction image of the current frame is obtained based on the nonlinear feature map after feature fusion and a linear feature map of the current frame. The method, like the conventional method, uses only frame information of the current frame and the history frame, and in the face of the extremely difficult problem of 4 times super resolution, it is difficult to perform super resolution work well due to the too low sampling rate of the low resolution sample.

The patent application No. 2022105841173 discloses a real-time super-resolution method using additional rendering information, which helps a convolutional neural network to better predict high resolution results by adding high resolution additional G-Buffer information. And by caching the historical frame characteristic values, the time consumption of the program is effectively reduced, and the application instantaneity is improved. The whole network of the method needs to directly process high-resolution data, the network operation time is difficult to optimize due to the network structure design of the middle and early up-sampling, and real-time performance cannot be guaranteed in the task with ultrahigh target resolution.

Disclosure of Invention

In order to overcome the technical problems in the prior art, the invention provides a real-time super-resolution method and a real-time super-resolution device, which realize real-time high-resolution rendering imaging of low-resolution data of a current frame.

The embodiment of the invention provides a real-time super-resolution device, which comprises:

the input module is used for inputting high-resolution data and current frame low-resolution data containing illumination information, wherein the high-resolution data is auxiliary information which can be obtained in the rendering process;

the prediction module is used for predicting a high-resolution drawing result which corresponds to the low-resolution data of the current frame and contains illumination information according to the input high-resolution data and the low-resolution data of the current frame through a prediction network;

and the output module is used for outputting the high-resolution drawing result of the current frame.

In one embodiment, the prediction module adopts a bi-directional shuffling strategy to fuse data with different resolutions, specifically: shuffling and downsampling the input high-resolution data, inputting the downsampling result and the low-resolution data of the current frame into a prediction network for prediction at the same time, combining the prediction result and the low-resolution data of the current frame, and then carrying out shuffling and upsampling to obtain a shuffling upsampling result;

and the output module is used for fusing the shuffling up-sampling result and the high-resolution data to obtain a high-resolution drawing result of the current frame and outputting the high-resolution drawing result.

In one embodiment, the apparatus further comprises:

the preprocessing module of the low-resolution data is used for encoding the low-resolution data of the current frame to obtain a feature map of the current frame, such as the first class, and a hidden space feature map of the current frame, re-projecting the cached hidden space feature map of the historical frame, combining the re-projected feature map with the attention weight of the historical frame calculated based on the historical frame data, and inputting the combined result and the hidden space feature map of the current frame into a prediction network;

in the prediction module, a prediction network predicts an input combination result, a current frame hidden space feature map and a shuffling downsampling result of high-resolution data, and the shuffling downsampling result is obtained after the combination of the prediction result and the current frame elementary feature map.

In one embodiment, the prediction network includes a plurality of residual groups, and the input data is predicted by the plurality of residual groups to obtain a prediction result.

In one embodiment, the prediction module further includes another convolution network layer, configured to perform convolution operation on the shuffle upsampling result, where the shuffle upsampling result after the convolution operation is fused with the high-resolution data to obtain a high-resolution drawing result.

In one embodiment, the current frame low resolution data containing illumination information includes a low resolution incident illumination estimation map or a low resolution rendering result;

preferably, the current frame low resolution data further includes auxiliary information in a geometry buffer.

In one embodiment, the preprocessing module of the low resolution data includes an encoder, a feature reuse unit, an attention network, and a reprojection operation;

the encoder encodes low-resolution data of the current frame, wherein the low-layer encoding outputs a first-class feature map of the current frame, the encoder finally outputs a hidden space feature map of the current frame, and the hidden space feature map of the current frame is input to a prediction network and is buffered to a feature reuse unit;

the characteristic reuse unit is used for caching the historical frame hidden space characteristic map;

the reprojection operation is used for extracting a current frame hidden space feature map and adjacent hidden space feature maps thereof from the feature reuse unit as two-frame historical frame hidden space feature maps, and reprojecting the two-frame historical frame hidden space feature maps to a current screen space to obtain a reprojection result;

the attention network is used for calculating historical frame attention weight based on historical frame data, and the historical frame attention weight is input to the prediction network after weighted multiplication of the historical frame attention weight and the re-projection result.

In one embodiment, the historical frame data includes at least one of a low resolution current frame depth map, a low resolution neighboring historical frame depth map neighboring the current frame, and a neighboring historical frame to current frame motion vector.

In one embodiment, the preprocessing module of the low-resolution data further includes a convolution network layer, configured to perform a convolution operation on the hidden space feature map of the current frame, and input the feature map after the convolution operation to the prediction network.

In one embodiment, each module needs to be trained to optimize network parameters before being applied, and during training, sample data is cut into a plurality of slices with overlapping regions, and each model is subjected to supervised training based on target high resolution results by using the slices.

To achieve the above object, an embodiment further provides a real-time super-resolution method, including the steps of:

the method comprises the steps of utilizing an input module to input high-resolution data and current frame low-resolution data containing illumination information, wherein the high-resolution data is auxiliary information obtained in any rendering process;

predicting a high-resolution drawing result which corresponds to the low-resolution data of the current frame and contains illumination information according to the input high-resolution data and the low-resolution data of the current frame by using a prediction module through a prediction network;

and outputting the high-resolution drawing result of the current frame by using an output module.

In one embodiment, the method further comprises the steps of:

shuffling and downsampling the input high-resolution data by using a prediction module, inputting a downsampling result and the low-resolution data of the current frame into a prediction network at the same time for prediction, and shuffling and upsampling after combining the prediction result and the low-resolution data of the current frame to obtain a shuffling upsampling result;

and fusing the shuffling up-sampling result with the high-resolution data by using an output module to obtain a high-resolution drawing result of the current frame and outputting the high-resolution drawing result.

Preferably, the method further comprises:

the method comprises the steps of utilizing a preprocessing module of low-resolution data, obtaining a current frame elementary feature map and a current frame hidden space feature map by encoding the current frame low-resolution data, re-projecting a cached historical frame hidden space feature map, combining the cached historical frame hidden space feature map with a historical frame attention weight calculated based on historical frame data, and inputting a combined result and the current frame hidden space feature map into a prediction network;

predicting the input combined result, the current frame hidden space feature map and the shuffling downsampling result of the high-resolution data by utilizing a prediction network in a prediction module, combining the predicted result and the current frame elementary feature map, and then carrying out shuffling upsampling to obtain a shuffling upsampling result;

Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:

by introducing high-resolution data which can be rapidly generated as input guidance, the guidance realized based on the high-resolution data performs real-time super-resolution imaging on the low-resolution data of the current frame containing illumination information to obtain a final output high-resolution drawing result, the high-resolution information can be efficiently utilized on the premise of ensuring the network execution speed, and the super-resolution imaging quality is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic structural diagram of a real-time super-resolution device according to an embodiment;

FIG. 2 is another schematic structural diagram of the real-time super-resolution device according to the embodiment;

FIG. 3 is a schematic structural diagram of a real-time super-resolution device according to an embodiment;

FIG. 4 is a schematic diagram of a detailed structure of a real-time super-resolution device according to an embodiment;

FIG. 5 is a schematic diagram of the structure of a residual group provided by the embodiment;

FIG. 6 is a schematic diagram of an encoder structure of a residual group provided by an embodiment;

FIG. 7 is a schematic diagram of the structure of an attention network provided by an embodiment;

figure 8 is a schematic diagram of a bi-directional shuffling strategy as provided by embodiments.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.

According to the real-time super-resolution device and method provided by the embodiment of the invention, the high-resolution data which can be rapidly generated by introducing is used as the guide input guide, the low-resolution data of the current frame containing illumination information is subjected to real-time super-resolution imaging based on the guide realized by the high-resolution data, and the high-resolution information is efficiently utilized on the premise of ensuring the network execution speed. The device renders pipeline designs for modern graphics applications, either using rasterization or ray tracing, to access the device without modification to the underlying rendering logic.

FIG. 1 is a schematic structural diagram of a real-time super-resolution device according to an embodiment; as shown in fig. 1, the real-time super-resolution rendering acceleration device provided by the embodiment includes an input module, a prediction module, and an output module.

In one embodiment, the input module is configured to input high resolution data and current frame low resolution data containing illumination information; the prediction module is used for predicting a high-resolution drawing result which corresponds to the low-resolution data of the current frame and contains illumination information according to the high-resolution data and the low-resolution data of the current frame through a prediction network; the output module is used for outputting the high-resolution drawing result of the current frame.

The input high-resolution data is auxiliary information obtained in any rendering process, usually a geometric buffer which can be rapidly generated, and in an embodiment, the high-resolution data can be bidirectional reflection distribution function components, reflectivity and normal vector information. The low resolution data of the current frame containing illumination information comprises an incident illumination estimated value graph with low resolution or a drawing result with low resolution; the low resolution rendering results are the final output data in the rendering pipeline, which is consistent with the output high resolution rendering results in terms of data content, but the production cost is far lower than that of directly producing high resolution rendering results due to the fact that only the rendering pipeline is required to be low resolution. In an embodiment, the low resolution data of the current frame further includes auxiliary information in the geometry buffer, wherein the auxiliary information includes a normal vector diagram, a depth diagram, and the like.

The main function of the prediction network is to combine the input high-resolution data serving as a guide with the low-resolution data of the current frame, and to predict the high-resolution drawing result of the current frame, and the prediction network can be formed by any combination of U-Net, dense Net and ResNet.

In one embodiment, as shown in fig. 2, the prediction module uses a bi-directional shuffling strategy to fuse data of different resolutions, specifically: shuffling and downsampling the input high-resolution data, inputting the downsampling result and the low-resolution data of the current frame into a prediction network for prediction at the same time, combining the prediction result and the low-resolution data of the current frame, and then carrying out shuffling and upsampling to obtain a shuffling upsampling result; at this time, the output module is used for fusing the shuffling up-sampling result and the high resolution data to obtain a high resolution drawing result of the current frame and outputting the result. Specifically, the prediction module further comprises another convolution network layer, the shuffling up-sampling result is input into the other convolution network layer to carry out convolution operation, and the shuffling up-sampling result after the convolution operation is fused with high-resolution data to obtain a high-resolution drawing result.

In another embodiment, as shown in fig. 3, the real-time super-resolution accelerated rendering device further includes a preprocessing module of low-resolution data, configured to encode the low-resolution data of the current frame to obtain an elementary feature map of the current frame and a hidden space feature map of the current frame, and re-project the buffered hidden space feature map of the historical frame, then combine the re-project with the attention weight of the historical frame calculated based on the historical frame data, and input the combined result and the hidden space feature map of the current frame to the prediction network. When a preprocessing module of low resolution data is introduced, the function of the prediction module is also changed, specifically: in the prediction module, a prediction network predicts an input combination result, a current frame hidden space feature map and a shuffling downsampling result of high-resolution data, and the combination of the prediction result and the current frame elementary feature map is subjected to shuffling upsampling to obtain a shuffling upsampling result.

As shown in fig. 4-7, the preprocessing module of the low resolution data includes an encoder, a feature reuse unit, an attention network, a reprojection operation, and a convolutional network layer. The encoder encodes the low resolution data of the current frame, and the low-layer encoding outputs the elementary characteristic diagram of the current frameThe final output of the encoder is the hidden space feature map +.>The hidden space feature map of the current frame>Stores the hidden space feature map of the current frame in the feature reuse unit>And the result is input into a prediction network after convolution operation of a convolution network layer.

The feature reuse unit is used for caching the hidden space feature map of the historical frame, and the hidden space feature map stored in the feature reuse unit is the hidden space feature map of the historical frame.

The reprojection operation is used for extracting the hidden space feature map of the current frame and the adjacent hidden space feature maps thereof from the feature reuse unit as two-frame historical frame hidden space feature maps, and reprojecting the two-frame historical frame hidden space feature maps to the current screen space to obtain a reprojection result

The attention network is used for calculating historical frame attention weight A based on historical frame data _k And weigh the historical frame attention weight A _k And the weighted multiplication is carried out on the obtained result and the obtained result is input into a prediction network. Wherein the historical frame data comprises a low resolution current frame depth map, a low resolution adjacent historical frame depth map adjacent to the current frame,and motion vectors from neighboring historical frames to the current frame.

In an embodiment, the principle of shuffling up-sampling and shuffling down-sampling in a bi-directional shuffling strategy is shown in figure 8. Specifically, the prediction network adopts a plurality of residual groups, each residual group comprises a convolution layer Conv, an activated ReLU, a convolution layer Conv and a residual jump connection as shown in fig. 5, and input data is predicted through the plurality of residual groups to obtain a prediction result.

In the real-time super-resolution device provided by the embodiment, the bidirectional shuffling strategy and the prediction network are structured, the input data with different resolutions are compressed into the low-resolution characteristic expression together, and the high-resolution information is utilized efficiently on the premise of ensuring the network execution speed; the rendering equation is rewritten by using a reverse modulation mode, and high-resolution data is used as a high-resolution input basis, so that the high-resolution data can be utilized in a more compact form; the attention network is used to extract the effective area of the history frame, so that the history frame can be effectively applied.

In one embodiment, the real-time super-resolution device is incorporated into a modern graphics application rendering pipeline, which pre-calculates the bi-directional reflectance distribution function component F prior to being started _β (ω _o ) As high resolution data, the calculation formula is:

F _β (ω _o )＝∫ _Ω f _r (ω _i ,ω _o )cosθ _i dω _i

wherein omega _i And omega _o Incident and emergent solid angles, f _r (ω _i ,ω _o ) As a bi-directional reflection distribution function, cos θ _i Is based on F for the Lanbert term _β (ω _o ) Calculating to obtain an incident illumination estimated value, wherein the calculation formula is as follows:

wherein L (omega) _i ) Is the incident illumination term.

When the method is operated, firstly, an incident illumination estimated value image, a normal vector image and a depth image with low resolution of the current frame are rendered, and the low resolution images are input into an encoder to encode and output a current frame elementary characteristic image which only passes through one layer of convolution network and a current frame hidden space characteristic image which passes through all seven layers of convolution networks. The hidden space feature map of the current frame can buffer a part of results in a feature reuse unit and be taken out for use in future frames.

And then, taking out two cached hidden space feature images of the historical frame in a feature reuse unit, re-projecting the hidden space feature images into a current frame screen space, inputting the corresponding historical frame depth image, the current frame depth image and motion vectors from the historical frame to the current frame into an attention network to obtain a historical frame attention image, combining the historical frame attention image with a re-projection result, and inputting the combined historical frame attention image and the re-projection result together with the hidden space feature images of the current frame as a prediction network.

The next step is to render the high resolution data using a graphics application, downsampling the piece of high resolution data to the same resolution as the low resolution image using shuffle downsampling, and inputting into the prediction network with the low resolution image. The output of the prediction network is restored to high resolution feature results via a shuffle upsample. The high-resolution characteristic result is multiplied with high-resolution data after passing through a single-layer adjustment convolution network to obtain a final high-resolution drawing result and output.

Illustratively, the effects of the present invention are described with reference to specific experiments, and the hardware platform adopted by the experiments provided in the embodiments is: intel (R) Xeon (R) Gold 6248R CPU@3.00GHz, memory 128GB,Tesla V100S-PCIE-32GB. The software platform is as follows: ubuntu 7.5.0-3Ubuntu 1-18.04,conda 4.9.2 environmental management System, python3.7, pytorch-GPU deep learning framework.

It should be noted that, when the convolution layers in the encoder, the attention network and the prediction network are calculated, the calculation time is related to the size of the input feature map information, and if the full map is directly used for training, the training time is too long. Therefore, during actual training, the data is cut into a plurality of slices with overlapping areas, so that the time required by single forward and reverse propagation can be reduced, the number of training data can be increased, and the efficiency of network learning can be improved. And full-graph-sized data is used in both the validation set and the test set to accurately verify the validity of the method.

In addition, the experimental data taken by the embodiments were made using the UE4 open source game engine, the scene was made using the free scene ZenGarden, infiltrator, kite of UE4 and VR Showdown, rendered at a target resolution of 3840×2160, then the low resolution input data was rendered at a resolution of 960×540, at which time the multisampling antialiasing (Microsoft Active Accessibility, MSAA) would be turned off and additional offset added for the texture mapping bias (mipmap bias) of the texture samples according to the following formula:

b ^′ ＝b+log ₂ (R _r /R _n )

wherein b ^′ For the modified texture mapping deviation, b is the original texture mapping deviation calculated from the object to camera distance, R _r Is the resolution size of the low resolution input data, R _n Is the resolution size of the result that the high resolution input data has been output.

And finally, outputting the component precalculation value of the bidirectional reflection distribution function as high-resolution additional input information of the method after all anti-aliasing effects are closed by the original resolution. The training data is cut into sections with high resolution of 512×512 and low resolution of 128×128 with overlapping areas therebetween for training, so as to improve the network learning efficiency.

Specifically, the experimental steps of the real-time super-resolution method include:

(1) The training sample is first cut into a number of small slices to form a training set.

(2) Parameters of the network are initialized randomly.

(3) A batch (batch) of slice training data is read.

(4) The read data is input into the network for training.

(5) And (4) performing the loop until training of all the data sets is completed.

(6) And saving the training result and the optimizer parameters at the moment.

(7) The full-graph verification data of one batch is read.

(8) And inputting the read data into a network to calculate a result graph.

(9) And calculating verification loss (loss) through a result graph, if the current verification loss is better than the previous verification loss, recording the current result, and if the better result does not appear after a plurality of times of training, considering that the network is converged into a local optimal result, and storing the result at the moment.

(10) Cycling (3) - (9) until repeated to a set epoch (epoch) threshold.

(11) All data in the test set is input to the network to calculate the results in the test set.

By comparing the method with the prior method by super-resolution imaging, the quality comparison results are shown in table 1, the speed comparison results are shown in table 2, and the analysis of the table 1 and the table 2 can be obtained.

The peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) are two commonly used image quality evaluation indexes. NSRR is the method described in paper Neural Supersampling for Real-time Rendering, and the method is the same as the invention and is also a real-time super-resolution algorithm. EDSR and RCAN are methods introduced in paper Enhanced Deep Residual Networks for Single Image Super-Resolution and Image Super-Resolution Using Very Deep Residual Channel Attention Networks respectively, and the two methods are methods for solving the problem of non-real-time single frame Super-Resolution and serve as reference contrast of the quality of the method. FSR, xeSS, and TAAU are real-time super-resolution solutions proposed in AMD, intel, and phantom engines, respectively.

TABLE 1

TABLE 2

Target resolution	720p	1080p	2k	4k
					The invention is that	6.57ms	8.44ms	15.09ms	34.94ms
NSRR	13.53ms	26.29ms	64.02ms	149.20ms

The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims

1. A real-time super-resolution device, comprising:

2. The real-time super-resolution device according to claim 1, wherein the prediction module adopts a bi-directional shuffling strategy to fuse data with different resolutions, specifically: shuffling and downsampling the input high-resolution data, inputting the downsampling result and the low-resolution data of the current frame into a prediction network for prediction at the same time, combining the prediction result and the low-resolution data of the current frame, and then carrying out shuffling and upsampling to obtain a shuffling upsampling result;

3. The real-time super-resolution device according to claim 1, further comprising:

4. A real-time super-resolution device according to claim 2 or 3, wherein the prediction network comprises a plurality of residual groups, and the input data is predicted by the plurality of residual groups to obtain a prediction result;

the prediction module further comprises another convolution network layer, and is used for carrying out convolution operation on the shuffling up-sampling result, and the shuffling up-sampling result after the convolution operation is fused with the high-resolution data to obtain a high-resolution drawing result.

5. The real-time super-resolution device according to claim 1, wherein the current frame low-resolution data containing illumination information includes a low-resolution incident illumination estimation value map or a low-resolution drawing result;

6. The real-time super-resolution device according to claim 2, wherein the preprocessing module of the low-resolution data comprises an encoder, a feature reuse unit, an attention network, and a re-projection operation;

7. The real-time super-resolution apparatus according to claim 1, wherein the history frame data includes at least one of a low-resolution current frame depth map, a low-resolution neighboring history frame depth map neighboring the current frame, and a neighboring history frame to current frame motion vector.

8. The real-time super-resolution device according to claim 2, wherein the preprocessing module of the low-resolution data further comprises a convolution network layer, and the convolution network layer is configured to convolve the latent spatial feature map of the current frame, and the feature map after the convolution operation is input to the prediction network.

9. The real-time super-resolution device according to claim 1, wherein each module needs to be trained to optimize network parameters before being applied, and when training, sample data is cut into a plurality of slices with overlapping areas, and each model is subjected to supervised training based on target high resolution results by using the slices.

10. A real-time super-resolution method, comprising the steps of: