CN115564655A

CN115564655A - Video super-resolution reconstruction method, system and medium based on deep learning

Info

Publication number: CN115564655A
Application number: CN202211392882.1A
Authority: CN
Inventors: 季栋浩; 潘金山
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2023-01-03

Abstract

The invention relates to a video super-resolution reconstruction method, a system and a medium based on deep learning, in particular to the technical field of video processing. The method comprises the following steps: inputting each frame of a video to be processed into a super-resolution model to obtain a super-resolution image corresponding to each frame of the video to be processed; obtaining a super-resolution video corresponding to the video to be processed according to the super-resolution image corresponding to each frame of image of the video to be processed, wherein a super-resolution model is obtained by training a BasicVSR model by taking the video to be trained as input, the super-resolution video corresponding to the video to be trained as output and taking the minimum frequency loss function as a target; the forward and backward branches of the BasicVSR model each include a GDFN module. The invention can improve the quality of high-resolution video images.

Description

Video super-resolution reconstruction method, system and medium based on deep learning

Technical Field

The invention relates to the technical field of video processing, in particular to a method, a system and a medium for reconstructing video super-resolution based on deep learning.

Background

The resolution is a set of performance parameters for evaluating the richness degree of detailed information contained in an image, and comprises time resolution, space resolution, color level resolution and the like, and the capability of an imaging system for actually reflecting the detailed information of an object is reflected. High resolution images typically include greater pixel density, richer texture details, and higher confidence than low resolution images. In practice, however, the ideal high-resolution image with sharp edges and no block blurring cannot be directly obtained under the constraint of many factors, such as the acquisition equipment and environment, the network transmission medium and bandwidth, the video degradation model itself, and the like. The most direct way to improve the image resolution is to improve the optical hardware in the acquisition system, but since the manufacturing process is difficult to be greatly improved and the manufacturing cost is very high, it is often too costly to physically solve the problem of low image resolution.

The super-resolution reconstruction technique of video refers to restoring a given low-resolution image into a corresponding high-resolution video through a specific algorithm. Compared with the image super-score, the video super-score can utilize the information of adjacent multi-frames to achieve a better super-score effect. The traditional hyper-resolution algorithm, such as interpolation, can cause the edge of the high-resolution video image to be blurred, and the effect is not good.

Disclosure of Invention

The invention aims to provide a video super-resolution reconstruction method, a system and a medium based on deep learning, which can improve the quality of a high-resolution video image.

In order to achieve the purpose, the invention provides the following scheme:

a video super-resolution reconstruction method based on deep learning comprises the following steps:

constructing a hyper-resolution model; the super-resolution model is obtained by training a BasicVSR model by taking an image corresponding to each frame of a video to be trained as input, taking a super-resolution image corresponding to each frame of the video to be trained as output and taking the minimum frequency loss function as a target; the forward and backward branches of the basicsvsr model each include a GDFN module;

acquiring a video to be processed;

inputting each frame image of the video to be processed into the super-resolution model to obtain a super-resolution image corresponding to each frame image of the video to be processed;

and obtaining a super-resolution video corresponding to the video to be processed according to the super-resolution image corresponding to each frame of image of the video to be processed.

Optionally, the basicsvr model includes a forward branch, a backward branch, and an upsampling branch; the output ends of the forward branch and the backward branch are connected with the input end of the up-sampling branch.

Optionally, the forward branch includes N forward propagation modules; the backward branch comprises N backward propagation modules; the up-sampling branch comprises N up-sampling propagation modules; n is a positive integer greater than 1;

the first input end of the ith forward propagation module is connected with the first output end of the (i-1) th forward propagation module; a second input end of the ith forward propagation module is used for inputting an ith frame image and an (i-1) th frame image of the video to be processed; a first output end of the ith forward propagation module is connected with a first input end of the (i + 1) th forward propagation module; a second output end of the ith forward propagation module is connected with a first input end of the ith up-sampling module;

the first input end of the ith backward propagation module is connected with the first output end of the (i + 1) th backward propagation module; a second input end of the ith backward propagation module is used for inputting an ith frame image and an (i-1) th frame image of the video to be processed; a first output end of the ith backward propagation module is connected with a first input end of the (i-1) th backward propagation module; and the second output end of the ith backward propagation module is connected with the second input end of the ith up-sampling module.

Optionally, the forward propagation module and the backward propagation module each include an optical flow estimation module, a spatial warping module and a depth residual block, and the optical flow estimation module, the spatial warping module, the GDFN module and the depth residual block are sequentially connected.

Optionally, the frequency loss function is specifically:

wherein the content of the first and second substances,

the function of the loss of frequency is represented,

representing an image generated by inputting the video to be trained into a BasicVSR model, I representing a super-resolution image corresponding to the video to be trained, epsilon representing a first constant, alpha representing a second constant,

presentation pair

A fast fourier transform is performed and,

indicating that I is fast fourier transformed.

A video super-resolution reconstruction system based on deep learning comprises:

the construction module is used for constructing a hyper-resolution model; the super-resolution model is obtained by training a BasicVSR model by taking an image corresponding to each frame of a video to be trained as input, taking a super-resolution image corresponding to each frame of the video to be trained as output and taking the minimum frequency loss function as a target; the forward branch and the backward branch of the BasicVSR model both comprise GDFN modules;

the acquisition module is used for acquiring a video to be processed;

the super-resolution image determining module is used for inputting each frame image of the video to be processed into the super-resolution model to obtain a super-resolution image corresponding to each frame image of the video to be processed;

a super-resolution video determination module; and obtaining a super-resolution video corresponding to the video to be processed according to the super-resolution image corresponding to each frame of image of the video to be processed.

Optionally, a computer-readable storage medium stores a computer program, which when executed by a processor implements the method for super-resolution reconstruction of video based on deep learning as described above.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the invention uses GDFN module to achieve better feature fusion effect, and can improve the quality of high-resolution video image.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 is a flowchart of a video super-resolution reconstruction method based on deep learning according to an embodiment of the present invention;

FIG. 2 is a detailed architecture diagram of the BasicVSR model;

FIG. 3 is a detailed block diagram of a forward propagation module;

FIG. 4 is a detailed block diagram of the back propagation module;

FIG. 5 is a detailed block diagram of a GDFN module;

fig. 6 is a detailed block diagram of the video super-resolution system.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

With the rise of deep learning, the development of a video super-resolution technology based on the deep learning is more and more rapid, the invention provides a video super-resolution reconstruction method based on the deep learning, a hyper-resolution model in the invention uses a cyclic network architecture to carry out information transmission among video frame information, simultaneously uses a GDFN module to improve the effect of feature fusion, and adds a frequency loss function optimization network to ensure that the hyper-resolution model has the advantages of good performance, low parameter number and high calculation efficiency.

The embodiment of the invention provides a video super-resolution reconstruction method based on deep learning, which comprises the following steps of:

step 101: constructing a hyper-resolution model; the super-resolution model is obtained by training a BasicVSR model by taking an image corresponding to each frame of a video to be trained as input, taking a super-resolution image corresponding to each frame of the video to be trained as output and taking the minimum frequency loss function as a target; the forward and backward branches of the basicsvsr model each include a GDFN module.

Step 102: and acquiring a video to be processed.

Step 103: and inputting each frame image of the video to be processed into the super-resolution model to obtain a super-resolution image corresponding to each frame image of the video to be processed.

Step 104: and obtaining a super-resolution video corresponding to the video to be processed according to the super-resolution image corresponding to each frame of image of the video to be processed.

In practical application, the BasicVSR model comprises a forward branch, a backward branch and an up-sampling branch; and the output ends of the forward branch and the backward branch are connected with the input end of the up-sampling branch.

In practical applications, as shown in fig. 2, the forward branch includes N forward propagation modules; the backward branch comprises N backward propagation modules; the up-sampling branch comprises N up-sampling propagation modules; n is a positive integer greater than 1.

The first input end of the ith forward propagation module is connected with the first output end of the (i-1) th forward propagation module and is used for inputting the forward propagation characteristics of the (i-1) th frame image output by the (i-1) th forward propagation module

The second input end of the ith forward propagation module is used for inputting the ith frame image x of the video to be processed _i And the i-1 st frame image x _i-1 (ii) a A first output end of the ith forward propagation module is connected with a first input end of the (i + 1) th forward propagation module, and is used for outputting the forward propagation characteristics of the ith frame image output by the ith forward propagation module

The second output end of the ith forward propagation module is connected with the first input end of the ith up-sampling module

The first input end of the ith backward propagation module is connected with the first output end of the (i + 1) th backward propagation module and is used for inputting the backward propagation characteristics of the (i + 1) th frame image output by the (i + 1) th backward propagation module

A second input end of the ith backward propagation module is used for inputting an ith frame image x of the video to be processed _i And the i-1 st frame image x _i-1 (ii) a The first output end of the ith backward propagation module is connected with the first input end of the (i-1) th backward propagation module and is used for outputting the backward propagation characteristics of the ith frame image output by the ith backward propagation module

A second output end of the ith backward propagation module is connected with a second input end of the ith up-sampling module for outputting

The output end of the ith up-sampling module outputs a super-resolution image hr corresponding to the ith frame image _i 。

In practical applications, as shown in fig. 3 and 4, the forward propagation module and the backward propagation module each include an optical flow estimation module, a spatial warping module, and a depth residual block, and the optical flow estimation module, the spatial warping module, the GDFN module, and the depth residual block are connected in sequence.

Taking the ith forward propagation module as an example, a specific work flow of forward propagation is introduced: first, x is calculated by an optical flow estimation module _i-1 And x _i Forward optical flow information of

By using

To pair

Carrying out space distortion alignment to obtain forward propagation characteristics of the i-1 frame image aligned with the i frame image

Then will pass through GDFN module

And x _i Performing fusion to obtain fused features

Will be provided with

Feeding the depth residual block to obtain

Taking the ith backward propagation module as an example to introduce the specific workflow of backward propagation: first, x is calculated by an optical flow estimation module _i-1 And x _i Backward optical flow information of

By using

To pair

Carrying out space distortion alignment to obtain the backward propagation characteristic of the i +1 frame image aligned with the i frame image

Then will pass through GDFN module

And x _i Performing fusion to obtain fused features

Will be provided with

Feeding the depth residual block to obtain

Then will be

And

and (4) performing fusion to obtain a final characteristic diagram, performing upsampling by a pixel-shuffle technology, and reconstructing a network to obtain a final high-resolution video.

In practical application, the specific structure of the GDFN module is as shown in fig. 5, and the feature fusion module GDFN uses depth-wise fusion to encode information from spatially adjacent pixel positions, which can be used to learn to effectively fuse features. The input characteristic is divided into two parts according to the channel after normalization operation (Norm), each part is convoluted by 1x1 convolution and 3x3 convolution, one branch is activated by a GELU activation function, then is subjected to element product with the other branch, and is added with the original input after the channel is restored by one 1x1 convolution to obtain the final result.

In practical applications, the invention uses the frequency loss function to help obtain more detailed information of the image. Using some commonly used loss functions, the model tends to make the video image smoother in order to reduce the loss value. These detail parts often correspond to the high frequency parts of the frequency signal, so that the difference in frequency space is reduced by the frequency loss function, and a clearer and sharper video is obtained. The frequency loss function is specifically:

wherein, the first and the second end of the pipe are connected with each other,

the function of the loss of frequency is represented,

presentation pair

A fast fourier transform is performed and,

indicating that the fast fourier transform is performed on I.

The embodiment of the invention also provides a video super-resolution reconstruction system based on deep learning aiming at the method, which comprises the following steps:

the construction module is used for constructing a hyper-resolution model; the super-resolution model is obtained by training a BasicVSR model by taking an image corresponding to each frame of a video to be trained as input, taking a super-resolution image corresponding to each frame of the video to be trained as output and taking the minimum frequency loss function as a target; the forward and backward branches of the basicsvsr model each include a GDFN module.

And the acquisition module is used for acquiring the video to be processed.

And the super-resolution image determining module is used for inputting each frame image of the video to be processed into the super-resolution model to obtain a super-resolution image corresponding to each frame image of the video to be processed.

A super-resolution video determination module; and obtaining a super-resolution video corresponding to the video to be processed according to the super-resolution image corresponding to each frame image of the video to be processed.

In practical application, the forward branch comprises N forward propagation modules; the backward branch comprises N backward propagation modules; the up-sampling branch comprises N up-sampling propagation modules; n is a positive integer greater than 1.

The first input end of the ith forward propagation module is connected with the first output end of the (i-1) th forward propagation module; a second input end of the ith forward propagation module is used for inputting an ith frame image and an (i-1) th frame image of the video to be processed; a first output end of the ith forward propagation module is connected with a first input end of the (i + 1) th forward propagation module; and the second output end of the ith forward propagation module is connected with the first input end of the ith up-sampling module.

In practical applications, the forward propagation module and the backward propagation module each include an optical flow estimation module, a spatial warping module, and a depth residual block, and the optical flow estimation module, the spatial warping module, the GDFN module, and the depth residual block are connected in sequence.

The embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the method for reconstructing super-resolution video based on deep learning according to the above embodiment is implemented.

The embodiment of the invention also provides a video super-resolution system which comprises the following steps:

in order to better show the hyper-parting performance of the model, the model is converted by using open neural network exchange (ONNX), so that the video hyper-parting task can be carried out in an environment without installing a model dependency library. The remove system interface is built using pyqt. As shown in fig. 6, in the figure, select is a Video selection button, click select selects a Video file to be processed, originalVideo is an original Video player, modified Video is a super-score Video player, model a, model b, and model c are super-score algorithm selection buttons, and press a button will process and play the Video using the corresponding super-score algorithm, so as to obtain the corresponding Video.

The method is improved based on a BasicVSR model, and compared with the existing BasicVSR, the method achieves a better feature fusion effect by using a GDFN module. The loss of high frequency components of the over-resolution result is reduced using a frequency loss function.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A video super-resolution reconstruction method based on deep learning is characterized by comprising the following steps:

acquiring a video to be processed;

2. The method for super-resolution reconstruction of videos based on deep learning of claim 1, wherein the BasicVSR model comprises a forward branch, a backward branch and an up-sampling branch; and the output ends of the forward branch and the backward branch are connected with the input end of the up-sampling branch.

3. The method for reconstructing super-resolution video based on deep learning of claim 2, wherein the forward branch comprises N forward propagation modules; the backward branch comprises N backward propagation modules; the up-sampling branch comprises N up-sampling propagation modules; n is a positive integer greater than 1;

4. The method as claimed in claim 3, wherein the forward propagation module and the backward propagation module each include an optical flow estimation module, a spatial warping module, and a depth residual block, and the optical flow estimation module, the spatial warping module, the GDFN module, and the depth residual block are connected in sequence.

5. The method for reconstructing video super-resolution based on deep learning of claim 1, wherein the frequency loss function is specifically:

the function of the loss of frequency is represented,

presentation pair

A fast fourier transform is performed and,

indicating that I is fast fourier transformed.

6. A video super-resolution reconstruction system based on deep learning is characterized by comprising:

the construction module is used for constructing a hyper-resolution model; the super-resolution model is obtained by training a BasicVSR model by taking an image corresponding to each frame of a video to be trained as input, taking a super-resolution image corresponding to each frame of the video to be trained as output and taking the minimum frequency loss function as a target; the forward and backward branches of the basicsvsr model each include a GDFN module;

the acquisition module is used for acquiring a video to be processed;

7. The deep learning-based video super-resolution reconstruction system of claim 6, wherein the BasicVSR model comprises a forward branch, a backward branch and an upsampling branch; the output ends of the forward branch and the backward branch are connected with the input end of the up-sampling branch.

8. The deep learning-based video super-resolution reconstruction system according to claim 7, wherein the forward branch comprises N forward propagation modules; the backward branch comprises N backward propagation modules; the upsampling branch comprises N upsampling propagation modules; n is a positive integer greater than 1;

9. The deep learning-based video super-resolution reconstruction system of claim 8, wherein the forward propagation module and the backward propagation module each comprise an optical flow estimation module, a spatial warping module and a depth residual block, and the optical flow estimation module, the spatial warping module, the GDFN module and the depth residual block are connected in sequence.

10. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the deep learning-based video super-resolution reconstruction method according to any one of claims 1 to 5.