CN113344780A

CN113344780A - Fully-known video super-resolution network, and video super-resolution reconstruction method and system

Info

Publication number: CN113344780A
Application number: CN202110549356.0A
Authority: CN
Inventors: 王中元; 易鹏; 江奎; 王光成
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2021-09-03

Abstract

The invention discloses a fully-known video super-resolution network, a video super-resolution reconstruction method and a system, wherein the fully-known video super-resolution network consists of two sub-networks, namely a precursor network and a subsequent network; firstly, selecting a plurality of video data as training samples, intercepting images from the same position in each video frame as a high-resolution learning target, and down-sampling r times to obtain a low-resolution image as the input of a network; then according to the type of the fully-known video super-resolution frame, inputting the low-resolution frames into a precursor network in a forward direction (local fully-known type) or a backward direction (global fully-known type) to generate hidden state and high-resolution structure information corresponding to all the low-resolution frames; inputting the low-resolution frame and the hidden state obtained in the step 2 into a subsequent network in sequence to further generate hidden state and high-resolution detail information; and finally, adding the high-resolution structure information and the detail information to obtain a finally reconstructed high-resolution video frame.

Description

Fully-known video super-resolution network, and video super-resolution reconstruction method and system

Technical Field

The invention belongs to the technical field of digital image processing, and relates to a fully-known video super-resolution network, a video super-resolution reconstruction method and a system.

Background

In recent years, with the development of science and technology, video has become an increasingly important information carrier in people's lives. However, high-resolution video is still of limited popularity due to hardware limitations and traffic bandwidth. The video super-resolution technology can reconstruct a corresponding high-resolution video from a low-resolution video, and is widely applied to the fields of video monitoring, satellite remote sensing, video conferences and the like.

The existing internationally leading video super-resolution methods mostly focus on designing a more complex network structure so as to better fit the mapping relationship from a low-resolution space to a high-resolution space, but neglect the design of a video super-resolution framework. However, the frame is the foundation of the video super-resolution algorithm, and for the same network model, a poor frame cannot fully excavate the potential of the model, and a good frame can fully exert the performance of the model.

The existing video super-resolution network framework can be summarized into three types: iterative network frameworks, circular network frameworks, and hybrid network frameworks. The iterative network framework considers only low resolution video frames as processing objects, generates a central high resolution video frame from a given central frame using its surrounding video frames (usually 1 to 3 frames before and after), and iteratively processes the entire video sequence in a sliding window fashion. The circular network framework uses past and current low resolution frames, and past super-resolution results as information sources, while disregarding future low resolution frames. The hybrid network framework integrates the object information of both, but still does not fully encompass the sources of information that are embedded in the video sequence.

Disclosure of Invention

In order to solve the technical problems, the invention provides a fully-known video super-resolution network, a video super-resolution reconstruction method and a system. The core idea is to take the past, present and future low-resolution video frames and the intermediate results (hidden states) generated in the super-resolution process as information sources to fully mine the time-space domain related information contained in the video sequence.

The invention provides a fully-known video super-resolution network which consists of two sub-networks, namely a precursor network and a successor network;

according to the processing directions of a precursor network and a successor network, the fully-known video super-resolution network is divided into a local fully-known video super-resolution network and a global fully-known video super-resolution network;

the processing directions of a precursor network and a successor network of the local fully-aware video super-resolution network are the same and are both forward; the processing process of the precursor network of the local fully-aware video super-resolution network comprises the following steps:

wherein the content of the first and second substances,

a low resolution video frame representing a previous time instant, a current time instant and a next time instant,

is a hidden state at the previous moment, Net_pA representation of a precursor network is shown,

is in a hidden state at the current moment,

super-resolution video frame structure information generated for a precursor network at the current moment;

the processing directions of a precursor network and a successor network of the global fully-aware video super-resolution network are opposite, wherein the direction of the precursor network is backward, and the direction of the successor network is forward; the processing process of the pioneer network of the global fully-aware video super-resolution network comprises the following steps:

the subsequent network processing process comprises:

wherein

Is a hidden state generated by the precursor network at the current time and the next time, and

it is the last hidden state, Net, of the subsequent network itself_sThe representation of the subsequent network is shown,

is a hidden state at the current time, and

it is the super-resolution video frame detail information generated by the subsequent network at the current moment.

The method adopts the technical scheme that: a video super-resolution reconstruction method comprises the following steps:

step 1: selecting a plurality of video data as training samples, intercepting images from the same position in each video frame as a high-resolution learning target, and sampling the images by r times to obtain low-resolution images which are used as the input of a fully-known video super-resolution network;

step 2: if the fully-known video super-resolution network is a local fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in the forward direction to generate hidden state and high-resolution structure information corresponding to all the low-resolution frames;

if the fully-known video super-resolution network is the global fully-known video super-resolution network, inputting the low-resolution frames into a precursor network in sequence, and generating hidden state and high-resolution structure information corresponding to all the low-resolution frames;

and step 3: inputting the low-resolution frame and the hidden state obtained in the step 2 into a subsequent network in sequence, and further generating hidden state and high-resolution detail information;

and 4, step 4: and adding the high-resolution structure information and the detail information generated in the step 2 and the step 3 to obtain a finally reconstructed high-resolution video frame.

The technical scheme adopted by the system of the invention is as follows: a video super-resolution reconstruction system, comprising the following modules:

the module 1 is used for selecting a plurality of video data as training samples, intercepting images from the same position in each video frame as a high-resolution learning target, and sampling r times the images to obtain low-resolution images as the input of a fully-known video super-resolution network;

the module 2 is used for inputting the low-resolution frames into a precursor network in sequence if the fully-known video super-resolution network is a local fully-known video super-resolution network, and generating hidden state and high-resolution structure information corresponding to all the low-resolution frames;

a module 3, configured to input the low-resolution frame and the hidden state obtained in the module 2 into a subsequent network in sequence, and further generate hidden state and high-resolution detail information;

and the module 4 is used for adding the high-resolution structure information and the detail information generated in the modules 2 and 3 to obtain a final reconstructed high-resolution video frame.

The invention firstly uses a precursor network to carry out one-step rough processing on the video frame to generate the hidden state and high-resolution structure information corresponding to the low-resolution video frame. And then, the subsequent network inherits the hidden state generated by the precursor network, and further generates the hidden state and high-resolution detail information corresponding to each low-resolution video frame. And finally, adding the high-resolution structure information and the high-resolution detail information to obtain a finally reconstructed high-resolution video frame.

Drawings

Fig. 1 is a fully-known video super-resolution network framework diagram according to an embodiment of the present invention.

FIG. 2 is a flow chart of a method according to an embodiment of the present invention.

Detailed Description

In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.

Referring to fig. 1, the fully-aware video super-resolution network provided by the present invention is composed of two sub-networks, a predecessor network and a successor network;

the processing directions of a precursor network and a successor network of the local fully-known video super-resolution network are the same and are both forward; the processing process of the precursor network of the local fully-aware video super-resolution network comprises the following steps:

wherein the content of the first and second substances,

is in a hidden state at the current moment,

the current time is firstDriving super-resolution video frame structure information generated by a network;

the processing directions of a precursor network and a successor network of the global fully-known video super-resolution network are opposite, wherein the direction of the precursor network is backward, and the direction of the successor network is forward; the processing process of the pioneer network of the global fully-aware video super-resolution network comprises the following steps:

the subsequent network processing procedure is as follows:

wherein

is a hidden state at the current time, and

It should be noted that the present invention designs a fully-aware video super-resolution network, which includes two sub-networks, a predecessor network and a successor network. However, the specific network structures of the precursor network and the subsequent network are not designed in the invention, and because any network with any structure can be used as the precursor network or the subsequent network, the network is integrated into the fully-known video super-resolution network designed by the invention as long as the input and output forms of the network satisfy the formulas (1), (2) and (3).

Referring to fig. 2, the method for reconstructing super-resolution video provided by the present invention includes the following steps:

the invention adopts a pioneer network firstly, and the pioneer network Net_pThe information source of processing consecutive video frames in forward or backward order includes two aspects: low resolution video frames at previous, current and next moments

And the hidden state generated in the process of exceeding the mark at the last moment

Or

Based on these two types of information, the precursor network cyclically generates all the low resolution video frames

Corresponding hidden state

And high resolution video frame structure information

all low resolution video frames are generated due to the precursor network

Corresponding hidden state

The successor network can inherit its hidden state information, so its information source includes three aspects: low resolution video frames at previous, current and next moments

Hidden state generated in the process of overtaking at last moment

And a hidden state inherited from the current and next moments of the precursor network

And

based on this information, the subsequent network further refines the generation of all low resolution video frames

Corresponding hidden state

And high resolution video frame detail information

In summary, the successor network can use the last time whenThe low resolution video frames and the hidden states at the previous moment and the next moment can fully utilize the time-space domain information contained in the video sequence.

The precursor network firstly processes the low-resolution video frame in a coarsening way to generate the high-resolution video frame

The subsequent network processes the low-resolution video frame in a refined way to generate a high-resolution video frame

Contains detail high-frequency information, so that the detail high-frequency information and the detail high-frequency information are added to obtain a final reconstructed high-resolution video frame

The final super-resolution video frame output is:

wherein the content of the first and second substances,

for the final super-resolution video frame output,

representing high-resolution video frame detail information generated by a subsequent network,

representing high resolution video frame structure information generated by the precursor network.

In the embodiment, a loss function is further constructed to respectively constrain the precursor network and the whole framework, so that the performance of the network model is optimized.

Limiting the final reconstructed high resolution video frames using a variant of the L1 loss function

High resolution video frames close to reality

While also constraining high resolution video frame structure information

High resolution video frames close to reality

But the specific gravity is adjusted by a parameter alpha to balance the weights of the precursor network and the subsequent network.

The loss function constructed in this example is:

in the formula (I), the compound is shown in the specification,

representing a real high-resolution video frame,

represents the finally generated super-resolution video frame, and

representing high-resolution video frame structure information generated by a precursor network; t is the number of frames, ε is a small constant, typically set to 10^-3And alpha is a weight for adjusting the specific gravity of the precursor network.

The method can make full use of the intra-frame spatial correlation and the inter-frame time correlation contained in the video sequence, and can generate high-fidelity video and keep higher speed.

It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A fully-aware video super-resolution network, comprising: the system consists of a precursor network and a subsequent network;

wherein the content of the first and second substances,

is in a hidden state at the current moment,

the subsequent network processing process comprises:

wherein

is a hidden state at the current time, and

2. A video super-resolution reconstruction method is characterized by comprising the following steps:

3. The method for reconstructing super-resolution video of claim 2, wherein in step 4, the super-resolution video frames generated by the precursor network and the subsequent network are added to obtain a final super-resolution video frame, and the final super-resolution video frame output is as follows:

wherein the content of the first and second substances,

for the final super-resolution video frame output,

4. The video super-resolution reconstruction method according to claim 2 or 3, characterized in that: constructing a loss function to respectively constrain a precursor network and a fully-known video super-resolution network, and optimizing the performance of the fully-known video super-resolution network;

the loss function was constructed as:

in the formula (I), the compound is shown in the specification,

representing a real high-resolution video frame,

represents the finally generated super-resolution video frame, and

representing high-resolution video frame structure information generated by a precursor network; t is the frame number, and epsilon is a smaller constant; and alpha is a weight for adjusting the specific gravity of the precursor network.

5. The video super-resolution reconstruction system is characterized by comprising the following modules: