CN112950470B

CN112950470B - Video super-resolution reconstruction method and system based on time domain feature fusion

Info

Publication number: CN112950470B
Application number: CN202110217175.8A
Authority: CN
Inventors: 徐君; 许刚; 程明明
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2022-07-15
Anticipated expiration: 2041-02-26
Also published as: CN112950470A

Abstract

The invention provides a video super-resolution reconstruction method and a system based on time domain feature fusion. The method comprises the steps of obtaining an image sequence of a video, extracting image sequence characteristics and obtaining an initial characteristic sequence; performing local time domain feature fusion on features in the initial feature sequence to obtain a local feature sequence; fusing non-boundary features in the initial feature sequence with two features which are the nearest to the non-boundary features; for the boundary features in the initial feature sequence, fusing two boundary features and one feature which is most adjacent to the two boundary features; inputting the local feature sequence into a variable convolution length-time memory network for bidirectional sampling, and performing feature supplementation in a global range on each feature in the local feature sequence to obtain a global feature sequence; extracting super-resolution characteristics of the global characteristic sequence, correspondingly adding the super-resolution characteristics with the initial characteristic sequence, extracting high-resolution up-sampling characteristics of the sequence after characteristic addition, and obtaining a final high-resolution reconstruction image sequence through a convolutional neural network.

Description

Video super-resolution reconstruction method and system based on time domain feature fusion

Technical Field

The invention belongs to the field of video super-resolution reconstruction, and particularly relates to a video super-resolution reconstruction method and system based on time domain feature fusion.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

Due to the rapid development of Liquid Crystal Display (LCD) technology and Light Emitting Diode (LED) technology in recent years, displays on the market today can play ultra high definition television video with 4K UHD (3840 × 2160) resolution or 8K (7680 × 4320) resolution. However, currently available video typically employs a full high definition resolution of 2K FHD (1920 x 1080). In order to play full high definition video on ultra high definition television, it is necessary to increase the spatial resolution of the full high definition video in the broadcasting standard of the ultra high definition television. Therefore, video super-resolution reconstruction techniques have been proposed to process low-resolution video into high-resolution video to alleviate the current problem of insufficient resources of high-resolution video. The video super-resolution reconstruction technology has been widely applied to various multimedia devices such as televisions, mobile phones and the like, and for example, mobile phone manufacturers utilize the technology to improve the definition of mobile phone imaging.

The traditional video super-resolution method generally adopts a motion compensation technology based on interpolation or a discrete Fourier transform technology based on a frequency domain to improve the resolution of a video, however, both the two technologies can only be applied to image reconstruction under translational motion and cannot be applied to more complex motion scenes.

Due to the rapid development of the deep learning technology, the super-resolution reconstruction method based on the convolutional neural network can perform accurate and robust super-resolution reconstruction on the image by learning the mode information in the mass data. Although the industry can improve the spatial resolution of a video frame by performing super-resolution reconstruction on a single-frame image, an unstable "flicker" phenomenon caused by the fluctuation of the quality of video reconstruction follows. Therefore, a video super-resolution reconstruction method for improving the reconstruction effect of a video image by using an information fusion mechanism on a time domain is developed, and the reconstruction effect is improved by local feature fusion of EDVR (enhanced dynamic response video) proposed by Xintao Wang et al and RBPN (radial basis function) proposed by Muhammad Haris et al.

However, the inventor finds that the previous method only performs fusion on features in a local time domain and ignores global features, and the defect causes the problems of image detail loss and the like, thereby leading to poor super-resolution reconstruction effect.

Disclosure of Invention

In order to solve at least one technical problem in the background art, the invention provides a video super-resolution reconstruction method and system based on time domain feature fusion, which effectively screen local features and effectively fuse global features, so that organic complementation and integration are performed on the local features and the global features, and the super-resolution reconstruction effect of an image is improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a video super-resolution reconstruction method based on time domain feature fusion.

A video super-resolution reconstruction method based on time domain feature fusion comprises the following steps:

acquiring an image sequence of a video, and extracting image sequence characteristics to obtain an initial characteristic sequence;

performing local time domain feature fusion on the features in the initial feature sequence to obtain a local feature sequence; fusing non-boundary features in the initial feature sequence with two features which are nearest to the non-boundary features; for the boundary features in the initial feature sequence, fusing two boundary features and one feature which is most adjacent to the two boundary features;

extracting super-resolution features of the global feature sequence, correspondingly adding the super-resolution features with the initial feature sequence, extracting high-resolution up-sampling features of the sequence after feature addition, and finally obtaining a final high-resolution reconstruction image sequence through a convolutional neural network.

The invention provides a video super-resolution reconstruction system based on time domain feature fusion.

A video super-resolution reconstruction system based on time domain feature fusion comprises:

the initial feature extraction module is used for acquiring an image sequence of a video and extracting image sequence features to obtain an initial feature sequence;

the local feature fusion module is used for carrying out local time domain feature fusion on the features in the initial feature sequence to obtain a local feature sequence; fusing non-boundary features in the initial feature sequence with two features which are nearest to the non-boundary features; for the boundary features in the initial feature sequence, fusing two boundary features and one feature which is most adjacent to the two boundary features;

the global feature fusion module is used for inputting the local feature sequence into a variable convolution length-time memory network for bidirectional sampling, and performing feature supplementation in a global range on each feature in the local feature sequence to obtain a global feature sequence;

and the super-resolution reconstruction module is used for extracting super-resolution features of the global feature sequence, correspondingly adding the super-resolution features with the initial feature sequence, extracting high-resolution up-sampling features of the sequence after feature addition, and finally obtaining a final high-resolution reconstruction image sequence through a convolutional neural network.

A third aspect of the invention provides a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for super-resolution reconstruction of video based on temporal feature fusion as set forth above.

A fourth aspect of the invention provides a computer apparatus.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the video super-resolution reconstruction method based on temporal feature fusion as described above.

Compared with the prior art, the invention has the beneficial effects that:

the invention establishes a competitive feature fusion mechanism for the features of the local time domain, thereby screening out the features which have larger influence on the final super-resolution reconstruction effect, and then, the effective feature information which runs through the whole video sequence is transmitted and supplemented in the global range through the global time domain features, thereby effectively improving the reconstruction effect of each frame.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a flow chart of a video super-resolution reconstruction method based on time domain feature fusion according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a local time domain feature fusion process according to an embodiment of the present invention;

fig. 3 is a schematic process diagram of global temporal feature fusion according to an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

In order to solve the problem that the prior convolutional neural network-based video super-resolution reconstruction technology cannot effectively utilize local and global information in a video and further causes poor video reconstruction quality, the embodiment provides a video super-resolution reconstruction method based on time domain feature fusion, which utilizes complementary features from a global time domain to improve the super-resolution reconstruction effect of each frame in a video sequence under the condition of screening out effective features in a local time domain

Referring to fig. 1, the present embodiment provides a video super-resolution reconstruction method based on temporal feature fusion, which specifically includes the following steps:

s101: and acquiring an image sequence of the video, and extracting image sequence characteristics to obtain an initial characteristic sequence.

In an implementation, for a sequence of images of a set resolution video

The sequence consists of n frames of images, firstly, performing first-step feature extraction on each frame of image through a convolution of 3 x 3 and a Leaky ReLU activation function, and then inputting the features in the extracted feature sequence into 5 Residual blocks (Residual blocks) one by one for further feature extraction to obtain an initial feature sequence

S102: performing local time domain feature fusion on features in the initial feature sequence to obtain a local feature sequence; fusing non-boundary features in the initial feature sequence with two features which are most adjacent to the non-boundary features; for the boundary features in the initial feature sequence, fusing two boundary features and one feature which is most adjacent to the two boundary features;

in specific implementation, for the initial feature sequence, firstly, the features identical to the two critical features are respectively supplemented at the two ends of the initial feature sequence, and the sequence after the features are supplemented at the two ends is

Sequence of

Every three characteristics in the time sequence window are formed, a central frame positioned in the middle and two adjacent frames positioned around are respectively spliced on channel dimension, then two groups of '3 x 3 convolution + Leaky ReLU activation functions' are carried out to obtain two deviation characteristics, and then the two deviation characteristics are utilizedThe method comprises the steps of performing variable sampling on two adjacent frames respectively through variable convolution, splicing the two adjacent frames subjected to variable sampling and a central frame in a channel dimension, and screening effective information in the characteristics of the two adjacent frames through four groups of 1 × 1 convolution and a Leaky ReLU activation function and supplementing the characteristics of an intermediate frame. The above-mentioned treatment is implemented frame by frame so as to obtain the local characteristic sequence formed from effective characteristics screened under the local time domain

See figure 2 for a detailed description.

In specific implementations, the sequences

Every third feature in (1) constitutes a timing window, respectively

And

these windows.

S103: and inputting the local feature sequence into a variable convolution duration memory network for bidirectional sampling, and performing feature supplement in a global scope on each feature in the local feature sequence to obtain a global feature sequence.

The local characteristic sequence obtained in the step S102

Inputting a variable convolution duration memory network (BDConvLSTM) for bidirectional sampling, and further performing feature supplement in a global scope on each feature in the sequence to obtain a feature sequence

See figure 3 for a detailed description.

S104: extracting super-resolution features of the global feature sequence, correspondingly adding the super-resolution features with the initial feature sequence, extracting high-resolution up-sampling features of the sequence after feature addition, and finally obtaining a final high-resolution reconstruction image sequence through a convolutional neural network.

Inputting the features in the feature sequence obtained in the step S103 into 40 residual blocks one by one for super-resolution feature extraction to obtain high-resolution features, adding the high-resolution features to the initial features obtained in the step S1, obtaining high-resolution up-sampling features after 2 groups of ' 3 × 3 convolution + 2-magnification Pixel Shuffle up-sampling + leak ReLU activation function ', and obtaining a final high-resolution reconstructed image sequence through a group of ' 3 × 3 convolution + leak ReLU activation function +3 × 3 convolution

Example two

The embodiment provides a video super-resolution reconstruction system based on time domain feature fusion, which specifically comprises the following modules:

the global feature fusion module is used for inputting the local feature sequence into a variable convolution duration memory network for bidirectional sampling, and performing feature supplement in a global range on each feature in the local feature sequence to obtain a global feature sequence;

and the super-resolution reconstruction module is used for extracting the super-resolution features of the global feature sequence, correspondingly adding the super-resolution features with the initial feature sequence, extracting the high-resolution up-sampling features of the sequence after feature addition, and finally obtaining a final high-resolution reconstruction image sequence through a convolutional neural network.

It should be noted here that, each module in the video super-resolution reconstruction system based on temporal domain feature fusion of the present embodiment corresponds to each step in the video super-resolution reconstruction method based on temporal domain feature fusion in the first embodiment one to one, and the specific implementation process is the same, and will not be described here again.

EXAMPLE III

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the video super-resolution reconstruction method based on temporal feature fusion as described in the first embodiment above.

Example four

The embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the steps in the video super-resolution reconstruction method based on temporal domain feature fusion as described in the first embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a computer to implement the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A video super-resolution reconstruction method based on time domain feature fusion is characterized by comprising the following steps:

performing local time domain feature fusion on features in the initial feature sequence to obtain a local feature sequence; fusing non-boundary features in the initial feature sequence with two features which are most adjacent to the non-boundary features; for the boundary features in the initial feature sequence, fusing two boundary features and one feature which is most adjacent to the two boundary features;

in the process of local time domain feature fusion, the features which are the same as the two critical features are respectively supplemented at the two ends of the initial feature sequence, and for a time sequence window formed by every three features, a central frame positioned in the middle and two adjacent frames at the periphery are respectively spliced on a channel dimension;

obtaining two deviation characteristics by the spliced characteristics through two groups of '3 multiplied by 3 convolution + Leaky ReLU activation functions', then respectively carrying out variable sampling on two adjacent frames by utilizing the two deviation characteristics through variable convolution, splicing the two adjacent frames subjected to variable sampling and a central frame on the channel dimension, and carrying out four groups of '1 multiplied by 1 convolution + Leaky ReLU activation functions' on the spliced characteristics, thereby screening effective information in two adjacent frame characteristics and supplementing the intermediate frame characteristics, performing the above processing frame by frame to obtain a local characteristic sequence consisting of the screened effective characteristics in a local time domain, inputting the local characteristic sequence into a variable convolution duration memory network of bidirectional sampling, performing feature supplementation in a global scope on each feature in the local feature sequence to obtain a global feature sequence;

2. The video super-resolution reconstruction method based on temporal domain feature fusion of claim 1, wherein the process of extracting image sequence features is as follows:

and performing first feature extraction on each frame of image through a convolution of 3 multiplied by 3 and a Leaky ReLU activation function, and then inputting the features in the extracted feature sequence one by one into a set number of residual blocks for further feature extraction to obtain an initial feature sequence.

3. The video super-resolution reconstruction method based on temporal feature fusion of claim 1, wherein the process of extracting the super-resolution feature of the global feature sequence comprises:

and inputting the features in the global feature sequence into a set number of residual blocks one by one to perform super-resolution feature extraction to obtain high-resolution features.

4. The video super-resolution reconstruction method based on temporal domain feature fusion of claim 1, wherein the process of extracting the high resolution up-sampling features of the sequence after feature addition is as follows:

and (3) after the features are added, the sequence is subjected to 2 groups of '3 × 3 convolution +2 multiplying power Pixel Shuffle upsampling + Leaky ReLU activation function' to obtain the high-resolution upsampling features.

5. The super-resolution video reconstruction method based on temporal domain feature fusion of claim 1, wherein the convolutional neural network is a set of "3 x 3 convolution + leak ReLU activation function +3 x 3 convolution".

6. A video super-resolution reconstruction system based on time domain feature fusion is characterized by comprising:

the initial feature extraction module is used for acquiring an image sequence of a video and extracting the features of the image sequence to obtain an initial feature sequence;

the local feature fusion module is used for carrying out local time domain feature fusion on the features in the initial feature sequence to obtain a local feature sequence; fusing non-boundary features in the initial feature sequence with two features which are most adjacent to the non-boundary features; for the boundary features in the initial feature sequence, fusing two boundary features and one feature which is most adjacent to the two boundary features;

7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for reconstructing super-resolution video based on temporal domain feature fusion according to any one of claims 1 to 5.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the video super-resolution reconstruction method based on temporal domain feature fusion according to any one of claims 1-5 when executing the program.