CN111083479A

CN111083479A - Video frame prediction method and device and terminal equipment

Info

Publication number: CN111083479A
Application number: CN201911420604.0A
Authority: CN
Inventors: 李东阳
Original assignee: Hefei Tucodec Information Technology Co ltd
Current assignee: Hefei Tucodec Information Technology Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-04-28

Abstract

The invention is suitable for the technical field of video compression, and provides a video frame prediction method, a video frame prediction device and terminal equipment, wherein the method comprises the following steps: calculating optical flow information between the current frame and the reference frame; inputting the optical flow information, the reference frame and the current frame into a motion compensation network to obtain a reconstructed optical flow and N separation convolution kernels, wherein N is a positive integer; performing warp operation on the reference frame according to the reconstructed optical flow to obtain a warp prediction graph; according to preset N expansion rates, respectively acting the N separation convolution kernels on the warp prediction graph to obtain N separation convolution prediction graphs; inputting the N separated convolution prediction images into a fusion network for fusion to obtain a prediction frame of the current frame. The invention provides the expansion separation convolution by combining the characteristics of the expansion convolution and the separation convolution, expands the reception field of the separation convolution while keeping the complexity, and effectively promotes the high-efficiency prediction of the video.

Description

Video frame prediction method and device and terminal equipment

Technical Field

The invention belongs to the technical field of video compression, and particularly relates to a video frame prediction method, a video frame prediction device and terminal equipment.

Background

In the video compression process, efficient prediction needs to be performed by utilizing the temporal correlation of the video. The separation convolution in the presently disclosed technique is a more efficient method, however, due to complexity issues, the separation of the convolution kernel to enlarge the receptive field cannot be enlarged without limit, thus limiting the efficiency of video prediction.

Therefore, a new technical solution is needed to solve the above technical problems.

Disclosure of Invention

In view of this, embodiments of the present invention provide a video frame prediction method and apparatus terminal device, so as to solve the problem in the prior art that video prediction efficiency is not high.

A first aspect of an embodiment of the present invention provides a video frame prediction method, including:

calculating optical flow information between the current frame and the reference frame;

inputting the optical flow information, the reference frame and the current frame into a motion compensation network to obtain a reconstructed optical flow and N separation convolution kernels, wherein N is a positive integer;

performing warp operation on the reference frame according to the reconstructed optical flow to obtain a warp prediction graph;

according to preset N expansion rates, respectively acting the N separation convolution kernels on the warp prediction graph to obtain N separation convolution prediction graphs;

inputting the N separated convolution prediction images into a fusion network for fusion to obtain a prediction frame of the current frame.

A second aspect of the embodiments of the present invention provides a video frame prediction apparatus, including:

the optical flow calculation module is used for calculating optical flow information between the current frame and the reference frame;

the motion compensation module is used for inputting the optical flow information, the reference frame and the current frame into a motion compensation network to obtain a reconstructed optical flow and N separation convolution kernels, wherein N is a positive integer;

a warp module, configured to perform a warp operation on the reference frame according to the reconstructed optical flow to obtain a warp prediction graph;

the separation convolution module is used for respectively acting the N separation convolution kernels on the warp prediction graph according to N preset expansion rates to obtain N separation convolution prediction graphs;

and the fusion module is used for inputting the N separated convolution prediction images into a fusion network for fusion to obtain a prediction frame of the current frame.

A third aspect of embodiments of the present invention provides a video frame prediction terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method provided in the first aspect when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method as provided in the first aspect above.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

the invention provides the expansion separation convolution by combining the characteristics of the expansion convolution and the separation convolution, expands the reception field of the separation convolution while keeping the complexity, and effectively promotes the high-efficiency prediction of the video.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart illustrating an implementation of a video frame prediction method according to an embodiment of the present invention;

FIG. 2 is a diagram of an apparatus for predicting video frames according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a video frame prediction terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

Fig. 1 shows a flow of implementing a video frame prediction method according to an embodiment of the present invention, where an execution subject of the method may be a terminal device, and details are as follows:

in step S101, optical flow information between the current frame and the reference frame is calculated.

Optionally, a spatial position mapping relationship between pixels of the current frame image and pixels of the reference frame image is calculated to obtain optical flow information.

Specifically, the optical flow is to use the change of pixels in the image sequence in the time domain and the correlation between adjacent frames to find the correlation between two adjacent frames, so as to calculate the motion information of the object between the adjacent frames: and inputting the current frame and the reference frame into a preset optical flow network to obtain optical flow information. Further, the optical flow network includes two network structures: FlowNeTS (FlowNetSimple) and FlowNetC (FlowNetCorr). The optical flow network FlowNet S directly overlaps and inputs two images according to channel dimensions, and the network structure of the FlowNet S only has convolution layers; the optical flow network FlowNet C firstly extracts the characteristics of the two input images respectively and then calculates the correlation of the characteristics, namely the characteristics of the two images are subjected to convolution operation in a space dimension.

Step S102, inputting the optical flow information, the reference frame and the current frame into a motion compensation network to obtain a reconstructed optical flow and N separation convolution kernels, wherein N is a positive integer.

Optionally, the optical flow information, the reference frame and the current frame are input into a motion compensation network to obtain motion compensation feature information. The motion compensation network comprises an up-sampling layer, a down-sampling layer, an encoding network and a decoding network. And further, inputting the optical flow information, the reference frame and the current frame into a motion compensation network, and performing down-sampling operation and convolution operation on a down-sampling layer to obtain motion compensation characteristic information.

And further, entropy coding and decoding the motion compensation characteristic information, and inputting the motion compensation characteristic information into the motion compensation network to obtain a reconstructed optical flow and N separation convolution kernels. Optionally, entropy coding is performed on the motion compensation feature information to obtain a compressed bit stream, and the compressed bit stream is stored. Further, the stored compressed bit stream is input into the motion compensation network after being subjected to entropy decoding, and a reconstructed optical flow and N separation convolution kernels are obtained. The above coding may be an entropy coding scheme such as Shannon (Shannon) coding, Huffman (Huffman) coding, or arithmetic coding (arithmetric coding), and is not limited herein.

And S103, performing warp operation on the reference frame according to the reconstructed optical flow to obtain a warp prediction graph.

Alternatively, according to the reconstructed optical flow, a warp prediction map is obtained by converting a reference frame warp (affine transformation of an image) to a specified position.

And step S104, according to the preset N expansion rates, respectively acting the N separation convolution kernels on the warp prediction graph to obtain N separation convolution prediction graphs.

Optionally, according to N preset expansion rates, after the N separate convolution kernels respectively perform separate convolution operations on the warp prediction graph, N separate convolution prediction graphs are obtained. And the expansion rates are positive integers, the N expansion rates correspond to the N separation convolution kernels one by one, according to the expansion rates, the separation convolution kernels corresponding to the expansion rates perform separation convolution operation on the warp prediction graph to obtain corresponding separation convolution prediction graphs, and the N expansion rates obtain N separation convolution prediction graphs.

And step S105, inputting the N separated convolution prediction images into a fusion network for fusion to obtain a prediction frame of the current frame.

In the embodiment, the expansion separation convolution is provided by combining the characteristics of the expansion convolution and the separation convolution, the complexity is kept, the receptive field of the separation convolution is expanded, and the high-efficiency prediction of the video is effectively improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Example two

Fig. 2 is a block diagram illustrating a structure of a video frame prediction apparatus according to an embodiment of the present invention, and only a part related to the embodiment of the present invention is shown for convenience of description. The video frame prediction apparatus 2 includes: an optical flow calculation module 21, a motion compensation module 22, a warp module 23, a separation convolution module 24 and a fusion module 25.

The optical flow calculating module 21 is configured to calculate optical flow information between the current frame and the reference frame;

a motion compensation module 22, configured to input the optical flow information, the reference frame, and the current frame into a motion compensation network to obtain a reconstructed optical flow and N separate convolution kernels, where N is a positive integer;

a warp module 23, configured to perform a warp operation on the reference frame according to the reconstructed optical flow to obtain a warp prediction graph;

the separation convolution module 24 is configured to respectively act the N separation convolution kernels on the warp prediction graph according to N preset expansion rates to obtain N separation convolution prediction graphs;

and the fusion module 25 is configured to input the N separate convolution prediction maps into a fusion network for fusion, so as to obtain a prediction frame of the current frame.

Optionally, the optical flow calculation module 21 comprises:

and the optical flow calculating unit is used for calculating the spatial position mapping relation between the pixels of the current frame image and the pixels of the reference frame image to obtain optical flow information.

Optionally, the motion compensation module 22 comprises:

the input unit is used for inputting the optical flow information, the reference frame and the current frame into a motion compensation network to obtain motion compensation characteristic information;

and the output unit is used for performing entropy coding and entropy decoding on the motion compensation characteristic information and then inputting the motion compensation characteristic information into the motion compensation network to obtain a reconstructed optical flow and N separation convolution kernels.

Optionally, the separation convolution module 24 includes:

and the separation convolution unit is used for performing separation convolution operation on the warp prediction graph by the N separation convolution kernels according to N preset expansion rates to obtain N separation convolution prediction graphs.

EXAMPLE III

Fig. 3 is a schematic diagram of a video frame prediction terminal device according to an embodiment of the present invention. As shown in fig. 3, the video frame prediction terminal device 3 of this embodiment includes: a processor 30, a memory 31 and a computer program 32, such as a video frame prediction program, stored in said memory 31 and executable on said processor 30. The processor 30, when executing the computer program 32, implements the steps in the various video frame prediction method embodiments described above, such as the steps 101 to 105 shown in fig. 1. Alternatively, the processor 30, when executing the computer program 32, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 21 to 25 shown in fig. 2.

Illustratively, the computer program 32 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution process of the computer program 32 in the video frame prediction terminal device 3. For example, the computer program 32 may be divided into an optical flow calculation module, a motion compensation module, a warp module, a separation convolution module, and a fusion module, and each module has the following specific functions:

The video frame prediction terminal device 3 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The video frame prediction terminal device may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the video frame prediction terminal device 3, and does not constitute a limitation of the video frame prediction terminal device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the video frame prediction terminal device may further include an input-output device, a network access device, a bus, etc.

The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may be an internal storage unit of the video frame prediction terminal device 3, such as a hard disk or a memory of the video frame prediction terminal device 3. The memory 31 may also be an external storage device of the video frame prediction terminal device 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), or the like, provided on the video frame prediction terminal device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the video frame prediction terminal device 3. The memory 31 is used for storing the computer program and other programs and data required by the video frame prediction terminal device. The memory 31 may also be used to temporarily store data that has been output or is to be output.

Therefore, the expansion and separation convolution is provided by combining the characteristics of the expansion convolution and the separation convolution, the complexity is kept, the receptive field of the separation convolution is expanded, and the high-efficiency prediction of the video is effectively improved.

It is clear to those skilled in the art that for the convenience and simplicity of description, the above functional units and modules are merely illustrated as being divided, and in practical applications, the above secure digital flash memory card and the like may be used as needed, and further, the memory may include both an internal storage unit of some terminal device and an external storage device, the memory is used for storing the computer program and other programs and data required by the terminal device, and the memory may be used for temporarily storing data that has been output or will be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method for video frame prediction, comprising:

2. The video frame prediction method of claim 1, wherein said calculating optical flow information between the current frame and the reference frame comprises:

and calculating the spatial position mapping relation between the pixels of the current frame image and the pixels of the reference frame image to obtain optical flow information.

3. The method of claim 1, wherein said inputting said optical flow information, reference frame and current frame into a motion compensation network to obtain a reconstructed optical flow and N separate convolution kernels comprises:

inputting the optical flow information, the reference frame and the current frame into a motion compensation network to obtain motion compensation characteristic information;

and entropy coding and entropy decoding the motion compensation characteristic information, and inputting the motion compensation characteristic information into the motion compensation network to obtain a reconstructed optical flow and N separation convolution kernels.

4. The method of claim 1, wherein said applying the N separate convolution kernels to the warp prediction map according to N preset dilation rates to obtain N separate convolution prediction maps comprises:

and according to N preset expansion rates, performing separation convolution operation on the warp prediction graph by the N separation convolution kernels respectively to obtain N separation convolution prediction graphs.

5. A video frame prediction apparatus, comprising:

6. The video frame prediction apparatus of claim 5, wherein the optical flow computation module comprises:

7. The video frame prediction device of claim 5, wherein the motion compensation module comprises:

8. The video frame prediction apparatus of claim 5, wherein the separate convolution module comprises:

9. Video frame prediction terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor implements the steps of the method according to any of claims 1 to 4 when executing said computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.