CN117376605A

CN117376605A - Frame inserting method, sending card and playing system based on deep learning

Info

Publication number: CN117376605A
Application number: CN202311353735.8A
Authority: CN
Inventors: 胥斌; 刘凯; 邵杰
Original assignee: China Film Equipment Co ltd
Current assignee: China Film Equipment Co ltd
Priority date: 2023-10-18
Filing date: 2023-10-18
Publication date: 2024-01-09

Abstract

The embodiment of the disclosure relates to a frame inserting method, a sending card and a playing system based on deep learning. Comprising the following steps: acquiring a front frame and a rear frame of an inserted frame; acquiring inter-frame motion information of the front frame and the rear frame; extracting motion characteristics of the front frame and the rear frame; the inter-frame motion information and the motion characteristics are used as the input of an optical flow estimation network together; outputting optical flow information based on a deep learning method by taking the motion information as a constraint condition; the interpolated frame is generated from the optical flow information and based on the preceding frame or the following frame. According to the method and the device, the inter-frame motion information and the motion characteristics in video coding are transmitted to a deep learning algorithm, so that the information of an optical flow estimation part is constrained by the existing inter-frame motion information, the calculation complexity of the optical flow estimation part is reduced, the calculation efficiency is improved, and meanwhile, more accurate optical flow estimation is obtained, so that an inserted frame with higher quality is obtained.

Description

Frame inserting method, sending card and playing system based on deep learning

Technical Field

The embodiment of the disclosure relates to the technical field of image processing equipment, in particular to a frame inserting method, a sending card and a playing system based on deep learning.

Background

Along with the improvement of the video frame number, the jittering sense of the video can be reduced, the fluency can be improved, and the visual impression of human eyes is better. The video frame inserting is an important direction in the technical field of display, a simple frame inserting algorithm adopts frame copying or frame overlapping, no additional information is introduced, and the visual effect is improved only to a limited extent; the optical flow-based method adopts motion estimation and motion compensation (Motion Estimation and Motion Compensation, MEMC) to estimate the motion of the object, compensates in the motion direction, can generate an intermediate frame close to reality, and has a good visual effect. In recent years, a frame inserting algorithm based on deep learning achieves good results, and the method can be mainly divided into three types of methods: the core-based, optical flow-based and phase-based methods, among which the optical flow-based method achieves the best effect, are currently the mainstream methods, but the methods have higher computational complexity and are difficult to apply.

In video coding protocols, for example: h.264, h.265, AV1, etc., all introduce motion estimation, reduce retained frame information by motion information, replace part of the frames with starting frame and inter-frame information, and thus implement compression encoding of video. Motion information is also needed in the optical flow-based deep learning algorithm, and if the motion information in video coding is utilized, the load of optical flow calculation in the deep learning algorithm can be reduced, the accuracy of optical flow calculation in the deep learning algorithm is improved, and the calculation efficiency is improved.

In the related art, the deep learning frame interpolation algorithm based on optical flow estimation has performance exceeding that of the traditional frame interpolation algorithm. The existing algorithm scheme mainly comprises feature extraction, optical flow estimation, and affine transformation and reconstruction of images. The method comprises the steps of performing high-dimensional mapping on the features of the image, estimating an optical flow diagram between frames through the extracted features, performing affine transformation on the image by utilizing the generated optical flow information, reprocessing the transformed image, and reconstructing an intermediate frame.

Regarding the above technical solution, in the deep learning frame interpolation algorithm based on optical flow estimation, the accuracy of optical flow estimation greatly affects the quality of the reconstructed intermediate frame, and the calculated amount of the part is large, the calculation complexity is high, and ghosting occurs when the optical flow estimation is inaccurate.

Accordingly, there is a need to improve one or more problems in the related art as described above.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure aims to provide a frame inserting method, a sending card and a playing system based on deep learning, so as to at least solve the problems of higher calculation complexity, inaccurate optical flow estimation and the like.

The invention adopts the following technical scheme:

in a first aspect, the present invention provides a frame inserting method based on deep learning, including:

acquiring a front frame and a rear frame of an inserted frame;

acquiring inter-frame motion information of the front frame and the rear frame;

extracting motion characteristics of the front frame and the rear frame;

the inter-frame motion information and the motion characteristics are used as the input of an optical flow estimation network together;

outputting optical flow information based on a deep learning method by taking the motion information as a constraint condition;

the interpolated frame is generated from the optical flow information and based on the preceding frame or the following frame.

Optionally, the step of acquiring the preceding frame and the following frame of the inserted frame includes:

and acquiring the front frame and the rear frame of the adjacent n frames of the inserted frame, wherein n is a positive integer greater than or equal to 1.

The technical scheme has the beneficial effects that the more frames are adopted, the more accurate the generated optical flow diagram is.

Optionally, the step of extracting motion features of the preceding frame and the following frame includes:

and extracting motion characteristics of the front frame and the rear frame, and acquiring high-dimensional characteristics containing semantics.

The technical scheme has the beneficial effects that the needed semantic information is acquired, so that the image optical flow diagram can be estimated conveniently.

Optionally, the step of outputting optical flow information based on the deep learning method using the inter-frame motion information as a constraint condition includes:

the inter-frame motion information is used as input of an optical flow estimation network, so that the inter-frame motion information obtained by each training is used as constraint condition when the motion characteristic is trained each time.

The technical scheme has the beneficial effects that the inter-frame motion information obtained by each training is used as a constraint condition, so that the training efficiency is further improved.

the inter-frame motion information is used as input to an optical flow estimation network to sum the inter-frame motion information obtained in each training with the estimated motion information obtained in each training of the motion features.

The technical scheme has the beneficial effects that the inter-frame motion information obtained through each training supplements the estimated motion information, so that the training efficiency is further improved.

and calculating a difference value between the inter-frame motion information obtained by each training and the estimated motion information, and generating a constraint model for the next training based on the difference value.

The technical scheme has the beneficial effects that the difference value of each training is used as a constraint condition, so that the training efficiency is further improved.

and summing the estimated motion information obtained based on the constraint model with the difference value, and taking the summed value as estimated motion information to be trained next time.

The technical scheme has the beneficial effects that the obtained estimated motion information is added with the lost part, so that the training efficiency is further improved.

Optionally, the step of generating the interpolated frame based on the preceding frame or the following frame by the optical flow information includes:

and generating the insertion frame after affine transformation and reconstruction of the front frame or the rear frame through the optical flow information.

In a second aspect, the present invention provides a transmission card comprising:

a processor that generates interpolated video image information according to the deep learning based interpolation method of any of the above embodiments.

In a third aspect, the present invention provides a playing system, including:

the upper computer is used for sending video image data;

the transmitting card is used for receiving the video image data, converting the video image data into video image information and transmitting the video image information;

the receiving card is used for receiving the video image information and controlling the display device to display;

wherein, the sending card is the sending card in the above embodiment.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

in the embodiment of the disclosure, the inter-frame motion information and the motion characteristics in video coding are transmitted to a deep learning algorithm, so that the information of the current inter-frame motion information is constrained to the information of the optical flow estimation part, the calculation complexity of the optical flow estimation part is reduced, the calculation efficiency is improved, and meanwhile, more accurate optical flow estimation is obtained, so that an inserted frame with higher quality is obtained.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 illustrates a flow diagram of a method of framing in an exemplary embodiment of the present disclosure;

FIG. 2 illustrates a logical schematic of a framing method in an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a logical schematic of deep learning in an exemplary embodiment of the present disclosure;

fig. 4 is a schematic diagram showing a configuration of a transmitting card in an exemplary embodiment of the present disclosure;

fig. 5 illustrates a schematic diagram of a playback system in an exemplary embodiment of the present disclosure;

fig. 6 illustrates a schematic diagram of a storage medium in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

In this example embodiment, a frame inserting method based on deep learning is provided first. Referring to fig. 1, the method comprises the following steps:

step S101: the preceding and following frames of the inserted frame are acquired.

Step S102: and acquiring inter-frame motion information of the front frame and the rear frame.

Step S103: and extracting motion characteristics of the front frame and the rear frame.

Step S104: the inter-frame motion information and the motion features are used together as inputs to an optical flow estimation network.

Step S105: the optical flow information is output based on the deep learning method with the motion information as a constraint condition.

Step S106: the interpolated frame is generated from the optical flow information and based on the previous or subsequent frame.

It is to be understood that video image data refers to a sequence of successive images, consisting essentially of a set of successive images, without any structural information for the images themselves, except for the order in which they appear. Where the Frame is the smallest visual unit that makes up the video, and is a static image. A sequence of temporally successive frames is synthesized together to form a dynamic video. The description of the frame may be performed by using a description method of the image, and thus, the retrieval of the frame may be performed by using a retrieval method of the similar image.

It is also understood that in inter coding, the relative displacement between the current coded block and the best matching block in its reference picture is represented by a motion vector (MotionVector, MV). Each divided block has corresponding motion information to be transmitted to the decoding side. If the MV of each block is independently encoded and transmitted, particularly divided into small-sized blocks, considerable bits are consumed. In order to reduce the number of bits used to encode motion information, h.264/AVC predicts motion information of a current block to be encoded from motion information of neighboring encoded blocks using spatial correlation between neighboring macroblocks, and then encodes prediction differences. This effectively reduces the number of bits representing motion information. Based on this, in the MV coding of the current macroblock, h.264/AVC first predicts the MV of the current macroblock using MVs of neighboring coded blocks, and then codes the difference (denoted as MVD (MotionVector Difference)) between the predicted value (denoted as MVP (Motion Vector Prediction)) of the MV and the true estimate of the MV, thereby effectively reducing the number of coded bits of the MV.

It should also be understood that the optical flow estimation network based on the deep learning method includes multi-scale feature extraction, and the multi-scale feature extraction only needs to directly use a convolutional neural network. However, different scales can be considered to have different receptive fields, which is equivalent to the pyramid in the conventional method. After the final-scale optical flow estimation is completed, a step of optical flow optimization is also required.

It is also understood that the optical flow estimating section estimates an optical flow map from frame to frame by the extracted features, and is divided into a forward optical flow and a backward optical flow. The forward optical flow and the reverse optical flow can be estimated from each other to generate a more accurate optical flow. The estimated optical flow can be scaled to different scales, a smaller optical flow diagram is called a coarse optical flow, a larger optical flow diagram is called a fine optical flow, the coarse optical flow can obtain larger motion information, and the fine optical flow can obtain more accurate motion information. The accuracy of the optical flow greatly affects the quality of the reconstructed image.

It should also be understood that the insertion frame may be an intermediate frame between the previous frame and the next frame, or may be an insertion frame at any position between the previous frame and the next frame.

According to the frame inserting method, the inter-frame motion information and the motion characteristics in video coding are transmitted to the deep learning algorithm, so that the information of the current inter-frame motion information on the optical flow estimating part is restrained, the calculation complexity of the optical flow estimating part is reduced, the calculation efficiency is improved, and meanwhile, more accurate optical flow estimation is obtained, so that an inserted frame with higher quality is obtained.

Next, the above-described method in the present exemplary embodiment will be described in more detail with reference to fig. 1 to 3.

Referring to fig. 2, optionally, in step S101, the following steps are included:

step S201: and acquiring a front frame and a rear frame of n adjacent frames of the inserted frame, wherein n is a positive integer greater than or equal to 1.

It is to be understood that extraction can be performed on different image scales to obtain optical flow information of different scales; the method can also be extracted on a single frame and a plurality of frames, wherein the single frame refers to the front frame and the rear frame of an intermediate frame, the plurality of frames refer to the front frame and the rear frame of more than one frame of the intermediate frame, and the more the frames are adopted, the more accurate the generated light flow diagram is, and the more accurate the motion estimation of an object is.

Referring to fig. 2, optionally, in step S103, the following steps are included:

step S301: and extracting motion characteristics of the front frame and the rear frame, and acquiring high-dimensional characteristics containing semantics.

It is to be understood that the feature extraction part mainly performs high-dimensional mapping on the features of the image to acquire the needed semantic information, so that the image optical flow diagram can be estimated conveniently.

Referring to fig. 2, optionally, in step S106, the following steps are included:

step S401: the previous frame or the following frame is generated into an inserted frame after affine transformation and reconstruction of the image through optical flow information.

It should be understood that the affine transformation and reconstruction part of the image is to perform affine transformation on the image by using the generated optical flow information, and reprocess the transformed image to reconstruct the intermediate frame. After affine transformation, the front and rear frames are weighted and fused according to a certain rule, and the inserted frame is obtained.

Referring to fig. 2, optionally, in step S105, the following steps are included:

step S501: the inter-frame motion information is used as input of an optical flow estimation network, so that the inter-frame motion information obtained in each training is used as constraint condition in each training of motion characteristics.

It should be understood that the motion information in the video coding and the decoded frames are transmitted to the deep learning algorithm together, so that the network can restrict the information of the optical flow estimation part by using the existing motion estimation in the video coding, and reduce the computational complexity of the optical flow estimation part.

step S601: the inter-frame motion information is used as input to the optical flow estimation network to sum the inter-frame motion information obtained in each training with the estimated motion information obtained in each training motion feature.

It should be understood that, firstly, the encoded video file is decoded to obtain the required front and back frames and the motion estimation information of the front and back frames, then the image is subjected to feature extraction to obtain the high-dimensional features containing the semantics, the inter-frame motion information and the extracted features are input into the optical flow part together, the motion information is used for supplementing and restricting the output of the part to obtain a better optical flow diagram, and finally the target intermediate frame output is obtained through the affine transformation and reconstruction part of the image.

Referring to fig. 3, optionally, in step S105, the following steps are included:

step S701: and calculating the difference value between the inter-frame motion information and the estimated motion information obtained by each training, and generating a constraint model for the next training based on the difference value.

It is to be understood that the feature map output by the feature extraction network and the inter-frame motion information enter the optical flow estimation network together, the network outputs an estimated optical flow map and estimated motion information, wherein a dotted line represents a training process, a solid line represents an reasoning and training process, and the motion information and the inter-frame motion information estimated by the network are lost during training, so that a model is constrained; and meanwhile, the inter-frame motion information and the high-dimensional feature map are input, so that the network can generate more accurate optical flow estimation more efficiently.

step S801: and summing the estimated motion information obtained based on the constraint model with the difference value, and taking the summed value as estimated motion information to be trained next time.

It is to be understood that the motion information and the inter-frame operation information estimated by the network are lost during training, so that the model is constrained; and again adds the estimated motion information obtained to the lost portion, thereby further improving the efficiency of training.

Further, in the present exemplary embodiment, as shown in fig. 4, there is also provided a transmission card. Comprising the following steps:

Further, in this exemplary embodiment, referring to fig. 5, a playing system is also provided. Comprising the following steps:

the upper computer is used for sending video image data;

wherein the transmitting card is the transmitting card in the above embodiment.

It should also be appreciated that the specific manner in which the card and playback system are sent has been described in detail in connection with embodiments of the pair of deep learning based framing methods, and will not be described in detail herein.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied. The components shown as modules or units may or may not be physical units, may be located in one place, or may be distributed across multiple network elements. Some or all of the modules can be selected according to actual needs to achieve the purpose of the wood disclosure scheme. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by, for example, a processor, can implement the steps of the deep learning based frame insertion method in any of the above embodiments. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the control method section of this specification, when said program product is run on the terminal device.

Referring to fig. 6, a program product 600 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method for inserting frames based on deep learning, comprising:

acquiring a front frame and a rear frame of an inserted frame;

acquiring inter-frame motion information of the front frame and the rear frame;

extracting motion characteristics of the front frame and the rear frame;

2. The frame inserting method according to claim 1, wherein the step of acquiring the preceding frame and the following frame of the inserted frame comprises:

3. The frame insertion method according to claim 1, wherein the step of performing motion feature extraction on the front frame and the rear frame comprises:

4. The frame interpolation method according to claim 1, wherein the step of outputting optical flow information based on the deep learning method using the inter-frame motion information as a constraint condition includes:

5. The frame interpolation method according to claim 4, wherein the step of outputting optical flow information based on the deep learning method using the inter-frame motion information as a constraint condition includes:

6. The frame interpolation method according to claim 1, wherein the step of outputting optical flow information based on the deep learning method using the inter-frame motion information as a constraint condition includes:

7. The frame interpolation method according to claim 6, wherein the step of outputting optical flow information based on the deep learning method using the inter-frame motion information as a constraint condition includes:

8. The frame interpolation method according to any one of claims 1 to 7, characterized in that the step of generating the interpolated frame from the optical flow information and based on the preceding frame or the following frame, comprises:

9. A transmission card, comprising:

a processor that generates the video image information after interpolation according to the deep learning-based interpolation method of any one of claims 1 to 8.

10. A playback system, comprising:

the upper computer is used for sending video image data;

wherein the transmitting card is the transmitting card of claim 9.