CN117857729A

CN117857729A - Video frame inserting method and device based on heterogeneous computation, processor and sending card

Info

Publication number: CN117857729A
Application number: CN202410057544.5A
Authority: CN
Inventors: 胥斌; 刘凯; 邵杰
Original assignee: China Film Technology Beijing Co ltd
Current assignee: China Film Technology Beijing Co ltd
Priority date: 2024-01-15
Filing date: 2024-01-15
Publication date: 2024-04-09

Abstract

The embodiment of the disclosure relates to a video frame inserting method and device based on heterogeneous computation, a sending card and a display control system. The method comprises the following steps: splitting and compiling a preset video frame inserting processing algorithm to obtain a first deployment file and a second deployment file; deploying the first deployment file into a first computing unit, and deploying the second deployment file into a second computing unit; receiving one or more video streams, the video streams comprising multi-frame image data; sequentially calculating an insertion frame between every two adjacent image data frames through a first calculating unit and a second calculating unit; and outputting the image data frames and the inserted frames in turn according to the time sequence. The method not only realizes the real-time frequency multiplication processing of the video, but also increases the calculation bandwidth.

Description

Video frame inserting method and device based on heterogeneous computation, processor and sending card

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, in particular to a video frame inserting method, a device, a processor, a sending card and a display control system based on heterogeneous computation.

Background

The frame rate of the video is an important factor affecting the human eye's sense, and when the frame rate is below 24Hz, the human eye perceives a significant discontinuity. In theatrical scenes, the dominant video source is currently 4k@24hz. With the development of film and television technology, the requirements of people on visual perception in the process of film viewing are also higher and higher, and then high-frame-rate video sources such as 48Hz and 60Hz are appeared.

In the related art, a high-frame-rate video source is derived from two modes, one is a source for shooting a high frame rate on site, and the other is a source for post-manufacturing a high frame rate. For the field shooting mode, the requirements on the camera and the storage capacity are very high, and the later rendering and making are charged according to the frames, so that the cost is too high. The existing film is manufactured in the later stage, so that the authority of the film repairing content is difficult to obtain, and the film repairing time is long. Accordingly, there is a need to improve one or more problems in the related art as described above.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of an embodiment of the present disclosure is to provide a video frame inserting method, apparatus, processor, transmitting card and display control system based on heterogeneous computation, thereby overcoming one or more of the problems due to the limitations and disadvantages of the related art at least to some extent.

In a first aspect, the present disclosure provides a video frame insertion method based on heterogeneous computation, the method comprising:

splitting and compiling a preset video frame inserting processing algorithm to obtain a first deployment file and a second deployment file;

deploying the first deployment file into a first computing unit, and deploying the second deployment file into a second computing unit;

receiving one or more video streams, the video streams comprising multi-frame image data;

sequentially calculating an insertion frame between every two adjacent image data frames through a first calculating unit and a second calculating unit;

and outputting the image data frames and the inserted frames in turn according to the time sequence.

Optionally, the first computing unit is an FPGA, and the second computing unit is a GPU.

Optionally, the step of splitting and compiling the preset video frame inserting processing algorithm to obtain the first deployment file and the second deployment file includes:

splitting and compiling a deep learning network frame inserting algorithm, wherein the obtained first deployment file comprises the following components: the method comprises the steps of preprocessing, at least one intermediate process, at least one first network and post-processing, wherein the obtained second deployment file comprises at least one second network.

Optionally, the step of sequentially calculating the insertion frames between every two adjacent image data frames by using the first calculating unit and the second calculating unit includes:

storing the image data through preprocessing in the first computing unit, and computing an inserted frame between every two adjacent image data frames through the first network and the intermediate processing;

in the second calculation unit, performing neural network image quality optimization on the inserted frame through the second network;

in the first calculation unit, filtering processing is performed on the interpolated frame after the neural network image quality optimization by the post-processing.

Optionally, the step of storing the image data by preprocessing in the first computing unit includes:

the image data is stored in ping-pong in a first space, a second space, a third space, a fourth space and a fifth space of a memory in the first computing unit by preprocessing according to a time sequence, wherein each memory space stores only one frame of the image data at a time.

Optionally, the step of calculating an interpolated frame between every two adjacent frames of image data by the first network and the intermediate processing includes:

calculating motion information and balance information between every two adjacent image data frames through the first network, and writing the motion information and the balance information into a fourth space or a fifth space of the memory;

and reading the motion information and the equalization information in the fourth space or the fifth space through the intermediate processing to calculate an inserted frame between every two adjacent image data frames.

Optionally, the step of performing, in the first computing unit, filtering processing on the inserted frame after the neural network image quality optimization through the post-processing includes:

and filtering the inserted frame with the optimized image quality of the neural network, and storing the filtered inserted frame into a sixth space of the memory.

In a second aspect, the present disclosure provides a video framing apparatus based on heterogeneous computation, the apparatus comprising:

the splitting module is used for splitting and compiling a preset video frame inserting processing algorithm to obtain a first deployment file and a second deployment file;

the deployment module is used for deploying the first deployment file into a first computing unit and deploying the second deployment file into a second computing unit;

an image data receiving module for receiving one or more video streams, the video streams comprising a preset number of image data;

the frame inserting module is used for sequentially calculating the inserted frames between every two adjacent image data frames through the first calculating unit and the second calculating unit;

and the image data output module is used for outputting the image data frames and the insertion frames in turn according to the time sequence.

In a third aspect, the present disclosure provides a video processor that generates video image information after interpolation according to the heterogeneous computation-based video interpolation method described in any one of the above.

In a fourth aspect, the present disclosure provides a transmitting card, including the video processor described above.

In a fifth aspect, the present disclosure provides a display control system comprising:

video source transmitting means for transmitting video image data;

the transmitting card is used for receiving the video image data, performing data conversion and transmitting the video image information;

the receiving card is used for receiving the video image information and driving a display device to display;

wherein, the sending card is the sending card.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in the embodiment of the disclosure, the video frame inserting processing algorithm is split, a heterogeneous processing architecture combined by the first computing unit and the second computing unit is adopted, and the proper deployment files are respectively put into the corresponding computing units and used for sequentially computing the inserted frames between every two adjacent image data frames, so that real-time frequency doubling processing of the video is realized, and the computing bandwidth is increased.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 illustrates a schematic diagram of steps of a video interpolation method based on heterogeneous computation in an exemplary embodiment of the present disclosure;

fig. 2 illustrates a split schematic of a deep learning network based MEMC algorithm in an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a system architecture diagram of a heterogeneous computation based video framing method in an exemplary embodiment of the present disclosure;

FIG. 4 is a block diagram illustrating a schematic structure of a video framing apparatus based on heterogeneous computation in an exemplary embodiment of the present disclosure;

fig. 5 illustrates a schematic diagram of a configuration of a transmitting card in an exemplary embodiment of the present disclosure;

fig. 6 illustrates a schematic structure of a display control system in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

In this exemplary embodiment, a video frame inserting method based on heterogeneous computation is provided first, and referring to fig. 1, the method may include the following steps S101 to S105:

step S101: splitting and compiling a preset video frame inserting processing algorithm to obtain a first deployment file and a second deployment file.

Step S102: and deploying the first deployment file into a first computing unit, and deploying the second deployment file into a second computing unit.

Step S103: one or more video streams are received, the video streams comprising multiple frames of image data.

Step S104: and sequentially calculating the insertion frames between every two adjacent image data frames through the first calculating unit and the second calculating unit.

Step S105: and outputting the image data frames and the inserted frames in turn according to the time sequence.

According to the method, the video frame inserting processing algorithm is split, the heterogeneous processing architecture combined by the first computing unit and the second computing unit is adopted, and the proper deployment files are respectively put into the corresponding computing units and used for sequentially computing the inserted frames between every two adjacent image data frames, so that real-time frequency multiplication processing of the video is realized, and the computing bandwidth is increased.

Next, each step of the above-described method in the present exemplary embodiment will be described in more detail with reference to fig. 1 to 3.

In step S101, a preset video plug-in processing algorithm is split and compiled into a plurality of deployment files capable of running based on a plurality of computing units based on a development environment. Specifically, the algorithm compiling operation is to compile the algorithm based on a PC machine used by the development environment, instead of compiling the algorithm at the terminal side equipment, so that the limitation of memory resources of the terminal side equipment can be effectively avoided.

It is understood that the computing units may include CPU, GPU, DSP, FPGA and Asic computing units.

In one embodiment, a heterogeneous processing architecture combining two computing units, namely an FPGA and a GPU, is selected.

FPGAs are a further development based on programmable devices such as PAL, GAL, CPLD, which have greater parallelism, are customizable and reconfigurable.

Specifically, greater parallelism is achieved through both concurrent and pipelined techniques. Concurrency refers to the repeated allocation of computing resources so that multiple modules may be simultaneously and independently computed, e.g., multiple additions and multiplications and any logic that can be designed may be performed simultaneously. Pipelining is by segmenting tasks, with execution occurring from segment to segment.

Logic is realized in the FPGA through a Lookup Table, and the customization means that a user can realize own logic circuit within the allowable range of resources. Typically, tasks run faster on hardware circuitry than software, such as comparing the size of a 64-bit high-order 32 bits with the size of a low-order 32 bits, requiring 2 field-number instructions under the CPU, two bit-and-instruction, a shift instruction, a compare instruction, and a write-back instruction, and only a comparator under the FPGA.

Reconfigurable means that logic inside the FPGA can be changed according to requirements, so that development cost is reduced. Meanwhile, the use of FPGA multiplexing resources saves more space for the server than the use of multiple fixed ASIC modules.

And a GPU is a microprocessor that works exclusively on personal computers, workstations, gaming machines, and some mobile devices (e.g., tablet computers, smartphones, etc.). The GPU has the advantages of high-density operation, high-efficiency parallelism, ultra-long graphics pipeline and the like.

GPUs typically have a memory bit width of 128 bits or 256 bits, and therefore GPUs have good performance in computationally intensive applications.

The efficient parallelism is mainly realized by parallel computation of a plurality of GPU drawing pipelines, and the pipelines can run under the centralized control of a single control component or independently.

The design of the GPU ultra-long graphics pipeline aims at maximizing throughput, so that the GPU is used as a data stream parallel processor and has obvious advantages in the aspect of large-scale data stream parallel processing.

In this embodiment, the video interpolation processing algorithm is a motion estimation and motion compensation (Motion Estimate and Motion Compensation, MEMC) algorithm based on a deep learning network, that is, the prediction of object motion information in MEMC is implemented by using the deep learning network, then the original frame information and the motion information are fused to obtain a predicted frame, and finally the predicted frame is inserted into the original frame to obtain a frequency-doubled video.

Specifically, as shown in fig. 2, the MEMC algorithm based on the deep learning network can be split and compiled into preprocessing, the deep learning network and post-processing. The deep learning network is divided into a network and an intermediate process, and in a multi-stage network, the deep learning network can comprise a network 1, an intermediate process 1, a network 2, intermediate processes 2, … … and an intermediate process n.

In step S102, after the above algorithm is split and compiled, a first deployment file suitable for FPGA implementation is placed on the FPGA to implement, and a second deployment file suitable for GPU processing is placed on the GPU to implement.

The deep learning network can achieve higher computation speed in the FPGA, for example, a dedicated deep learning hardware resource artificial intelligence engine (Artificial Intelligence engine, AIE) is integrated into the FPGA, enabling the network reasoning capability to reach 100T. Meanwhile, the preprocessing and the post-processing can be realized in a pipelining manner by using the logic resources of the FPGA. The FPGA is added to accelerate the processing of the deep learning network model, and meanwhile, the logic resource can efficiently realize the flow treatment part in the algorithm.

The embedded GPU can facilitate the development of special functions by a developer and has good computing capacity.

The frequency multiplication processing of 4k resolution requires a multi-stage network, and the calculation amount is huge. Thus, the characteristics of the FPGA and the GPU are integrated to realize real-time video frequency multiplication processing.

It is understood that the first deployment file may include a pre-process, at least one intermediate process, at least one first network, and a post-process, and the second deployment file includes at least one second network. Taking a level 2 network as an example, the deep learning network includes a network 1, an intermediate process, and a network 2. The first deployment file is pre-process, intermediate process, network 1 and post-process and the second deployment file is network 2. The multi-level network is similar to the 2-level network and will not be described in detail herein.

In step S103, the video stream may be a 3D video stream or a 2D video stream, which is not limited by the present disclosure. The 3D video stream contains image data of three views for stereoscopic image display, and the 2D video stream contains two-dimensional image data.

A specific video stream interface may be a high definition multimedia interface (High Definition Multimedia Interface, HDMI). The HDMI interface has small volume, not only can meet 1080P resolution, but also can transmit uncompressed audio signals and video signals. The video stream interface may be provided in an FPGA.

In step S104, after receiving the image data through the video stream interface in the FPGA, the inserting frames between every two adjacent image data frames are sequentially calculated by using the deployment files placed in the FPGA and the GPU in the previous step.

The process of computing the interpolated frame is described in more detail below with reference to one system architecture shown in fig. 3.

Specifically, the image data may be stored by preprocessing in a first computing unit, such as an FPGA, and the interpolated frames between every two adjacent frames of the image data may be computed by a first network and intermediate processing. And in a second computing unit such as a GPU, performing neural network image quality optimization on the inserted frame through a second network, transmitting the image data subjected to the neural network image quality optimization to an FPGA, and continuing filtering.

It is understood that a memory for storing image data is also provided in the first computing unit. In a specific example, the memory may be a Double Data Rate synchronous dynamic random access memory (DDR). Compared with the traditional single data rate memory, the DDR technology realizes that two read/write operations are performed in one clock period, namely, one read/write operation is performed on the rising edge and the falling edge of the clock respectively, so that a higher data transmission rate can be achieved.

In the disclosure, the memory space of the DDR in the first computing unit is partitioned, so that ping-pong cache of the image data is realized, and the cache mechanism can improve throughput of the device, which is finally helpful to avoid bottleneck.

In one embodiment, the DDR in the first computing unit has at least five memory spaces for storing raw input image data, and each memory space stores only one frame of image data at a time. For example, according to the time sequence, a first frame of image data is written in a first space, a second frame is written in a second space, a third frame is written in a third space, a fourth frame is written in a fourth space, and a fifth frame is written in a fifth space by preprocessing. When the sixth frame of image data is to be written, the first frame of image data in the first space is already output, and therefore, the sixth frame of image data can be written in the first space, and so on.

Of course, in other embodiments, the DDR may have more than five memory spaces for storing the original input image data, and the principle is similar to that of five, and will not be described here again.

After DDR buffer memory in the first calculation unit is enough for a certain amount of image data, the interposed frames between every two adjacent image data frames are calculated through the first network and the intermediate processing in the first calculation unit.

Specifically, motion information and equalization information between every two adjacent image data frames are calculated through a first network, and the motion information and the equalization information are written into a fourth space or a fifth space of the DDR.

It will be understood that when the DDR is buffered to the 1 st frame image and the 2 nd frame image, the first network starts to calculate the motion information and the equalization information of the 1 st frame and the 2 nd frame, but at this time, the 4 th frame image is not yet received, so that the motion information and the equalization information of the 1 st frame and the 2 nd frame may be stored in the fourth space, and when the 4 th frame image is to be written, the motion information and the equalization information of the 1 st frame and the 2 nd frame are already transmitted for calculation in the subsequent step. Therefore, the existence of the fourth space makes the process of storing the motion information and the balance information not conflict with the process of writing the original image, and the processes can be performed simultaneously in time. Similarly, the fifth space can store motion information and balance information similarly, and the principle is the same as that described above, and the description is omitted here.

Then, the intermediate processing in the first computing unit reads the motion information and the equalization information in the fourth space or the fifth space to compute an interposed frame between every two adjacent image data frames. The intermediate processing is pipeline calculation, and the calculated insertion frames between every two adjacent image data frames can be directly transmitted to a PCIE interface arranged on a second calculation unit through a PCIE interface arranged on a first calculation unit, so that the calculated insertion frames are transmitted to the second calculation unit.

And after receiving the inserted frame, the second computing unit performs neural network image quality optimization processing through a second network deployed on the second computing unit. After the processing is finished, the processed image data is retransmitted back to the first computing unit through the PCIE interface.

The first calculation unit performs filtering processing on the inserted frame subjected to the neural network image quality optimization through post-processing to further improve the image quality, and stores the processed image data in a sixth space of the memory.

In S105, in the first computing unit, the image data frames and the insertion frames are alternately output in time sequence using the video output interface. For example, the 1 st frame, the inserted 1.5 st frame, the 2 nd frame, the inserted 2.5 nd frame, the 3 rd frame, the inserted 3.5 th frame, and the like are sequentially output in chronological order.

Specifically, the video output interface and the video input interface may be the same interface, for example, all are HDMI interfaces. Of course, in other embodiments, other interfaces that support both video input and video output are possible, which the present disclosure is not limited to.

It should be noted that although the steps of the methods of the present disclosure are depicted in a particular order in the figures, this does not require or imply that the steps must be performed in that particular order. Of course, in addition to the variation among the step sequences, it is also possible to combine a plurality of steps into one step to be performed, and/or to decompose one step into a plurality of steps to be performed, etc.

Referring to fig. 4, the disclosure further provides a video frame inserting device based on heterogeneous computation, where the device includes a first computing unit, a second computing unit, a splitting module, a deployment module, an image data receiving module, a frame inserting module, and an image data output module. The splitting module is used for splitting and compiling a preset video frame inserting processing algorithm to obtain a first deployment file and a second deployment file; the deployment module is used for deploying the first deployment file into a first computing unit and deploying the second deployment file into a second computing unit; an image data receiving module for receiving one or more video streams, the video streams comprising a preset number of image data; the frame inserting module is used for sequentially calculating the inserted frames between every two adjacent image data frames through the first calculating unit and the second calculating unit; and the image data output module is used for outputting the image data frames and the insertion frames in turn according to the time sequence.

In a specific example, the first computing unit is an FPGA, and the second computing unit is a GPU.

In a specific example, the splitting module is further configured to split and compile a deep learning network frame insertion algorithm, and the obtained first deployment file includes: the method comprises the steps of preprocessing, at least one intermediate process, at least one first network and post-processing, wherein the obtained second deployment file comprises at least one second network.

In a specific example, the frame inserting module includes a frame inserting calculating unit, a first optimizing unit, and a second optimizing unit. The first computing unit is used for storing the image data through preprocessing and computing an inserted frame between every two adjacent image data frames through the first network and the intermediate processing; the first optimizing unit is used for optimizing the neural network image quality of the inserted frame through the second network in the second calculating unit; and the second optimizing unit is used for performing filtering processing on the inserted frame subjected to the neural network image quality optimization through the post-processing in the first calculating unit.

In a specific example, the above-mentioned interpolation frame calculation unit further includes a preprocessing subunit for storing the image data by preprocessing, in accordance with a time sequence, the first space, the second space, the third space, the fourth space, and the fifth space of the memory in the first calculation unit, wherein each memory space stores only one frame of the image data at a time.

In a specific example, the frame inserting calculation unit further includes a first network subunit and an intermediate processing subunit, where the first network subunit is configured to calculate motion information and equalization information between two adjacent image data frames, and write the motion information and equalization information into a fourth space or a fifth space of the memory; and the intermediate processing subunit is used for reading the motion information and the equalization information in the fourth space or the fifth space and calculating an inserted frame between every two adjacent image data frames.

In summary, the video frame inserting device based on heterogeneous computation provided by the present disclosure, through each module in the device, not only realizes real-time frequency multiplication processing of video, but also increases computation bandwidth. .

The specific manner in which the various modules or units perform operations in the above-described heterogeneous computing-based video framing apparatus has been described in detail in connection with embodiments of heterogeneous computing-based video framing methods, and will not be described in detail herein.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied. The components shown as modules or units may or may not be physical units, may be located in one place, or may be distributed across multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the objectives of the disclosed solution. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

Further, in this example implementation, there is also provided a video processor that generates video image information after interpolation according to the heterogeneous computation-based video interpolation method of any one of the above embodiments.

Further, in the present exemplary embodiment, referring to fig. 5, there is also provided a transmission card including the video processor described above.

Further, in the present exemplary embodiment, referring to fig. 6, a display control system is also provided. Comprising the following steps:

video source transmitting means for transmitting video image data;

the transmitting card is used for receiving the video image data, converting the video image data into video image information and transmitting the video image information;

the receiving card is used for receiving the video image information and driving the display device to display;

wherein the transmitting card is the transmitting card in the above embodiment.

It should also be appreciated that the specific manner in which the video processor, the transmit card and the display control system are described in detail in connection with embodiments of the pair of heterogeneous computing-based video framing methods will not be described in detail herein.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A heterogeneous computation-based video frame inserting method, comprising:

2. The heterogeneous computing-based video interpolation method of claim 1, wherein the first computing unit is an FPGA and the second computing unit is a GPU.

3. The heterogeneous computation-based video frame inserting method according to claim 2, wherein the step of splitting and compiling a preset video frame inserting processing algorithm to obtain a first deployment file and a second deployment file comprises the steps of:

4. The heterogeneous computation-based video interpolation method according to claim 3, wherein the step of sequentially computing the interpolation frame between every two adjacent image data frames by the first computing unit and the second computing unit comprises:

5. The heterogeneous computation-based video interpolation method according to claim 4, wherein the storing of the image data by preprocessing in the first computing unit comprises:

6. The heterogeneous computation-based video interpolation method according to claim 5, wherein the step of computing an interpolation frame between every two adjacent frames of the image data through the first network and the intermediate process comprises:

7. The heterogeneous computation-based video interpolation method according to claim 6, wherein the step of filtering the interpolation frame after the neural network image quality optimization by the post-processing in the first computation unit includes:

8. A heterogeneous computation-based video framing apparatus, comprising:

9. A video processor, characterized in that the video processor generates video image information after interpolation according to the heterogeneous computation based video interpolation method according to any one of claims 1 to 7.

10. A transmission card comprising the video processor of claim 9.

11. A display control system, comprising:

video source transmitting means for transmitting video image data;

wherein the transmitting card is the transmitting card of claim 10.