WO2014079303A1

WO2014079303A1 - Method, device and system for synthesizing multi-screen video

Info

Publication number: WO2014079303A1
Application number: PCT/CN2013/086014
Authority: WO
Inventors: 贾少华; 桂志渊; 刘克华
Original assignee: 中兴通讯股份有限公司
Priority date: 2012-11-23
Filing date: 2013-10-25
Publication date: 2014-05-30
Also published as: CN103841359A

Abstract

Disclosed are a method, device and system for synthesizing a multi-screen video. The method comprises: a video processing field programmable gate array (FPGA) receiving, through a high-speed serial bus, multiple videos and respective corresponding addresses sent by a decoding module, the addresses of the videos being determined by the decoding module according to a multi-screen arrangement requirement; zooming the multiple received videos, so that the size of each video is the same as that of a corresponding screen of multiple screens; buffering the zoomed videos, and modifying the address corresponding to each of the buffered videos; storing each of the zoomed videos in a corresponding memory space according to the modified address. The present invention can conserve system resources, and improve the data transmission speed and image quality.

Description

Video multi-picture synthesis method, device and system

The present invention relates to video conferencing technology, and in particular, to a video multi-picture synthesis method, apparatus and system. Background technique

The HD conference TV terminal usually adopts the hardware architecture as shown in FIG. 1. The working principle of the HD conference television terminal is as follows: The network communication module 110 receives the network packet transmitted by the remote conference television terminal, and sends it to the main control processor 109 for unpacking. Obtaining a compressed video stream of the far end, and then transmitting the compressed video data to the decoding module 105 through the system bus 108 between the decoding module, and after decompressing the video data, the decoding module 105 obtains the data in the original RAW format, and then decodes the data. The video interface (Video Port, VP) 106 of the module 105 is packaged into standard BT.1120 format video data and sent to a Video Processing Field Programmable Gate Array (FPGA), that is, 107. At the same time, the local video is sent to the video switching matrix 103 through the video input interface module 101, and the switching matrix 103 also sends the video data to the video processing FPGA 107 according to the system configuration. The video processing FPGA 107 performs the video scaling and multi-picture synthesis of the obtained far-end and local video according to the system configuration, and then outputs the display from the video output interface module 102 through the video switching matrix 103. The encoding module 104 obtains the locally input video from the video processing FPGA 107, compresses and encodes the original image, reduces the image bit rate, and then transmits the compressed code stream to the main control processor 109 through the system bus 108 for network packaging, and then passes through the network communication module. 110 Transfer to the far end. In this way, the process of peer-to-peer communication between two conference television terminals is completed.

At present, the parallel VP interface is used for data transmission between the encoding and decoding modules and the video processing FPGA. The VP interface is a 16-bit data bus with a very small bandwidth and a small amount of data that can be transmitted. Up to one video data of 1080P60 can be transmitted at most. With HD conference TV final The terminal can realize the function of the built-in Multipoint Control Unit (MCU), and the data to be transmitted between the encoding and decoding module and the video processing FPGA is greatly increased, and the parallel VP interface can no longer meet the needs of data transmission. When there are multiple high-resolution, high-frame-rate decoded video that needs to be transmitted, the decoding module needs to scale multiple channels of video to reduce the bandwidth of the data stream, and then transmit it to the video processing FPGA through the VP interface. The video processing FPGA needs to perform video processing. Secondary scaling and picture extraction, followed by multi-picture synthesis, increases system complexity, not only wastes system resources, but also reduces image quality. In addition, the parallel VP interface will occupy a lot of printed circuit board (PCB) wiring space; when the video clock frequency is high, especially when the video is 1080P60, the bus timing is difficult to control. Summary of the invention

In view of this, the main objective of the embodiments of the present invention is to provide a video multi-picture synthesis method, apparatus and system, which can save system resources and improve data transmission speed and image quality.

To achieve the above objective, the technical solution of the embodiment of the present invention is implemented as follows:

An embodiment of the present invention provides a video multi-picture synthesis method, where the method includes: a video processing FPGA receives a multi-channel video and a corresponding address sent by a decoding module through a high-speed serial bus, and the address of each video is decoded. The module is determined according to the requirements of the multi-screen layout;

The received multi-channel video is scaled, and the size of each scaled video is the same as the size of the corresponding sub-picture in the multi-screen;

Cache the scaled video and correct the address corresponding to each cached video. Preferably, before the scaling the received multiple video, the method further includes: The data sent from the serial bus is deserialized, the valid data is parsed, and the valid data is processed in parallel to obtain parallel data. Preferably, the scaling of the received multi-channel video is:

The received multi-channel video is scaled according to the image quality requirement by selecting a neighborhood interpolation algorithm, a bilinear interpolation algorithm or a multi-phase interpolation algorithm.

Preferably, before the storing the video in the corresponding memory space according to the modified address, the method further includes:

Selecting a video to be stored in the memory space from the cached multi-channel video by a round-robin mechanism;

Correspondingly, the video is stored in the corresponding memory space, as follows:

The selected videos are sequentially stored in the corresponding memory space.

The embodiment of the present invention provides a video processing FPGA, where the video processing FPGA includes: a high-speed serial bus controller configured to receive multiple channels of video and corresponding addresses sent by the decoding module through a high-speed serial bus, The address of the road video is determined by the decoding module according to the requirements of the multi-screen layout;

The scaling module is configured to scale the multi-channel video received by the high-speed serial bus controller, and the size of each scaled video is the same as the size of the corresponding sub-picture in the multi-screen;

The frame buffer module is configured to cache the scaled videos, and respectively correct the addresses corresponding to the cached video channels;

The memory controller is configured to store the scaled video channels into the corresponding memory space according to the corrected address of the frame buffer module.

Preferably, the high-speed serial bus controller is further configured to perform deserialization processing on the data sent by the decoding module through the high-speed serial bus, parse the valid data, and perform parallel processing on the valid data to obtain parallel data.

Preferably, the scaling module has a configured to select a neighboring domain interpolation algorithm, a bilinear interpolation algorithm or a multi-phase interpolation algorithm according to the image quality requirement, and scale the received multiple video. Preferably, the video processing FPGA further includes: an arbitration module;

The arbitration module is configured to sequentially select a video to be stored in the memory space from the multi-channel video buffered by the frame buffer module by using a round-robin mechanism;

Correspondingly, the memory controller is configured to sequentially store the video selected by the arbitration module into a corresponding memory space.

Preferably, the frame buffer module is composed of a one-hot state machine, and each state corresponds to one frame of data.

An embodiment of the present invention provides a video multi-view synthesis system, where the system includes: a decoding module and a video processing FPGA, where

The decoding module is configured to determine, according to the requirements of the multi-screen layout, respective addresses corresponding to the multi-channel videos decoded by the decoder, and decode the corresponding addresses of the multi-channel video and the determined multi-channel video through the high-speed serial bus. Send to the video processing FPGA;

The video processing FPGA is configured to receive the multi-channel video and the corresponding address sent by the decoding module through the high-speed serial bus, and the address of each video is determined by the decoding module according to the requirement of the multi-screen layout;

The scaled video is cached, and the addresses corresponding to the cached videos are respectively corrected. As can be seen from the above, the technical solution of the embodiment of the present invention includes: a video processing field programmable gate array (FPGA) through a high speed serial bus Receiving multiple channels of video and corresponding addresses sent by the decoding module, and the addresses of the respective channels are determined by the decoding module according to the requirements of the multi-screen layout; the received multiple channels of video are scaled, and the scaled videos are respectively The size is the same as the size of the corresponding sub-picture in the multi-screen; the cached video is cached, and corresponding to each cached video The memory space, by which data is transferred over the high-speed serial bus, saves system resources and increases data transfer speed and image quality. DRAWINGS

1 is a schematic diagram of a hardware architecture of an existing HD conference television terminal;

2 is a schematic flowchart of an implementation of a video multi-screen synthesis method according to a first embodiment of the present invention; FIG. 3 is a schematic structural diagram of an embodiment of a video processing FPGA according to the present invention;

4 is a schematic structural diagram of an embodiment of a decoding module according to the present invention;

FIG. 5 is a schematic structural diagram of an embodiment of a video multi-picture synthesizing system according to the present invention; FIG.

FIG. 6 is a schematic diagram of an implementation flow of a second embodiment of a video multi-picture synthesis method according to the present invention; FIG. 7 is a schematic diagram of a three-way sub-picture synthesis structure according to an embodiment of the present invention. A first embodiment of a video multi-picture synthesis method is provided by the present invention. As shown in FIG. 2, the method includes:

Step 201: The video processing FPGA receives the multi-channel video and the corresponding address sent by the decoding module through the high-speed serial bus, and the address of each video is determined by the decoding module according to the requirement of the multi-screen layout;

Step 202: The received multi-channel video is scaled, and the size of each scaled video is the same as the size of the corresponding sub-picture in the multi-screen;

Step 203: Cache the scaled video, and correct the address corresponding to each cached video; space.

Preferably, before the scaling the received multiple video, the method further includes: The data sent from the decoding module through the high-speed serial bus is subjected to deserialization processing, the valid data is parsed, and the valid data is processed in parallel to obtain parallel data.

Preferably, the scaling the received multiple video is:

The selected videos are sequentially stored in the corresponding memory space.

An embodiment of a video processing FPGA provided by the present invention, as shown in FIG. 3, the video processing FPGA includes:

The high-speed serial bus controller 301 is configured to receive the multi-channel video and the corresponding address sent by the decoding module through the high-speed serial bus, and the address of each video is determined by the decoding module according to the requirement of the multi-screen layout;

The scaling module 302 is configured to scale the multi-channel video received by the high-speed serial bus controller 301, and the size of each scaled video is the same as the size of the corresponding sub-picture in the multi-screen; the frame buffer module 303 is configured to Cache each of the scaled videos, and respectively correct the addresses corresponding to the cached videos;

The memory controller 304 is configured to store the scaled videos in the corresponding memory space according to the corrected address of the frame buffer module 303.

Preferably, the high-speed serial bus controller 301 is further configured to perform deserialization processing on data sent by the decoding module through the high-speed serial bus, parse the valid data, and perform parallel processing on the valid data to obtain parallel data. Preferably, the scaling module 302 has a configured to select a neighboring domain interpolation algorithm, a bilinear interpolation algorithm or a multi-phase interpolation algorithm according to the image quality requirement, and scale the received multiple video.

Preferably, the video processing FPGA further includes: an arbitration module 305;

The arbitration module 305 is configured to sequentially select a video to be stored in the memory space from the multi-channel video buffered by the frame buffer module 303 by using a round-robin mechanism;

Here, since the memory can only perform one read or write operation at the same time, and there are multiple ways to issue a read or write request to the memory, it is necessary to arbitrate the multiple read and write requests to determine which request is currently authorized. . The arbitration module 305 uses a one-hot encoding state machine, each state represents a request, and the arbitration uses a polling mechanism to ensure that the requests are fair and timely.

Correspondingly, the memory controller 304 is specifically configured to sequentially store the video selected by the arbitration module 305 into a corresponding memory space.

Preferably, the frame buffer module 303 is composed of a one-hot state machine, and each state corresponds to one frame of data.

Here, since the video formats may be different, the codec format may be different. Therefore, when the video is buffered, three frames of buffer space are opened, and the state of the three frames is marked. Assuming that the current first frame state is empty, when the high speed serial bus controller 301 writes video data to the frame buffer module 303, and the frame buffer module 303 finds that the first frame state is empty, the jump enters the first frame. The state, after the first frame is written, the first frame is marked as full, indicating that the frame has been filled with video data and can be read. At this time, the frame buffer module 303 determines whether the state of the next frame is empty, and if it is empty, jumps to the second frame to start writing when the decoding module transmits data. If the state of the second frame is full, indicating that the second frame data is also full and is being read by the frame read mode 306 block, then the frame buffer module 303 continues to remain in the state of the first frame, when the decoding module passes the high speed serial When the bus controller 301 writes video data The frame buffer module 303 overwrites the data of the original first frame. The frame buffer module 303 cooperates with the frame reading module 306 to complete the frame rate conversion function of the frame loss.

The frame read module 306 is configured to read the synthesized video multi-picture from the memory. The basic structure of the frame read module 306 is the same as the frame buffer module 303, and is also composed of a one-hot state machine. When the encoding module reads the video multi-picture of the corresponding address through the high-speed serial bus controller 301, the frame reading module 306 selects one frame from its buffered three-frame data for reading under the control of the state machine. Only frames marked as full can be read, and the status of this frame is set to empty after reading. If the first frame data is just read, the encoding module sends the read command again. At this time, the frame reading module 306 determines whether the state of the next frame data is full. If it is full, it indicates that one frame has just been written. The data can be read, then the frame read module 306 skips to the state of the next frame and reads the data. If it is empty, indicating that the frame data is not ready yet, the frame reading module 306 maintains the current state, and reads the frame data just read again, thus completing the frame rate conversion of the frame copy. Features.

An embodiment of a decoding module provided by the present invention, as shown in FIG. 4, the decoding module includes: an address determining unit 401, configured to determine, according to a requirement of a multi-screen layout, an address corresponding to each of the decoded multi-channel videos;

The transmitting unit 402 is configured to send the address corresponding to each of the decoded multi-channel video and the determined multi-channel video to the video processing FPGA through the high-speed serial bus.

The embodiment of the present invention provides a video multi-screen synthesis system. As shown in FIG. 5, the system includes: a decoding module 501 and a video processing FPGA 502, where

The decoding module 501 is configured to determine, according to the requirements of the multi-screen layout, respective addresses corresponding to the multiple channels of video decoded by the decoder, and decode the obtained multiple channels of video and the corresponding addresses of the determined multiple channels of video through high-speed serial The bus is sent to the video processing FPGA 502;

The video processing FPGA 502 is configured to receive the multi-channel video and the corresponding address sent by the decoding module 501 through the high-speed serial bus, and the address of each video is the decoding module 501. The requirements of the multi-screen layout are determined;

The scaled video is cached, and the addresses corresponding to the cached videos are respectively corrected. A second embodiment of the video multi-picture synthesis method provided by the present invention is described below with reference to FIG. In this embodiment, the encoding and decoding module uses a digital signal processor (DSP) TMS320TCI6608, and the TMS320TCI6608 is a multi-core fixed-point/floating point DSP with a frequency of up to 1.25G, which can simultaneously encode or decode two 1080P60 formats. Video, support rapioIO high-speed serial bus; video processing FPGA uses EP4S110GXF1120, the EP4S110GXF1120 embedded 32 serial transceivers, can achieve PCIe, rapidIO and other high-speed serial protocols. In this embodiment, the rapidIO interconnection is used between the DSP and the video processing FPGA to transmit video data, and the high-speed serial bus controller is a RapidIO controller. Video Processing FPGA's rapidIO can support up to 3.125G in a 4x configuration, so the total bandwidth is 12.5G, except for protocol overhead, with an effective bandwidth of 10G. One channel of 1080P60 video has a valid data bandwidth of 2G, so it is enough to transmit five channels of 1080P60 raw valid data. Video processing FPGA external four-speed double-rate synchronous dynamic random access memory 3 ( Double Data Rate, DDR3), each DDR3 memory 16-bit 2Gbits, the rate is 800Mbps, so the total memory bandwidth is 51.2Gbps. It is assumed that a three-picture image synthesis of a character shape is realized, and the codec is a video system of 1080P30.

Step 601: The video processing FPGA receives the multi-channel video and the corresponding address sent by the decoding module through the high-speed serial bus, and the address of each video is determined by the decoding module according to the requirement of the multi-screen layout;

Here, the high-speed serial bus can effectively save the wiring space of the PCB, and the bandwidth of the high-speed serial bus is much larger than that of the VP interface. High-speed serial total of mid-range video processing FPGAs The line interface can achieve 4x 3.125G = 12.5Gbps transmission rate, can transmit four channels of 1080P60 standard effective video, and the VP interface can only transmit one channel of 1080P60 standard effective video.

Step 602: Perform deserialization processing on the data sent by the decoding module through the high-speed serial bus, analyze the valid data, and perform parallel processing on the valid data to obtain parallel data.

Step 603: The received multi-channel video is scaled, and the size of each scaled video is the same as the size of the corresponding sub-picture in the multi-screen;

Specifically, the received multi-channel video is scaled according to the requirement of image quality by selecting a neighborhood interpolation algorithm, a bilinear interpolation algorithm or a multi-phase interpolation algorithm. If the image quality is not high, select the neighborhood interpolation algorithm; if the image quality is high, choose the bilinear interpolation algorithm; if the image quality is very high, the multi-phase interpolation algorithm;

Here, in order to balance performance and complexity, a bilinear algorithm can be used. Although the bilinear algorithm will produce a certain ringing effect on the image, its image quality can meet the requirements in the conference television application scenario. Moreover, the bilinear algorithm only needs to use 4 pixels in the original image to generate one pixel in the target image, and the amount of computation and complexity are relatively small. In this example, since the output is 1080P30 video, the multi-picture is a three-picture superposition of the character shape, so the line and column of each sub-picture is half of the original image line, so the scaling is 1/2.

Step 604: Cache the scaled videos, and correct the addresses corresponding to the cached videos.

Here, since the number of image points changes after scaling, the address also needs to be remapped. In this example, the scaling is 1/2, so each row and column address is only half of the original image. The storage address of each pixel is recalculated according to the size of the sub-picture after scaling and the starting position in the multi-picture, so that the three-way sub-picture is accurate. Stored in the location of the remote 1, remote 2 and remote 3 in Figure 7; because the rate of video transmission by the DSP and the rate of DDR3 memory are inconsistent, in order to improve the storage efficiency of DDR3 memory, use the Ping Qong Buffer method. The video is cached, that is, a random access memory with two line buffers inside the video processing FPGA (Random Access) Memory, RAM), when the video transmitted by the decoding DSP writes the first line of RAM, it writes the second line of RAM, and at the same time generates a write request signal to the arbitration module, and after obtaining the authorization response of the arbitration, Write the data stored in the first row of RAM to DDR3 memory. When the second row RAM is full of data and then switched to the first row of RAM to write, this realizes the operation mode of PINGPONG, which can improve the storage efficiency of DDR3 memory, and realize when the three-way sub-pictures are all stored to the corresponding positions. The synthesis process of multi-picture.

After the three-way sub-picture has stored one frame, jump to the next frame to perform the same operation. At this time, if the encoding DSP needs a new one-frame multi-picture encoding, the reading command is issued by the RapidIO controller, and the frame reading module judges that after one frame of the complete multi-picture storage is completed, the frame data is read, and the frame reading is performed. The PINGPONG buffer method is also used internally to synchronize the speed of DDR3 memory and RapidIO controller, and improve the read efficiency of DDR3 memory. When the data of one row RAM is filled, it is sent to the encoding DSP through the RapidIO controller. After the data is read, it waits for the DSP to read the new read command.

Step 605: Select, by using a round-robin mechanism, the video to be stored in the memory space from the cached multi-channel video.

Step 606: Store the selected videos into the corresponding memory space according to the corrected address. The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Industrial applicability

The present invention provides a video multi-picture synthesis method, apparatus and system, wherein the method comprises: a video processing field programmable gate array receiving a multi-channel video and a corresponding address sent by a decoding module through a high-speed serial bus. The address of each video is determined by the decoding module according to the requirements of the multi-screen layout; the received multi-channel video is scaled, and the size of each scaled video is the same as the size of the corresponding sub-picture in the multi-screen; Each channel of the video, and respectively correct the address corresponding to each cached video; according to the corrected address will be scaled each The road videos are stored in the corresponding memory space. The invention can save system resources and improve data transmission speed and image quality.

Claims

claims

1. A video multi-picture synthesis method, the method includes:

The video processing field programmable gate array (FPGA) receives multiple channels of video and their corresponding addresses from the decoding module through a high-speed serial bus. The address of each channel of video is determined by the decoding module in accordance with the requirements of the multi-screen layout;

The received multi-channel video is scaled, and the size of each scaled video is the same as the size of the corresponding sub-picture in the multi-picture;

Cache the scaled videos of each channel, and modify the addresses corresponding to the cached videos respectively;

2. The method according to claim 1, wherein before scaling the received multi-channel video, the method further includes:

The data sent from the decoding module through the high-speed serial bus is deserialized, parsed out the valid data, and the valid data is processed in parallel to obtain parallel data.

3. The method according to claim 1, wherein the scaling of the received multi-channel videos is:

According to the requirements for image quality, select the adjacent domain interpolation algorithm, bilinear interpolation algorithm or multi-phase interpolation algorithm to scale the received multi-channel video.

4. The method according to claim 1, wherein before storing each video into the corresponding memory space according to the corrected address, the method further includes:

The videos to be stored in the memory space are sequentially selected from the cached multi-channel videos through a round-robin mechanism;

Correspondingly, the above-mentioned method of storing each channel of video into the corresponding memory space is:

Save the selected videos into the corresponding memory space in sequence.

5. A video processing field programmable gate array (FPGA), the video processing FPGA includes:

The high-speed serial bus controller is configured to receive multiple channels of video and their corresponding addresses from the decoding module through the high-speed serial bus. The addresses of each channel of video are determined by the decoding module in accordance with the requirements of the multi-screen layout;

A scaling module configured to scale the multi-channel video received by the high-speed serial bus controller, and the size of each scaled video is the same as the size of the corresponding sub-picture in the multi-picture;

The frame cache module is configured to cache each channel of video after scaling, and correct the address corresponding to each channel of cached video respectively;

The memory controller is configured to store each scaled video into the corresponding memory space according to the corrected address of the frame buffer module.

6. The video processing FPGA according to claim 5, wherein the high-speed serial bus controller is further configured to deserialize the data sent from the decoding module through the high-speed serial bus, parse out the valid data, and The effective data is processed in parallel to obtain parallel data.

7. The video processing FPGA according to claim 5, wherein the scaling module is configured to select a nearby domain interpolation algorithm, a bilinear interpolation algorithm or a multi-phase interpolation algorithm according to the requirements for image quality, and will receive multi-channel video scaling.

8. The video processing FPGA according to claim 5, wherein the video processing FPGA further includes: an arbitration module;

The arbitration module is configured to sequentially select videos to be stored in the memory space from the multi-channel videos cached by the frame buffer module through a round-robin mechanism;

Correspondingly, the memory controller is configured to store the videos selected by the arbitration module into the corresponding memory space in sequence.

9. The video processing FPGA according to claim 8, wherein the frame buffer module is composed of a one-hot state machine, and each state corresponds to one frame of data.

10. A decoding module, the decoding module includes:

The address determination unit is configured to determine the respective addresses of the decoded multi-channel videos according to the requirements of the multi-screen layout;

The sending unit is configured to send the decoded multi-channel video and the corresponding addresses of the determined multi-channel video to the video processing field programmable gate array (FPGA) through the high-speed serial bus.

11. A video multi-picture synthesis system, the system includes: a decoding module and a video processing field programmable gate array (FPGA), where,

The decoding module is configured to determine the corresponding addresses of the multi-channel videos decoded by itself in accordance with the requirements of the multi-screen layout, and transmit the decoded multi-channel videos and the determined addresses of the multi-channel videos through the high-speed serial bus. Sent to video processing FPGA;

The video processing FPGA is configured to receive multiple channels of video and their corresponding addresses from the decoding module through a high-speed serial bus. The address of each channel of video is determined by the decoding module in accordance with the requirements of the multi-screen layout;