CN108881927B

CN108881927B - Video data synthesis method and device

Info

Publication number: CN108881927B
Application number: CN201711239750.4A
Authority: CN
Inventors: 高�浩; 牛永会; 亓娜; 王艳辉
Original assignee: Visionvera Information Technology Co Ltd
Current assignee: Hainan Shilian Communication Technology Co.,Ltd.
Priority date: 2017-11-30
Filing date: 2017-11-30
Publication date: 2020-06-26
Anticipated expiration: 2037-11-30
Also published as: CN108881927A

Abstract

The embodiment of the invention provides a video data synthesis method, wherein a video network comprises a video network video acquisition terminal and a video network server, and the method comprises the following steps: the video networking server receives a plurality of video data streams respectively collected by a plurality of video networking video collecting terminals, and counts the number of the video data streams; the video networking server respectively packs the multiple video data streams into video data packets, decodes the video data packets into a plurality of YUV image data, acquires the time stamp of the YUV image data, and synthesizes the YUV image data with the same time stamp into a multi-grid spliced image in a preset splicing mode; each grid in the multi-grid spliced image respectively displays the corresponding YUV image data; and packaging the continuous multi-grid spliced images into spliced video data which simultaneously shows the multi-channel video stream data. The aim of splicing multiple video network video streams into a complete video file is fulfilled.

Description

Video data synthesis method and device

Technical Field

The present invention relates to the field of video networking technologies, and in particular, to a video data synthesis method and a video data synthesis apparatus.

Background

With the rapid development of network science and technology, bidirectional communication such as video conference, video teaching is widely popularized in the aspects of life, work, study and the like of users, and plays a positive role in life and work application, further, when the users interact through video networking videos or audios, a plurality of video devices can generate a plurality of video data streams, so if a plurality of devices use the saved and downloaded files simultaneously, the users cannot watch a plurality of videos simultaneously, and when the users need to check a plurality of videos simultaneously, a plurality of video files need to be saved, which causes troubles for the users.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are proposed to provide a video data composition method and a corresponding video data composition apparatus that overcome or at least partially solve the above problems.

In order to solve the above problems, an embodiment of the present invention discloses a method for synthesizing video data, which is applied to a video network, wherein the video network comprises a video acquisition terminal and a video server, and the method comprises:

the video networking server receives a plurality of video data streams respectively collected by a plurality of video networking video collecting terminals, and counts the number of the video data streams;

the video networking server respectively packages the multiple video data streams into video data packets by adopting a video networking protocol; the video networking protocol comprises a video networking video data decoding format and a video networking video data file transmission format;

the video networking server decodes the video data packets into a plurality of YUV image data which accord with the video networking video data decoding format and the video networking video data file transmission format, and acquires the time stamp of the YUV image data;

the video network server synthesizes the YUV image data which have the same timestamp and correspond to the multi-channel video stream data into a multi-grid spliced image in a preset splicing mode; each grid in the multi-grid spliced image respectively displays the corresponding YUV image data; the preset splicing mode comprises vertical splicing or left-right splicing;

packaging the continuous multi-grid spliced images to generate spliced video data which simultaneously display the multi-channel video stream data; the spliced video data conforms to preset target video data parameters.

The embodiment of the invention also discloses a video data synthesis device, which is applied to the video network, wherein the video network comprises a video acquisition terminal and a video server, and the device comprises:

the video stream receiving module is used for receiving a plurality of video data streams respectively collected by a plurality of video network video collecting terminals by the video network server and counting the number of the video data streams;

the packaging module is used for packaging the multiple video data streams into video data packets by the video networking server by adopting a video networking protocol; the video networking protocol comprises a video networking video data decoding format and a video networking video data file transmission format;

the decoding module is used for decoding the video networking server into a plurality of YUV image data which accord with the video networking video data decoding format and the video networking video data file transmission format, and acquiring the time stamp of the YUV image data;

the splicing module is used for synthesizing the YUV image data which have the same timestamp and correspond to the multi-channel video stream data into a multi-grid spliced image in a preset splicing mode by the video networking server; each grid in the multi-grid spliced image respectively displays the corresponding YUV image data; the preset splicing mode comprises vertical splicing or left-right splicing;

the packaging module is used for packaging the continuous multi-grid spliced images to generate spliced video data which simultaneously display the multi-path video stream data; the spliced video data conforms to preset target video data parameters.

The embodiment of the invention has the following advantages:

the embodiment of the invention applies the characteristics of video networking, the video networking server receives multiple video data streams respectively collected by multiple video networking video collection terminals and counts the number of the video data streams, the video networking server adopts a video networking protocol to respectively pack the multiple video data streams into video data packets, the video networking server decodes the multiple video data packets into multiple YUV image data which conform to a video networking video data decoding format and a video networking video data file transmission format and obtains the time stamps of the YUV image data, the video networking server synthesizes the YUV image data which have the same time stamps and correspond to the multiple video stream data into a multi-grid spliced image in a preset splicing mode, each grid in the multi-grid spliced image respectively displays the corresponding YUV image data, the preset splicing mode comprises up-down splicing or left-right splicing, and packaging the continuous multi-grid spliced images to generate spliced video data for simultaneously displaying the multi-channel video stream data, wherein the spliced video data conforms to preset target video data parameters. The problem that a plurality of video files need to be stored due to the fact that a plurality of devices simultaneously use the stored and downloaded files and cannot simultaneously play a plurality of videos is solved.

Drawings

FIG. 1 is a schematic networking diagram of a video network of the present invention;

FIG. 2 is a schematic diagram of a hardware architecture of a node server according to the present invention;

fig. 3 is a schematic diagram of a hardware structure of an access switch of the present invention;

fig. 4 is a schematic diagram of a hardware structure of an ethernet protocol conversion gateway according to the present invention;

FIG. 5 is a flow chart of the steps of an embodiment of a video data compositing method of the present invention;

FIG. 6 is an exemplary illustration of a stitched image of the present invention;

FIG. 7 is a diagram illustrating an exemplary operation of a video data composition according to the present invention;

fig. 8 is a block diagram of a video data synthesizing apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Splicing multiple video images: the image contents of a plurality of videos displayed at the same time are spliced into one image and then synthesized into a video, namely the synthesized video can simultaneously view the contents of the plurality of videos.

The video networking is an important milestone for network development, is a real-time network, can realize high-definition video real-time transmission, and pushes a plurality of internet applications to high-definition video, and high-definition faces each other.

The video networking adopts a real-time high-definition video exchange technology, can integrate required services such as dozens of services of video, voice, pictures, characters, communication, data and the like on a system platform on a network platform, such as high-definition video conference, video monitoring, intelligent monitoring analysis, emergency command, digital broadcast television, delayed television, network teaching, live broadcast, VOD on demand, television mail, Personal Video Recorder (PVR), intranet (self-office) channels, intelligent video broadcast control, information distribution and the like, and realizes high-definition quality video broadcast through a television or a computer.

To better understand the embodiments of the present invention, the following description refers to the internet of view:

some of the technologies applied in the video networking are as follows:

network Technology (Network Technology)

Network technology innovation in video networking has improved over traditional Ethernet (Ethernet) to face the potentially enormous video traffic on the network. Unlike pure network Packet Switching (Packet Switching) or network circuit Switching (circuit Switching), the Packet Switching is adopted by the technology of the video networking to meet the Streaming requirement. The video networking technology has the advantages of flexibility, simplicity and low price of packet switching, and simultaneously has the quality and safety guarantee of circuit switching, thereby realizing the seamless connection of the whole network switching type virtual circuit and the data format.

Switching Technology (Switching Technology)

The video network adopts two advantages of asynchronism and packet switching of the Ethernet, eliminates the defects of the Ethernet on the premise of full compatibility, has end-to-end seamless connection of the whole network, is directly communicated with a user terminal, and directly bears an IP data packet. The user data does not require any format conversion across the entire network. The video networking is a higher-level form of the Ethernet, is a real-time exchange platform, can realize the real-time transmission of the whole-network large-scale high-definition video which cannot be realized by the existing Internet, and pushes a plurality of network video applications to high-definition and unification.

Server Technology (Server Technology)

The server technology on the video networking and unified video platform is different from the traditional server, the streaming media transmission of the video networking and unified video platform is established on the basis of connection orientation, the data processing capacity of the video networking and unified video platform is independent of flow and communication time, and a single network layer can contain signaling and data transmission. For voice and video services, the complexity of video networking and unified video platform streaming media processing is much simpler than that of data processing, and the efficiency is greatly improved by more than one hundred times compared with that of a traditional server.

Storage Technology (Storage Technology)

The super-high speed storage technology of the unified video platform adopts the most advanced real-time operating system in order to adapt to the media content with super-large capacity and super-large flow, the program information in the server instruction is mapped to the specific hard disk space, the media content is not passed through the server any more, and is directly sent to the user terminal instantly, and the general waiting time of the user is less than 0.2 second. The optimized sector distribution greatly reduces the mechanical motion of the magnetic head track seeking of the hard disk, the resource consumption only accounts for 20% of that of the IP internet of the same grade, but concurrent flow which is 3 times larger than that of the traditional hard disk array is generated, and the comprehensive efficiency is improved by more than 10 times.

Network Security Technology (Network Security Technology)

The structural design of the video network completely eliminates the network security problem troubling the internet structurally by the modes of independent service permission control each time, complete isolation of equipment and user data and the like, generally does not need antivirus programs and firewalls, avoids the attack of hackers and viruses, and provides a structural carefree security network for users.

Service Innovation Technology (Service Innovation Technology)

The unified video platform integrates services and transmission, and is not only automatically connected once whether a single user, a private network user or a network aggregate. The user terminal, the set-top box or the PC are directly connected to the unified video platform to obtain various multimedia video services in various forms. The unified video platform adopts a menu type configuration table mode to replace the traditional complex application programming, can realize complex application by using very few codes, and realizes infinite new service innovation.

Networking of the video network is as follows:

the video network is a centralized control network structure, and the network can be a tree network, a star network, a ring network and the like, but on the basis of the centralized control node, the whole network is controlled by the centralized control node in the network.

As shown in fig. 1, the video network is divided into an access network and a metropolitan network.

The devices of the access network part can be mainly classified into 3 types: node server, access switch, terminal (including various set-top boxes, coding boards, memories, etc.). The node server is connected to an access switch, which may be connected to a plurality of terminals and may be connected to an ethernet network.

The node server is a node which plays a centralized control function in the access network and can control the access switch and the terminal. The node server can be directly connected with the access switch or directly connected with the terminal.

Similarly, devices of the metropolitan network portion may also be classified into 3 types: a metropolitan area server, a node switch and a node server. The metro server is connected to a node switch, which may be connected to a plurality of node servers.

The node server is a node server of the access network part, namely the node server belongs to both the access network part and the metropolitan area network part.

The metropolitan area server is a node which plays a centralized control function in the metropolitan area network and can control a node switch and a node server. The metropolitan area server can be directly connected with the node switch or directly connected with the node server.

Therefore, the whole video network is a network structure with layered centralized control, and the network controlled by the node server and the metropolitan area server can be in various structures such as tree, star and ring.

The access network part can form a unified video platform (the part in the dotted circle), and a plurality of unified video platforms can form a video network; each unified video platform may be interconnected via metropolitan area and wide area video networking.

Video networking device classification

1.1 devices in the video network of the embodiment of the present invention can be mainly classified into 3 types: servers, switches (including ethernet gateways), terminals (including various set-top boxes, code boards, memories, etc.). The video network as a whole can be divided into a metropolitan area network (or national network, global network, etc.) and an access network.

1.2 wherein the devices of the access network part can be mainly classified into 3 types: node servers, access switches (including ethernet gateways), terminals (including various set-top boxes, code boards, memories, etc.).

The specific hardware structure of each access network device is as follows:

a node server:

as shown in fig. 2, the system mainly includes a network interface module 201, a switching engine module 202, a CPU module 203, and a disk array module 204;

the network interface module 201, the CPU module 203, and the disk array module 204 all enter the switching engine module 202; the switching engine module 202 performs an operation of looking up the address table 205 on the incoming packet, thereby obtaining the direction information of the packet; and stores the packet in a queue of the corresponding packet buffer 206 based on the packet's steering information; if the queue of the packet buffer 206 is nearly full, it is discarded; the switching engine module 202 polls all packet buffer queues for forwarding if the following conditions are met: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero. The disk array module 204 mainly implements control over the hard disk, including initialization, read-write, and other operations on the hard disk; the CPU module 203 is mainly responsible for protocol processing with an access switch and a terminal (not shown in the figure), configuring an address table 205 (including a downlink protocol packet address table, an uplink protocol packet address table, and a data packet address table), and configuring the disk array module 204.

The access switch:

as shown in fig. 3, the network interface module mainly includes a network interface module (a downlink network interface module 301 and an uplink network interface module 302), a switching engine module 303 and a CPU module 304;

wherein, the packet (uplink data) coming from the downlink network interface module 301 enters the packet detection module 305; the packet detection module 305 detects whether the Destination Address (DA), the Source Address (SA), the packet type, and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id) and enters the switching engine module 303, otherwise, discards the stream identifier; the packet (downstream data) coming from the upstream network interface module 302 enters the switching engine module 303; the data packet coming from the CPU module 204 enters the switching engine module 303; the switching engine module 303 performs an operation of looking up the address table 306 on the incoming packet, thereby obtaining the direction information of the packet; if the packet entering the switching engine module 303 is from the downstream network interface to the upstream network interface, the packet is stored in the queue of the corresponding packet buffer 307 in association with the stream-id; if the queue of the packet buffer 307 is nearly full, it is discarded; if the packet entering the switching engine module 303 is not from the downlink network interface to the uplink network interface, the data packet is stored in the queue of the corresponding packet buffer 307 according to the guiding information of the packet; if the queue of the packet buffer 307 is nearly full, it is discarded.

The switching engine module 303 polls all packet buffer queues, which in this embodiment of the present invention is divided into two cases:

if the queue is from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queued packet counter is greater than zero; 3) obtaining a token generated by a code rate control module;

if the queue is not from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero.

The rate control module 208 is configured by the CPU module 204, and generates tokens for packet buffer queues from all downstream network interfaces to upstream network interfaces at programmable intervals to control the rate of upstream forwarding.

The CPU module 304 is mainly responsible for protocol processing with the node server, configuration of the address table 306, and configuration of the code rate control module 308.

Ethernet protocol conversion gateway：

As shown in fig. 4, the apparatus mainly includes a network interface module (a downlink network interface module 401 and an uplink network interface module 402), a switching engine module 403, a CPU module 404, a packet detection module 405, a rate control module 408, an address table 406, a packet buffer 407, a MAC adding module 409, and a MAC deleting module 410.

Wherein, the data packet coming from the downlink network interface module 401 enters the packet detection module 405; the packet detection module 405 detects whether the ethernet MAC DA, the ethernet MAC SA, the ethernet length or frame type, the video network destination address DA, the video network source address SA, the video network packet type, and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id); then, the MAC deletion module 410 subtracts MAC DA, MAC SA, length or frame type (2byte) and enters the corresponding receiving buffer, otherwise, discards it;

the downlink network interface module 401 detects the sending buffer of the port, and if there is a packet, obtains the ethernet MAC DA of the corresponding terminal according to the destination address DA of the packet, adds the ethernet MAC DA of the terminal, the MACSA of the ethernet coordination gateway, and the ethernet length or frame type, and sends the packet.

The other modules in the ethernet protocol gateway function similarly to the access switch.

A terminal:

the system mainly comprises a network interface module, a service processing module and a CPU module; for example, the set-top box mainly comprises a network interface module, a video and audio coding and decoding engine module and a CPU module; the coding board mainly comprises a network interface module, a video and audio coding engine module and a CPU module; the memory mainly comprises a network interface module, a CPU module and a disk array module.

1.3 devices of the metropolitan area network part can be mainly classified into 2 types: node server, node exchanger, metropolitan area server. The node switch mainly comprises a network interface module, a switching engine module and a CPU module; the metropolitan area server mainly comprises a network interface module, a switching engine module and a CPU module.

2. Video networking packet definition

2.1 Access network packet definition

The data packet of the access network mainly comprises the following parts: destination Address (DA), Source Address (SA), reserved bytes, payload (pdu), CRC.

As shown in the following table, the data packet of the access network mainly includes the following parts:

DA

SA

Reserved

Payload

CRC

wherein:

the Destination Address (DA) is composed of 8 bytes (byte), the first byte represents the type of the data packet (such as various protocol packets, multicast data packets, unicast data packets, etc.), there are 256 possibilities at most, the second byte to the sixth byte are metropolitan area network addresses, and the seventh byte and the eighth byte are access network addresses;

the Source Address (SA) is also composed of 8 bytes (byte), defined as the same as the Destination Address (DA);

the reserved byte consists of 2 bytes;

the payload part has different lengths according to different types of datagrams, and is 64 bytes if the datagram is various types of protocol packets, and is 32+1024 or 1056 bytes if the datagram is a unicast packet, of course, the length is not limited to the above 2 types;

the CRC consists of 4 bytes and is calculated in accordance with the standard ethernet CRC algorithm.

2.2 metropolitan area network packet definition

The topology of a metropolitan area network is a graph and there may be 2, or even more than 2, connections between two devices, i.e., there may be more than 2 connections between a node switch and a node server, a node switch and a node switch, and a node switch and a node server. However, the metro network address of the metro network device is unique, and in order to accurately describe the connection relationship between the metro network devices, parameters are introduced in the embodiment of the present invention: a label to uniquely describe a metropolitan area network device.

In this specification, the definition of the Label is similar to that of the Label of MPLS (Multi-Protocol Label Switch), and assuming that there are two connections between the device a and the device B, there are 2 labels for the packet from the device a to the device B, and 2 labels for the packet from the device B to the device a. The label is classified into an incoming label and an outgoing label, and assuming that the label (incoming label) of the packet entering the device a is 0x0000, the label (outgoing label) of the packet leaving the device a may become 0x 0001. The network access process of the metro network is a network access process under centralized control, that is, address allocation and label allocation of the metro network are both dominated by the metro server, and the node switch and the node server are both passively executed, which is different from label allocation of MPLS, and label allocation of MPLS is a result of mutual negotiation between the switch and the server.

As shown in the following table, the data packet of the metro network mainly includes the following parts:

DA

SA

Reserved

label (R)

Payload

CRC

Namely Destination Address (DA), Source Address (SA), Reserved byte (Reserved), tag, payload (pdu), CRC. The format of the tag may be defined by reference to the following: the tag is 32 bits with the upper 16 bits reserved and only the lower 16 bits used, and its position is between the reserved bytes and payload of the packet.

Based on the characteristics of the video networking, one of the core concepts of the embodiment of the invention is provided, following the protocol of the video networking, when receiving the multimedia data stream of the multimedia terminal of the video networking, packaging the multimedia data stream into a data packet containing a plurality of multimedia data frames according to the video networking protocol; the video networking protocol comprises a video networking multimedia data decoding format and a video networking multimedia data file transmission format; extracting a Decoding Time Stamp (DTS) and a display time stamp (PTS) in the data packet by adopting the video networking multimedia data decoding format; and packaging the data packet into a multimedia file in a video networking transmission format according to the Decoding Time Stamp (DTS) and the display time stamp (PTS), so as to realize the purpose of utilizing a video networking protocol.

Referring to fig. 5, a flowchart illustrating steps of an embodiment of a video data synthesis method according to the present invention is shown, where the method may be applied to a video network, where the video network includes a video capture terminal and a video server, and specifically may include the following steps:

step 501, the video networking server receives multiple video data streams respectively collected by multiple video networking video collecting terminals, and counts the number of the video data streams.

In the embodiment of the present invention, the terminal, which is one of the main devices of the video network described above, may include not only various set top boxes, encoding boards, memories, etc., but also a multimedia terminal following a video network transmission protocol, for example, a video capture device, which captures video data when accessing the video network. Therefore, the server in the video network, which is responsible for processing the multimedia data, receives the video data collected by the multiple devices, and usually the number of the devices corresponds to the number of the video data streams, that is, when several video collecting devices are simultaneously turned on, the video network server receives several video data streams.

Step 502, the video networking server packs the multiple video data streams into video data packets respectively by adopting a video networking protocol; the video networking protocol comprises a video networking video data decoding format and a video networking video data file transmission format.

In the embodiment of the invention, when the received video data is packed into a plurality of video packets containing video frame information according to a video networking real-Time transmission protocol, the video frame information of each video packet comprises file offset information, file size information, Decoding Time Stamp (DTS) and Presentation Time Stamp (PTS) information.

Further, adding information for identifying the type and length of the media file to each video data packet to form a header of each video packet, wherein the data of the access network further comprises the following parts according to the video networking transmission protocol: destination Address (DA), Source Address (SA), reserved bytes, payload (pdu), CRC.

When the video data packet processed by the above is transmitted, the video data packet is identified and received by the video network node server.

Step 503, the video network server decodes the video data packets into a plurality of YUV image data conforming to the video network video data decoding format and the video network video data file transmission format, and obtains a timestamp of the YUV image data.

In the embodiment of the present invention, usually, the sampled video data stream is decoded into original image data in a preset format by a preset decoder, usually, the original image data is YUV image data, wherein the YUV image data decoded according to the original information of the video data stream also has the original information, for example, information such as resolution, encoding format, timestamp, sampling frequency, and the like of the video data stream, and the video data sampled by the video networking video acquisition device conforms to the video networking video data decoding format and the video networking video data file transmission format. And then, the video networking server acquires the time stamp in each piece of decoded YUV image data according to the original information carried in the video data stream.

Step 504, the video network server synthesizes the YUV image data with the same timestamp and corresponding to the multiple paths of video stream data into a multi-grid spliced image in a preset splicing mode; each grid in the multi-grid spliced image respectively displays the corresponding YUV image data; the preset splicing mode comprises vertical splicing or left-right splicing.

In the embodiment of the invention, when the video networking server decodes the received multiple video data streams into YUV image data and acquires the time stamp, the YUV image data corresponding to the multiple video streams with the same time stamp are spliced into integral image data according to a preset splicing mode. For example, the video networking server receives four video data streams, decodes the four video data streams into corresponding YUV image data, respectively, where YUV image data 1, 2, 3, and 4 are YUV images corresponding to the four video data streams, and have the same timestamp, then splices the images 1, 2, 3, and 4 into a complete image data in a manner of splicing up and down or left and right, where the spliced complete image data may be four frames, and where the images 1, 2, 3, and 4 are respectively displayed. Of course, the complete image data may be set to 6 frames, and in the case of receiving only four video data streams, the remaining two frames of data show the preset image data, such as the company logo image.

Step 505, packaging the continuous multi-grid spliced images to generate spliced video data which simultaneously shows the multi-channel video stream data; the spliced video data conforms to preset target video data parameters.

In the embodiment of the invention, finally, the spliced continuous YUV images are encoded into an H264 file according to the sequence of the time stamps and packaged into a video file with a preset format, such as an FLV video file. The spliced video data conforms to the preset target video data parameters, and means that if the size of the spliced image is 1280 × 720, the image data obtained by splicing the multiple YUV image data decoded by the last multi-channel video data stream is still 1280 × 720.

Preferably, before step 504, the method further comprises: substeps A11-A13;

substep A11, calculating the size of the spliced image by the video network server according to the ratio between the preset target video data parameter and the number of paths of the video stream;

in the embodiment of the present invention, if the size of the target video data image is 1280 × 720, the spliced three-way video data stream should be decoded into YUV image data according to the standard of 1280/3 × 720/3. The target size of each YUV image data is 1280/3 × 240, which is the size of the stitched image.

And a substep A12, wherein the video networking server judges whether the YUV image data corresponding to the multiple video data streams conform to the size of a spliced image.

In the embodiment of the invention, after the size of the spliced image is calculated, the video network server can obtain the original size of the YUV image data decoded corresponding to each video stream by reading the original data of each video stream, and can judge whether the size of each YUV image accords with the size of the spliced image.

Sub-step a13, if the YUV image data corresponding to the multiple video data streams conform to the size of the stitched image, step 504 is performed.

In the embodiment of the present invention, if the YUV image data conforms to the size of the stitched image, step 504 is further performed. For example, the packet of the multi-channel video stream is decoded into a plurality of YUV images and relevant parameters including the length, height and image format of the images are read, the parameters are compared with target parameters (if the size of the current image to be merged is 1280 × 720, and there are two channels of video currently, and the two channels of video are divided left and right, the size of the target image is 640 × 360, if there are three channels of video currently, the three channels are arranged side by side, the target size of each video is 1280/3 × 240, so that the aspect ratio is saved, and the problem of video image compression or stretching is not caused), and if the image parameters meet the target parameters, the step 504 is directly performed.

Preferably, the method further comprises the following steps:

and a substep A14, if the YUV image data corresponding to any one of the video streams does not conform to the size of the spliced image, compressing the YUV image data by the video networking server, and entering the step that the video networking server judges whether the YUV image data corresponding to the multiple video data streams conform to the size of the spliced image.

In the embodiment of the invention, if the YUV image data corresponding to any video data stream does not conform to the size of the spliced image, the resolution of the YUV image is converted to conform to the size of the spliced image, and a substep A12 is further executed to judge whether the YUV image after resolution conversion conforms to the size of the spliced image. For example, when the image size of the received video data stream is 1280 × 720 and the preset spliced image size is 640 × 360, we need to compress the image size using a certain compression algorithm.

Preferably, the sub-step a12 specifically includes: sub-steps a1-a 2;

and a sub-step a1 of obtaining original size information of the YUV image data.

In sub-step a1, if the original size information of the YUV image data matches the size of the stitched image, the YUV image data conforms to the size of the stitched image, otherwise, the YUV image data does not conform to the size of the stitched image.

In the embodiment of the invention, a packet of a multi-channel video stream is decoded into a plurality of YUV images and relevant parameters including the length, height and image format of the images are read, the parameters are compared with the calculated size of the spliced image, (if the size of the current image to be merged is 1280 × 720, and the current two channels of videos are distributed left and right, the size of the spliced image is 640 × 360), if the size of the YUV original image is 640 × 360, the YUV image data accords with the size of the spliced image, otherwise, the YUV image data does not accord with the size of the spliced image.

Preferably, the sub-step a14 specifically includes: sub-step b 1;

and a substep b1, if the YUV image data corresponding to any one of the video streams does not conform to the size of the spliced image, the video network server compresses the resolution of the YUV image according to the size of the spliced image.

In the embodiment of the invention, if the YUV image data corresponding to any path of video data stream does not accord with the size of the spliced image, the resolution of the YUV image is converted to accord with the size of the spliced image. For example, when the image size of the received video data stream is 1280 × 720 and the preset stitched image size is 640 × 360, we need to compress the image size to 1280/3 × 240 using a certain compression algorithm.

Preferably, step 504 specifically includes: substeps B11-B12;

sub-step B11, the video network server obtains a display time stamp PTS of each of the YUV image data.

In the embodiment of the present invention, since multimedia data collected by the video networking multimedia device is multimedia data generated according to a video networking transmission protocol, when a Decoding Time Stamp (DTS) and a display time stamp (PTS) in a video networking data frame are received, the Decoding Time Stamp (DTS) and the display time stamp (PTS) are included in each video frame information.

In practical application, after a display time stamp (PTS) of video data is obtained for a video data stream, a decoding time sequence of a video packet can be obtained according to the PTS, and a PTS time difference value between YUV image data corresponding to multiple paths of video streams received simultaneously can be calculated pairwise according to the sequence. For example, YUV image data corresponding to four video streams are 1, 2, 3, 4, respectively, and then the difference between the time stamps of image 1 and image 2 is 0.2 seconds, the difference between the time stamps of fig. 2 and 3 is 0.3 seconds, and the difference between the time stamps of fig. 3 and 4 is 0.3 seconds.

Sub-step B12, if the difference value between the display time stamps PTS of the YUV image data does not exceed the preset threshold value, the video network server synthesizes the YUV image data corresponding to the multi-channel video stream data into a multi-lattice spliced image in a preset splicing mode; each cell in the multi-cell stitched image shows one piece of the YUV image data.

In the embodiment of the present invention, if the preset threshold of the difference between the timestamps is 0.5 second, and the difference between the YUV images in fig. 1, 2, 3, and 4 does not exceed the preset threshold, it is determined that the four YUV images have the same timestamp in the same time period, so that the four YUV image data are further spliced in a preset splicing manner. For example, a blank YUV image with a preset parameter size is prepared, and then each piece of data is copied from the small YUV image with the splicing size according to the pixel point mode, so that a large YUV image is formed.

Preferably, the method further comprises the following steps:

sub-step B13, if the difference between the display timestamps PTS of the YUV image data exceeds a preset threshold, the video network server discards the YUV image data and replaces the discarded YUV image data with preset image data.

In the embodiment of the invention, if the PTS difference value of one or more paths of video data streams is larger and exceeds the preset threshold value, the PTS is revised again according to the configuration, or the current packet is discarded, or the last frame of the path of video data streams is used for replacing or a fixed background is used for replacing, namely, the image data is preset; if one or more video data streams are disconnected, adding data of other streams into a queue for waiting, and if the data are not received in the waiting 2s, replacing the data of the lost stream with a specific background image; if the data stream is reconnected, this data can be used again.

Sub-step B14, synthesizing the residual YUV image data and the preset image data into a multi-lattice spliced image in a preset splicing mode; and displaying the corresponding YUV image data or the preset image data on each grid in the multi-grid spliced image.

In the embodiment of the invention, particularly, after the YUV image with qualified parameters is obtained, the YUV image needs to be spliced, namely a plurality of small YUV images are spliced into a large YUV image with target parameters; then, a large YUV image of blank data and target parameters needs to be prepared, and then each piece of data is copied from the small YUV image according to a pixel point mode, so that a large multi-lattice YUV image is formed. The example of the stitched image shown in fig. 6 is a four-grid stitched image, which shows that there are four video data streams, each grid shows YUV image data corresponding to one video data stream, where there are three video data streams that may have a problem of disconnection or discarding the corresponding YUV image, and all three stitched images are replaced by a fixed background, so that correspondingly replaced preset image data is shown, where the stitching mode may be up-down stitched or left-right stitched.

Specifically, as shown in the flowchart of fig. 7, after the video data stream is received by the video networking server, processing each video stream into YUV image data by recognition and decoding, and after corresponding YUV images are spliced into a multi-cell spliced image, packaging the continuous spliced image to generate spliced video data displaying multiple video stream data in multiple cells is described, which is also a general description of steps 501 to 505.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 8, a block diagram of a video data synthesis apparatus according to an embodiment of the present invention is shown, where the apparatus may be applied to a video network, where the video network includes a video capture terminal and a video server, and the apparatus may specifically include the following modules:

the video stream receiving module 601 is used for the video networking server to receive multiple video data streams respectively collected by a plurality of video networking video collecting terminals and count the number of the video data streams;

a packing module 602, configured to pack, by the video networking server, the multiple video data streams into video data packets respectively by using a video networking protocol; the video networking protocol comprises a video networking video data decoding format and a video networking video data file transmission format;

a decoding module 603, configured to decode, by the video networking server, the multiple video data packets into multiple YUV image data conforming to the video networking video data decoding format and video networking video data file transmission format, and obtain a timestamp of the YUV image data;

a splicing module 604, configured to combine, by the video networking server, the YUV image data of multiple paths of video stream data having the same timestamp into a multi-grid spliced image in a preset splicing manner; each grid in the multi-grid spliced image respectively displays the corresponding YUV image data; the preset splicing mode comprises vertical splicing or left-right splicing;

a packaging module 605, configured to package the continuous multi-grid stitched images to generate stitched video data that simultaneously shows the multiple paths of video stream data; the spliced video data conforms to preset target video data parameters.

Preferably, the method further comprises the following steps:

the spliced image size calculation module is used for calculating the size of a spliced image by the video network server according to the ratio of the preset target video data parameter to the number of paths of the video stream;

the judgment module is used for judging whether the YUV image data corresponding to the multiple video data streams conform to the size of a spliced image or not by the video networking server;

and the standard-conforming processing module is used for executing the splicing module if the YUV image data corresponding to the plurality of video data streams conforms to the size of the spliced image.

And the non-standard processing module is used for compressing the YUV image data by the video network server and executing the judging module if the YUV image data corresponding to any one path of the video stream does not conform to the size of the spliced image.

Preferably, the determining module 604 includes:

an original size information obtaining submodule, configured to obtain original size information of the YUV image data;

and the judging submodule is used for judging that the YUV image data accords with the size of the spliced image if the original size information of the YUV image data is matched with the size of the spliced image, or else, judging that the YUV image data does not accord with the size of the spliced image.

Preferably, the non-compliant processing module includes:

and the compression submodule is used for compressing the resolution of the YUV image by the video networking server according to the size of the spliced image if the YUV image data corresponding to any one path of the video stream does not accord with the size of the spliced image.

Preferably, the splicing module 605 includes:

a timestamp obtaining submodule, which is used for the video network server to obtain the display timestamp PTS of each YUV image data;

the splicing submodule is used for synthesizing the YUV image data of the corresponding multi-channel video stream data into a multi-grid spliced image in a preset splicing mode by the video networking server if the difference value between the display time stamps PTS of the YUV image data does not exceed a preset threshold value; each cell in the multi-cell stitched image shows one piece of the YUV image data.

Preferably, the method further comprises the following steps:

a replacement submodule, configured to discard the YUV image data by the video networking server if a difference between display timestamps PTS of the YUV image data exceeds a preset threshold, and replace the discarded YUV image data with preset image data;

the mixed splicing submodule is used for synthesizing the rest YUV image data and the preset image data into a multi-lattice spliced image in a preset splicing mode; and displaying the corresponding YUV image data or the preset image data on each grid in the multi-grid spliced image.

In the embodiment of the invention, under the environment of video networking, the video networking server receives a plurality of paths of video data streams respectively collected by a plurality of video networking video collecting terminals, and counts the paths of the video data streams; the video networking server respectively packages the multiple video data streams into video data packets by adopting a video networking protocol; the video networking protocol comprises a video networking video data decoding format and a video networking video data file transmission format; the video networking server decodes the video data packets into a plurality of YUV image data which accord with the video networking video data decoding format and the video networking video data file transmission format, and acquires the time stamp of the YUV image data; the video network server synthesizes the YUV image data which have the same timestamp and correspond to the multi-channel video stream data into a multi-grid spliced image in a preset splicing mode; each grid in the multi-grid spliced image respectively displays the corresponding YUV image data; the preset splicing mode comprises vertical splicing or left-right splicing; packaging the continuous multi-grid spliced images to generate spliced video data which simultaneously display the multi-channel video stream data; the spliced video data conforms to preset target video data parameters. The method and the device realize the purpose of synthesizing a plurality of video network video data into a complete video file, and not only solve the problem that a plurality of audio files need to be stored because a plurality of devices simultaneously use the stored and downloaded files and can not simultaneously play a plurality of videos. The convenience of the user for viewing the multi-path video stream data at the same time is greatly improved.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The video data synthesis method and the video data synthesis device provided by the present invention are described in detail above, and the principle and the implementation of the present invention are explained herein by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for synthesizing video data is applied to a video network, wherein the video network comprises a video acquisition terminal and a video server, and the method comprises the following steps:

the video networking server respectively packages the multiple video data streams into video data packets by adopting a video networking protocol; the video networking protocol comprises a video networking video data decoding format and a video networking video data file transmission format; the video data packet is a data packet containing video frame information, and the video frame information comprises file offset information, file size information, a decoding timestamp and a display timestamp PTS;

the video network server synthesizes the YUV image data which have the same timestamp and correspond to the multi-channel video stream data into a multi-grid spliced image in a preset splicing mode; each grid in the multi-grid spliced image respectively displays the corresponding YUV image data; the preset splicing mode comprises vertical splicing or left-right splicing; the same timestamp means that the difference between the display timestamps PTS does not exceed a preset threshold; the difference value is a PTS time difference value between YUV image data corresponding to multiple paths of video streams received simultaneously in pairwise calculation according to a decoding time sequence of a video packet obtained by a PTS;

2. The method according to claim 1, wherein before the step of synthesizing the YUV image data of the corresponding multiple video stream data with the same time stamp into a multi-grid stitched image by the video network server in a preset stitching manner, the method further comprises:

the video network server calculates the size of a spliced image according to the ratio of preset target video data parameters to the number of paths of the video stream; the preset target video data parameter is the size of a target video data image;

the video networking server judges whether the YUV image data corresponding to the multiple video data streams conform to the size of a spliced image;

and if the YUV image data corresponding to the multiple video data streams conform to the size of the spliced image, synthesizing the YUV image data corresponding to the multiple video data streams with the same timestamp into a multi-grid spliced image in a preset splicing mode by the video networking server.

3. The method of claim 2, further comprising:

and if the YUV image data corresponding to any one path of video stream does not accord with the size of the spliced image, compressing the YUV image data by the video networking server, and entering the video networking server to judge whether the YUV image data corresponding to the multiple paths of video data streams accord with the size of the spliced image.

4. The method according to claim 2, wherein the step of the video networking server determining whether the YUV image data corresponding to the multiple video data streams conform to the size of the stitched image comprises:

acquiring original size information of the YUV image data;

and if the original size information of the YUV image data is matched with the size of the spliced image, the YUV image data conforms to the size of the spliced image, otherwise, the YUV image data does not conform to the size of the spliced image.

5. The method of claim 3, wherein if the YUV image data corresponding to any of the video streams does not conform to the stitched image size, the step of compressing the YUV image data by the video networking server comprises:

and if the YUV image data corresponding to any one path of the video stream does not accord with the size of the spliced image, the video network server compresses the resolution of the YUV image according to the size of the spliced image.

6. The method according to claim 1, wherein the step of synthesizing the YUV image data of the corresponding multiple video stream data with the same timestamp into a multi-frame stitched image by the video networking server in a preset stitching manner comprises:

the video network server acquires a display time stamp PTS of each YUV image data;

if the difference value between the display time stamps PTS of the YUV image data does not exceed a preset threshold value, the video networking server synthesizes the YUV image data corresponding to the multi-channel video stream data into a multi-lattice spliced image in a preset splicing mode; the multi-grid spliced image is used for displaying the YUV image data.

7. The method of claim 6, further comprising:

if the difference value between the display time stamps PTS of the YUV image data exceeds a preset threshold value, the YUV image data is discarded by the video network server, and the discarded YUV image data is replaced by preset image data;

synthesizing the rest YUV image data and the preset image data into a multi-lattice spliced image in a preset splicing mode; and displaying the corresponding YUV image data or the preset image data on each grid in the multi-grid spliced image.

8. The utility model provides a video data synthesizer, its characterized in that, the device is applied to the video networking, the video networking includes video networking video acquisition terminal and video networking server, the device include:

the packaging module is used for packaging the multiple video data streams into video data packets by the video networking server by adopting a video networking protocol; the video networking protocol comprises a video networking video data decoding format and a video networking video data file transmission format; the video data packet is a data packet containing video frame information, and the video frame information comprises file offset information, file size information, a decoding timestamp and a display timestamp PTS;

the decoding module is used for decoding the video networking server into a plurality of YUV image data which accord with the video networking video data decoding format and the video networking video data file transmission format, and acquiring a time stamp of the YUV image data;

the splicing module is used for synthesizing the YUV image data which have the same timestamp and correspond to the multi-channel video stream data into a multi-grid spliced image in a preset splicing mode by the video networking server; each grid in the multi-grid spliced image respectively displays the corresponding YUV image data; the preset splicing mode comprises vertical splicing or left-right splicing; the same timestamp means that the difference between the display timestamps PTS does not exceed a preset threshold; the difference value is a PTS time difference value between YUV image data corresponding to multiple paths of video streams received simultaneously in pairwise calculation according to a decoding time sequence of a video packet obtained by a PTS;

9. The apparatus of claim 8, further comprising:

the spliced image size calculation module is used for calculating the size of a spliced image by the video network server according to the ratio of the preset target video data parameter to the number of paths of the video stream; the preset target video data parameter is the size of a target video data image;

and the standard-conforming processing module is used for calling the splicing module if the YUV image data corresponding to the plurality of video data streams conforms to the size of the spliced image.

10. The apparatus of claim 9, further comprising:

and the non-standard processing module is used for compressing the YUV image data by the video network server and calling the judging module if the YUV image data corresponding to any one path of the video stream does not conform to the size of the spliced image.

11. The apparatus of claim 9, wherein the determining module comprises:

12. The apparatus of claim 10, wherein the non-compliant processing module comprises:

13. The apparatus of claim 8, wherein the splicing module comprises:

the splicing submodule is used for synthesizing the YUV image data of the corresponding multi-channel video stream data into a multi-grid spliced image in a preset splicing mode by the video networking server if the difference value between the display time stamps PTS of the YUV image data does not exceed a preset threshold value; the multi-grid spliced image is used for displaying the YUV image data.

14. The apparatus of claim 8, further comprising: