CN116980657A

CN116980657A - Video data transmission processing method, device and equipment

Info

Publication number: CN116980657A
Application number: CN202311234721.4A
Authority: CN
Inventors: 朱云; 李元骅; 陈志磊; 李涛
Original assignee: Beijing Shudun Information Technology Co ltd
Current assignee: Beijing Shudun Information Technology Co ltd
Priority date: 2023-09-25
Filing date: 2023-09-25
Publication date: 2023-10-31
Anticipated expiration: 2043-09-25
Also published as: CN116980657B

Abstract

The invention provides a video data transmission processing method, a device and equipment. The method comprises the following steps: acquiring extension data of a video stream; the expansion data comprises a plurality of attribute identification tags of the camera; tracking and matching the received video stream to obtain a matched video stream; and respectively packaging a plurality of attribute identification tags in the extension data into a plurality of real-time transmission protocol packets of the matched video stream for tag calibration to obtain a target video stream with tags, and transmitting the target video stream. The scheme of the invention realizes the tracking and identification of the video stream, uses less transmission bandwidth, and is beneficial to improving the transmission efficiency.

Description

Video data transmission processing method, device and equipment

Technical Field

The present invention relates to the field of video transmission processing technologies, and in particular, to a method, an apparatus, and a device for video data transmission processing.

Background

In applications for transmitting real-time video streams, real-time transport protocol (RTP) is typically used to deliver audio and video data. Real-time transport protocol RTP provides a reliable way to segment, transport and reconstruct audio-video data. However, for some internet of things application scenarios, it may not be sufficient to transmit only audio and video data, and the user also needs to add custom extension data to the RTP video stream, where the extension data may include different attribute identification tags for tracking and identifying the video stream. However, adding extension data increases the bandwidth and cost of transmission, affecting transmission performance.

Disclosure of Invention

The invention aims to solve the technical problem of increasing transmission bandwidth and cost by adding extension data to a video stream.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a video data transmission processing method, comprising:

acquiring extension data of a video stream; the expansion data comprises a plurality of attribute identification tags of the camera;

tracking and matching the received video stream to obtain a matched video stream;

respectively encapsulating a plurality of attribute identification tags in the extension data into a plurality of real-time transmission protocol packets of the matched video stream to perform tag calibration to obtain a target video stream with tags;

and sending the target video stream.

Optionally, tracking and matching the received video stream to obtain a matched video stream, including:

acquiring key information of a received video stream;

assembling the key information to obtain a key value structure;

obtaining a connection tracking table according to the key value structure;

and searching the message of the video stream in the connection tracking table, and if the message is found, determining that the video stream is a matched video stream.

Optionally, encapsulating the plurality of attribute identification tags in the extension data in the plurality of real-time transport protocol packets of the matched video stream respectively for tag calibration to obtain a target video stream with tags, including:

converting each video frame of the matched video stream to obtain a plurality of real-time transmission protocol packets;

and respectively packaging the attribute identification tags into the real-time transmission protocol packets according to a preset rule to obtain a target video stream with the tags.

Optionally, respectively encapsulating the attribute identification tags in the real-time transport protocol packets according to a preset rule to obtain a target video stream with tags, including:

converting a plurality of attribute identification tags in the extension data according to a preset extension format to obtain extension data with a calibration format;

sequencing a plurality of the expansion data with the calibration format to obtain a plurality of ordered labels with the expansion format;

sequentially and respectively packaging a plurality of ordered labels in a plurality of real-time transmission protocol packets to obtain a target video stream with labels; the real-time transport protocol includes at least one of the ordered labels.

Optionally, converting the plurality of attribute identification tags in the extension data according to a preset extension format to obtain extension data with a calibration format, including:

and converting the plurality of attribute identification tags in the extension data according to a double-byte extension format to obtain the extension data with a calibration format.

Optionally, the sequentially encapsulating the plurality of ordered labels in the plurality of real-time transport protocol packets respectively to obtain a target video stream with labels includes:

determining a target data space required by a plurality of the ordered tags;

when the target data space is larger than the current available data space, expanding the current available data space by using a preset function to obtain an expanded data space;

and sequentially and respectively packaging the plurality of ordered labels in the extended data space in the plurality of real-time transmission protocol packets to obtain a target video stream with the labels.

and respectively packaging a plurality of attribute identification tags in the extension data into a plurality of different real-time transmission protocol packets of the matched video stream for tag calibration to obtain a target video stream with tags.

According to another aspect of the present invention, there is provided a video data transmission processing apparatus comprising:

the acquisition module is used for acquiring the expansion data of the video stream; the expansion data comprises a plurality of attribute identification tags of the camera;

the processing module is used for carrying out tracking matching on the received video stream to obtain a matched video stream; and respectively packaging a plurality of attribute identification tags in the extension data into a plurality of real-time transmission protocol packets of the matched video stream for tag calibration to obtain a target video stream with tags, and transmitting the target video stream.

According to another aspect of the present invention, there is provided a video streaming apparatus comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method of any of the above claims.

According to another aspect of the invention, there is provided a computer readable storage medium storing instructions that when run on a computer cause the computer to perform the method of any one of the above.

The scheme of the invention at least comprises the following beneficial effects:

according to the scheme, firstly, the received video stream is tracked and matched, the video stream to be processed is determined, and then, a plurality of attribute identification tags in the expanded data are respectively packaged in a plurality of real-time transmission protocol packets of the matched video stream for tag calibration, so that the purposes of adding the expanded data into the video stream and tracking and identifying the video stream are achieved, less transmission bandwidth and cost are used, and the method has the advantage of reducing the cost.

Drawings

Fig. 1 is a flow chart of a video data transmission processing method in an embodiment of the invention;

FIG. 2 is a flowchart of a method for obtaining a connection tracking table according to a video stream according to an embodiment of the present invention;

FIG. 3 is a flow chart of determining matching video streams according to a connection tracking table in an embodiment of the invention;

FIG. 4 is a flow chart of adding an attribute identification tag to a real-time transport protocol packet in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of a structure of a double-byte extension header according to an embodiment of the present invention;

FIG. 6 is a flow chart of expanding a currently available data space in an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a video data transmission processing apparatus in an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

As shown in fig. 1, an embodiment of the present invention proposes a video data transmission processing method, including:

step 11, obtaining expansion data of a video stream; the expansion data comprises a plurality of attribute identification tags of the camera;

step 12, tracking and matching the received video stream to obtain a matched video stream;

step 13, respectively packaging a plurality of attribute identification tags in the extension data into a plurality of real-time transmission protocol packets of the matched video stream for tag calibration to obtain a target video stream with tags;

and step 14, transmitting the target video stream.

The video data transmission processing method provided by the embodiment of the invention firstly carries out tracking matching on the received video stream, determines the video stream needing to be processed, then respectively packages a plurality of attribute identification tags in the expanded data into a plurality of real-time transmission protocol packets of the matched video stream for tag calibration, thereby not only realizing the purpose of adding the expanded data into the video stream for tracking and identifying the video stream, but also using less transmission bandwidth and cost and having the advantage of reducing the cost.

In this embodiment of the present invention, the plurality of attribute identification tags of which the extension data includes a camera may include at least one of: such as the property of the camera (device MAC, vendor name, device model, software version, etc.), the environment property of the camera access gateway (location, environment temperature, environment humidity, etc.), the security property of the current camera (whether security holes exist) etc., which can support user customization.

In an alternative embodiment of the present invention, step 12 of tracking and matching the received video stream to obtain a matched video stream includes:

step 121, acquiring key information of a received video stream;

step 122, assembling the key information to obtain a key value structure;

step 123, obtaining a connection tracking table according to the key value structure;

step 124, searching the message of the video stream in the connection tracking table, and if the message is found, determining that the video stream is a matching video stream.

As shown in fig. 2, in order to obtain a connection tracking table according to a video stream, fig. 3 is a process of determining whether the video stream is a matching video stream (i.e., a matching video stream requiring tag calibration) according to the connection tracking table. The transmission of the video stream is usually performed based on UDP (user datagram protocol) or TCP (transmission control protocol), and in this embodiment, the matching of the specified video stream (i.e. the matching video stream requiring label calibration) is implemented by using a connection tracking module in a Linux (an operating system) kernel netfilter framework (packet processing framework).

Specifically, when the kernel module is initialized, key information of video streams to be matched, such as source IP, destination IP, protocol, ports and the like, is assembled into a corresponding key value structure and added into a connection tracking table, and then when the traffic passes through a HOOK point (some operations are inserted between execution flows of source codes to play roles of interception and replacement and changed objects), a HOOK function registered by the kernel module is added. In the hook function, it can be looked up whether the message of the current video stream is in the previously saved connection tracking table entry, and if so, it indicates that this is the video traffic that needs to be processed (i.e. the matching video stream).

Because the video stream of the camera is directly forwarded through the gateway kernel routing table, the user cannot take the video data, and even if the user cannot send the video data after the post-processing is completed. Therefore, firstly, the received video stream is matched, the video stream is confirmed to be the video stream needing to be subjected to label calibration, and then the video stream is subjected to label calibration, so that the accuracy of the video data transmission processing method is improved, and the improvement of the working efficiency is facilitated.

In an alternative embodiment of the present invention, step 13 may include:

step 131, converting each video frame of the matched video stream to obtain a plurality of real-time transport protocol packets;

and 132, respectively packaging the attribute identification labels in the real-time transmission protocol packets according to a preset rule to obtain a target video stream with the labels.

In this embodiment, there are a plurality of attribute identification tags, which are directly added to the real-time transport protocol packet of the video stream, so that the bandwidth of video transmission is increased and the transmission performance is affected. Therefore, in this embodiment, first, each video frame of the matched video stream is converted into a plurality of real-time transport protocol packets, and at least one attribute identification tag is encapsulated in each real-time transport protocol packet, so that the plurality of attribute identification tags are respectively encapsulated in the plurality of real-time transport protocol packets, and it is only required to ensure that all tags of each frame of video can complete transmission in a plurality of RTP packets in the frame, so that the problem that the real-time transport protocol packets are too large due to a single large tag can be avoided, and the influence on video stream transmission is reduced. In an implementation, if a plurality of scattered RTP packets are received, at least one attribute identification tag may be directly encapsulated in each RTP packet.

In an alternative embodiment of the present invention, step 132 may include:

step 1321, converting a plurality of attribute identification tags in the extension data according to a preset extension format to obtain extension data with a calibration format;

step 1322, sorting the plurality of expanded data with the calibrated format to obtain a plurality of ordered labels with the expanded format;

step 1323, sequentially encapsulating the plurality of ordered labels in the plurality of real-time transport protocol packets respectively to obtain a target video stream with labels; the real-time transport protocol includes at least one of the ordered labels.

In the implementation, before the extension data is inserted, each attribute identification tag in the extension data is converted into the extension data with a calibration format according to a preset extension format, so that each attribute identification tag in the extension data accords with the specification of a real-time transmission protocol packet, and the working efficiency of adding the extension data in a video stream is improved. As shown in fig. 4, a plurality of attribute identification tags are then encapsulated in the extension header of different real-time transport protocol packets according to a rule, such as a round robin allocation rule. Because it cannot be determined how many real-time transport protocol packets exist, attribute identification tags are encapsulated in the plurality of real-time transport protocol packets according to a rule of circular allocation. For example: for example, labels with ID of 1, 2, 3 and 4 exist, the first real-time transmission protocol packet is labeled with ID of 1, and the second real-time transmission protocol packet is labeled with ID of 2, which are sequentially carried out; after marking the label with ID 4, marking the label with ID 1 in the next RTP packet, and sequentially cycling until all the attribute identification labels are encapsulated in the real-time transmission protocol packets, wherein at least one attribute identification label is encapsulated in each real-time transmission protocol packet.

In an alternative embodiment of the present invention, step 1321 may include:

As shown in fig. 5, a schematic structure of the double-byte extension header is shown. The two byte extension header starts with 0x10,0x00 followed by two bytes in length and then an extension in 1 byte id+1 byte ID length+present ID data, which is looped in sequence. The use of a double byte extension header can meet the need for adding extension data to a video stream. Wherein, ID is a serial number, such as 1, 2, etc.

In an alternative embodiment of the present invention, step 1323 may include:

step 13231, determining a target data space required by a plurality of the ordered labels;

step 13232, when the target data space is greater than the current available data space, expanding the current available data space by using a preset function to obtain an expanded data space;

and 13233, respectively encapsulating the ordered labels in the real-time transmission protocol packets according to the ordered labels in the extended data space to obtain a target video stream with labels.

As shown in fig. 6, before adding extension data to a real-time transport protocol packet of a video stream, it is first determined whether the currently available data space is sufficient to store the extended data. The size of the currently available data space can be obtained through the skb_tailroom () function. If the space is insufficient, expansion is needed, and a system function pskb_expansion_head () can be called for expansion, and an expansion data space is obtained after expansion. And then, an IP head pointer is obtained by calling an ip_hdr () function, an RTP (real-time transport protocol) head pointer is defined, a result of the IP head pointer which is offset by the length of the IP head is assigned to the RTP head pointer, extension data is inserted after the RTP head, and the original data is moved by using a mo-over function. The IPv4 total length header field and the UDP length field are adjusted after completion (because extension data is added). Therefore, the RTP header expansion can be performed on the matched video flow, and expansion data can be inserted under the condition that the original video flow message is not influenced.

In an alternative embodiment of the present invention, step 13 may include, on the basis of any of the above embodiments:

In practice, since each frame of video of a video stream usually requires multiple real-time transport protocol packets to be transported, different attribute identification tags are encapsulated in different real-time transport protocol packets, respectively, and not all tags are encapsulated in one packet. So long as it is ensured that all attribute identification tags of each frame of video can complete transmission in a plurality of real-time transport protocol packets within the frame. This avoids a single large tag causing the packet to be oversized. Specifically, the total size of the tag required by a frame can be calculated (the calculation mode is to add the serial numbers of all attribute identification tags and the attribute identification tag data to the data extension header), and then the data extension header is packaged into extension headers of different real-time transmission protocol packets in a scattered manner according to a certain rule, for example, a circular distribution rule. The receiving end only needs to collect the attribute identification tag information from all the real-time transmission protocol packets in one frame and reassemble the attribute identification tag information according to the serial number of the attribute identification tag so as to restore the complete extension data.

A specific embodiment of the video data transmission processing method of the embodiment of the invention is as follows:

step 21, obtaining extension data added to the RTP video stream (video stream using real-time transmission protocol) to be transmitted, where the extension data may include different attribute identification tags, such as asset attributes of the camera (e.g. device MAC, vendor name, device model, software version, etc.), environment attributes of the access gateway of the camera (e.g. location, environment temperature, environment humidity, etc.), security attributes of the current camera (e.g. whether security holes exist), which can support user definition (the user configures on the access security gateway, all attribute options exist, and the user can click which needs to be encapsulated in the video stream). By using the customized extension data, the RTP video stream can be tracked and identified, so that the requirements of specific application scenes are met.

In step 22, the video stream to be processed is matched, and the transmission of the video stream is usually performed based on UDP or TCP protocol. Because the video stream of the camera is directly forwarded through the gateway kernel routing table, the user cannot take the video data, and even if the user cannot send the video data after the post-processing is completed. The matching to the video stream to be processed can be done by: the kernel has a forwarding HOOK point, so a kernel processing module can be written, a registration function nf_register_net_hooks () is used, a registration tag processing function is on the HOOK point, the data is judged in the video forwarding process, and if the data is a data stream needing to be subjected to the tagging processing, the tagging processing is performed. In this embodiment, the connection tracking module in the Linux kernel netfilter framework is used to implement matching of the specified video traffic. In detail, as shown in fig. 2, when the kernel module is initialized, the key information of the video stream to be matched, such as source IP, destination IP, protocol, port, etc., is assembled into a corresponding key structure and added into the connection tracking table. Then, when the traffic passes through the HOOK point, a HOOK function registered by the kernel module is added. As in fig. 3, in the hook function, it can be looked up if the current message is in a previously saved connection tracking table entry, if it matches, this is the video traffic that needs to be processed.

Step 23, after obtaining the matched video flow, the message of the RTP video flow needs to be expanded to insert the customized expansion data. Referring to fig. 6, it is first determined whether the available space of the sk_buff packet is sufficient to store the expanded data. The available space size of sk_buff may be obtained through a skb_task () function. If the space is insufficient, the sk_buff needs to be expanded, a system function pskb_expansion_head () can be called to expand, and enough available space is obtained after expansion. And then, an IP head pointer is obtained by calling an ip_hdr () function, an RTP head pointer is defined, a result of the IP head pointer after the IP head pointer is shifted by the length of the IP head is assigned to the RTP head pointer, extended head data is inserted after the RTP head, and the original data is moved by using a momove function. The IPv4 total length header field and the UDP length field are adjusted after completion (because extension data is added). Therefore, RTP header expansion can be performed on the matched video flow, and custom expansion data can be inserted under the condition that the original message is not influenced.

According to the RFC specification (a series of documents defining the standards and protocols of the internet), RTP video header extensions can be divided into single byte extension headers and double byte extension headers. The single byte extension header requirement must start with 0xBE,0xDE followed by two bytes in length, followed by an extension in 4bit ID+4 bit ID length+present ID data, which is looped in sequence. The two byte extension header requirement must start with 0x10,0x00 followed by a two byte length, followed by an extension in the form of 1 byte id+1 byte ID length+present ID data, which loops in sequence. Because the single byte ID and ID length occupy 4 bits respectively, the ID can support 15 kinds at most, the ID content is 15 bytes at most, and the requirement of the video tag custom content is far not met, all attribute identification tags in the extension data are converted into a standard double-byte extension format in the embodiment. Because the corresponding information such as the pushing mode, the server, the port and the like is required to be configured on the camera, the user can obtain the corresponding information (the user accesses the local management page of the camera to perform configuration), and therefore, the user only needs to process the gateway management page by accessing the video tag to perform configuration on the video configuration column. This reduces errors in video stream matching as much as possible.

In step 24, since each frame of video typically requires multiple RTP packets to transmit, different tag information is encapsulated in different RTP packets, instead of all tags being encapsulated in one packet. So long as it is ensured that all tags of each frame of video can complete transmission in multiple RTP packets within the frame. This avoids the problems of a single large tag resulting in a packet that is too large, a large total tag count, and a large bandwidth occupation. Specifically, the total size of the label required by one frame can be calculated (the total size of the data is the data extension header plus all the IDs and the data of the IDs), and then the label is packaged into extension headers of different RTP packets in a dispersed manner according to a certain rule, as shown in fig. 4, for example, the label with the ID of 1, 2, 3 and 4 is allocated in a circulating manner (for example, the label with the ID of 1 is packaged in the first RTP packet, the label with the ID of 1 is packaged in the second RTP packet, and the label with the ID of 2 is packaged in sequence). The receiving end only needs to collect the label information from all RTP packets in one frame and reassemble the label information according to the ID size sequence to restore the complete label data.

In the above embodiment of the present invention, by encapsulating different tag information into different RTP packets, it is only necessary to ensure that all tags of each frame of video can complete transmission in multiple RTP packets within the frame. Therefore, the oversized package caused by a single large tag can be avoided, the transmission delay of the video stream with the extension data is reduced, and the transmission efficiency is improved.

As shown in fig. 7, an embodiment of the present invention provides a video data transmission processing apparatus 100, including:

an acquisition module 101, configured to acquire extension data of a video stream; the expansion data comprises a plurality of attribute identification tags of the camera;

the processing module 102 is configured to track and match the received video stream to obtain a matched video stream; and respectively packaging a plurality of attribute identification tags in the extension data into a plurality of real-time transmission protocol packets of the matched video stream for tag calibration to obtain a target video stream with tags, and transmitting the target video stream.

In an alternative embodiment, tracking and matching the received video stream to obtain a matched video stream includes:

acquiring key information of a received video stream;

assembling the key information to obtain a key value structure;

obtaining a connection tracking table according to the key value structure;

In an optional embodiment, encapsulating the plurality of attribute identification tags in the extension data in a plurality of real-time transport protocol packets of the matched video stream respectively to perform tag calibration to obtain a target video stream with a tag, including:

In an alternative embodiment, a plurality of attribute identification tags are respectively encapsulated in a plurality of real-time transport protocol packets according to a preset rule to obtain a target video stream with tags, which includes:

In an alternative embodiment, converting the plurality of attribute identification tags in the extension data according to a preset extension format to obtain extension data with a calibration format, including:

In an optional embodiment, the sequentially encapsulating the plurality of ordered labels in the plurality of real-time transport protocol packets respectively to obtain a target video stream with labels includes:

determining a target data space required by a plurality of the ordered tags;

The video data transmission processing device provided by the embodiment of the invention firstly carries out tracking matching on the received video stream, determines the video stream needing to be processed, then respectively packages a plurality of attribute identification tags in the expanded data into a plurality of real-time transmission protocol packets of the matched video stream for tag calibration, thereby not only realizing the purpose of adding the expanded data into the video stream for tracking and identifying the video stream, but also using less transmission bandwidth and cost and having the advantage of reducing the cost.

It should be noted that, the device is a device corresponding to the video data transmission processing method, and all implementation manners in the method embodiment are applicable to the device embodiment, so that the same technical effects can be achieved. In this embodiment, details are not described again.

The embodiment of the invention also provides video streaming transmission equipment, which comprises: a processor, a memory storing a computer program which, when executed by the processor, performs a method as in any of the above embodiments. All the implementation manners in the method embodiment are applicable to the embodiment of the equipment, and the same technical effect can be achieved. In this embodiment, details are not described again.

Embodiments of the present invention also provide a computer-readable storage medium having stored thereon instructions which, when run on a computer, cause the computer to perform a method according to any of the above embodiments. All the implementation manners in the above method embodiments are applicable to the embodiments of the computer readable storage medium, and the same technical effects can be achieved. In this embodiment, details are not described again.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. A video data transmission processing method, characterized by comprising:

and sending the target video stream.

2. The video data transmission processing method according to claim 1, wherein tracking and matching are performed on the received video stream to obtain a matched video stream, comprising:

acquiring key information of a received video stream;

assembling the key information to obtain a key value structure;

obtaining a connection tracking table according to the key value structure;

3. The method for processing video data according to claim 1, wherein the step of encapsulating the plurality of attribute identification tags in the extension data in the plurality of real-time transport protocol packets of the matched video stream respectively to perform tag calibration to obtain the target video stream with the tag comprises:

4. The video data transmission processing method according to claim 3, wherein the step of respectively encapsulating the plurality of attribute identification tags in the plurality of real-time transport protocol packets according to a preset rule to obtain the target video stream with the tag comprises the steps of:

5. The method for video data transmission processing according to claim 4, wherein converting the plurality of attribute identification tags in the extended data according to a preset extended format to obtain the extended data with a calibrated format, comprises:

6. The method for processing video data according to claim 4, wherein sequentially encapsulating the plurality of ordered labels in the plurality of real-time transport protocol packets, respectively, to obtain the target video stream with the labels, comprises:

determining a target data space required by a plurality of the ordered tags;

7. The method for processing video data according to any one of claims 1 to 6, wherein encapsulating the plurality of attribute identification tags in the extension data in the plurality of real-time transport protocol packets of the matched video stream respectively performs tag calibration to obtain a target video stream with tags, and includes:

8. A video data transmission processing apparatus, comprising:

9. A video streaming apparatus, comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method of any one of claims 1 to 7.

10. A computer readable storage medium storing instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 7.