WO2007045140A1

WO2007045140A1 - A real-time method for transporting multimedia data

Info

Publication number: WO2007045140A1
Application number: PCT/CN2006/001845
Authority: WO
Inventors: Bin Song; Zhong Luo
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2005-10-17
Filing date: 2006-07-25
Publication date: 2007-04-26
Also published as: CN1863314A; CN100407726C

Abstract

A method for real-time transporting H.264 multimedia video data is disclosed, it makes that the RTP protocol could efficiently carry H.264 multimedia video data, and enhances the ensuring mechanism of the service quality on the premise of being compatible with the known apparatus and transport manners with RTP protocol. A modified RTP protocol for carrying H.264 data is disclosed, by combing all header information bytes of H.264 NALU into the header information of RTP packet itself, so as not to affect the action of known RTP protocol and apparatus, and it could directly indicate the attribute of H.264 NALU payload in the MRTP header information; further an identification method for discriminating the known RTP from MRTP is provided, by modifying the relevant fields such as M, F field of the known RTP header information, this makes it possible that the known network media apparatus could support both RTP and MRTP to work, and improves the compatibility of MRTP and the flexibility of application.

Description

Real-time method for multimedia data transmission

Technical field

The present invention relates to the field of multimedia communication technologies, and in particular, to a method for real-time transmission of multimedia data. Background technique

With the rapid development of computer Internet (Internet) and mobile communication networks, streaming media technology is becoming more and more widely used, from streaming media, movie playback to distance learning and online news sites. Currently, there are two ways to download video and audio on the Internet, including download and streaming. Streaming is the continuous transmission of video/audio signals, and the rest of the video continues to be downloaded in the background while the streaming media is playing. Streaming has two methods: Progressive Streaming and Realtime Streaming. Real-time streaming is real-time delivery, especially for live events. Real-time streaming must match the connection bandwidth, which means that image quality will degrade due to reduced network speed to reduce the need for transmission bandwidth. The concept of "real time" means that the delivery of data in an application must be kept in precise time relationship with the generation of the data.

Especially with the advent of third-generation mobile communication systems (3G, 3rd Generation) and the rapid development of Internet-based (Internet Protocol)-based networks, video communication is gradually becoming one of the main services of communication. Two-way or multi-party video communication services, such as video telephony, video conferencing, and mobile terminal multimedia services, impose strict requirements on the transmission of multimedia data streams and the quality of services. Not only does network transmission require better real-time performance, but equivalently requires video data compression coding to be more efficient.

In view of the current demand for media communication, the ITU-T Telecommunication Standardization Sector officially released H.264 in 2003 after the development of video compression standards such as H.261, H.263, and H.263+. standard. This is a highly efficient compression coding standard that is jointly developed by the ITU-T and the Moving Picture Experts Group (MPEG) of the International Standardization Organization (ISO) to adapt to the new phase of network media delivery and communication requirements. It is also the main content of Part 10 of the MPEG-4 standard.

The purpose of the H.264 standard is to improve video coding efficiency and its network The suitability of the network. In fact, due to its superiority, the H.264 video compression coding standard has gradually become the mainstream standard in multimedia communication. A large number of Η.264 multimedia real-time communication products (such as conference TV, videophone, 3CJ mobile communication terminal) and network streaming products have been introduced. Whether or not to support Η.264 has become a key factor in determining the competitiveness of products in this market segment. It can be predicted that with the official promulgation and widespread use of Η.264, multimedia communication based on IP networks and 3G and 3G wireless networks will inevitably enter a new stage of rapid development.

As mentioned above, multimedia communication not only requires high efficiency of media compression coding, but also requires real-time transmission of the network. At present, multimedia streaming is basically based on the RTP Real-time Transport Protocol and its Real-time Transport Control Protocol (RTCP). RTP is a transport protocol for multimedia data streams over the Internet, published by the Internet Engineering Task Force (IETF). RTP is defined to work in one-to-one or one-to-many transmissions with the goal of providing time information and stream synchronization. The typical application of RTP is based on the User Datagram Protocol (UDP), but it can also work on other protocols such as TCP (Transport Control Protocol) or Asynchronous Transfer Mode (ATM). .

RTP itself only guarantees the transmission of real-time data, and does not provide a reliable transmission mechanism for transmitting packets in sequence, nor does it provide flow control or congestion control. It relies on RTCP to provide these services. RTCP is responsible for managing the transmission quality to exchange control information between current application processes. During the RTP session, each participant periodically transmits RTCP packets, which contain statistics such as the number of transmitted packets and the number of lost packets. Therefore, the server can use this information to dynamically change the transmission rate, even Change the payload type. Used in conjunction with RTCP, RTP optimizes transmission efficiency with efficient feedback and minimal overhead, making it ideal for real-time data on the delivery network.

H.264 multimedia data is transmitted over IP networks, also based on UDP and its upper layers.

RTP protocol. RTP itself is structurally applicable to different media data types, but different high-level protocols or media compression coding standards in multimedia communication (eg H.261, H.263, .MPEG-1/-2/-4, MP3, etc.), the IETF will formulate a specification file for the RTP payload (Payload) packaging method of the protocol, and specify the method of RTP encapsulation and packaging, which is optimized for the specific protocol. Similarly, the corresponding IETF standard for H.264 is PC 3984: RTP Payload Format for H.264 Video „ This standard is currently H.264 The main standard for video stream transmission over IP networks is widely used. In the field of video communication, the products of major manufacturers are based on RFC 3984, and it is currently the only H.264/RTP transmission method.

In fact, the key difference between H.264 and other video compression coding protocols is that H.264 defines a new layer called NAL (Network Abstract Layer), which is a standard that makes it standard. The interface opens up the underlying business capabilities and shields the underlying network from the differences and abstracts the business capability layer. In order to increase the separation and independence of its video coding layer (VCL, Video Coding Layer) and the following specific network transport protocol layer, H.264 brings greater application flexibility and defines a new layer of NAL. The early ITU-T video compression coding protocols such as H.261, H.263/H.263+/H.263++ were not available. However, how to design a more efficient and better solution for the advantages of H.264 in the NAL and RTP protocol bearer cooperation makes RTP better for H.264, practical, and worthy of study.

The method of RTP carrying H.264 NAL layer data proposed by the RFC3984 specification is the only technical solution currently. The scheme encapsulates the NAL layer data in the RTP payload for bearer based on the RTP protocol (RFC 3550). The NAL layer is located between the VCL and the RTP, and is configured to divide the video stream into a series of network abstraction layer data units (NALUs, NAL Units) according to defined rules and structures. The encapsulation format of the RTP payload for NALU is defined in RFC3984. The following briefly introduces the RTP frame format and the existing NALU encapsulation method.

RTP. The main purpose of the design is real-time multimedia conferencing and continuous data storage, interactive distributed simulation, control and measurement applications. RTP is typically carried over the UDP protocol to take advantage of its multiplexing and parity functions. If the underlying provides multipoint distribution, RTP supports multi-address delivery. The functions provided by RTP include: payload type identification, sequence number, time stamp, and transmission monitoring.

In the case of carrying H.264 video, RTP packages the NALU package of H.264 into an RTP packet stream. The NALU is mainly defined in the RFC 3984 file, and the H.264 layer is given based on this.

The encapsulation format of NAL data in RTP is shown in Figure 1.

FIG. 1 shows the encapsulation structure of a NALU in the payload of the RTP. The first byte in the previous byte is the NALU header information, and then the data content of the NALU, and the payloads of the multiple NALUs that are filled in the end to the RTP packet. In the end, there is an optional RTP padding, which is specified in the RTP packet format, in order to make the length of the RTP packet meet certain requirements (such as Fixed length), optional RTP fill data is generally zero filled.

The NALU header information is the first byte, also known as the Octet, which has three fields. The meaning and full name are described as follows:

The F field is defined as a forbidden bit (forbidden_zero-bit), which is 1 bit, used to identify grammatical errors, etc., and is set to 1 if there is a syntax conflict. When the network recognizes a bit error in this unit, it can be set. Is 1, for the receiver to drop the unit, mainly used to adapt to different kinds of network environments (such as wired and wireless combined environment);

The NI field is defined as a NAL reference identifier (nal_ref_idc), which is 2 bits, used to indicate the importance of the NALU data. A value of 00 indicates that the content of the NALU is not used to reconstruct the inter-predicted reference image, instead of 00. Indicates that the current NALU is important data such as a slice or a sequence parameter set (SPS, Sequence Parameter Set) and a picture parameter set (PPS, Picture Parameter Set) belonging to a reference frame. The larger the value, the more important the current NAL is;

The Type field is defined as NALU type (Nal_ unit_type), which is 5 bits in total, and can have 32 types of NALU. The correspondence between the value and the specific type is given in Table 1.

Table 1 Relationship between Type and Type of Type Fields in NALU Header Information

Type value Type of NALU content

0 not specified

1 encoding of non-IDR images

2 encoding slice data division A

3 encoding slice data division B

4 encoding slice data division C

5 Code slice in IDR image

6 SEI (Supplemental Enhancement Information)

7 SPS (sequence parameter set)

8 PPS (image parameter set)

9 access unit delimiter

10 end of sequence

11 code stream ends

12 Fill data

13-23 Reserved

24-31 not specified It can be seen that the information given in one byte of the NALU header information mainly includes the validity and importance level of the NALU, and based on the information, the importance of the data carried by the RTP can be determined.

In summary, the prior art scheme can be summarized as follows: First, the video bit stream of H.264 is segmented according to a certain rule to form a NALU stream in the NAL layer, and this step actually belongs to the category of H.264 implementation, for example, Taking a frame of image as a NALU, you can also use a slice as a NALU, and then package the NALU stream into an RTP packet stream according to the package encapsulation strategy associated with the application. In the RTP data packet, after the header information is the NALU data area. If an RTP data packet encapsulates multiple NACHs of H.264, then these NALUs are arranged end to end, and each NALU occupies a continuous bit, and each NALU is the same. One byte is the NALU header information byte, and at the end of the RTP packet, there may be some optional padding data bits as needed. During the transmission process, the underlying RTP protocol does not process the specific information of the NALU.

In practical applications, the above solution has the following problems: In this scheme, the NALU header information of H.264 is not reflected in the header information of the RTP packet. The NALU header information byte contains a lot of important information. For example, N I indicates whether the corresponding NALU contains H.264 non-reference frames or reference frames or other important image data such as parameter sets.

Because the RTP protocol itself does not provide any QoS (Quality of Service) mechanism, and does not provide information about the priority of the bearer data, etc., the RTP itself does not have any errors such as network packet loss on the IP and wireless networks. Fault tolerance, or Error Resilience.

If some information of the H.264 NALU can be reflected in the header information of the RTP packet, the information about the H.264 NALU attribute can be obtained by directly scanning the RTP header information. Based on this information, it is possible for the network device to handle the RTP packets differently, so as to ensure the priority of important data in the transmission process.

In addition, there is room for improvement in efficiency. If an RTP packet encapsulates multiple H.264 NALUs, the types of these NALUs are the same, then the header bytes of these NALUs are exactly the same. If there are N NALUs in an RTP packet, then the N NALU header information bytes can be replaced by one byte without any loss of information, but the efficiency is improved because N-1 redundant bytes can be reduced. .

In the prior art solution, the header information of the NALU is completely encapsulated in the payload, so that the RTP The protocol cannot directly know the attributes, levels, importance, etc. of the payload, so that the QoS mechanism based on this cannot be implemented. Secondly, such an encapsulation format also causes the NALU header information to occupy the payload resources, because each NALU has header information, which results in many cases, because the header information of multiple NALUs of the same type in an RTP are the same. , which wastes RTP transmission bandwidth resources.

Summary of the invention

In view of this, the main purpose of the present invention is to provide a real-time transmission method of H.264 multimedia data, so that the RTP protocol can efficiently carry multimedia video data, and the service is added under the premise of being compatible with existing RTP protocol devices and transmission methods. Quality assurance mechanism.

According to the H.264-based multimedia data transmission method provided by the present invention, the multimedia data is divided into a network abstraction layer unit stream in a network abstraction layer, and the network abstraction layer unit includes header information, and the method includes:

The sender encapsulates at least one network abstraction layer unit with the same header information in the same improved real-time transport protocol packet according to the improved real-time transport protocol encapsulation format, and sets an identifier in the improved real-time transport protocol header information to distinguish it from real-time transmission. Agreement package

The receiving party determines whether the packet is an improved real-time transport protocol packet according to the identifier, and if so, processes the packet according to the improved real-time transport protocol encapsulation format, and acquires the carried network abstraction layer unit;

In the improved real-time transport protocol encapsulation format, the same header information possessed by the network abstraction layer unit carried by the network is included in the header information of the improved real-time transport protocol packet, and the header of the network abstraction layer unit carried After the information is removed, it is populated into the payload of the improved real-time transport protocol packet.

The network abstraction layer unit header information includes:

a disable bit field, configured to indicate whether the network abstraction layer unit is in error;

a network abstraction layer reference identifier field, configured to indicate an importance of the network abstraction layer unit; a type field, configured to indicate a type of the network abstraction layer unit;

In the improved real-time transport protocol encapsulation format, the network abstraction layer reference identification field and type field are populated in a payload type field of the improved real-time transport protocol header information.

The improved real-time transport protocol identifier is the version of the improved real-time transport protocol header information The information field, the version information field is set in the improved real-time transport protocol header information. In the improved real-time transport protocol encapsulation format, the forbidden bit field is populated in a tag field of the improved real-time transport protocol header information;

The receiver judges whether the network abstraction layer unit carried by the real-time transport protocol packet is in error according to the marked field of the improved real-time transport protocol packet.

The receiving party includes a communication terminal and a network intermediate device.

The improved real-time transport protocol identifier is in the marked field of the improved real-time transport protocol header information.

The sender first determines whether the forbidden bit field in the header information of at least one of the network abstraction layer units is valid, and accordingly divides the barred data field into a normal network abstraction layer unit and an error network abstraction layer unit;

And then encapsulating the normal network abstraction layer unit into the improved real-time transport protocol packet according to the improved real-time transport protocol encapsulation format, and setting the improved real-time transport protocol identifier, in the improved real-time transport protocol encapsulation format, ignoring a forbidden bit field in the network abstraction layer unit header information;

Encapsulating the erroneous network abstraction layer unit into the real-time transport protocol packet according to the real-time transport protocol encapsulation format;

The receiver first divides the modified real-time transport protocol identifier into the improved real-time transport protocol packet and the real-time transport protocol packet according to the received header information of the packet; according to the improved real-time transport protocol encapsulation format Processing the improved real-time transport protocol packet, processing the real-time transport protocol packet according to the real-time transport protocol packet encapsulation format.

In the improved real-time transport protocol encapsulation format, when the type of the network abstraction layer unit is less than 16 types, only the lower 4 bits of the type field are used, and the highest bit of the type field is reserved as an extension. Bit.

The multimedia transmission device learns related information of the network abstraction layer unit carried by the real-time transmission protocol header information according to the improved real-time transmission protocol header information, and implements the service quality policy for real-time transmission of the multimedia data according to the implementation.

By comparison, it can be found that the main difference from the prior art is that the technical solution of the present invention provides an improved RTP protocol (MRTP, Modified RTP) for carrying NALU data by using all NALUs in the same RTP packet. Header information bytes are combined into its header information A combination method is adopted to prevent the operation of the existing RTP protocol and the device, and the attribute of the NALU payload can be directly reflected in the MRTP header information, so that the encapsulation efficiency of the MRTP to the NALU is greatly improved. On the other hand, the implementation of the NAL mechanism by the MRTP on the payload NALU attribute provides the basis for the implementation of the QoS mechanism;

In addition, by improving the related fields in the existing RTP header information, such as the \ and F fields, the identification method of distinguishing the existing RTP and MRTP is given, which makes it possible for the existing network media equipment to support both RTP and MRTP, and improve The compatibility of MRTP; and the corresponding conventional RTP single transmission method for NALU with syntactical errors or errors, and the scheme of alternate processing of MRTP and RTP data packets are given.

The header information of the improved MRTP protocol of the present invention carries the NALU header information of the H.264, so that the importance of the MRTP data packet carrying the NALU can be determined by performing a fast scan of the MRTP header information without decoding the NALU. Therefore, corresponding measures are taken to implement QoS policies, etc., to further improve service quality;

At the same time, by stripping out the header information bytes of the same H.264 NALU in the MRTP data packet, the purpose of reducing redundancy and improving transmission efficiency is achieved, thereby improving the video transmission quality of the multimedia video communication of the IP network and further satisfying the requirements of the user.

In addition, the difference between MRTP and RTP achieves compatibility with the prior art, gives a solution for alternate processing of MRTP and RTP packets, and a separate transmission scheme for erroneous NALU, which improves the robustness of the new MRTP method. DRAWINGS

1 is a schematic diagram of a format of encapsulation of NALU data in an RTP packet payload in the prior art; FIG. 2 is a schematic diagram of a header information structure of an RTP packet;

3 is a schematic diagram showing the structure of a header information of an MRTP data packet according to an embodiment of the present invention. detailed description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings.

The present invention aims to provide a multimedia data transmission scheme capable of embodying H.264 NALU header information in the header information of the RTP. The basic principle is to use some or some bytes or bits in the current RTP header information to represent the NALU header information, and the purpose of these bytes or bits in the RTP is to combine with the specific media protocol carried. Extended space of. The improved RTP protocol will not affect the interoperability with devices supporting the original RTP protocol, that is, some terminals in the communication adopt the improved RTP protocol according to the present invention, and the other terminals adopt the unmodified RTP protocol, and the terminals use It is fully communicable. By setting the flag bit in the MRTP header information to distinguish the traditional RTP data packet, the terminal can also adopt different processing measures for different situations to implement the alternate transmission processing scheme of MRTP and RTP. This includes a solution to the syntax error NALU, which is transmitted using traditional RTP.

The improvement of the existing RTP by MRTP mainly involves the redefinition of the payload type FT field and the version information V field in the RTP packet header information. The two potential values of the scheme are: Provide a certain QoS mechanism for H.264 data transmission; Improve the efficiency of RTP encapsulation H.264 NALU.

In order to facilitate the understanding of the technical solution of the present invention, here, the format of the RTP packet is briefly introduced: The basic option of the RTP header information occupies 12 bytes (minimum case), and the header information of the IP protocol and the UDP protocol respectively occupy 20 bytes and 8 words. Therefore, the RTP packet is encapsulated in the UDP packet and then encapsulated in the IP packet. The total number of bytes occupied by the header information is 12+8+20=40 bytes. The detailed structure of the header information of the RTP packet is shown in Figure 2.

The front-to-back RTP header information shown in Figure 2 is: The first byte (byte 0) is some field about the header information structure itself, the second byte (byte 1) is the defined payload type, the third 4 bytes (bytes 2, 3) are the sequence number (Sequence Number), the 5th-8th byte is the timestamp (timestamp), and the 9th-12th byte is the synchronous contribution source identifier (SSRC ID, Synchronous Source) Identifier ) , and finally the list of contributing source identifiers ( CSRC Ids , Contributing Source Identifiers ), the number of which is uncertain. Note that the first byte in the description in this article is the byte 0 of the label, and so on.

The first 12 bytes appear in all different types of RTP packets, while other data in the header information, such as the contribution source identifier, is only available when the mixer is inserted. Therefore, CSRC is generally used when there is media mixing. For example, in a multi-party conference, the audio needs to be mixed, and the video can also provide multi-screen functions in this way. The synchronization source identifier SSRC is actually the identifier of the carried media stream.

The specific meanings and full names of the above fields are described as follows:

The V field is version (Version) information, which occupies 2 bits. Currently, the version used is 2, so V=2 is set, and other values such as V=l indicate the earlier RTP version, and V=0 indicates the original. The RTP predecessor, which was adopted in the voice over IP (VOIP) communication system used on the early Mbone network, later evolved into RTP, while V=3 has not been defined, and thus is usable by the present invention;

The P field is a padding flag (Padding), which is 1 bit. If P is set, it indicates that the packet contains one or more padding bytes (Padding) at the end, and the padding does not belong to a part of the payload; the X field is an extension flag bit (Extension) ), occupying lbit, if X is set, the last part of the RTP header must be followed by a variable-length header extension (if there is a CSRC list, the header extension is followed), mainly for the header information in some application environments. In the case where the field is not enough, the header extension includes a 16-bit length field to count how many 32-bit words are in the extension, and the first 16 bits of the header extension are left-opened to distinguish between identifiers and parameters. The format of the bit is defined by a specific level specification, which is described in detail in Section 5.3.1 of RFC 3550, which is not given here;

The CC field is the number of contributing sources (CSRC Count), which is 4 bits, indicating the number of CSRC identifiers at the end of the header information. The receiver can determine the length of the CSRC IDs list following the header information according to the CC field.

The M field is a marker bit (Marker), which occupies 1 bit. The interpretation of the identifier bit is defined in a specific profile. It allows identification of important events in the packet stream. One layer can define additional identification bits or specify no Identification bit, the so-called level refers to the specific application environment setting, which is specifically agreed by the communication parties and is not limited by the agreement;

The PT ( Payload Type ) field indicates the payload type, a total of 7 bits, identifies the format of the RTP payload and confirms his interpretation in the application; the flag bit and the payload type share a specified number of bytes, and this byte may be specified. The level is redefined to suit different needs. The so-called profile can be defined in a specific application. In fact, it is a set of static (that is, agreed by the communication parties), and the different values of the PT bits are associated with different media formats. Of course, dynamic negotiation can also be used to define the relationship between the PT value and the media format through signaling other than RTP. In an RTP session, the RTP source can change the PT.

The following field is the serial number of 16 bits. Each time an RTP data packet is sent, the serial number value is incremented by one, so that the receiver can use it to detect the data packet loss and recover the data packet sequence. The initial value of the serial number in one communication can be given randomly. , does not affect communication.

The timestamp occupies 32 bits, which reflects the sampling time of the first byte in the RTP packet. The sampling time here must be derived from a monotonically increasing clock, and the receiver adjusts the media playback time or synchronizes according to it. The synchronization source SSRC ID occupies 32 bits, and its specific value can be randomly selected. However, to ensure the uniqueness in the same RTP session, it can uniquely identify a media source. If a source changes the source transmission address, a new SSRC must be selected. The identifier.

The source CSRC list can be 0-15 items as required, each item occupying 32 bits. The length of the list, ie the number of CSRC IDs, is exactly indicated by the 4 bits of the CC field. In fact, the CSRC identifier used to identify a media source is identical to the SSRC identifier of its corresponding contribution source, except that the role of the different receivers is different and is set to SSRC or CSRC. In multiparty communication, the CSRC ID is inserted by the mixer.

In this paper, NALU data is encapsulated in a modified RTP (MRTP, Modified RTP) format, which is described below by way of specific implementation. All the descriptions of MRTP given are only different from the existing RTP. The most basic difference between MRTP and RTP is that the header information of the NALU packet with the same header information is integrated into the header information of the MRTP during the MRTP encapsulation process.

The NALU header information structure has been mentioned above, and the NALU information includes: a 1-bit F field for indicating whether the NALU is in error;

a 2-bit NRI field indicating the importance of the NALU;

A 5-bit Type field indicating the type of the NALU.

The present invention will be described in detail below by taking H.264-based multimedia data transmission as an example. [First Embodiment of the Invention]

In this embodiment, the execution steps of both the transmitting and receiving parties are as follows:

The sender encapsulates multiple NALUs with the same header information in the same MRTP packet according to the MRTP encapsulation format, and sets an MRTP identifier in the MRTP header information to distinguish the RTP packets. In the technical solution of the present invention, only the same type of H.264 NALU is stored in the same MRTP data packet, that is, it has the same header information.

According to the actual engineering experience, in general, because the H.264 bitstream always has the same attribute of its neighboring NALU type, the same type of NALU can be accumulated until a certain number is satisfied and then encapsulated into MRTP. In the case that the number of the same type of NALUs does not reach a certain number, the RTP padding method may be adopted. In addition, there is another method. If there are many NALUs of different types, RTP encapsulation may be adopted, so that the receiver can According to the MRTP logo, it is identified and processed accordingly.

The receiver determines that the packet is an MRTP packet according to the MRTP identifier, and is encapsulated according to the MRTP. The format processes the MRTP packet to obtain the NALU carried. Here, the receiver can identify the MRTP packet according to the MRTP identifier, which is mainly different from the RTP packet, so that the terminal using the MRTP protocol does not affect the normal communication of the existing RTP protocol, and has backward compatibility.

The above mentioned MRTP encapsulation format integrates the same header information of the NALU carried by the NATU in the header information of the MRTP packet, and removes the header information of the carried NALU and fills the MRTP packet. In the payload. This is the main difference between MRTP and RTP. As mentioned earlier, this facilitates function expansion and saves bandwidth.

[Second Embodiment of the Invention]

The focus here is on integrating the NALU header into the MRTP header and identifying the MRTP packet.

Based on the first embodiment, in the MRTP encapsulation format, the NRI field and the Type field in the NALU header information are filled in the PT field of the MRTP header information. As described above, the PT field is located in the second byte of the MRTP header information. The last 7 bits. The format of this MRTP header is shown in Figure 3, where the difference from RTP has been shown in bold, and in addition, some places in the figure are explained later.

In this embodiment, the V field in the MRTP header is used as the MRTP identifier, and if it is the MRTP packet, the V field is taken as 3 (binary value 11), and the V field is located before the first byte of the MRTP header information. 2 bits, that is, the version information field; moreover, the F field in the NALU header information is filled in the M field of the MRTP header information, which is located in the first 1 byte of the 2nd byte of the MRTP header information, and is based on the receiver The M field of the MRTP packet determines whether the NALU carried by the MRTP is in error, and the F bit disable function is also implemented.

It can be seen that in the second embodiment of the present invention, the current version V of the MRTP is set to 3, which is equivalent to the new version of RTP, and the current RTP version V has a value of 2. By the difference of the version, the receiver of the RTP data packet can be told. The RTP protocol is an improved version of MRTP, so that the subsequent processing is performed according to the processing flow for improving the RTP protocol. An alternative will be described later, and the purpose of representing the difference between MRTP and RTP can be achieved without modifying V.

In this embodiment, the NALU header information byte (8 bits) is replaced with the identification M field 1 bit in the original RTP header information and the PT field 7 bits total 8 bits. The specific replacement order can be like this:

F bits replace M bits; NRI 2 bits replace the highest 2 bits of the PT 7 bits;

Type 5 bits replace the lowest 5 bits of the PT 7 bits;

In fact, such an alternative is justified. As mentioned earlier, the PT 7 bits are free to use. The purpose of the M field is specified in RTP (RFC 3550) as follows: A specific profile (Profile) can specify not to use M bits, but to incorporate it into the PT, so that ΡΤ can have up to 8 bits, distinguishing 256 different type. Therefore, replacing the Μ bits with F bits is completely RTP-compliant and does not affect the interworking between MRTP and the original RTP.

Obviously, the MRTP encapsulation format of the present invention has three obvious advantages: First, the overhead is small, especially when there are multiple NALUs in one RTP, the number of transmission bits is obviously saved; Second, there is no need for the RTP packets. .264 NALU data decoding can discriminate the relative importance of these NALUs. Third, without decoding the Η.264 NALU data in the RTP data packet, it can identify whether the RTP packet can be correctly decoded due to other bit loss. .

In order to further explain the technical solution of the second embodiment of the present invention, the following describes the process of MRTP encapsulation and decapsulation. After the above processing, multiple H.264 NALU types in the same MRTP data packet are identical, that is, their header information bytes are the same, and when they are encapsulated into the MRTP data packet, the original information can be stripped off. The header information byte, so if there are N NALUs, you can reduce N bytes. When decapsulating, the NALU is extracted from the MRTP packet and restored to the original form, that is, the N NALUs are extracted from the MRTP packets they are in, and then the 7 bits of the PT in the MRTP header information are copied to The lowest 7 bits of one byte H (8 bits) are removed, and the highest bit of H is set to 0 as the F bit. The generated H bytes are then appended to the top of each extracted NALU, thus restoring each NALU. Of course, if the F field in the MRTP header is 1, it indicates that the NALU in the MRTP packet is in error, so it can be directly discarded, which saves processing time.

[Third embodiment of the present invention]

A second solution is given on the basis of the first embodiment, which is similar to the second embodiment in that: the NI and Type fields in the NALU header are filled into 7 bits of the PT field of the MRTP header. . The difference is: the V field is no longer used to identify the MRTP, or the value is V = 2, but the M field is used to identify the MRTP. This causes the F field to have no place to be filled. In this embodiment, two types of F are set. NALU treats separately. For the error NALU set by F, the original RTP transmission is used, and for normal, MRTP is adopted. Transmit, but ignore the F bit. The specific details are as follows.

In the third embodiment, the M field is set to 1 to identify the MRTP packet, which is located in the first 1 byte of the 2nd byte of the MRTP header information. For F bits, it is specified in the H.264 protocol: 1 if there is a syntax conflict or an error. When the network recognizes that there is a bit error in this unit, it can be set to 1 so that the receiver drops the unit. It is mainly used to adapt to different kinds of network environments, such as wired and wireless combined environments. The specific usage principle is: Generally, when the sender and receiver of the communication perform H.264 encoding and decoding on the video, the bit is not "written, operated, and the decoder performs a "read" operation on the bit. If F=l is found, the receiver will discard the NALU during the decoding process. According to the current general application of the industry, the "write" operation for the F bit is mainly the gateway between two different networks. Performed on, for example, the case of encoding conversion (MPEG-4 to H.264, H.263 to H.264, etc.).

Therefore, in the third embodiment of the present invention, the F bit is ignored and is not used for the purpose of the original H.264 definition. Thus, the M field originally used to fill the F bits can be reserved, and the future extension carries more information for identifying the MRTP packet. In this way, it is not necessary to modify the version information V = 2, and the MRTP still uses the original version V to take the value 2. This also saves the current RTP version information resources.

However, in practical applications, there may be a small rate condition that requires the use of F bits. For example, when the NALU syntax is wrong, the third embodiment of the present invention performs the following processing for this case: In the MRTP encapsulation format, the above is ignored. The F field in the NALU header information; but on the sender side, the error NALU that is valid for the F field is still encapsulated in the RTP packet, and only the normal NALU is used in the MRTP wrapper; on the receiving side, the receiver is judged to be the MRTP or the RTP packet. The package is processed in the corresponding package format. That is, when the F bit is used in some special cases, it is used for the purpose of the original H.264 definition, that is, to indicate the possible H.264 NALU syntax error, if an intermediate device such as a gateway is in the When the video is video-encoded according to the H.264 protocol, it is found that a certain NALU has a syntax error, and then the NALU is separately packaged.

The process of summarizing the above MRTP and RTP alternate processing is as follows:

The sender first determines whether the F field in the header information of at least one NALU is valid, and accordingly divides it into a normal NALU and an error NALU;

Then encapsulate the normal NALU into an MRTP packet according to the MRTP encapsulation format, and set the MRTP. Identification; package the error NALU into an RTP package in the RTP encapsulation format;

The receiver first determines whether the header information of the received packet is an MRTP identifier, and divides it into an MRTP packet and an RTP packet;

The MRTP packet is then processed according to the MRTP encapsulation format, and the RTP packet is processed according to the RTP packet encapsulation format.

It can be seen that, in the third embodiment of the present invention, the gateway is in accordance with the foregoing method for the normal NALU, according to a certain rule of the same type of H.264 NALU (determined by the specific application, mainly specified in each MRTP data packet) How many similar NALUs are encapsulated for MRTP encapsulation. Once a NALU is found to have a syntax error, a regular RTP encapsulation is required for the NALU. At this time, the regular RTP packet may contain only one H.264 NALU.

The premise of the above method is that in the continuous H.264 NALU stream, a separate syntax error NALU occasionally appears. At this time, the wrong NALU is taken out separately and encapsulated in RTP. On the receiving side, if the MRTP packet is received, the H.264 NALU is decapsulated according to the MRTP rule; if the RTP packet is received, the H.264 NALU is decapsulated according to the RTP rule.

When the H.264 encoding error occurs in the intermediate device, and there are consecutive consecutive H.264 NALUs with syntax errors, such as M consecutive syntax errors NALU, then the M NALUs can still be encapsulated by the traditional RTP. In addition, even if the NALUs in the NALU stream are inexhaustible, the error NALUs can be accumulated until they reach the length of one RTP packet and then packed with RTP, which can save bandwidth without affecting the receiver, because the receiver It is possible to know which NALUs are missing based on the sequence number.

It can be seen that although this scheme uses traditional RTP to transmit, it does not affect the benefits of MRTP. Because the normal NALU can be encapsulated in MRTP, the benefits can be enjoyed, such as the QoS mechanism that may be adopted based on the header information. For the faulty NALU, the processing at the receiver is generally discarded, so they cannot get the benefits of MRTP. One notices that the type of the NALU and its corresponding Type field given in Table 1 mentioned above, it can be found that there are less than 16 types of existing types, that is, the 5 bits of the Type can be reduced to four. Without affecting the existing H.264 transmission, therefore, in the fourth embodiment of the present invention, in the MRTP encapsulation format, when all types of NALUs are less than 16 types, only the lower 4 bits of the Type field are used, and The highest bit of Type is used as an extended reserved bit. Called the C field, as shown in Figure 3. Leave the C bit for later use and continue with the function expansion. After the bit C is reserved, the NALU type given in Table 1 should be modified accordingly: A total of 16 values, the values 0-12 are the same as Table 1, and the values 13-15 are reserved.

Of course, although there are only 13 NALU types of H.264, H.264 will be developed later, and more NALU types may be generated. If the number of NALUs is increased to more than 16 in the future, it is still necessary to use PT 7 bits. The lowest 4 bits plus C bits are used as type indications.

With the MRTP of the present invention, the multimedia transmission device can directly learn the relevant information of the NALU carried by the multimedia transmission device according to the MRTP header information, and implement the QoS policy for real-time transmission of the H.264 multimedia data according to the same. This is not possible in the existing RTP, because for the RTP layer, the NALU layer information is not concerned, and the head information of each NALU in the payload cannot be known, so that the QoS policy cannot be implemented.

While the invention has been illustrated and described with reference to the preferred embodiments embodiments The essence and scope of the invention.

Claims

Rights request

An H.264-based multimedia data transmission method, the multimedia data is divided into a network abstraction layer unit stream in a network abstraction layer, and the network abstraction layer unit includes header information, and the method includes:

The receiving party determines, according to the identifier, whether the packet is an improved real-time transport protocol packet, and if so, processes the packet according to the improved real-time transport protocol encapsulation format, and obtains the carried network abstraction layer unit;

The multimedia data transmission method according to claim 1, wherein the network abstraction layer unit header information comprises:

The multimedia data transmission method according to claim 2, wherein in the improved real-time transport protocol encapsulation format, the network abstraction layer reference identifier field and the type field are populated in the improved real-time transport protocol header information In the payload type field.

The multimedia data transmission method according to claim 3, wherein the improved real-time transmission protocol identifier is a version information field of the improved real-time transmission protocol header information.

The method for transmitting real-time multimedia data according to claim 4, wherein in the improved real-time transport protocol encapsulation format, the forbidden bit field is filled in a tag field of the improved real-time transport protocol header information;

The receiver determines, according to the marked field of the improved real-time transport protocol packet, whether the network abstraction layer unit it carries is in error. The receiving party includes a communication terminal and a network intermediate device.

The multimedia data transmission method according to claim 3, wherein the improved real-time transmission protocol identifier is in a tag field of the improved real-time transmission protocol header information.

The multimedia data transmission method according to claim 6, wherein the sender first determines whether the forbidden bit field in the header information of the at least one network abstraction layer unit is valid, and accordingly divides the virtual data abstract into a normal network abstraction. Layer unit and error network abstraction layer unit;

The multimedia data transmission method according to claim 6, wherein the receiving party first divides the header information of the received packet into the modified real-time transmission protocol identifier, and divides the improved real-time transmission protocol identifier into the improved real-time. Transmitting a protocol packet and the real-time transport protocol packet; processing the improved real-time transport protocol packet according to the improved real-time transport protocol encapsulation format, and processing the real-time transport protocol packet according to the real-time transport protocol packet encapsulation format.

The multimedia data transmission method according to any one of claims 3 to 8, wherein in the improved real-time transport protocol encapsulation format, when the type of the network abstraction layer unit is less than 16 types, Only the lower 4 bits of the type field are characterized, and the highest bit of the type field is used as the extended reserved bit.

The multimedia data transmission method according to any one of claims 3 to 8, wherein the multimedia transmission device learns related information of the network abstraction layer unit carried by the multimedia transmission device according to the improved real-time transmission protocol header information. And implementing the quality of service strategy for real-time delivery of the multimedia data accordingly.