WO2007045140A1 - A real-time method for transporting multimedia data - Google Patents

A real-time method for transporting multimedia data Download PDF

Info

Publication number
WO2007045140A1
WO2007045140A1 PCT/CN2006/001845 CN2006001845W WO2007045140A1 WO 2007045140 A1 WO2007045140 A1 WO 2007045140A1 CN 2006001845 W CN2006001845 W CN 2006001845W WO 2007045140 A1 WO2007045140 A1 WO 2007045140A1
Authority
WO
WIPO (PCT)
Prior art keywords
real
header information
transport protocol
rtp
time transport
Prior art date
Application number
PCT/CN2006/001845
Other languages
French (fr)
Chinese (zh)
Inventor
Bin Song
Zhong Luo
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2007045140A1 publication Critical patent/WO2007045140A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8451Structuring of content, e.g. decomposing content into time segments using Advanced Video Coding [AVC]

Definitions

  • the present invention relates to the field of multimedia communication technologies, and in particular, to a method for real-time transmission of multimedia data. Background technique
  • Real-time streaming is real-time delivery, especially for live events. Real-time streaming must match the connection bandwidth, which means that image quality will degrade due to reduced network speed to reduce the need for transmission bandwidth.
  • the concept of "real time" means that the delivery of data in an application must be kept in precise time relationship with the generation of the data.
  • video communication is gradually becoming one of the main services of communication.
  • Two-way or multi-party video communication services such as video telephony, video conferencing, and mobile terminal multimedia services, impose strict requirements on the transmission of multimedia data streams and the quality of services. Not only does network transmission require better real-time performance, but equivalently requires video data compression coding to be more efficient.
  • the ITU-T Telecommunication Standardization Sector In view of the current demand for media communication, the ITU-T Telecommunication Standardization Sector officially released H.264 in 2003 after the development of video compression standards such as H.261, H.263, and H.263+. standard.
  • This is a highly efficient compression coding standard that is jointly developed by the ITU-T and the Moving Picture Experts Group (MPEG) of the International Standardization Organization (ISO) to adapt to the new phase of network media delivery and communication requirements. It is also the main content of Part 10 of the MPEG-4 standard.
  • MPEG Moving Picture Experts Group
  • ISO International Standardization Organization
  • the purpose of the H.264 standard is to improve video coding efficiency and its network The suitability of the network.
  • the H.264 video compression coding standard has gradually become the mainstream standard in multimedia communication.
  • a large number of ⁇ .264 multimedia real-time communication products such as conference TV, videophone, 3CJ mobile communication terminal
  • network streaming products have been introduced.
  • Whether or not to support ⁇ .264 has become a key factor in determining the competitiveness of products in this market segment. It can be predicted that with the official promulgation and widespread use of ⁇ .264, multimedia communication based on IP networks and 3G and 3G wireless networks will inevitably enter a new stage of rapid development.
  • multimedia communication not only requires high efficiency of media compression coding, but also requires real-time transmission of the network.
  • multimedia streaming is basically based on the RTP Real-time Transport Protocol and its Real-time Transport Control Protocol (RTCP).
  • RTP is a transport protocol for multimedia data streams over the Internet, published by the Internet Engineering Task Force (IETF).
  • IETF Internet Engineering Task Force
  • RTP is defined to work in one-to-one or one-to-many transmissions with the goal of providing time information and stream synchronization.
  • the typical application of RTP is based on the User Datagram Protocol (UDP), but it can also work on other protocols such as TCP (Transport Control Protocol) or Asynchronous Transfer Mode (ATM). .
  • UDP User Datagram Protocol
  • ATM Asynchronous Transfer Mode
  • RTP itself only guarantees the transmission of real-time data, and does not provide a reliable transmission mechanism for transmitting packets in sequence, nor does it provide flow control or congestion control. It relies on RTCP to provide these services.
  • RTCP is responsible for managing the transmission quality to exchange control information between current application processes.
  • each participant periodically transmits RTCP packets, which contain statistics such as the number of transmitted packets and the number of lost packets. Therefore, the server can use this information to dynamically change the transmission rate, even Change the payload type. Used in conjunction with RTCP, RTP optimizes transmission efficiency with efficient feedback and minimal overhead, making it ideal for real-time data on the delivery network.
  • H.264 multimedia data is transmitted over IP networks, also based on UDP and its upper layers.
  • RTP protocol itself is structurally applicable to different media data types, but different high-level protocols or media compression coding standards in multimedia communication (eg H.261, H.263, .MPEG-1/-2/-4, MP3, etc.), the IETF will formulate a specification file for the RTP payload (Payload) packaging method of the protocol, and specify the method of RTP encapsulation and packaging, which is optimized for the specific protocol.
  • the corresponding IETF standard for H.264 is PC 3984: RTP Payload Format for H.264 Videowhi This standard is currently H.264 The main standard for video stream transmission over IP networks is widely used. In the field of video communication, the products of major manufacturers are based on RFC 3984, and it is currently the only H.264/RTP transmission method.
  • H.264 defines a new layer called NAL (Network Abstract Layer), which is a standard that makes it standard.
  • NAL Network Abstract Layer
  • the interface opens up the underlying business capabilities and shields the underlying network from the differences and abstracts the business capability layer.
  • VCL Video Coding Layer
  • H.264 brings greater application flexibility and defines a new layer of NAL.
  • the early ITU-T video compression coding protocols such as H.261, H.263/H.263+/H.263++ were not available.
  • RTP better for H.264, practical, and worthy of study.
  • the scheme encapsulates the NAL layer data in the RTP payload for bearer based on the RTP protocol (RFC 3550).
  • the NAL layer is located between the VCL and the RTP, and is configured to divide the video stream into a series of network abstraction layer data units (NALUs, NAL Units) according to defined rules and structures.
  • NALUs, NAL Units network abstraction layer data units
  • the encapsulation format of the RTP payload for NALU is defined in RFC3984. The following briefly introduces the RTP frame format and the existing NALU encapsulation method.
  • RTP Real-time multimedia conferencing and continuous data storage, interactive distributed simulation, control and measurement applications.
  • RTP is typically carried over the UDP protocol to take advantage of its multiplexing and parity functions. If the underlying provides multipoint distribution, RTP supports multi-address delivery.
  • the functions provided by RTP include: payload type identification, sequence number, time stamp, and transmission monitoring.
  • RTP packages the NALU package of H.264 into an RTP packet stream.
  • the NALU is mainly defined in the RFC 3984 file, and the H.264 layer is given based on this.
  • FIG. 1 shows the encapsulation structure of a NALU in the payload of the RTP.
  • the first byte in the previous byte is the NALU header information, and then the data content of the NALU, and the payloads of the multiple NALUs that are filled in the end to the RTP packet.
  • there is an optional RTP padding which is specified in the RTP packet format, in order to make the length of the RTP packet meet certain requirements (such as Fixed length), optional RTP fill data is generally zero filled.
  • the NALU header information is the first byte, also known as the Octet, which has three fields. The meaning and full name are described as follows:
  • the F field is defined as a forbidden bit (forbidden_zero-bit), which is 1 bit, used to identify grammatical errors, etc., and is set to 1 if there is a syntax conflict.
  • a forbidden bit forbidden_zero-bit
  • the NI field is defined as a NAL reference identifier (nal_ref_idc), which is 2 bits, used to indicate the importance of the NALU data.
  • a value of 00 indicates that the content of the NALU is not used to reconstruct the inter-predicted reference image, instead of 00.
  • Indicates that the current NALU is important data such as a slice or a sequence parameter set (SPS, Sequence Parameter Set) and a picture parameter set (PPS, Picture Parameter Set) belonging to a reference frame. The larger the value, the more important the current NAL is;
  • the Type field is defined as NALU type (Nal_ unit_type), which is 5 bits in total, and can have 32 types of NALU. The correspondence between the value and the specific type is given in Table 1.
  • the information given in one byte of the NALU header information mainly includes the validity and importance level of the NALU, and based on the information, the importance of the data carried by the RTP can be determined.
  • the prior art scheme can be summarized as follows: First, the video bit stream of H.264 is segmented according to a certain rule to form a NALU stream in the NAL layer, and this step actually belongs to the category of H.264 implementation, for example, Taking a frame of image as a NALU, you can also use a slice as a NALU, and then package the NALU stream into an RTP packet stream according to the package encapsulation strategy associated with the application.
  • the RTP data packet after the header information is the NALU data area. If an RTP data packet encapsulates multiple NACHs of H.264, then these NALUs are arranged end to end, and each NALU occupies a continuous bit, and each NALU is the same.
  • One byte is the NALU header information byte, and at the end of the RTP packet, there may be some optional padding data bits as needed.
  • the underlying RTP protocol does not process the specific information of the NALU.
  • the NALU header information of H.264 is not reflected in the header information of the RTP packet.
  • the NALU header information byte contains a lot of important information.
  • N I indicates whether the corresponding NALU contains H.264 non-reference frames or reference frames or other important image data such as parameter sets.
  • the RTP protocol itself does not provide any QoS (Quality of Service) mechanism, and does not provide information about the priority of the bearer data, etc., the RTP itself does not have any errors such as network packet loss on the IP and wireless networks. Fault tolerance, or Error Resilience.
  • QoS Quality of Service
  • the information about the H.264 NALU attribute can be obtained by directly scanning the RTP header information. Based on this information, it is possible for the network device to handle the RTP packets differently, so as to ensure the priority of important data in the transmission process.
  • an RTP packet encapsulates multiple H.264 NALUs, the types of these NALUs are the same, then the header bytes of these NALUs are exactly the same. If there are N NALUs in an RTP packet, then the N NALU header information bytes can be replaced by one byte without any loss of information, but the efficiency is improved because N-1 redundant bytes can be reduced. .
  • the header information of the NALU is completely encapsulated in the payload, so that the RTP
  • the protocol cannot directly know the attributes, levels, importance, etc. of the payload, so that the QoS mechanism based on this cannot be implemented.
  • such an encapsulation format also causes the NALU header information to occupy the payload resources, because each NALU has header information, which results in many cases, because the header information of multiple NALUs of the same type in an RTP are the same. , which wastes RTP transmission bandwidth resources.
  • the main purpose of the present invention is to provide a real-time transmission method of H.264 multimedia data, so that the RTP protocol can efficiently carry multimedia video data, and the service is added under the premise of being compatible with existing RTP protocol devices and transmission methods. Quality assurance mechanism.
  • the multimedia data is divided into a network abstraction layer unit stream in a network abstraction layer, and the network abstraction layer unit includes header information, and the method includes:
  • the sender encapsulates at least one network abstraction layer unit with the same header information in the same improved real-time transport protocol packet according to the improved real-time transport protocol encapsulation format, and sets an identifier in the improved real-time transport protocol header information to distinguish it from real-time transmission.
  • Agreement package
  • the receiving party determines whether the packet is an improved real-time transport protocol packet according to the identifier, and if so, processes the packet according to the improved real-time transport protocol encapsulation format, and acquires the carried network abstraction layer unit;
  • the same header information possessed by the network abstraction layer unit carried by the network is included in the header information of the improved real-time transport protocol packet, and the header of the network abstraction layer unit carried After the information is removed, it is populated into the payload of the improved real-time transport protocol packet.
  • the network abstraction layer unit header information includes:
  • a disable bit field configured to indicate whether the network abstraction layer unit is in error
  • a network abstraction layer reference identifier field configured to indicate an importance of the network abstraction layer unit
  • a type field configured to indicate a type of the network abstraction layer unit
  • the network abstraction layer reference identification field and type field are populated in a payload type field of the improved real-time transport protocol header information.
  • the improved real-time transport protocol identifier is the version of the improved real-time transport protocol header information
  • the information field, the version information field is set in the improved real-time transport protocol header information.
  • the forbidden bit field is populated in a tag field of the improved real-time transport protocol header information;
  • the receiver judges whether the network abstraction layer unit carried by the real-time transport protocol packet is in error according to the marked field of the improved real-time transport protocol packet.
  • the receiving party includes a communication terminal and a network intermediate device.
  • the improved real-time transport protocol identifier is in the marked field of the improved real-time transport protocol header information.
  • the sender first determines whether the forbidden bit field in the header information of at least one of the network abstraction layer units is valid, and accordingly divides the barred data field into a normal network abstraction layer unit and an error network abstraction layer unit;
  • the receiver first divides the modified real-time transport protocol identifier into the improved real-time transport protocol packet and the real-time transport protocol packet according to the received header information of the packet; according to the improved real-time transport protocol encapsulation format Processing the improved real-time transport protocol packet, processing the real-time transport protocol packet according to the real-time transport protocol packet encapsulation format.
  • the type of the network abstraction layer unit is less than 16 types, only the lower 4 bits of the type field are used, and the highest bit of the type field is reserved as an extension. Bit.
  • the multimedia transmission device learns related information of the network abstraction layer unit carried by the real-time transmission protocol header information according to the improved real-time transmission protocol header information, and implements the service quality policy for real-time transmission of the multimedia data according to the implementation.
  • the technical solution of the present invention provides an improved RTP protocol (MRTP, Modified RTP) for carrying NALU data by using all NALUs in the same RTP packet.
  • Header information bytes are combined into its header information
  • a combination method is adopted to prevent the operation of the existing RTP protocol and the device, and the attribute of the NALU payload can be directly reflected in the MRTP header information, so that the encapsulation efficiency of the MRTP to the NALU is greatly improved.
  • the implementation of the NAL mechanism by the MRTP on the payload NALU attribute provides the basis for the implementation of the QoS mechanism;
  • the identification method of distinguishing the existing RTP and MRTP is given, which makes it possible for the existing network media equipment to support both RTP and MRTP, and improve The compatibility of MRTP; and the corresponding conventional RTP single transmission method for NALU with syntactical errors or errors, and the scheme of alternate processing of MRTP and RTP data packets are given.
  • the header information of the improved MRTP protocol of the present invention carries the NALU header information of the H.264, so that the importance of the MRTP data packet carrying the NALU can be determined by performing a fast scan of the MRTP header information without decoding the NALU. Therefore, corresponding measures are taken to implement QoS policies, etc., to further improve service quality;
  • the purpose of reducing redundancy and improving transmission efficiency is achieved, thereby improving the video transmission quality of the multimedia video communication of the IP network and further satisfying the requirements of the user.
  • FIG. 1 is a schematic diagram of a format of encapsulation of NALU data in an RTP packet payload in the prior art
  • FIG. 2 is a schematic diagram of a header information structure of an RTP packet
  • FIG. 3 is a schematic diagram showing the structure of a header information of an MRTP data packet according to an embodiment of the present invention. detailed description
  • the present invention aims to provide a multimedia data transmission scheme capable of embodying H.264 NALU header information in the header information of the RTP.
  • the basic principle is to use some or some bytes or bits in the current RTP header information to represent the NALU header information, and the purpose of these bytes or bits in the RTP is to combine with the specific media protocol carried. Extended space of.
  • the improved RTP protocol will not affect the interoperability with devices supporting the original RTP protocol, that is, some terminals in the communication adopt the improved RTP protocol according to the present invention, and the other terminals adopt the unmodified RTP protocol, and the terminals use It is fully communicable.
  • the terminal can also adopt different processing measures for different situations to implement the alternate transmission processing scheme of MRTP and RTP. This includes a solution to the syntax error NALU, which is transmitted using traditional RTP.
  • the improvement of the existing RTP by MRTP mainly involves the redefinition of the payload type FT field and the version information V field in the RTP packet header information.
  • the two potential values of the scheme are: Provide a certain QoS mechanism for H.264 data transmission; Improve the efficiency of RTP encapsulation H.264 NALU.
  • the format of the RTP packet is briefly introduced:
  • the basic option of the RTP header information occupies 12 bytes (minimum case), and the header information of the IP protocol and the UDP protocol respectively occupy 20 bytes and 8 words. Therefore, the RTP packet is encapsulated in the UDP packet and then encapsulated in the IP packet.
  • the detailed structure of the header information of the RTP packet is shown in Figure 2.
  • the front-to-back RTP header information shown in Figure 2 is:
  • the first byte (byte 0) is some field about the header information structure itself
  • the second byte (byte 1) is the defined payload type
  • the third 4 bytes (bytes 2, 3) are the sequence number (Sequence Number)
  • the 5th-8th byte is the timestamp (timestamp)
  • the 9th-12th byte is the synchronous contribution source identifier (SSRC ID, Synchronous Source) Identifier )
  • SSRC ID Synchronous Source
  • CSRC Ids Contributing Source Identifiers
  • the first 12 bytes appear in all different types of RTP packets, while other data in the header information, such as the contribution source identifier, is only available when the mixer is inserted. Therefore, CSRC is generally used when there is media mixing. For example, in a multi-party conference, the audio needs to be mixed, and the video can also provide multi-screen functions in this way.
  • the synchronization source identifier SSRC is actually the identifier of the carried media stream.
  • the P field is a padding flag (Padding), which is 1 bit. If P is set, it indicates that the packet contains one or more padding bytes (Padding) at the end, and the padding does not belong to a part of the payload;
  • the X field is an extension flag bit (Extension) ), occupying lbit, if X is set, the last part of the RTP header must be followed by a variable-length header extension (if there is a CSRC list, the header extension is followed), mainly for the header information in some application environments.
  • the header extension includes a 16-bit length field to count how many 32-bit words are in the extension, and the first 16 bits of the header extension are left-opened to distinguish between identifiers and parameters.
  • the format of the bit is defined by a specific level specification, which is described in detail in Section 5.3.1 of RFC 3550, which is not given here;
  • the CC field is the number of contributing sources (CSRC Count), which is 4 bits, indicating the number of CSRC identifiers at the end of the header information.
  • the receiver can determine the length of the CSRC IDs list following the header information according to the CC field.
  • the M field is a marker bit (Marker), which occupies 1 bit.
  • the interpretation of the identifier bit is defined in a specific profile. It allows identification of important events in the packet stream.
  • One layer can define additional identification bits or specify no Identification bit, the so-called level refers to the specific application environment setting, which is specifically agreed by the communication parties and is not limited by the agreement;
  • the PT Payload Type
  • the PT indicates the payload type, a total of 7 bits, identifies the format of the RTP payload and confirms his interpretation in the application; the flag bit and the payload type share a specified number of bytes, and this byte may be specified.
  • the level is redefined to suit different needs.
  • the so-called profile can be defined in a specific application. In fact, it is a set of static (that is, agreed by the communication parties), and the different values of the PT bits are associated with different media formats.
  • dynamic negotiation can also be used to define the relationship between the PT value and the media format through signaling other than RTP.
  • the RTP source can change the PT.
  • serial number 16 bits. Each time an RTP data packet is sent, the serial number value is incremented by one, so that the receiver can use it to detect the data packet loss and recover the data packet sequence.
  • the initial value of the serial number in one communication can be given randomly. , does not affect communication.
  • the timestamp occupies 32 bits, which reflects the sampling time of the first byte in the RTP packet.
  • the sampling time here must be derived from a monotonically increasing clock, and the receiver adjusts the media playback time or synchronizes according to it.
  • the synchronization source SSRC ID occupies 32 bits, and its specific value can be randomly selected. However, to ensure the uniqueness in the same RTP session, it can uniquely identify a media source. If a source changes the source transmission address, a new SSRC must be selected. The identifier.
  • the source CSRC list can be 0-15 items as required, each item occupying 32 bits.
  • the length of the list ie the number of CSRC IDs, is exactly indicated by the 4 bits of the CC field.
  • the CSRC identifier used to identify a media source is identical to the SSRC identifier of its corresponding contribution source, except that the role of the different receivers is different and is set to SSRC or CSRC.
  • the CSRC ID is inserted by the mixer.
  • NALU data is encapsulated in a modified RTP (MRTP, Modified RTP) format, which is described below by way of specific implementation. All the descriptions of MRTP given are only different from the existing RTP. The most basic difference between MRTP and RTP is that the header information of the NALU packet with the same header information is integrated into the header information of the MRTP during the MRTP encapsulation process.
  • MRTP Modified RTP
  • the NALU header information structure has been mentioned above, and the NALU information includes: a 1-bit F field for indicating whether the NALU is in error;
  • a 5-bit Type field indicating the type of the NALU.
  • the sender encapsulates multiple NALUs with the same header information in the same MRTP packet according to the MRTP encapsulation format, and sets an MRTP identifier in the MRTP header information to distinguish the RTP packets.
  • only the same type of H.264 NALU is stored in the same MRTP data packet, that is, it has the same header information.
  • the same type of NALU can be accumulated until a certain number is satisfied and then encapsulated into MRTP.
  • the RTP padding method may be adopted.
  • the receiver determines that the packet is an MRTP packet according to the MRTP identifier, and is encapsulated according to the MRTP.
  • the format processes the MRTP packet to obtain the NALU carried.
  • the receiver can identify the MRTP packet according to the MRTP identifier, which is mainly different from the RTP packet, so that the terminal using the MRTP protocol does not affect the normal communication of the existing RTP protocol, and has backward compatibility.
  • the above mentioned MRTP encapsulation format integrates the same header information of the NALU carried by the NATU in the header information of the MRTP packet, and removes the header information of the carried NALU and fills the MRTP packet. In the payload. This is the main difference between MRTP and RTP. As mentioned earlier, this facilitates function expansion and saves bandwidth.
  • the focus here is on integrating the NALU header into the MRTP header and identifying the MRTP packet.
  • the NRI field and the Type field in the NALU header information are filled in the PT field of the MRTP header information.
  • the PT field is located in the second byte of the MRTP header information. The last 7 bits.
  • the format of this MRTP header is shown in Figure 3, where the difference from RTP has been shown in bold, and in addition, some places in the figure are explained later.
  • the V field in the MRTP header is used as the MRTP identifier, and if it is the MRTP packet, the V field is taken as 3 (binary value 11), and the V field is located before the first byte of the MRTP header information.
  • the M field of the MRTP packet determines whether the NALU carried by the MRTP is in error, and the F bit disable function is also implemented.
  • the current version V of the MRTP is set to 3, which is equivalent to the new version of RTP, and the current RTP version V has a value of 2.
  • the RTP protocol is an improved version of MRTP, so that the subsequent processing is performed according to the processing flow for improving the RTP protocol.
  • An alternative will be described later, and the purpose of representing the difference between MRTP and RTP can be achieved without modifying V.
  • the NALU header information byte (8 bits) is replaced with the identification M field 1 bit in the original RTP header information and the PT field 7 bits total 8 bits.
  • the specific replacement order can be like this:
  • F bits replace M bits; NRI 2 bits replace the highest 2 bits of the PT 7 bits;
  • Type 5 bits replace the lowest 5 bits of the PT 7 bits
  • the PT 7 bits are free to use.
  • the purpose of the M field is specified in RTP (RFC 3550) as follows:
  • a specific profile (Profile) can specify not to use M bits, but to incorporate it into the PT, so that ⁇ can have up to 8 bits, distinguishing 256 different type. Therefore, replacing the ⁇ bits with F bits is completely RTP-compliant and does not affect the interworking between MRTP and the original RTP.
  • the MRTP encapsulation format of the present invention has three obvious advantages: First, the overhead is small, especially when there are multiple NALUs in one RTP, the number of transmission bits is obviously saved; Second, there is no need for the RTP packets. .264 NALU data decoding can discriminate the relative importance of these NALUs. Third, without decoding the ⁇ .264 NALU data in the RTP data packet, it can identify whether the RTP packet can be correctly decoded due to other bit loss. .
  • the following describes the process of MRTP encapsulation and decapsulation.
  • multiple H.264 NALU types in the same MRTP data packet are identical, that is, their header information bytes are the same, and when they are encapsulated into the MRTP data packet, the original information can be stripped off.
  • the header information byte so if there are N NALUs, you can reduce N bytes.
  • the NALU When decapsulating, the NALU is extracted from the MRTP packet and restored to the original form, that is, the N NALUs are extracted from the MRTP packets they are in, and then the 7 bits of the PT in the MRTP header information are copied to The lowest 7 bits of one byte H (8 bits) are removed, and the highest bit of H is set to 0 as the F bit. The generated H bytes are then appended to the top of each extracted NALU, thus restoring each NALU.
  • the F field in the MRTP header is 1, it indicates that the NALU in the MRTP packet is in error, so it can be directly discarded, which saves processing time.
  • a second solution is given on the basis of the first embodiment, which is similar to the second embodiment in that: the NI and Type fields in the NALU header are filled into 7 bits of the PT field of the MRTP header. .
  • two types of F are set. NALU treats separately. For the error NALU set by F, the original RTP transmission is used, and for normal, MRTP is adopted. Transmit, but ignore the F bit. The specific details are as follows.
  • the M field is set to 1 to identify the MRTP packet, which is located in the first 1 byte of the 2nd byte of the MRTP header information.
  • it is specified in the H.264 protocol: 1 if there is a syntax conflict or an error.
  • the network recognizes that there is a bit error in this unit, it can be set to 1 so that the receiver drops the unit. It is mainly used to adapt to different kinds of network environments, such as wired and wireless combined environments.
  • the specific usage principle is: Generally, when the sender and receiver of the communication perform H.264 encoding and decoding on the video, the bit is not "written, operated, and the decoder performs a "read" operation on the bit.
  • the receiver will discard the NALU during the decoding process.
  • the "write" operation for the F bit is mainly the gateway between two different networks. Performed on, for example, the case of encoding conversion (MPEG-4 to H.264, H.263 to H.264, etc.).
  • the F bit is ignored and is not used for the purpose of the original H.264 definition.
  • the third embodiment of the present invention performs the following processing for this case: In the MRTP encapsulation format, the above is ignored. The F field in the NALU header information; but on the sender side, the error NALU that is valid for the F field is still encapsulated in the RTP packet, and only the normal NALU is used in the MRTP wrapper; on the receiving side, the receiver is judged to be the MRTP or the RTP packet. The package is processed in the corresponding package format.
  • the F bit when used in some special cases, it is used for the purpose of the original H.264 definition, that is, to indicate the possible H.264 NALU syntax error, if an intermediate device such as a gateway is in the When the video is video-encoded according to the H.264 protocol, it is found that a certain NALU has a syntax error, and then the NALU is separately packaged.
  • the sender first determines whether the F field in the header information of at least one NALU is valid, and accordingly divides it into a normal NALU and an error NALU;
  • the receiver first determines whether the header information of the received packet is an MRTP identifier, and divides it into an MRTP packet and an RTP packet;
  • the MRTP packet is then processed according to the MRTP encapsulation format, and the RTP packet is processed according to the RTP packet encapsulation format.
  • the gateway is in accordance with the foregoing method for the normal NALU, according to a certain rule of the same type of H.264 NALU (determined by the specific application, mainly specified in each MRTP data packet) How many similar NALUs are encapsulated for MRTP encapsulation.
  • a regular RTP encapsulation is required for the NALU.
  • the regular RTP packet may contain only one H.264 NALU.
  • the premise of the above method is that in the continuous H.264 NALU stream, a separate syntax error NALU occasionally appears. At this time, the wrong NALU is taken out separately and encapsulated in RTP.
  • the H.264 NALU is decapsulated according to the MRTP rule; if the RTP packet is received, the H.264 NALU is decapsulated according to the RTP rule.
  • the M NALUs can still be encapsulated by the traditional RTP.
  • the error NALUs can be accumulated until they reach the length of one RTP packet and then packed with RTP, which can save bandwidth without affecting the receiver, because the receiver It is possible to know which NALUs are missing based on the sequence number.
  • the fourth embodiment of the present invention in the MRTP encapsulation format, when all types of NALUs are less than 16 types, only the lower 4 bits of the Type field are used, and The highest bit of Type is used as an extended reserved bit. Called the C field, as shown in Figure 3. Leave the C bit for later use and continue with the function expansion. After the bit C is reserved, the NALU type given in Table 1 should be modified accordingly: A total of 16 values, the values 0-12 are the same as Table 1, and the values 13-15 are reserved.
  • the multimedia transmission device can directly learn the relevant information of the NALU carried by the multimedia transmission device according to the MRTP header information, and implement the QoS policy for real-time transmission of the H.264 multimedia data according to the same.
  • the NALU layer information is not concerned, and the head information of each NALU in the payload cannot be known, so that the QoS policy cannot be implemented.

Abstract

A method for real-time transporting H.264 multimedia video data is disclosed, it makes that the RTP protocol could efficiently carry H.264 multimedia video data, and enhances the ensuring mechanism of the service quality on the premise of being compatible with the known apparatus and transport manners with RTP protocol. A modified RTP protocol for carrying H.264 data is disclosed, by combing all header information bytes of H.264 NALU into the header information of RTP packet itself, so as not to affect the action of known RTP protocol and apparatus, and it could directly indicate the attribute of H.264 NALU payload in the MRTP header information; further an identification method for discriminating the known RTP from MRTP is provided, by modifying the relevant fields such as M, F field of the known RTP header information, this makes it possible that the known network media apparatus could support both RTP and MRTP to work, and improves the compatibility of MRTP and the flexibility of application.

Description

一种多媒体数据传送实时方法  Real-time method for multimedia data transmission
技术领域 Technical field
本发明涉及多媒体通信技术领域, 具体涉及多媒体数据实时传送方 法。 背景技术  The present invention relates to the field of multimedia communication technologies, and in particular, to a method for real-time transmission of multimedia data. Background technique
随着计算机互联网 (Internet )和移动通信网络的飞速发展, 流媒体 技术的应用越来越广泛, 从网上广播、 电影播放到远程教学以及在线的 新闻网站等都用到了流媒体技术。 当前网上传送视频、 音频主要有下载 ( Download )和流式传送 ( Streaming ) 两种方式。 流式传送是连续传送 视 /音频信号, 当流媒体在客户机播放时其余部分在后台继续下载。 流式 传送有顺序流式传送 (Progressive Streaming)和实时流式传送 (Realtime Streaming)两种方式。 实时流式传送是实时传送, 特别适合现场事件, 实 时流式传送必须匹配连接带宽, 这意味着图像质量会因网络速度降低而 变差, 以减少对传送带宽的需求。 "实时" 的概念是指在一个应用中数据 的交付必须与数据的产生保持精确的时间关系。  With the rapid development of computer Internet (Internet) and mobile communication networks, streaming media technology is becoming more and more widely used, from streaming media, movie playback to distance learning and online news sites. Currently, there are two ways to download video and audio on the Internet, including download and streaming. Streaming is the continuous transmission of video/audio signals, and the rest of the video continues to be downloaded in the background while the streaming media is playing. Streaming has two methods: Progressive Streaming and Realtime Streaming. Real-time streaming is real-time delivery, especially for live events. Real-time streaming must match the connection bandwidth, which means that image quality will degrade due to reduced network speed to reduce the need for transmission bandwidth. The concept of "real time" means that the delivery of data in an application must be kept in precise time relationship with the generation of the data.
尤其是随着第三代移动通信系统( 3G, 3rd Generation ) 的出现和普 遍基于网际协议 ( IP, Internet Protocol )的网络迅速发展, 视频通信正逐 步成为通信的主要业务之一。 而双方或多方视频通信业务, 如可视电话、 视频会议、 移动终端多媒体服务等, 更对多媒体数据流的传送及服务质 量提出苛刻的要求。 不仅要求网絡传送实时性更好, 而且等效的也要求 视频数据压缩编码效率更高。  Especially with the advent of third-generation mobile communication systems (3G, 3rd Generation) and the rapid development of Internet-based (Internet Protocol)-based networks, video communication is gradually becoming one of the main services of communication. Two-way or multi-party video communication services, such as video telephony, video conferencing, and mobile terminal multimedia services, impose strict requirements on the transmission of multimedia data streams and the quality of services. Not only does network transmission require better real-time performance, but equivalently requires video data compression coding to be more efficient.
鉴于媒体通信的需求现状, 国际电信联盟标准部 ( ITU-T Telecommunication Standardization Sector )继制定了 H.261、 H.263、 H.263+ 等视频压缩标准后, 于 2003年正式发布了 H.264标准。 这是 ITU-T和国 际标准 t组织(ISO, International Standardization Organization )的运动图 像专家组(MPEG, Moving Picture Experts Group )一起联合制定的适应 新阶段网络媒体传送及通信需求的高效压缩编码标准。 它同时也是 MPEG-4标准第 10部分的主要内容。  In view of the current demand for media communication, the ITU-T Telecommunication Standardization Sector officially released H.264 in 2003 after the development of video compression standards such as H.261, H.263, and H.263+. standard. This is a highly efficient compression coding standard that is jointly developed by the ITU-T and the Moving Picture Experts Group (MPEG) of the International Standardization Organization (ISO) to adapt to the new phase of network media delivery and communication requirements. It is also the main content of Part 10 of the MPEG-4 standard.
制定 H.264标准的目的在于更加有效地提高视频编码效率和它对网 络的适配性。 事实上由于其优越性, H.264视频压缩编码标准^ f艮快就已经 逐渐成为当前多媒体通信中的主流标准。 大量的采用 Η.264 多媒体实时 通信产品 (如会议电视, 可视电话, 3CJ移动通信终端)和网络流媒体产 品先后问世。 是否支持 Η.264 已经成为这个市场领域中决定产品竟争力 的关键因素。 可以预测, 随着 Η.264的正式颁布和广泛使用, 基于 IP网 络和 3G、后 3G无线网络的多媒体通信必然进入一个飞跃发展的新阶段。 The purpose of the H.264 standard is to improve video coding efficiency and its network The suitability of the network. In fact, due to its superiority, the H.264 video compression coding standard has gradually become the mainstream standard in multimedia communication. A large number of Η.264 multimedia real-time communication products (such as conference TV, videophone, 3CJ mobile communication terminal) and network streaming products have been introduced. Whether or not to support Η.264 has become a key factor in determining the competitiveness of products in this market segment. It can be predicted that with the official promulgation and widespread use of Η.264, multimedia communication based on IP networks and 3G and 3G wireless networks will inevitably enter a new stage of rapid development.
如前所述, 多媒体通信不仅要求媒体压缩编码效率高, 而且要求网 络传送的实时性。 目前多媒体流传送基本上都是采用实时传送协议 ( RTP Real-time Transport Protocol )及其控制协议 ( RTCP, Real-time Transport Control Protocol )。 RTP是针对 Internet上多媒体数据流的一个传送协议, 由互联网工程任务组 ( IETF , Internet Engineering Task Force )发布。 RTP 被定义为在一对一或一对多的传送情况下工作, 其目的是提供时间信息 和实现流同步。 RTP 的典型应用建立在用户数据包协议(UDP , User Datagram Protocol )上,但也可以在传送控制协议( TCP , Transport Control Protocol )或异步传送模式 ( ATM, Asynchronous Transfer Mode )等其他 协议之上工作。  As mentioned above, multimedia communication not only requires high efficiency of media compression coding, but also requires real-time transmission of the network. At present, multimedia streaming is basically based on the RTP Real-time Transport Protocol and its Real-time Transport Control Protocol (RTCP). RTP is a transport protocol for multimedia data streams over the Internet, published by the Internet Engineering Task Force (IETF). RTP is defined to work in one-to-one or one-to-many transmissions with the goal of providing time information and stream synchronization. The typical application of RTP is based on the User Datagram Protocol (UDP), but it can also work on other protocols such as TCP (Transport Control Protocol) or Asynchronous Transfer Mode (ATM). .
RTP 本身只保证实时数据的传送, 并不能为按顺序传送数据包提供 可靠的传送机制,也不提供流量控制或拥塞控制,它依靠 RTCP提供这些 服务。 RTCP 负责管理传送质量在当前应用进程之间交换控制信息。 在 RTP会话期间, 各参与者周期性地传送 RTCP 包, 包中含有已发送的数 据包的数量、 丢失的数据包的数量等统计资料, 因此, 服务器可以利用 这些信息动态地改变传送速率, 甚至改变有效载荷类型。 RTP 和 RTCP 配合使用, 能以有效的反馈和最小的开销使传送效率最佳化, 故特别适 合传送网上的实时数据。  RTP itself only guarantees the transmission of real-time data, and does not provide a reliable transmission mechanism for transmitting packets in sequence, nor does it provide flow control or congestion control. It relies on RTCP to provide these services. RTCP is responsible for managing the transmission quality to exchange control information between current application processes. During the RTP session, each participant periodically transmits RTCP packets, which contain statistics such as the number of transmitted packets and the number of lost packets. Therefore, the server can use this information to dynamically change the transmission rate, even Change the payload type. Used in conjunction with RTCP, RTP optimizes transmission efficiency with efficient feedback and minimal overhead, making it ideal for real-time data on the delivery network.
而 H.264多媒体数据在 IP网络上传送, 也是基于 UDP和其上层的 H.264 multimedia data is transmitted over IP networks, also based on UDP and its upper layers.
RTP协议。 RTP本身在结构上对于不同的媒体数据类型都能够适用, 但 是在多媒体通信中不同的高层协议或媒体压缩编码标准 (如 H.261 , H.263, . MPEG-1/-2/-4, MP3等), IETF都会制定针对该协议的 RTP净 荷 (Payload)打包方法的规范文件, 详细规定 RTP封装打包的方法, 对于 该具体协议是经过优化的。 同样的, 对于 H.264也存在对应的 IETF标准 是 PC 3984: RTP Payload Format for H.264 Video„ 该标准目前是 H.264 视频码流在 IP网络上传送的主要标准, 应用 艮广泛。 在视频通信领域, 各主流厂商的产品都是基于 RFC 3984的,也是目前仅有的 H.264/RTP传 送方式。 RTP protocol. RTP itself is structurally applicable to different media data types, but different high-level protocols or media compression coding standards in multimedia communication (eg H.261, H.263, .MPEG-1/-2/-4, MP3, etc.), the IETF will formulate a specification file for the RTP payload (Payload) packaging method of the protocol, and specify the method of RTP encapsulation and packaging, which is optimized for the specific protocol. Similarly, the corresponding IETF standard for H.264 is PC 3984: RTP Payload Format for H.264 Video „ This standard is currently H.264 The main standard for video stream transmission over IP networks is widely used. In the field of video communication, the products of major manufacturers are based on RFC 3984, and it is currently the only H.264/RTP transmission method.
事实上, H.264和以往其它的视频压缩编码协议不同的关键地方在于 H.264定义了一个新的层面, 称为网络抽象层(NAL, Network Abstract Layer ),该层是一种使得可以标准的接口开放底层业务能力, 并屏蔽底层 网络的差异性而抽象的业务能力层。 H.264为了增加其视频编码层 (VCL, Video Coding Layer)和下面具体的网络传送协议层的分离和无关性, 带来 更大的应用灵活性, 定义了 NAL这个新的层面, 该层在 ITU-T早期的视 频压缩编码协议比如 H.261 , H.263/H.263+/H.263++中都是没有的。然而, 如何在 NAL和 RTP协议承载协同工作中针对 H.264的优点设计效率更 高、 更好的方案, 使得 RTP对于 H.264的承载性能更好, 具有实用性, 值得研究。  In fact, the key difference between H.264 and other video compression coding protocols is that H.264 defines a new layer called NAL (Network Abstract Layer), which is a standard that makes it standard. The interface opens up the underlying business capabilities and shields the underlying network from the differences and abstracts the business capability layer. In order to increase the separation and independence of its video coding layer (VCL, Video Coding Layer) and the following specific network transport protocol layer, H.264 brings greater application flexibility and defines a new layer of NAL. The early ITU-T video compression coding protocols such as H.261, H.263/H.263+/H.263++ were not available. However, how to design a more efficient and better solution for the advantages of H.264 in the NAL and RTP protocol bearer cooperation makes RTP better for H.264, practical, and worthy of study.
RFC3984规范所提出的 RTP承载 H.264的 NAL层数据的方法是目前 仅有的技术方案, 该方案在 RTP协议(RFC 3550 ) 的基础上, 将 NAL 层数据封装在 RTP净荷中进行承载。 NAL层位于 VCL和 RTP之间, 规 定要把视频码流按照定义的规则和结构, 分割成一连串的网络抽象层数 据单元( NALU, NAL Units )。在 RFC3984中定义了 RTP净荷对于 NALU 的封装格式。 下面简单介绍 RTP的帧格式和现有的 NALU的封装方法。  The method of RTP carrying H.264 NAL layer data proposed by the RFC3984 specification is the only technical solution currently. The scheme encapsulates the NAL layer data in the RTP payload for bearer based on the RTP protocol (RFC 3550). The NAL layer is located between the VCL and the RTP, and is configured to divide the video stream into a series of network abstraction layer data units (NALUs, NAL Units) according to defined rules and structures. The encapsulation format of the RTP payload for NALU is defined in RFC3984. The following briefly introduces the RTP frame format and the existing NALU encapsulation method.
RTP .设计的主要目的是实时多媒体会议和连续数据存储、 交互分布 式仿真、 控制和测量应用等。 RTP通常被承载于 UDP协议之上, 以利用 其多路复用和校验的功能。 如果底层提供多点分发, RTP 支持多地址传 送。 RTP提供的功能包括: 载荷类型鉴别, 序列编号, 时间戳, 和发送 监测。  RTP. The main purpose of the design is real-time multimedia conferencing and continuous data storage, interactive distributed simulation, control and measurement applications. RTP is typically carried over the UDP protocol to take advantage of its multiplexing and parity functions. If the underlying provides multipoint distribution, RTP supports multi-address delivery. The functions provided by RTP include: payload type identification, sequence number, time stamp, and transmission monitoring.
在承载 H.264视频的情况下, RTP把 H.264的 NALU封装打包成 RTP 包流。 在 RFC 3984文件中主要定义了 NALU, 并基于此给出 H.264层 In the case of carrying H.264 video, RTP packages the NALU package of H.264 into an RTP packet stream. The NALU is mainly defined in the RFC 3984 file, and the H.264 layer is given based on this.
NAL数据在 RTP中的封装格式, 如图 1所示。 The encapsulation format of NAL data in RTP is shown in Figure 1.
图 1中示出了一个 NALU在 RTP的净荷中的封装结构, 前面第一个 字节为 NALU头信息,之后为 NALU的数据内容, 多个 NALU首尾相接 的填充到 RTP包的净荷中,在最后还有可选的 RTP填充, 这是 RTP包格 式规定的内容, 是为了使得 RTP包的长度符合某种特定要求(比如达到 固定长度), 可选的 RTP填充数据一般都填零。 FIG. 1 shows the encapsulation structure of a NALU in the payload of the RTP. The first byte in the previous byte is the NALU header information, and then the data content of the NALU, and the payloads of the multiple NALUs that are filled in the end to the RTP packet. In the end, there is an optional RTP padding, which is specified in the RTP packet format, in order to make the length of the RTP packet meet certain requirements (such as Fixed length), optional RTP fill data is generally zero filled.
NALU头信息即第 1个字节,也称为八比特組 (Octet), 其共有三个字 段, 意义和全称分别描述如下:  The NALU header information is the first byte, also known as the Octet, which has three fields. The meaning and full name are described as follows:
F字段定义为禁止比特(forbidden— zero— bit ), 占 1 bit, 用于标识语 法错等情况, 如果有语法冲突则置为 1 , 当网络识别此单元中存在比特错 误时, 可将其设为 1 , 以便接收方丟掉该单元, 主要用于适应不同种类的 网络环境(比如有线无线相结合的环境);  The F field is defined as a forbidden bit (forbidden_zero-bit), which is 1 bit, used to identify grammatical errors, etc., and is set to 1 if there is a syntax conflict. When the network recognizes a bit error in this unit, it can be set. Is 1, for the receiver to drop the unit, mainly used to adapt to different kinds of network environments (such as wired and wireless combined environment);
N I字段定义为 NAL参考标识(nal— ref— idc ), 占 2 bits, 用于指示 NALU数据的重要程度,其值为 00表示 NALU的内容不用于重建帧间预 测的参考图像, 而非 00则表示当前 NALU是属于参考帧的条带 (slice ) 或序列参数集( SPS , Sequence Parameter Set ), 图像参数集( PPS , Picture Parameter Set )等重要数据, 该值越大表示当前 NAL越重要;  The NI field is defined as a NAL reference identifier (nal_ref_idc), which is 2 bits, used to indicate the importance of the NALU data. A value of 00 indicates that the content of the NALU is not used to reconstruct the inter-predicted reference image, instead of 00. Indicates that the current NALU is important data such as a slice or a sequence parameter set (SPS, Sequence Parameter Set) and a picture parameter set (PPS, Picture Parameter Set) belonging to a reference frame. The larger the value, the more important the current NAL is;
Type字段定义为 NALU类型 ( Nal— unit— type ), 共 5 bits, 可以有 32 种 NALU的类型, 其值和具体类型的对应关系在表 1中详细给出。  The Type field is defined as NALU type (Nal_ unit_type), which is 5 bits in total, and can have 32 types of NALU. The correspondence between the value and the specific type is given in Table 1.
表 1 NALU头信息中 Type字段取值与类型对应关系表  Table 1 Relationship between Type and Type of Type Fields in NALU Header Information
Type值 NALU内容的类型  Type value Type of NALU content
0 未指定  0 not specified
1 非 IDR图像的编码 slice  1 encoding of non-IDR images
2 编码 slice数据划分 A  2 encoding slice data division A
3 编码 slice数据划分 B  3 encoding slice data division B
4 编码 slice数据划分 C  4 encoding slice data division C
5 IDR图像中的编码 slice  5 Code slice in IDR image
6 SEI (补充增强信息)  6 SEI (Supplemental Enhancement Information)
7 SPS (序列参数集)  7 SPS (sequence parameter set)
8 PPS (图像参数集)  8 PPS (image parameter set)
9 接入单元定界符  9 access unit delimiter
10 序列结束  10 end of sequence
11 码流结束  11 code stream ends
12 填充数据  12 Fill data
13-23 保留  13-23 Reserved
24-31 未指定 可见, NALU的头信息的一个字节中给出的信息主要包含 NALU的 有效性、 重要性等级, 根据这些信息可以确定 RTP所承载的数据重要性。 24-31 not specified It can be seen that the information given in one byte of the NALU header information mainly includes the validity and importance level of the NALU, and based on the information, the importance of the data carried by the RTP can be determined.
综上所述,可以将现有技术的方案归纳如下:首先在 NAL层将 H.264 的视频比特流按照一定的规则分割形成 NALU 流, 这一步实际上属于 H.264实现的范畴, 比如可以把一帧图像作为一个 NALU, 也可以把一个 Slice作为一个 NALU, 然后根据和应用相关的封装打包策略把 NALU流 封装打包形成 RTP数据包流。 RTP数据包中, 在头信息之后就是 NALU 数据区,如果一个 RTP数据包封装多个 H.264的 NALU,那么这些 NALU 首尾相接排列, 每个 NALU占据一段连续的比特, 每个 NALU的第一个 字节是 NALU头信息字节, 而在 RTP数据包的最后, 根据需要, 可能存 在一些可选性的填充数据比特。 在传送过程中, 底层 RTP协议并不处理 NALU的具体信息。  In summary, the prior art scheme can be summarized as follows: First, the video bit stream of H.264 is segmented according to a certain rule to form a NALU stream in the NAL layer, and this step actually belongs to the category of H.264 implementation, for example, Taking a frame of image as a NALU, you can also use a slice as a NALU, and then package the NALU stream into an RTP packet stream according to the package encapsulation strategy associated with the application. In the RTP data packet, after the header information is the NALU data area. If an RTP data packet encapsulates multiple NACHs of H.264, then these NALUs are arranged end to end, and each NALU occupies a continuous bit, and each NALU is the same. One byte is the NALU header information byte, and at the end of the RTP packet, there may be some optional padding data bits as needed. During the transmission process, the underlying RTP protocol does not process the specific information of the NALU.
在实际应用中, 上述方案存在以下问题: 该方案中 H.264的 NALU 头信息没有能够体现在 RTP包的头信息中。 而 NALU头信息字节中, 其 实含有很多重要的信息,比如 N I指示对应的 NALU包含的数据是 H.264 非参考帧还是参考帧或者其它重要图像数据比如参数集等。  In practical applications, the above solution has the following problems: In this scheme, the NALU header information of H.264 is not reflected in the header information of the RTP packet. The NALU header information byte contains a lot of important information. For example, N I indicates whether the corresponding NALU contains H.264 non-reference frames or reference frames or other important image data such as parameter sets.
因为 RTP协议本身不提供任何服务质量( QoS, Quality of Service ) 机制, 并且不提供关于承载数据重要性优先级等的信息, 所以在 IP和无 线网络上 RTP本身没有任何针对网络丟包等错误的容错能力, 或称为容 错弹性( Error Resilience )。  Because the RTP protocol itself does not provide any QoS (Quality of Service) mechanism, and does not provide information about the priority of the bearer data, etc., the RTP itself does not have any errors such as network packet loss on the IP and wireless networks. Fault tolerance, or Error Resilience.
如果能够在 RTP数据包的头信息中,反映 H.264 NALU的一些信息, 则可以通过直接扫描 RTP包头信息获得关于 H.264 NALU属性的信息。 根据这些信息, 网络设备就有可能对于 RTP数据包作有区别的处理, 从 而保证重要数据在传送过程中的优先性。  If some information of the H.264 NALU can be reflected in the header information of the RTP packet, the information about the H.264 NALU attribute can be obtained by directly scanning the RTP header information. Based on this information, it is possible for the network device to handle the RTP packets differently, so as to ensure the priority of important data in the transmission process.
另外, 该方案在效率上还有改进的余地, 如果一个 RTP数据包中封 装了多个 H.264 NALU,—般这些 NALU的类型都一样,那么这些 NALU 的头信息字节其实是完全相同的, 如果有 N个 NALU在一个 RTP包中, 那么这 N个 NALU头信息字节其实可以用一个字节替代, 不损失任何信 息, 但是效率提高了, 因为可以减少 N-1个冗余字节。  In addition, there is room for improvement in efficiency. If an RTP packet encapsulates multiple H.264 NALUs, the types of these NALUs are the same, then the header bytes of these NALUs are exactly the same. If there are N NALUs in an RTP packet, then the N NALU header information bytes can be replaced by one byte without any loss of information, but the efficiency is improved because N-1 redundant bytes can be reduced. .
现有技术方案中将 NALU的头信息完全封装在净荷当中 , 使得 RTP 协议无法直接获知有关净荷的属性、 级别、 重要程度等, 从而无法实现 基于此的 QoS机制。 其次, 这样的封装格式还造成了 NALU头信息占用 净荷资源, 因为每个 NALU的都附带头信息, 导致在很多情况下, 由于 一个 RTP中多个相同类型的 NALU的头信息都是一样的,从而浪费了 RTP 传送带宽资源。 In the prior art solution, the header information of the NALU is completely encapsulated in the payload, so that the RTP The protocol cannot directly know the attributes, levels, importance, etc. of the payload, so that the QoS mechanism based on this cannot be implemented. Secondly, such an encapsulation format also causes the NALU header information to occupy the payload resources, because each NALU has header information, which results in many cases, because the header information of multiple NALUs of the same type in an RTP are the same. , which wastes RTP transmission bandwidth resources.
发明内容 Summary of the invention
有鉴于此, 本发明的主要目的在于提供一种 H.264 多媒体数据实时 传送方法, 使得 RTP协议能够高效承载多媒体视频数据, 在兼容现有釆 用 RTP协议的设备及传送方式的前提下增加服务质量保证机制。  In view of this, the main purpose of the present invention is to provide a real-time transmission method of H.264 multimedia data, so that the RTP protocol can efficiently carry multimedia video data, and the service is added under the premise of being compatible with existing RTP protocol devices and transmission methods. Quality assurance mechanism.
根据本发明提供的一种基于 H.264 的多媒体数据传送方法, 所述多 媒体数据在网络抽象层被分为网络抽象层单元流, 所述网络抽象层单元 包含头信息, 该方法包括:  According to the H.264-based multimedia data transmission method provided by the present invention, the multimedia data is divided into a network abstraction layer unit stream in a network abstraction layer, and the network abstraction layer unit includes header information, and the method includes:
发送方按改进实时传送协议封装格式, 将头信息相同的至少一个网 络抽象层单元封装在同一个改进实时传送协议包中, 并在该改进实时传 送协议包头信息中设置标识, 以区别于实时传送协议包;  The sender encapsulates at least one network abstraction layer unit with the same header information in the same improved real-time transport protocol packet according to the improved real-time transport protocol encapsulation format, and sets an identifier in the improved real-time transport protocol header information to distinguish it from real-time transmission. Agreement package
接收方才艮据所述标识判断该包是否为改进实时传送协议包, 若是, 则根据改进实时传送协议封装格式处理该包, 并获取所承载的网络抽象 层单元;  The receiving party determines whether the packet is an improved real-time transport protocol packet according to the identifier, and if so, processes the packet according to the improved real-time transport protocol encapsulation format, and acquires the carried network abstraction layer unit;
在所述改进实时传送协议封装格式中, 将其所承载的网络抽象层单 元所具有的相同头信息包含在该改进实时传送协议包的头信息中, 并将 所承载的网络抽象层单元的头信息去掉之后, 再填充到该改进实时传送 协议包的净荷中。  In the improved real-time transport protocol encapsulation format, the same header information possessed by the network abstraction layer unit carried by the network is included in the header information of the improved real-time transport protocol packet, and the header of the network abstraction layer unit carried After the information is removed, it is populated into the payload of the improved real-time transport protocol packet.
所述网络抽象层单元头信息包括:  The network abstraction layer unit header information includes:
禁止比特字段, 用于指示所述网络抽象层单元是否出错;  a disable bit field, configured to indicate whether the network abstraction layer unit is in error;
网络抽象层参考标识字段, 用于指示所述网络抽象层单元的重要性; 类型字段, 用于指示所述网络抽象层单元的类型;  a network abstraction layer reference identifier field, configured to indicate an importance of the network abstraction layer unit; a type field, configured to indicate a type of the network abstraction layer unit;
在所述改进实时传送协议封装格式中, 所述网络抽象层参考标识字 段和类型字段填充在所述改进实时传送协议包头信息的净荷类型字段 中。  In the improved real-time transport protocol encapsulation format, the network abstraction layer reference identification field and type field are populated in a payload type field of the improved real-time transport protocol header information.
所迷改进实时传送协议标识为所述改进实时传送协议包头信息的版 本信息字段, 该版本信息字段设置在所述改进实时传送协议包头信息中。 在所述改进实时传送协议封装格式中, 所述禁止比特字段填充在所 述改进实时传送协议包头信息的标记字段中; The improved real-time transport protocol identifier is the version of the improved real-time transport protocol header information The information field, the version information field is set in the improved real-time transport protocol header information. In the improved real-time transport protocol encapsulation format, the forbidden bit field is populated in a tag field of the improved real-time transport protocol header information;
接收方根据所述改进实时传送协议包的标记字段判断其所承载的网 络抽象层单元是否出错。  The receiver judges whether the network abstraction layer unit carried by the real-time transport protocol packet is in error according to the marked field of the improved real-time transport protocol packet.
其中, 所述接收方包含通信终端和网络中间设备。  The receiving party includes a communication terminal and a network intermediate device.
所述改进实时传送协议标识为所述改进实时传送协议包头信息的标 记字段中。  The improved real-time transport protocol identifier is in the marked field of the improved real-time transport protocol header information.
发送方首先判断至少一个所述网络抽象层单元的头信息中的禁止比 特字段是否有效, 据此将其分为正常网络抽象层单元和出错网络抽象层 单元;  The sender first determines whether the forbidden bit field in the header information of at least one of the network abstraction layer units is valid, and accordingly divides the barred data field into a normal network abstraction layer unit and an error network abstraction layer unit;
然后按所述改进实时传送协议封装格式将所述正常网络抽象层单元 封装成所述改进实时传送协议包, 并设所述改进实时传送协议标识, 在 所述改进实时传送协议封装格式中, 忽略所述网絡抽象层单元头信息中 的禁止比特字段;  And then encapsulating the normal network abstraction layer unit into the improved real-time transport protocol packet according to the improved real-time transport protocol encapsulation format, and setting the improved real-time transport protocol identifier, in the improved real-time transport protocol encapsulation format, ignoring a forbidden bit field in the network abstraction layer unit header information;
按所述实时传送协议封装格式将所述出错网络抽象层单元封装成所 述实时传送协议包;  Encapsulating the erroneous network abstraction layer unit into the real-time transport protocol packet according to the real-time transport protocol encapsulation format;
接收方首先根据接收到的包的头信息中是否含有所述改进实时传送 协议标识, 将其分为所述改进实时传送协议包和所述实时传送协议包; 根据所述改进实时传送协议封装格式处理所述改进实时传送协议 包, 根据所述实时传送协议包封装格式处理所述实时传送协议包。  The receiver first divides the modified real-time transport protocol identifier into the improved real-time transport protocol packet and the real-time transport protocol packet according to the received header information of the packet; according to the improved real-time transport protocol encapsulation format Processing the improved real-time transport protocol packet, processing the real-time transport protocol packet according to the real-time transport protocol packet encapsulation format.
在所述改进实时传送协议封装格式中, 当所述网络抽象层单元的类 型少于 16种时, 仅用所述类型字段的低 4比特表征, 而所述类型字段的 最高比特位作为扩展保留比特位。  In the improved real-time transport protocol encapsulation format, when the type of the network abstraction layer unit is less than 16 types, only the lower 4 bits of the type field are used, and the highest bit of the type field is reserved as an extension. Bit.
多媒体传送设备根据所述改进实时传送协议头信息获知其所承载的 网络抽象层单元的相关信息, 并据此实施所述多媒体数据实时传送的服 务质量策略。  The multimedia transmission device learns related information of the network abstraction layer unit carried by the real-time transmission protocol header information according to the improved real-time transmission protocol header information, and implements the service quality policy for real-time transmission of the multimedia data according to the implementation.
通过比较可以发现, 与现有技术的主要区别在于, 本发明的技术方 案给出一种改进的 RTP协议 ( MRTP, Modified RTP )用于承载 NALU数 据, 通过将同一个 RTP包中的所有 NALU的头信息字节结合到其头信息 中, 采用了一种结合方式使得既不影响现有 RTP协议及设备的运作, 而 且能够将 NALU净荷的属性直接体现在 MRTP 头信息中, 一方面使得 MRTP对 NALU的封装承载效率大大提高 , 另一方面使得通过 MRTP对 净荷 NALU属性的体现提供了 QoS机制实现的基础; By comparison, it can be found that the main difference from the prior art is that the technical solution of the present invention provides an improved RTP protocol (MRTP, Modified RTP) for carrying NALU data by using all NALUs in the same RTP packet. Header information bytes are combined into its header information A combination method is adopted to prevent the operation of the existing RTP protocol and the device, and the attribute of the NALU payload can be directly reflected in the MRTP header information, so that the encapsulation efficiency of the MRTP to the NALU is greatly improved. On the other hand, the implementation of the NAL mechanism by the MRTP on the payload NALU attribute provides the basis for the implementation of the QoS mechanism;
另外通过对现有 RTP头信息中相关字段比如]\、 F字段的改进, 给 出区别现有 RTP和 MRTP的标识方法, 这使得现有网絡媒体设备同时支 持 RTP和 MRTP进行工作成为可能, 提高了 MRTP的兼容性; 并且给出 相应的对于存在语法错误 (syntactical errors)或者误码的 NALU的传统 RTP 单独传送方法, 和 MRTP与 RTP数据包交替处理的方案。  In addition, by improving the related fields in the existing RTP header information, such as the \ and F fields, the identification method of distinguishing the existing RTP and MRTP is given, which makes it possible for the existing network media equipment to support both RTP and MRTP, and improve The compatibility of MRTP; and the corresponding conventional RTP single transmission method for NALU with syntactical errors or errors, and the scheme of alternate processing of MRTP and RTP data packets are given.
通过本发明改进的 MRTP协议的头信息携带 H.264的 NALU头信息, 从而在不用对于 NALU进行解码的情况下, 通过对于 MRTP头信息的快 速扫描即可以确定 MRTP数据包承载 NALU的重要性, 从而采取相应的 措施, 实现 QoS策略等, 进一步提高服务质量;  The header information of the improved MRTP protocol of the present invention carries the NALU header information of the H.264, so that the importance of the MRTP data packet carrying the NALU can be determined by performing a fast scan of the MRTP header information without decoding the NALU. Therefore, corresponding measures are taken to implement QoS policies, etc., to further improve service quality;
同时,通过剥离掉 MRTP数据包中同类 H.264 NALU的头信息字节, 达到降低冗余、 提高传送效率的目的, 从而改善 IP网络多媒体视频通信 的视频传送质量, 进一步满足用户的要求。  At the same time, by stripping out the header information bytes of the same H.264 NALU in the MRTP data packet, the purpose of reducing redundancy and improving transmission efficiency is achieved, thereby improving the video transmission quality of the multimedia video communication of the IP network and further satisfying the requirements of the user.
另外, 对于 MRTP和 RTP的区别实现了与现有技术的兼容, 给出交 替处理 MRTP和 RTP数据包的解决方式, 以及错误 NALU的单独传送方 案, 提高了 MRTP这种新方法的健壮性。 附图说明  In addition, the difference between MRTP and RTP achieves compatibility with the prior art, gives a solution for alternate processing of MRTP and RTP packets, and a separate transmission scheme for erroneous NALU, which improves the robustness of the new MRTP method. DRAWINGS
图 1是现有技术在 RTP包净荷中对 NALU数据的封装格式示意图; 图 2是 RTP数据包的头信息结构示意图;  1 is a schematic diagram of a format of encapsulation of NALU data in an RTP packet payload in the prior art; FIG. 2 is a schematic diagram of a header information structure of an RTP packet;
图 3是根据本发明的实施方式的 MRTP数据包的头信息结构示意图。 具体实施方式  3 is a schematic diagram showing the structure of a header information of an MRTP data packet according to an embodiment of the present invention. detailed description
为使本发明的目的、 技术方案和优点更加清楚, 下面将结合附图对 本发明作进一步地详细描述。  In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings.
本发明旨在提供一种多媒体数据传送方案, 能够在 RTP的头信息中 体现 H.264 NALU的头信息。 其基本原理是利用目前 RTP头信息中的某 个或者某些字节或者比特来体现 NALU的头信息, 而这些字节或者比特 在 RTP中定义的目的就是为和所承载的具体媒体协议结合留下扩展空间 的。 改进后的 RTP协议还不会影响与支持原 RTP协议的设备之间的互通 性, 即在通信中有些终端采用按照本发明改进的 RTP协议, 另外的终端 采用未改进的 RTP协议,这些终端之间完全能够正常通信。通过在 MRTP 头信息中设置标志比特, 用以区分传统的 RTP数据包, 同时终端也可以 针对不同的情况, 采取不同的处理措施, 实现 MRTP与 RTP交替传送处 理的方案。 其中包括了对语法错误 NALU的处理方案, 用传统 RTP传送 The present invention aims to provide a multimedia data transmission scheme capable of embodying H.264 NALU header information in the header information of the RTP. The basic principle is to use some or some bytes or bits in the current RTP header information to represent the NALU header information, and the purpose of these bytes or bits in the RTP is to combine with the specific media protocol carried. Extended space of. The improved RTP protocol will not affect the interoperability with devices supporting the original RTP protocol, that is, some terminals in the communication adopt the improved RTP protocol according to the present invention, and the other terminals adopt the unmodified RTP protocol, and the terminals use It is fully communicable. By setting the flag bit in the MRTP header information to distinguish the traditional RTP data packet, the terminal can also adopt different processing measures for different situations to implement the alternate transmission processing scheme of MRTP and RTP. This includes a solution to the syntax error NALU, which is transmitted using traditional RTP.
MRTP对现有 RTP的改进主要涉及到 RTP数据包头信息中的净荷类 型 FT字段和版本信息 V字段的重新定义。该方案的两个潜在价值是: 为 H.264数据的传送提供一定的 QoS机制; 提高 RTP封装 H.264 NALU的 效率。 The improvement of the existing RTP by MRTP mainly involves the redefinition of the payload type FT field and the version information V field in the RTP packet header information. The two potential values of the scheme are: Provide a certain QoS mechanism for H.264 data transmission; Improve the efficiency of RTP encapsulation H.264 NALU.
为了便于理解本发明的技术方案, 在此, 简要介绍 RTP包的格式: RTP头信息基本选项占用 12字节 (最小情况), 而 IP协议和 UDP协 议的头信息分别占用 20字节和 8字节, 因此 RTP包封装在 UDP包再封 装在 IP包中, 总的头信息占用字节数是 12+8+20=40字节。 RTP包的头 信息的详细结构如图 2所示。  In order to facilitate the understanding of the technical solution of the present invention, here, the format of the RTP packet is briefly introduced: The basic option of the RTP header information occupies 12 bytes (minimum case), and the header information of the IP protocol and the UDP protocol respectively occupy 20 bytes and 8 words. Therefore, the RTP packet is encapsulated in the UDP packet and then encapsulated in the IP packet. The total number of bytes occupied by the header information is 12+8+20=40 bytes. The detailed structure of the header information of the RTP packet is shown in Figure 2.
图 2中所示从前到后 RTP头信息依次为: 第 1字节 (字节 0 )为一 些关于头信息结构本身的字段, 第 2字节(字节 1 )为定义净荷类型, 第 3、 4字节 (字节 2、 3 ) 为包序号 (Sequence Number ), 第 5-8字节为时 间戳 (timestamp ), 第 9-12 字节为同步贡献源标识符 ( SSRC ID, Synchronous Source Identifier ) , 最后为贡献源标识符 ( CSRC Ids , Contributing Source Identifiers ) 的列表, 其数目不确定。 注意到, 在本文 描述中第 1个字节为标注的字节 0, 之后依此类推。  The front-to-back RTP header information shown in Figure 2 is: The first byte (byte 0) is some field about the header information structure itself, the second byte (byte 1) is the defined payload type, the third 4 bytes (bytes 2, 3) are the sequence number (Sequence Number), the 5th-8th byte is the timestamp (timestamp), and the 9th-12th byte is the synchronous contribution source identifier (SSRC ID, Synchronous Source) Identifier ) , and finally the list of contributing source identifiers ( CSRC Ids , Contributing Source Identifiers ), the number of which is uncertain. Note that the first byte in the description in this article is the byte 0 of the label, and so on.
其中前 12个字节出现在所有不同类型的 RTP数据包中,而头信息中 的其它数据, 比如贡献源标识符标识只有当混合器插入时才有。 因此 CSRC—般用于存在媒体混合时候的情况, 比如在多方会议中, 音频需要 混合, 视频也可以用这种方法提供多画面的功能。 而同步源标识 SSRC 其实就是所承载媒体流的标识。  The first 12 bytes appear in all different types of RTP packets, while other data in the header information, such as the contribution source identifier, is only available when the mixer is inserted. Therefore, CSRC is generally used when there is media mixing. For example, in a multi-party conference, the audio needs to be mixed, and the video can also provide multi-screen functions in this way. The synchronization source identifier SSRC is actually the identifier of the carried media stream.
上述各个字段的具体意义及全称分别描述如下:  The specific meanings and full names of the above fields are described as follows:
V字段为版本 ( Version )信息, 占 2比特 (bits), 目前采用的版本为 2, 因此置 V=2, 而其他值如 V=l表示更早的 RTP版本, V=0表示最原始的 RTP前身, 即在早期 Mbone网络上使用的语音 IP ( VOIP )通信系统中采 用, 后来演化成了 RTP, 而 V=3则尚未定义, 因此是本发明可以利用的; The V field is version (Version) information, which occupies 2 bits. Currently, the version used is 2, so V=2 is set, and other values such as V=l indicate the earlier RTP version, and V=0 indicates the original. The RTP predecessor, which was adopted in the voice over IP (VOIP) communication system used on the early Mbone network, later evolved into RTP, while V=3 has not been defined, and thus is usable by the present invention;
P字段为填充标识(Padding ), 占 lbit, P如果置位, 则表示数据包 末尾包含一个或多个填充字节 (Padding), 填充不属于有效载荷的一部分; X字段为扩展标识比特(Extension ), 占 lbit, X如果置位, 则 RTP 头的最后必须跟一个可变长的头扩展(如果有 CSRC列表, 头扩展要跟 在其后), 主要是保留用于某些应用环境下头信息字段不够用的情况, 该 头信息扩展包含一个 16比特的长度字段来计数扩展中有多少个 32比特 长的字, 头扩展的前 16比特是左开放的, 以便区分标识符和参数, 这 16 比特的格式由具体的层面规范定义, 该头扩展的格式定义在 RFC3550第 5.3.1节中有详细描述, 此处限于篇幅不再给出;  The P field is a padding flag (Padding), which is 1 bit. If P is set, it indicates that the packet contains one or more padding bytes (Padding) at the end, and the padding does not belong to a part of the payload; the X field is an extension flag bit (Extension) ), occupying lbit, if X is set, the last part of the RTP header must be followed by a variable-length header extension (if there is a CSRC list, the header extension is followed), mainly for the header information in some application environments. In the case where the field is not enough, the header extension includes a 16-bit length field to count how many 32-bit words are in the extension, and the first 16 bits of the header extension are left-opened to distinguish between identifiers and parameters. The format of the bit is defined by a specific level specification, which is described in detail in Section 5.3.1 of RFC 3550, which is not given here;
CC字段为贡献源数目 (CSRC Count ), 占 4bits, 指明头信息最后面 的 CSRC 标识符的个数, 接收方根据 CC 字段可以确定头信息后面的 CSRC IDs列表长度;  The CC field is the number of contributing sources (CSRC Count), which is 4 bits, indicating the number of CSRC identifiers at the end of the header information. The receiver can determine the length of the CSRC IDs list following the header information according to the CC field.
M字段为标识比特(Marker ), 占 lbit, 该标识比特的解释在特定的 层面(Profile ) 中定义, 它允许标识出数据包流中的重要事件, 一个层面 可以定义附加的标识比特或规定没有标识比特, 这里所谓层面就是指具 体的应用环境设置, 由通信双方具体协定, 不受协议的限定;  The M field is a marker bit (Marker), which occupies 1 bit. The interpretation of the identifier bit is defined in a specific profile. It allows identification of important events in the packet stream. One layer can define additional identification bits or specify no Identification bit, the so-called level refers to the specific application environment setting, which is specifically agreed by the communication parties and is not limited by the agreement;
PT ( Payload Type )字段表示载荷类型, 共 7 bits, 标识 RTP载荷的 格式并确 他在应用程序中的解释; 标志比特和载荷类型共一个字节携 带层面规定信息, 这个字节可能会被具体层面重新定义以适应不同需求, 在具体应用中可以定义所谓的 profile,其实就是一组静态(即通信双方事 先约定好的)对应关系, 将 PT比特不同的取值和不同的媒体格式对应起 来。当然也可以通过 RTP之外的信令来进行动态协商定义 PT取值和媒体 格式之间的关系。在一个 RTP会话 (Session)中, RTP源是可以变更 PT的。  The PT ( Payload Type ) field indicates the payload type, a total of 7 bits, identifies the format of the RTP payload and confirms his interpretation in the application; the flag bit and the payload type share a specified number of bytes, and this byte may be specified. The level is redefined to suit different needs. The so-called profile can be defined in a specific application. In fact, it is a set of static (that is, agreed by the communication parties), and the different values of the PT bits are associated with different media formats. Of course, dynamic negotiation can also be used to define the relationship between the PT value and the media format through signaling other than RTP. In an RTP session, the RTP source can change the PT.
接着的字段就是序号共 16 bits, 每发送一个 RTP数据包, 该序号值 加一, 这样接收者可以用它来检测数据包丟失和恢复数据包顺序, 一次 通信中的序号初始值可以随机给定, 不影响通信。  The following field is the serial number of 16 bits. Each time an RTP data packet is sent, the serial number value is incremented by one, so that the receiver can use it to detect the data packet loss and recover the data packet sequence. The initial value of the serial number in one communication can be given randomly. , does not affect communication.
时间戳占 32bits, 它反映了 RTP数据包中第一个字节的采样时间, 这里的采样时间必须来源于一个单调线性增长的时钟, 接收方根据其调 整媒体播放时间或者进行同步。 同步源 SSRC ID占 32 bits, 其具体值可随机选择, 但要确保同一个 RTP会话中的唯一性, 即能唯一标识一个媒体源, 如果一个源改变了源 传送地址, 必须选择一个新的 SSRC标志符。 The timestamp occupies 32 bits, which reflects the sampling time of the first byte in the RTP packet. The sampling time here must be derived from a monotonically increasing clock, and the receiver adjusts the media playback time or synchronizes according to it. The synchronization source SSRC ID occupies 32 bits, and its specific value can be randomly selected. However, to ensure the uniqueness in the same RTP session, it can uniquely identify a media source. If a source changes the source transmission address, a new SSRC must be selected. The identifier.
贡献源 CSRC列表, 可以根据需要为 0-15项, 每项占 32 bits, 该列 表的长度即 CSRC ID的数目正好由 CC字段的 4个 bit标出。 事实上, 用 于标识某个媒体源的 CSRC标识符与其对应的贡献源的 SSRC标识符是 一致的, 只不过在不同的接收方的角色不同, 而被置为 SSRC或 CSRC。 在多方通信中, CSRC ID是由混合器插入。  The source CSRC list can be 0-15 items as required, each item occupying 32 bits. The length of the list, ie the number of CSRC IDs, is exactly indicated by the 4 bits of the CC field. In fact, the CSRC identifier used to identify a media source is identical to the SSRC identifier of its corresponding contribution source, except that the role of the different receivers is different and is set to SSRC or CSRC. In multiparty communication, the CSRC ID is inserted by the mixer.
本文采用改进的 RTP ( MRTP, Modified RTP )格式封装 NALU数据, 下面通过具体实施方式予以描述, 给出的所有有关 MRTP的描述中仅对 与现有 RTP不同之处进行说明。 MRTP与 RTP最基本的不同点在于, 在 MRTP封装过程中,将具有相同头信息的 NALU包的头信息综合入 MRTP 的头信息中。  In this paper, NALU data is encapsulated in a modified RTP (MRTP, Modified RTP) format, which is described below by way of specific implementation. All the descriptions of MRTP given are only different from the existing RTP. The most basic difference between MRTP and RTP is that the header information of the NALU packet with the same header information is integrated into the header information of the MRTP during the MRTP encapsulation process.
前面已经提到过 NALU头信息结构, NALU信息依次包含: 占 1比特的 F字段, 用于指示所述 NALU是否出错;  The NALU header information structure has been mentioned above, and the NALU information includes: a 1-bit F field for indicating whether the NALU is in error;
占 2比特的 NRI字段, 用于指示所述 NALU的重要性;  a 2-bit NRI field indicating the importance of the NALU;
占 5比特的 Type字段, 用于指示所述 NALU的类型。  A 5-bit Type field indicating the type of the NALU.
下面以基于 H.264的多媒体数据的传送为例对本发明予以详细说明。 【本发明的第一种实施方式】  The present invention will be described in detail below by taking H.264-based multimedia data transmission as an example. [First Embodiment of the Invention]
本实施方式中, 收发双方的执行步骤如下:  In this embodiment, the execution steps of both the transmitting and receiving parties are as follows:
发送方按 MRTP封装格式将头信息相同的多个 NALU封装在同一个 MRTP包中, 并在该 MRTP包头信息中设 MRTP标识以区别 RTP包。 本 发明的技术发案中,在同一个 MRTP数据包中只存放同一种类型的 H.264 NALU, 即具有相同的头信息。  The sender encapsulates multiple NALUs with the same header information in the same MRTP packet according to the MRTP encapsulation format, and sets an MRTP identifier in the MRTP header information to distinguish the RTP packets. In the technical solution of the present invention, only the same type of H.264 NALU is stored in the same MRTP data packet, that is, it has the same header information.
根据实际工程经验, 在一般情况下, 因为 H.264 比特流总是存在相 邻的部分其对应的 NALU类型相同这个属性, 可以将相同类型的 NALU 累积,直到满足一定的数目后在封装到 MRTP中,如果相同类型的 NALU 的数目达不到一定的数目的话, 可采用 RTP填充的方法, 另外, 还有一 种方法, 如果类型不同的 NALU非常多, 则可以采用 RTP封装, 这样在 接收方能够才艮据 MRTP标识来识别, 进行相应的处理。  According to the actual engineering experience, in general, because the H.264 bitstream always has the same attribute of its neighboring NALU type, the same type of NALU can be accumulated until a certain number is satisfied and then encapsulated into MRTP. In the case that the number of the same type of NALUs does not reach a certain number, the RTP padding method may be adopted. In addition, there is another method. If there are many NALUs of different types, RTP encapsulation may be adopted, so that the receiver can According to the MRTP logo, it is identified and processed accordingly.
接收方 据该 MRTP标识判断该包为 MRTP包, 并根据 MRTP封装 格式处理该 MRTP包, 获取所承载的 NALU。 这里接收方根据 MRTP标 识可以识别 MRTP包, 主要是区别于 RTP包, 这样可是使得釆用 MRTP 协议的终端不影响现有的 RTP协议正常通信, 具备向下兼容性。 The receiver determines that the packet is an MRTP packet according to the MRTP identifier, and is encapsulated according to the MRTP. The format processes the MRTP packet to obtain the NALU carried. Here, the receiver can identify the MRTP packet according to the MRTP identifier, which is mainly different from the RTP packet, so that the terminal using the MRTP protocol does not affect the normal communication of the existing RTP protocol, and has backward compatibility.
上面提到的在所述 MRTP封装格式中, 将其所承载的 NALU所具有 的相同头信息综合在该 MRTP包的头信息中, 并将所承载的 NALU去掉 其头信息再填充入该 MRTP包的净荷中。 这一点也就是 MRTP于 RTP的 主要区别, 如前所述, 这样有利于功能扩展和节约带宽。  The above mentioned MRTP encapsulation format integrates the same header information of the NALU carried by the NATU in the header information of the MRTP packet, and removes the header information of the carried NALU and fills the MRTP packet. In the payload. This is the main difference between MRTP and RTP. As mentioned earlier, this facilitates function expansion and saves bandwidth.
【本发明第二实施方式】  [Second Embodiment of the Invention]
在此重点说明将 NALU头综合到 MRTP头中, 并对 MRTP包进行标 识。  The focus here is on integrating the NALU header into the MRTP header and identifying the MRTP packet.
基于第一实施方式, 在 MRTP封装格式中, NALU头信息中的 NRI 字段和 Type字段填充在 MRTP包头信息的 PT字段中, 如前所述, 该 PT 字段位于 MRTP包头信息的第 2个字节的后 7比特。 在图 3中示出了这 种 MRTP头的格式,其中与 RTP不同的地方已经用粗体部分表示,另夕卜, 图中有些地方在后面予以解释。  Based on the first embodiment, in the MRTP encapsulation format, the NRI field and the Type field in the NALU header information are filled in the PT field of the MRTP header information. As described above, the PT field is located in the second byte of the MRTP header information. The last 7 bits. The format of this MRTP header is shown in Figure 3, where the difference from RTP has been shown in bold, and in addition, some places in the figure are explained later.
该实施方式中, 将 MRTP包头中的 V字段作为 MRTP标识, 如果是 MRTP包则将其 V字段取值为 3 (二进制值 11 ), 该 V字段位于 MRTP 包头信息的第 1个字节的前 2比特, 即版本信息字段; 而且, NALU头 信息中的 F字段填充在 MRTP包头信息的 M字段中,该 M字段位于 MRTP 包头信息的第 2个字节的前 1比特, 在接收方则根据 MRTP包的 M字段 判断其所承载的 NALU是否出错, 也就实现了 F字段的禁止比特功能。  In this embodiment, the V field in the MRTP header is used as the MRTP identifier, and if it is the MRTP packet, the V field is taken as 3 (binary value 11), and the V field is located before the first byte of the MRTP header information. 2 bits, that is, the version information field; moreover, the F field in the NALU header information is filled in the M field of the MRTP header information, which is located in the first 1 byte of the 2nd byte of the MRTP header information, and is based on the receiver The M field of the MRTP packet determines whether the NALU carried by the MRTP is in error, and the F bit disable function is also implemented.
可见, 在本发明的第二实施方式中, 令 MRTP的当前版本 V取值 3 , 相当于是新版本的 RTP, 现行的 RTP版本 V取值为 2。 通过版本的区别 , 可以告诉 RTP数据包的接收方, 该 RTP协议是改进版本 MRTP, 从而在 后面的处理, 就要按照针对改进 RTP协议的处理流程进行。 后面还将描 述一种替代方案, 可以不修改 V也能达到表示 MRTP和 RTP之间的差别 的目的。  It can be seen that in the second embodiment of the present invention, the current version V of the MRTP is set to 3, which is equivalent to the new version of RTP, and the current RTP version V has a value of 2. By the difference of the version, the receiver of the RTP data packet can be told. The RTP protocol is an improved version of MRTP, so that the subsequent processing is performed according to the processing flow for improving the RTP protocol. An alternative will be described later, and the purpose of representing the difference between MRTP and RTP can be achieved without modifying V.
在该实施方式中, 将 NALU头信息字节 ( 8个比特 )替换原 RTP头 信息中的标识 M字段 1个比特和 PT字段 7个比特共 8个比特。 具体的 替换顺序比如可以是这样:  In this embodiment, the NALU header information byte (8 bits) is replaced with the identification M field 1 bit in the original RTP header information and the PT field 7 bits total 8 bits. The specific replacement order can be like this:
F比特替换 M比特; NRI 2个比特替换 PT 7个比特中的最高 2个比特; F bits replace M bits; NRI 2 bits replace the highest 2 bits of the PT 7 bits;
Type 5个比特替换 PT 7个比特中的最低 5个比特;  Type 5 bits replace the lowest 5 bits of the PT 7 bits;
实际上, 这样的替换方案是有其合理性的。 如前所述, PT 7个比特 是可以自由使用的。 M字段的用途在 RTP(RFC 3550)中规定如下: 某种 具体的层面 (Profile )可以规定不使用 M比特, 而是将其并入 PT, 这样 ΡΤ最多可以有 8个比特, 区别 256种不同的类型。 因此, 用 F比特替换 Μ比特完全是符合 RTP规定的,不会影响 MRTP和原有的 RTP之间互通。  In fact, such an alternative is justified. As mentioned earlier, the PT 7 bits are free to use. The purpose of the M field is specified in RTP (RFC 3550) as follows: A specific profile (Profile) can specify not to use M bits, but to incorporate it into the PT, so that ΡΤ can have up to 8 bits, distinguishing 256 different type. Therefore, replacing the Μ bits with F bits is completely RTP-compliant and does not affect the interworking between MRTP and the original RTP.
显然, 本发明 MRTP的封装格式具有明显的三个优点: 第一, 额外 开销少, 尤其是一个 RTP中有多个 NALU时, 明显节省传送比特数; 第 二,不用对 RTP数据包中的 Η.264 NALU数据解码就可以判别这些 NALU 的相对重要性; 第三, 不用对 RTP数据包中的 Η.264 NALU数据解码就 可识别由于其它的比特丟失而是否会造成该 RTP包能否正确解码。  Obviously, the MRTP encapsulation format of the present invention has three obvious advantages: First, the overhead is small, especially when there are multiple NALUs in one RTP, the number of transmission bits is obviously saved; Second, there is no need for the RTP packets. .264 NALU data decoding can discriminate the relative importance of these NALUs. Third, without decoding the Η.264 NALU data in the RTP data packet, it can identify whether the RTP packet can be correctly decoded due to other bit loss. .
为了进一步详细说明本发明第二实施方式的技术方案,下面对 MRTP 封装和去封装的过程进行描述。 在进行上述处理后, 在同一个 MRTP数 据包中的多个 H.264 NALU类型完全相同, 即它们的头信息字节都相同, 那么在将其封装到 MRTP数据包中时, 可以剥离掉原来的头信息字节, 这样如果有 N个 NALU, 可以减少 N个字节。 去封装时, 就是把 NALU 从 MRTP数据包中提取出来还原为原来的形式, 即将这 N个 NALU从他 们所在的 MRTP数据包中提取出来, 然后把 MRTP头信息中的 PT的 7 个比特拷贝到一个字节 H(8比特)中的最低 7个比特中去, 而 H的最高比 特作为 F比特, 设置为 0。 然后把生成的 H字节附加到每个提取出来的 NALU的最前面, 这样就还原了每个 NALU。 当然如果说 MRTP包头中 的 F字段为 1的话, 说明该 MRTP包中的 NALU出错, 因此直接丢弃即 可, 节省了处理时间。  In order to further explain the technical solution of the second embodiment of the present invention, the following describes the process of MRTP encapsulation and decapsulation. After the above processing, multiple H.264 NALU types in the same MRTP data packet are identical, that is, their header information bytes are the same, and when they are encapsulated into the MRTP data packet, the original information can be stripped off. The header information byte, so if there are N NALUs, you can reduce N bytes. When decapsulating, the NALU is extracted from the MRTP packet and restored to the original form, that is, the N NALUs are extracted from the MRTP packets they are in, and then the 7 bits of the PT in the MRTP header information are copied to The lowest 7 bits of one byte H (8 bits) are removed, and the highest bit of H is set to 0 as the F bit. The generated H bytes are then appended to the top of each extracted NALU, thus restoring each NALU. Of course, if the F field in the MRTP header is 1, it indicates that the NALU in the MRTP packet is in error, so it can be directly discarded, which saves processing time.
【本发明的第三实施方式】  [Third embodiment of the present invention]
在第一实施方式的基础上给出第二种解决方案, 该方案与第二实施 方式的相同之处是: 将 NALU头中的 N I和 Type字段填充到 MRTP头 的 PT字段的 7个比特中。 不同之处是: 不再采用 V字段标识 MRTP, 还 是取值为 V = 2,但是采用 M字段标识 MRTP, 这样导致 F字段没有地方 填充了, 该实施方式中, 将 F是否置位的两类 NALU分别对待, 对于 F 置位的出错 NALU还是采用原先的 RTP传送,而对于正常的则采用 MRTP 传送, 但忽略该 F比特。 具体细节如下所述。 A second solution is given on the basis of the first embodiment, which is similar to the second embodiment in that: the NI and Type fields in the NALU header are filled into 7 bits of the PT field of the MRTP header. . The difference is: the V field is no longer used to identify the MRTP, or the value is V = 2, but the M field is used to identify the MRTP. This causes the F field to have no place to be filled. In this embodiment, two types of F are set. NALU treats separately. For the error NALU set by F, the original RTP transmission is used, and for normal, MRTP is adopted. Transmit, but ignore the F bit. The specific details are as follows.
第三实施方式中,将 M字段取值为 1来标识 MRTP包,该 M字段位 于所述 MRTP包头信息的第 2个字节的前 1比特。而对于 F比特,在 H.264 协议中规定: 如果有语法冲突或者错误, 则为 1。 当网络识别此单元中存 在比特错误时, 可将其设为 1 , 以便接收方丟掉该单元。 主要用于适应不 同种类的网络环境, 比如有线无线相结合的环境。 具体的使用原则是: 一般情况下通信的发送方和接收方在对于视频进行 H.264编码和解码的 时候, 不对于该比特进行 " 写,, 操作, 解码端对于该比特进行 "读" 操 作。 如果发现 F=l , 则接收方在解码过程中将丟弃该 NALU。根据目前的 业界普遍应用情况来看, 对于 F 比特进行 "写" 操作, 主要是在两种不 同网络之间的网关上进行,比如进行编码转换的情况(MPEG-4到 H.264, H.263到 H.264等 )。  In the third embodiment, the M field is set to 1 to identify the MRTP packet, which is located in the first 1 byte of the 2nd byte of the MRTP header information. For F bits, it is specified in the H.264 protocol: 1 if there is a syntax conflict or an error. When the network recognizes that there is a bit error in this unit, it can be set to 1 so that the receiver drops the unit. It is mainly used to adapt to different kinds of network environments, such as wired and wireless combined environments. The specific usage principle is: Generally, when the sender and receiver of the communication perform H.264 encoding and decoding on the video, the bit is not "written, operated, and the decoder performs a "read" operation on the bit. If F=l is found, the receiver will discard the NALU during the decoding process. According to the current general application of the industry, the "write" operation for the F bit is mainly the gateway between two different networks. Performed on, for example, the case of encoding conversion (MPEG-4 to H.264, H.263 to H.264, etc.).
因此,在本发明的第三实施方式中,将 F比特忽略,不用于原来 H.264 定义的目的。 从而使得原先用于填充 F比特的 M字段可以保留, 用于未 来的扩展携带更多信息, 用于标识 MRTP 包。 这样, 不需要对于版本信 息 V = 2进行修改, MRTP还是用原来版本 V取值 2。 这也是节约了目前 仅有的 RTP版本信息资源。  Therefore, in the third embodiment of the present invention, the F bit is ignored and is not used for the purpose of the original H.264 definition. Thus, the M field originally used to fill the F bits can be reserved, and the future extension carries more information for identifying the MRTP packet. In this way, it is not necessary to modify the version information V = 2, and the MRTP still uses the original version V to take the value 2. This also saves the current RTP version information resources.
然而, 在实际应用中可能出现需要使用 F 比特的小^ f既率情况, 比如 NALU语法错的时候, 本发明第三实施方式对于这种情况做如下处理: 在 MRTP封装格式中,忽略所述 NALU头信息中的 F字段;但在发送方, 对于 F字段有效的出错 NALU,仍旧采用 RTP包封装,仅对正常的 NALU 采用 MRTP包装; 在接收方则判断该包为 MRTP还是 RTP包后按相应封 装格式处理该包。 也就是说, 当 F 比特在某些特殊情况下, 要用于原来 H.264定义的目的, 即要用于表示可能存在的 H.264 NALU语法错误的情 况, 如果一个中间设备比如网关在对于视频按照 H.264协议进行视频编 码的时候, 发现某个 NALU存在语法错误, 那么就要对于该 NALU单独 进行封装处理。  However, in practical applications, there may be a small rate condition that requires the use of F bits. For example, when the NALU syntax is wrong, the third embodiment of the present invention performs the following processing for this case: In the MRTP encapsulation format, the above is ignored. The F field in the NALU header information; but on the sender side, the error NALU that is valid for the F field is still encapsulated in the RTP packet, and only the normal NALU is used in the MRTP wrapper; on the receiving side, the receiver is judged to be the MRTP or the RTP packet. The package is processed in the corresponding package format. That is, when the F bit is used in some special cases, it is used for the purpose of the original H.264 definition, that is, to indicate the possible H.264 NALU syntax error, if an intermediate device such as a gateway is in the When the video is video-encoded according to the H.264 protocol, it is found that a certain NALU has a syntax error, and then the NALU is separately packaged.
归纳上述 MRTP和 RTP交替处理的方法流程如下:  The process of summarizing the above MRTP and RTP alternate processing is as follows:
发送方首先判断至少一个 NALU的头信息中的 F字段是否有效, 据 此将其分为正常 NALU和出错 NALU;  The sender first determines whether the F field in the header information of at least one NALU is valid, and accordingly divides it into a normal NALU and an error NALU;
然后按 MRTP封装格式将正常 NALU封装成 MRTP包, 并设 MRTP 标识; 按 RTP封装格式将出错 NALU封装成 RTP包; Then encapsulate the normal NALU into an MRTP packet according to the MRTP encapsulation format, and set the MRTP. Identification; package the error NALU into an RTP package in the RTP encapsulation format;
接收方首先判断接收到的包的头信息是否设 MRTP标识, 将其分为 MRTP包和 RTP包;  The receiver first determines whether the header information of the received packet is an MRTP identifier, and divides it into an MRTP packet and an RTP packet;
然后根据 MRTP封装格式处理 MRTP包, 根据 RTP包封装格式处理 RTP包。  The MRTP packet is then processed according to the MRTP encapsulation format, and the RTP packet is processed according to the RTP packet encapsulation format.
可见, 在本发明的第三实施方式中, 网关对于正常的 NALU, 按照 前面描述的方法, 对于类型相同的 H.264 NALU按照一定的规则(由具体 应用决定, 主要规定每个 MRTP数据包中封装多少个同类的 NALU)进行 MRTP封装,一旦发现某个 NALU存在语法错误,那么就要对于该 NALU 采用常规 RTP封装。 这个时候常规的 RTP数据包中也许就只含有一个 H.264 NALU。  It can be seen that, in the third embodiment of the present invention, the gateway is in accordance with the foregoing method for the normal NALU, according to a certain rule of the same type of H.264 NALU (determined by the specific application, mainly specified in each MRTP data packet) How many similar NALUs are encapsulated for MRTP encapsulation. Once a NALU is found to have a syntax error, a regular RTP encapsulation is required for the NALU. At this time, the regular RTP packet may contain only one H.264 NALU.
以上方法的前提是在连续的 H.264 NALU流中, 偶尔出现单独的语 法错误 NALU, 此时, 将错误的 NALU单独拿出来用 RTP封装。 在接收 方, 如果收到 MRTP数据包, 就按 MRTP的规则进行 H.264 NALU的去 封装处理; 如果收到 RTP的数据包, 就按 RTP的规则进行 H.264 NALU 的去封装处理。  The premise of the above method is that in the continuous H.264 NALU stream, a separate syntax error NALU occasionally appears. At this time, the wrong NALU is taken out separately and encapsulated in RTP. On the receiving side, if the MRTP packet is received, the H.264 NALU is decapsulated according to the MRTP rule; if the RTP packet is received, the H.264 NALU is decapsulated according to the RTP rule.
当中间设备出现 H.264编码错误, 出现连续多个有语法错误的 H.264 NALU,比如 M个连续的语法错误 NALU出现,那么可以将这 M个 NALU 仍然采取传统 RTP来封装。另夕卜,即使 NALU流中出错 NALU陆续不绝, 那么可以将这些出错 NALU累积, 直到达到满足一个 RTP包的长度后再 用 RTP打包, 这样可以节约带宽, 同时不影响接收方, 因为接收方可以 根据序号得知哪些 NALU丢失。  When the H.264 encoding error occurs in the intermediate device, and there are consecutive consecutive H.264 NALUs with syntax errors, such as M consecutive syntax errors NALU, then the M NALUs can still be encapsulated by the traditional RTP. In addition, even if the NALUs in the NALU stream are inexhaustible, the error NALUs can be accumulated until they reach the length of one RTP packet and then packed with RTP, which can save bandwidth without affecting the receiver, because the receiver It is possible to know which NALUs are missing based on the sequence number.
可以看出, 这种方案虽然采用了用传统 RTP 来传送, 但不会影响 MRTP带来的好处。 因为正常的 NALU都能够用 MRTP封装, 其好处都 可以享受,比如基于头信息可能采用的 QoS机制等。而对于出错的 NALU, 在接收方的处理一般都是丟弃, 因此它们不能得到 MRTP带来的好处。 一 注意到前文提到的表 1中给出的 NALU的类型及其对应 Type字段的 取值, 可以发现现有的类型不足 16种, 也就是说 Type的 5个比特完全 可以缩减为 4个, 而不影响现有的 H.264传送, 因此, 采用本发明的第 四实施方式, 在 MRTP封装格式中, 当 NALU的所有类型少于 16种时, 仅用 Type字段的低 4比特表征, 而 Type的最高比特作为扩展保留比特, 称作 C字段, 如图 3中所示。 将该 C比特留待以后使用, 继续进行功能 扩展。 将比特 C进行保留后, 表 1中给出的 NALU类型要做相应修改: 共 16个值, 取值 0-12与表 1相同, 取值 13-15为保留。 It can be seen that although this scheme uses traditional RTP to transmit, it does not affect the benefits of MRTP. Because the normal NALU can be encapsulated in MRTP, the benefits can be enjoyed, such as the QoS mechanism that may be adopted based on the header information. For the faulty NALU, the processing at the receiver is generally discarded, so they cannot get the benefits of MRTP. One notices that the type of the NALU and its corresponding Type field given in Table 1 mentioned above, it can be found that there are less than 16 types of existing types, that is, the 5 bits of the Type can be reduced to four. Without affecting the existing H.264 transmission, therefore, in the fourth embodiment of the present invention, in the MRTP encapsulation format, when all types of NALUs are less than 16 types, only the lower 4 bits of the Type field are used, and The highest bit of Type is used as an extended reserved bit. Called the C field, as shown in Figure 3. Leave the C bit for later use and continue with the function expansion. After the bit C is reserved, the NALU type given in Table 1 should be modified accordingly: A total of 16 values, the values 0-12 are the same as Table 1, and the values 13-15 are reserved.
当然虽然目前 H.264的 NALU类型只有 13种,但是 H.264后续会发 展, 可能会产生更多的 NALU类型, 如果未来 NALU类型增加到 16种 以上, 那么还是需要用 PT 7个比特中的最低 4个比特加上 C比特作为类 型指示。  Of course, although there are only 13 NALU types of H.264, H.264 will be developed later, and more NALU types may be generated. If the number of NALUs is increased to more than 16 in the future, it is still necessary to use PT 7 bits. The lowest 4 bits plus C bits are used as type indications.
采用本发明的 MRTP, 多媒体传送设备可以根据 MRTP头信息直接 获知其所承载的 NALU的相关信息, 并据此实施 H.264多媒体数据实时 传送的 QoS策略。这一点在现有的 RTP是无法实现的, 因为对于 RTP层 来说, NALU层信息是不关心的, 也就无法获知净荷中的每个 NALU的 头信息, 从而无法实现 QoS策略。  With the MRTP of the present invention, the multimedia transmission device can directly learn the relevant information of the NALU carried by the multimedia transmission device according to the MRTP header information, and implement the QoS policy for real-time transmission of the H.264 multimedia data according to the same. This is not possible in the existing RTP, because for the RTP layer, the NALU layer information is not concerned, and the head information of each NALU in the payload cannot be known, so that the QoS policy cannot be implemented.
虽然通过参照本发明的某些优选实施方式, 已经对本发明进行了图 示和描述, 但本领域的普通技术人员应该明白, 可以在形式上和细节上 对其作各种改变, 而不偏离本发明的实质和范围。  While the invention has been illustrated and described with reference to the preferred embodiments embodiments The essence and scope of the invention.

Claims

权 利 要 求 Rights request
1. 一种基于 H.264的多媒体数据传送方法, 所述多媒体数据在网络 抽象层被分为网络抽象层单元流, 所述网络抽象层单元包含头信息, 其 特征在于, 包括:  An H.264-based multimedia data transmission method, the multimedia data is divided into a network abstraction layer unit stream in a network abstraction layer, and the network abstraction layer unit includes header information, and the method includes:
发送方按改进实时传送协议封装格式, 将头信息相同的至少一个网 络抽象层单元封装在同一个改进实时传送协议包中, 并在该改进实时传 送协议包头信息中设置标识, 以区别于实时传送协议包;  The sender encapsulates at least one network abstraction layer unit with the same header information in the same improved real-time transport protocol packet according to the improved real-time transport protocol encapsulation format, and sets an identifier in the improved real-time transport protocol header information to distinguish it from real-time transmission. Agreement package
接收方根据所述标识判断该包是否为改进实时传送协议包, 若是, 则根据改进实时传送协议封装格式处理该包, 并获取所承载的网络抽象 层单元;  The receiving party determines, according to the identifier, whether the packet is an improved real-time transport protocol packet, and if so, processes the packet according to the improved real-time transport protocol encapsulation format, and obtains the carried network abstraction layer unit;
在所述改进实时传送协议封装格式中, 将其所承载的网络抽象层单 元所具有的相同头信息包含在该改进实时传送协议包的头信息中, 并将 所承载的网络抽象层单元的头信息去掉之后 , 再填充到该改进实时传送 协议包的净荷中。  In the improved real-time transport protocol encapsulation format, the same header information possessed by the network abstraction layer unit carried by the network is included in the header information of the improved real-time transport protocol packet, and the header of the network abstraction layer unit carried After the information is removed, it is populated into the payload of the improved real-time transport protocol packet.
2. 根据权利要求 1所述的多媒体数据传送方法, 其特征在于, 所述 网络抽象层单元头信息包括:  The multimedia data transmission method according to claim 1, wherein the network abstraction layer unit header information comprises:
禁止比特字段, 用于指示所述网络抽象层单元是否出错;  a disable bit field, configured to indicate whether the network abstraction layer unit is in error;
网络抽象层参考标识字段, 用于指示所述网络抽象层单元的重要性; 类型字段, 用于指示所述网络抽象层单元的类型;  a network abstraction layer reference identifier field, configured to indicate an importance of the network abstraction layer unit; a type field, configured to indicate a type of the network abstraction layer unit;
3. 根据权利要求 2所述的多媒体数据传送方法, 其特征在于, 在所 述改进实时传送协议封装格式中, 所述网络抽象层参考标识字段和类型 字段填充在所述改进实时传送协议包头信息的净荷类型字段中。  The multimedia data transmission method according to claim 2, wherein in the improved real-time transport protocol encapsulation format, the network abstraction layer reference identifier field and the type field are populated in the improved real-time transport protocol header information In the payload type field.
4. 根据权利要求 3所述的多媒体数据传送方法, 其特征在于, 所述 改进实时传送协议标识为所述改进实时传送协议包头信息的版本信息字 段。  The multimedia data transmission method according to claim 3, wherein the improved real-time transmission protocol identifier is a version information field of the improved real-time transmission protocol header information.
5. 根据权利要求 4所述的多媒体数据实时传送方法, 其特征在于, 在所述改进实时传送协议封装格式中, 所述禁止比特字段填充在所述改 进实时传送协议包头信息的标记字段中;  The method for transmitting real-time multimedia data according to claim 4, wherein in the improved real-time transport protocol encapsulation format, the forbidden bit field is filled in a tag field of the improved real-time transport protocol header information;
接收方根据所述改进实时传送协议包的标记字段判断其所承载的网 络抽象层单元是否出错。 其中, 所述接收方包含通信终端和网络中间设备。 The receiver determines, according to the marked field of the improved real-time transport protocol packet, whether the network abstraction layer unit it carries is in error. The receiving party includes a communication terminal and a network intermediate device.
6. 根据权利要求 3所述的多媒体数据传送方法, 其特征在于, 所述 改进实时传送协议标识为所述改进实时传送协议包头信息的标记字段 中。  The multimedia data transmission method according to claim 3, wherein the improved real-time transmission protocol identifier is in a tag field of the improved real-time transmission protocol header information.
7. 根据权利要求 6所述的多媒体数据传送方法, 其特征在于, 发送方首先判断至少一个所述网络抽象层单元的头信息中的禁止比 特字段是否有效, 据此将其分为正常网络抽象层单元和出错网絡抽象层 单元;  The multimedia data transmission method according to claim 6, wherein the sender first determines whether the forbidden bit field in the header information of the at least one network abstraction layer unit is valid, and accordingly divides the virtual data abstract into a normal network abstraction. Layer unit and error network abstraction layer unit;
然后按所述改进实时传送协议封装格式将所述正常网络抽象层单元 封装成所述改进实时传送协议包, 并设所述改进实时传送协议标识, 在 所述改进实时传送协议封装格式中, 忽略所述网络抽象层单元头信息中 的禁止比特字段;  And then encapsulating the normal network abstraction layer unit into the improved real-time transport protocol packet according to the improved real-time transport protocol encapsulation format, and setting the improved real-time transport protocol identifier, in the improved real-time transport protocol encapsulation format, ignoring a forbidden bit field in the network abstraction layer unit header information;
按所述实时传送协议封装格式将所述出错网络抽象层单元封装成所 述实时传送协议包;  Encapsulating the erroneous network abstraction layer unit into the real-time transport protocol packet according to the real-time transport protocol encapsulation format;
8. 根据权利要求 6所述的多媒体数据传送方法, 其特征在于, 接收方首先^ ^据接收到的包的头信息中是否含有所述改进实时传送 协议标识, 将其分为所述改进实时传送协议包和所述实时传送协议包; 根据所述改进实时传送协议封装格式处理所述改进实时传送协议 包, 根据所述实时传送协议包封装格式处理所述实时传送协议包。  The multimedia data transmission method according to claim 6, wherein the receiving party first divides the header information of the received packet into the modified real-time transmission protocol identifier, and divides the improved real-time transmission protocol identifier into the improved real-time. Transmitting a protocol packet and the real-time transport protocol packet; processing the improved real-time transport protocol packet according to the improved real-time transport protocol encapsulation format, and processing the real-time transport protocol packet according to the real-time transport protocol packet encapsulation format.
9. 根据权利要求 3至 8中任意一项所述的多媒体数据传送方法, 其 特征在于, 在所述改进实时传送协议封装格式中, 当所述网络抽象层单 元的类型少于 16种时, 仅用所述类型字段的低 4比特表征, 而所述类型 字段的最高比特位作为扩展保留比特位。  The multimedia data transmission method according to any one of claims 3 to 8, wherein in the improved real-time transport protocol encapsulation format, when the type of the network abstraction layer unit is less than 16 types, Only the lower 4 bits of the type field are characterized, and the highest bit of the type field is used as the extended reserved bit.
10. 才艮据权利要求 3至 8中任意一项所述的多媒体数据传送方法, 其特征在于, 多媒体传送设备根据所述改进实时传送协议头信息获知其 所承载的网络抽象层单元的相关信息 , 并据此实施所述多媒体数据实时 传送的服务质量策略。  The multimedia data transmission method according to any one of claims 3 to 8, wherein the multimedia transmission device learns related information of the network abstraction layer unit carried by the multimedia transmission device according to the improved real-time transmission protocol header information. And implementing the quality of service strategy for real-time delivery of the multimedia data accordingly.
PCT/CN2006/001845 2005-10-17 2006-07-25 A real-time method for transporting multimedia data WO2007045140A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200510113942.1 2005-10-17
CN2005101139421A CN100407726C (en) 2005-10-17 2005-10-17 Method for real-time transmitting H.264 multimedia data

Publications (1)

Publication Number Publication Date
WO2007045140A1 true WO2007045140A1 (en) 2007-04-26

Family

ID=37390618

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2006/001845 WO2007045140A1 (en) 2005-10-17 2006-07-25 A real-time method for transporting multimedia data

Country Status (2)

Country Link
CN (1) CN100407726C (en)
WO (1) WO2007045140A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112839242A (en) * 2020-12-31 2021-05-25 四川长虹网络科技有限责任公司 Method for packaging audio/video media file
CN114979092A (en) * 2022-05-13 2022-08-30 深圳智慧林网络科技有限公司 Data transmission method, device, equipment and medium based on RTP

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1972166B (en) * 2006-11-30 2011-05-25 中兴通讯股份有限公司 An audio stream transport method of mobile multimedia broadcast system
CN101355571B (en) * 2007-07-26 2011-09-28 中国移动通信集团公司 Method, apparatus and system for processing multimedia information
CN101488947B (en) * 2008-01-16 2012-07-25 联想(北京)有限公司 Data transmission method and device
CN101873549B (en) * 2010-05-26 2013-08-21 姜红志 Point-to-point transmission method for mobile video information adopting real-time flow transport protocol
CN103002353B (en) * 2011-09-16 2015-09-02 杭州海康威视数字技术股份有限公司 The method that multimedia file is encapsulated and device
KR101947000B1 (en) 2012-07-17 2019-02-13 삼성전자주식회사 Apparatus and method for delivering transport characteristics of multimedia data in broadcast system
CN103313045A (en) * 2013-05-31 2013-09-18 哈尔滨工业大学 H.264 video sub-packaging method of dispatching desk of wideband multimedia trunking system
EP2961176B1 (en) * 2014-06-23 2017-01-11 Harman Becker Automotive Systems GmbH Correcting errors in a digital media transport stream
CN105407351B (en) * 2014-09-15 2019-03-12 杭州海康威视数字技术股份有限公司 A kind of method and apparatus for rebuilding coding mode from Realtime Transport Protocol data packet
CN110719495A (en) * 2018-07-13 2020-01-21 视联动力信息技术股份有限公司 Video data processing method and system
CN112073822B (en) * 2019-06-10 2022-10-18 成都鼎桥通信技术有限公司 Media change method and system in broadband trunking communication
CN114449200B (en) * 2020-10-30 2023-06-06 华为技术有限公司 Audio and video call method and device and terminal equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1402492A (en) * 2002-09-29 2003-03-12 清华大学 Method for implementing stream medium transmission based on real time transmission protocol and transmission control protocol
EP1494425A1 (en) * 2003-07-03 2005-01-05 Microsoft Corporation RTP Payload Format
US20050007263A1 (en) * 2003-07-09 2005-01-13 Minhua Zhou Video coding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1402492A (en) * 2002-09-29 2003-03-12 清华大学 Method for implementing stream medium transmission based on real time transmission protocol and transmission control protocol
EP1494425A1 (en) * 2003-07-03 2005-01-05 Microsoft Corporation RTP Payload Format
US20050007263A1 (en) * 2003-07-09 2005-01-13 Minhua Zhou Video coding

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112839242A (en) * 2020-12-31 2021-05-25 四川长虹网络科技有限责任公司 Method for packaging audio/video media file
CN114979092A (en) * 2022-05-13 2022-08-30 深圳智慧林网络科技有限公司 Data transmission method, device, equipment and medium based on RTP
CN114979092B (en) * 2022-05-13 2024-04-02 深圳智慧林网络科技有限公司 RTP-based data transmission method, device, equipment and medium

Also Published As

Publication number Publication date
CN1863314A (en) 2006-11-15
CN100407726C (en) 2008-07-30

Similar Documents

Publication Publication Date Title
WO2007045140A1 (en) A real-time method for transporting multimedia data
US10728591B2 (en) Method of configuring and transmitting an MMT transport packet
US10939149B2 (en) Apparatus and method for transmitting/receiving processes of a broadcast signal
Schierl et al. System layer integration of high efficiency video coding
CN101517553B (en) Methods and apparatus for packetization of content for transmission over a network
Wenger et al. RTP payload format for H. 264 video
US20200029130A1 (en) Method and apparatus for configuring content in a broadcast system
KR101951650B1 (en) Method of transferring media contents over single port or multiple port and apparatus for performing the same
TWI432035B (en) Backward-compatible aggregation of pictures in scalable video coding
EP1936868B1 (en) A method for monitoring quality of service in multimedia communication
KR20190085899A (en) Interface apparatus and method for transmitting and receiving media data
KR20190045117A (en) Method of delivering media data based on packet with header minimizing delivery overhead
WO2007045141A1 (en) A method for supporting multimedia data transmission with error resilience
KR20140002026A (en) Ip broadcast streaming services distribution using file delivery methods
US20200021867A1 (en) Broadcast signal transmitting and receiving method and device
CN109194982A (en) A kind of method and apparatus for transmitting big file stream
Park et al. Delivery of ATSC 3.0 services with MPEG media transport standard considering redistribution in MPEG-2 TS format
KR20190018142A (en) Method configuring and transmitting mmt transport packet
Basso et al. Transport of MPEG—4 over IP/RTP
Wenger et al. RFC 3984: RTP payload format for H. 264 video
Standard Transport of high bit rate media signals over IP networks (HBRMT)
Pourmohammadi et al. Streaming MPEG-4 over IP and Broadcast Networks: DMIF based architectures
KR20130040148A (en) Method configuring and transmitting mmt payload
Wang et al. RFC 6184: RTP Payload Format for H. 264 Video
WO2014061925A1 (en) Method for adaptively transmitting fec parity data using cross-layer optimization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06761575

Country of ref document: EP

Kind code of ref document: A1