WO2007051425A1 - A multimedia communication method and the terminal thereof - Google Patents

A multimedia communication method and the terminal thereof Download PDF

Info

Publication number
WO2007051425A1
WO2007051425A1 PCT/CN2006/002961 CN2006002961W WO2007051425A1 WO 2007051425 A1 WO2007051425 A1 WO 2007051425A1 CN 2006002961 W CN2006002961 W CN 2006002961W WO 2007051425 A1 WO2007051425 A1 WO 2007051425A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
information
fault
transmission
real
Prior art date
Application number
PCT/CN2006/002961
Other languages
French (fr)
Chinese (zh)
Inventor
Zhong Luo
Bin Song
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2007051425A1 publication Critical patent/WO2007051425A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0023Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the signalling
    • H04L1/0025Transmission of mode-switching indication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0023Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the signalling
    • H04L1/0028Formatting
    • H04L1/0029Reduction of the amount of signalling, e.g. retention of useful signalling or differential signalling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0078Avoidance of errors by organising the transmitted data in a format specifically designed to deal with errors, e.g. location
    • H04L1/0079Formats for control data

Definitions

  • Multimedia communication method and terminal thereof Multimedia communication method and terminal thereof
  • the present invention relates to the field of multimedia communication technologies, and in particular, to a multimedia communication technology supporting fault tolerance and elasticity, and in particular to a multimedia communication method and a terminal thereof. Background technique
  • Video communication is gradually becoming the main communication.
  • 3G, 3rd Generation third-generation mobile communication systems
  • IP Internet Protocol Internet Protocol
  • Video communication is gradually becoming the main communication.
  • Two-way or multi-party video communication services such as video telephony, video conferencing, and mobile terminal multimedia services, impose strict requirements on the transmission of multimedia data streams and the quality of services. Not only does network transmission require better real-time performance, but equivalently requires video data compression coding to be more efficient.
  • the purpose of the H.264 standard is to improve video coding efficiency and its adaptability to the network more effectively.
  • the H.264 4 video compression coding standard has quickly become the mainstream standard in multimedia communication.
  • a large number of H.264 multimedia real-time communication products such as conference TV, videophone, 3G mobile communication terminal
  • network streaming media products have been released, with the official promulgation and widespread use of H.264, based on IP networks and 3G,
  • the multimedia communication of the post-3G wireless network will inevitably enter a new stage of rapid development.
  • the H.264 standard uses a layered mode to define the video coding layer (VCL, Video Coding Layer) and the network abstraction layer.
  • VCL Video Coding Layer
  • H.264 introduces an encoding mechanism for IP packets, which is beneficial to Packet transmission in the network, supporting streaming media transmission of video in the network; having strong anti-error characteristics, especially suitable for wireless video transmission with high packet loss rate and serious interference.
  • All data to be transmitted of H.264, including Image data and other messages are encapsulated into a uniform format packet transmission, ie, a network abstraction layer unit (NALU, NAL Unit).
  • NALU network abstraction layer unit
  • Each NALU is a variable long byte string of a certain syntax element, including a header containing one byte.
  • Information which can be used to represent data types, and load data of several integer bytes.
  • a NAL unit can carry a code slice, a data segment of its own type, or a sequence or image parameter set. To enhance data reliability, each frame of image is divided. For several strips
  • each slice is carried by a NALU, and the Slice is composed of several smaller macroblocks, which is the smallest processing unit.
  • the slices of the corresponding positions of the preceding and succeeding frames are related to each other, and the Slices at different positions are independent of each other, so that bit error interdiffusion between slices can be avoided.
  • the H.264 data includes texture data of non-reference frames, sequence parameters, image parameters, Supplemental Enhancement Information (SEI), reference frame texture data, and the like.
  • SEI Supplemental Enhancement Information
  • the SEI message is a general term for messages that assist in the decoding, display, and other aspects of H.264 video.
  • the prior art defines various types of SEI messages while preserving the SEI reservation messages, leaving room for expansion for future possible applications.
  • SEI message Not required to reconstruct luminance and chrominance images during the decoding process.
  • a decoder conforming to the H.264 standard does not require any processing for the SEI.
  • H.264 provides a variety of mechanisms for message extension, including SEI.
  • SEI Supplemental Enhancement Information
  • H.264 provides a variety of mechanisms for message extension, including SEI.
  • SEI Supplemental Enhancement Information
  • H.264 provides a variety of mechanisms for message extension, including SEI.
  • Supplemental Enhancement Information (SEI) is defined in H.264, and its data representation area is independent of video coding data. Its usage is given in the description of NAL in H.264 protocol.
  • the basic unit of H.264 code stream is NALU.
  • NALU can carry various H.264 data types, such as video sequence parameters, picture parameters, slice data (ie specific image data), and SEI messages. data.
  • SEI is used to deliver various messages and support message extension. Therefore, the SEI domain is used to transmit messages customized for a specific purpose without affecting the compatibility based on the H.264 video communication system.
  • the NALU carrying the SEI message is called SEI NALU.
  • An SEI NALU contains one or more SEI messages.
  • Each SEI message contains variables, mainly payload type (payloadType) and payload size (payloadSize), which indicate the type and size of the message payload.
  • payloadType payload type
  • payloadSize payload size
  • the grammar and semantics of some commonly used H.264 SEI messages are defined in H.264 Annex D.8, D.9.
  • the payload contained in NALU is called Raw-Byte Sequence Payload (RBSP), and SEI is a type of RBSP.
  • the data of the SEI indicates that the zone cylinder is called the SEI domain.
  • Each SEI field contains one or more SEI messages, which in turn consist of SEI header information and SEI payload.
  • the SEI header information includes two fields: one identifies the type of payload in the SEI message and the other indicates the size of the payload. Users can customize any of a variety of load types. For H.264 decoders that do not support parsing these user-defined information, the data in the SEI field is automatically discarded. Therefore, the inclusion of useful custom information within the SEI domain does not affect compatibility based on H.264 video communication systems. ; As described above, multimedia communication only requires medium compression coding efficiency, and requires real-time transmission network.
  • RTP Real-time Transport Protocol
  • RTCP Real-time Transport
  • Control Protocol a transport protocol for multimedia data streams over the Internet and is published by the Internet Engineering Task Force (IETF).
  • IETF Internet Engineering Task Force
  • RTP is defined to work in a one-to-one or one-to-many transmission with the goal of providing time information and stream synchronization.
  • the typical application of RTP is based on the User Datagram Protocol (UDP), but it can also work on other protocols such as TCP (Transport Control Protocol) or Asynchronous Transfer Mode (ATM). .
  • UDP User Datagram Protocol
  • ATM Asynchronous Transfer Mode
  • RTP itself only guarantees the transmission of real-time data, and does not provide a reliable transmission mechanism, flow control or congestion control, but relies on RTCP to provide these services.
  • RTCP is responsible for managing the transmission quality to exchange control information between current application processes.
  • each participant periodically transmits RTCP packets, which contain statistics such as the number of transmitted packets and the number of lost packets. Therefore, the server can use this information to dynamically change the transmission rate, even Change the payload type.
  • RTP and RTCP work together to optimize transmission efficiency with efficient feedback and minimal overhead, making it suitable for real-time data transmission over the network.
  • the H.264 multimedia data is transmitted over the IP network and is based on UDP and its upper layer RTP protocol.
  • RTP itself is structurally applicable to different media data types, but different high-level protocols or media compression coding standards in multimedia communication (eg H.261, H.263, MPEG-1/-2/-4, MP3) Etc.), the IETF will develop a specification file for the RTP payload (Package) packaging method for the protocol, detailing the method of RTP encapsulation of large packets, which is optimized for this specific protocol.
  • the corresponding IETF standard for H.264 is RFC 3984: RTP Payload Format for H.264 Video 0
  • This standard is currently the main standard for H.264 video stream transmission over IP networks, and is widely used. In the field of video communication, the products of major manufacturers are based on RFC 3984, and it is currently the only H.264/RTP transmission method.
  • H.264 defines a new layer, called Network Abstract Layer (AL), which is a standard that makes it standard.
  • the interface opens up the underlying business capabilities and shields the underlying network from the differences and abstracts the business capability layer.
  • H.264 is designed to increase the separation of its video coding layer (VCL, Video Coding Layer) and the following specific network transport protocol layer.
  • VCL Video Coding Layer
  • Sexuality bringing greater application flexibility, defines a new layer of NAL, which is an early ITU-T video compression coding protocol such as H.261, H.263/H.263+/H.263++ There is no such thing.
  • how to design a more efficient and better solution for the advantages of H.264 in the NAL and RTP protocol bearer cooperation makes RTP better for H.264, practical, and worthy of study.
  • the method of RTP carrying the NAL layer data of H.264 proposed by the RFC3984 specification is the current mainstream transmission method.
  • the scheme encapsulates the NAL layer data in the RTP payload for carrying on the basis of the RTP protocol (RFC 3550).
  • the NAL layer is located between the VCL and the RTP, and specifies that the video bitstream is divided into a series of NAL data units (NALU, NAL Units) according to defined rules and structures.
  • the RTP payload format for the NALU is defined in RFC3984. The following is a brief introduction to the frame format of the RTP and the encapsulation method of the NALU in the prior art.
  • RTP is typically carried over the UDP protocol to take advantage of its multiplexing and verification capabilities. If the underlying provides multipoint distribution, RTP supports multiple address transfers. Features provided by RTP include: payload type identification, sequence numbering, timestamp, and send monitoring.
  • the detailed structure of the header information of the RTP packet is shown in Figure 1.
  • the front-to-back RTP header information shown in Figure 1 is: The first byte is the field about the header information structure itself, the second byte is the defined payload type, and the third and fourth bytes are the packet sequence number ( Sequence Number ), the 5th-8th byte is the timestamp (timestamp), the 9th-12th byte is the Synchronous Source Identifier (SSRC ID), and finally the Gonen Source Identifier (CSRC Ids, Contributing Source Identifiers) The list of ) is uncertain.
  • the first 12 bytes appear in all different types of RTP packets, while other data in the header information, such as the contribution source identifier, is only available when the mixer is inserted.
  • the V field is the version (Version) information, which is 2 bits.
  • VOIP voice IP
  • the P field is a padding flag (Padding), which is 1 bit. If P is set, it indicates that the packet contains one or more padding bytes (Padding) at the end, and the padding does not belong to a part of the payload;
  • the X field is an extension identification bit (Extension), which occupies 1 bit.
  • Extension extension identification bit
  • the format of the header extension is described in detail in section 5.3.1 of RJFC3550.
  • the CC field is the number of contributing sources (CSRC Count), which is 4 bits, indicating the number of CSRC identifiers at the end of the header information.
  • the receiver can determine the length of the CSRC IDs list following the header information according to the CC field.
  • the M field is a marker bit (Marker), which occupies 1 bit.
  • the interpretation of the identifier bit is defined in a specific profile. It allows identification of important events in the packet stream, which are specifically agreed by the communicating parties and are not subject to agreement. Limited
  • the PT field is a payload type (PT, Payload Type), which is 7 bits in total, identifies the format of the RTP payload and determines his interpretation in the application; it can also perform dynamic negotiation through signaling other than RTP to define the PT value and media. The relationship between the formats.
  • the RTP source can change the PT.
  • the next field is the sequence number of 16 bits, which the receiver can use to detect packet loss and recover the packet sequence.
  • the time stamp occupies 32 bits, which reflects the sampling time of the first byte in the RTP packet, and the receiver adjusts the media playback time or synchronizes according to it.
  • Synchronization source The SSRC ID is 32 bits. The specific value can be randomly selected to uniquely identify a media source. If a source changes the source transmission address, a new SSRC flag must be selected.
  • the source CSRC list Contribute the source CSRC list and set it to SSRC or CSRC.
  • the CSRC ID is inserted by the mixer.
  • RTP packages the NA.Package of H.264 into an RTP packet stream.
  • the NALU is mainly defined in the RPC 3984 file and is given based on this.
  • the RTP encapsulation format of this NALU is shown in Figure 2.
  • Figure 2 shows the encapsulation structure of a NALU in the RTP payload, including NALU header information, NALU data content, and multiple NALUs that are filled end-to-end into the payload of the RTP packet.
  • NALU header information is the first byte, and there are three fields. The meaning and full name are respectively described as follows:
  • the F field is defined as a forbidden bit (forbidden_zero-bit), which is 1 bit, used to identify grammatical errors, etc., and is set to 1 if there is a syntax conflict.
  • a forbidden bit forbidden_zero-bit
  • the NRI field is defined as a NAL reference identifier (nal_ref_idc), which is 2 bits, and is used to indicate the degree of importance of the NALU data.
  • a value of 00 indicates that the content of the NALU is not used to reconstruct the reference image for inter prediction, instead of 00.
  • Indicates that the current NALU is important data such as a slice or a sequence parameter set (SPS) and a picture parameter set (PPS, Picture Parameter Set) belonging to a reference frame. The larger the value, the more important the current NAL is;
  • the Type field is defined as NALU type (Nal_unit_Jype), which is 5 bits in total. There can be 32 types of NALU. The correspondence between the value and the specific type is given in Table 1.
  • the information given in one byte of the NALU header information mainly contains the validity and importance level of the NALU. Based on this information, the importance of the data carried by the RTP can be determined.
  • RTCP Quality of service
  • QoS quality of service
  • RCTP control protocol
  • RTCP is mainly used for control and reporting of the RTP protocol.
  • the underlying protocol provides multiplexing of data and control packets (eg, using separate UDP port numbers, etc.).
  • RTCP packets The types and structure of RTCP packets are described below.
  • the following RTCP packet types are defined in RTCP to carry a variety of control information: sender report (SR, Sender Report), statistics on the transmission and reception of the active sender; receiver report (RR, Receiver Report) from Participants who are not active senders receive statistics; resource description items (SDES, Source Description), which include CNAME; participant end (exit) identifier (BYE); special function (APP, Application-specific fUnction).
  • sender report (SR, Sender Report), statistics on the transmission and reception of the active sender
  • receiver report RR, Receiver Report
  • SDES Source Description
  • BYE participant end (exit) identifier
  • APP Application-specific fUnction
  • the V field is version information
  • the P field is a padding flag bit (Padding);
  • the RC field is the Receive Report Count (RC, Reception Report Count), indicating the data.
  • RC Receive Report Count
  • the PT field is the packet type (PT, Payload Type);
  • the SSRC of the sender indicates the Synchronous Source Identifier (SSRC) of the initiator of the SR packet, where the synchronization source uniquely identifies a media data source, such as the source of the video;
  • SSRC Synchronous Source Identifier
  • NTP timestamp field is Network Time Protocol (NTP), which indicates wall clock (absolute date and time), used in conjunction with RTP timestamps;
  • NTP Network Time Protocol
  • the RTP timestampe field is an RTP timestamp, that is, a timestamp generated by the RTP protocol;
  • the sender's packet count field indicates the total number of RTP packets transmitted by the sender from the time the transmission is established to the generation of the SR packet;
  • the sender byte count field indicates the total number of bytes (not including headers or padding) that the sender transmits in the RTP packet during the generation of the SR packet (excluding header or padding). This field can be used to estimate Average rate of load;
  • the following fields contain zero or more receive report blocks, each of which receives the statistics of RTP packets received from a single sync source, including: fraction lost; cumulative lost packets;
  • the received maximum extension sequence number and arrival delay jitter all reflect the network transmission status
  • the last SR (LSR, Last SR) is 32 bits, which is the timestamp flag reported by the SR on the source, which is the middle 32 bits of the NTP of the previous SR;
  • the delay since last SR (DLSR, Delay since Last SR), which is 32 bits, refers to the length of the interval from the last SR to the SR. This parameter is used to calculate the key parameters of the QoS report.
  • the difference between the Receive Report (RR) packet format and the Transmit Report (SR) is:
  • the value of the Packet Type field is 201; there is no sender information portion.
  • RTCP the functions of RTCP are as follows:
  • RTCP transmits a permanent transport layer identifier for each RTP source, called the canonical name (CNAME, Canonical Name).
  • CNAME Canonical Name
  • the SSRC identifier may change when a conflict is found or the program is restarted, so the receiver needs to track each participant through CNAME;
  • the QoS report is transmitted by using the RTCP protocol, and the QoS information is reported according to the report content specified by the RTCP protocol, and the QoS monitoring for the bearer media such as H.264 is implemented based on this.
  • RTCP brings the ability to provide QoS reporting mechanisms
  • the use of periodic reporting methods results in additional network bandwidth overhead, up to 5%. If the network is congested, resulting in a drop in the transmission QoS, the extra traffic generated by the RTCP will make the situation worse.
  • H.264 is the main video protocol for multimedia communication in the future.
  • the network of future multimedia communication applications is mainly IP-based packet switching networks and wireless networks.
  • the IP network implements "best effort" transmission and does not guarantee the QoS of the transmitted video data. Especially for the H.264 code stream that has been efficiently compressed and encoded, the problem is more prominent. Best-effort delivery over IP networks does not guarantee QoS, packet loss, latency, and latency jitter for real-time video communications. It has an impact on the quality of the restored video.
  • Error Resilience is the ability of a delivery mechanism to prevent errors from occurring or to be corrected with certain capabilities after an error has occurred. In a multimedia communication environment, it is critical that a video delivery mechanism is resilient to fault.
  • FEC Forward Error Correction
  • ARQ Automatic Retransmission Request
  • JSCC Joint Channel Coding
  • -Channel Coding interleaving and eliminating bit error spread.
  • FEC Forward Error Correction
  • ARQ Automatic Retransmission Request
  • JSCC Joint Channel Coding
  • -Channel Coding interleaving and eliminating bit error spread.
  • the use of multiple error correction coding to encode the data to be protected essentially forms data redundancy, thereby increasing the ability to resist errors.
  • the main error of the packet on the network is the packet loss error, which is called Erasure Error in the error correction coding theory.
  • Error correction codes for deletion errors are a large class called Erasure Codes.
  • the so-called erasure code is to divide the data stream sequence into segments of the same size (Unit), also called data nodes (Data Nodes).
  • the data nodes are calculated to generate check nodes (Parity Nodes or Check Nodes).
  • the check nodes may continue to operate to generate the second layer check nodes, and so on.
  • the third layer, the fourth layer can be generated up to the Nth layer check node.
  • the number of nodes on each layer is decremented according to a certain rule with respect to the previous layer, thus forming a layer-by-layer hierarchical multi-node structure. It can be visually represented as a pyramid that turns 90 degrees to the right. The leftmost side is the data node layer, and the right side is the first layer of the face node, the second layer of the check node, ..., the Nth layer check node.
  • the erasure code has a very important property, that is, the time complexity required for processing has a linear relationship with the number n of data nodes, so it is called linear-time.
  • Many other erasure codes such as the famous Reed-Solomon code, require much more time complexity, on the order of n*log2n*log(logn). Therefore, the erasure code with linear time is much better used in real-time communication.
  • Tornado code Tornado erasure code
  • multiple check node layers are generated layer by layer from the data nodes. Both the check node and the data node are sent by the sender to the receiver through the network. If some nodes are lost during the network transmission process, because the upper node participates in the generation of the lower node, the information of the upper node is already included in the lower node and the lower node, so the information of the lost node can pass the lower level of sufficient majority. The node or lower node is fully recovered. Let the number of data nodes be n, and the number of check nodes generated is 1.
  • Figure 4 shows the relationship between a typical Tornado code data node and the check nodes of each layer. The line between the nodes in the figure is called the edge, and the node on the left side of the edge participates in the calculation of the right node. It can be seen that there is a many-to-many logical relationship between the two nodes before and after.
  • the higher the code rate and the higher the redundancy rate the higher the efficiency of the erasure code.
  • the structure and performance of Tornado code are mainly determined by three factors: (a) the number of data nodes and the law of layer-by-layer scaling, which is generally scaled down in equal proportions; (b) the calculation method for generating the next layer of nodes; (c) The relationship between two adjacent nodes.
  • the number of data nodes is set to n
  • the number of check nodes is set to m
  • the scale of scaling is set to p
  • the number of check nodes is i
  • the front i-1 layer the number of check nodes, respectively np, np 2, ..., ⁇ ⁇ , i.e., the last layer of the number of i-th layer as npV (l- p), so that the total number of nodes obtained
  • the most commonly used calculation method in the Tornado code generation process is the exclusive OR operation, because the XOR operation has 4 convenient recovery functions.
  • XORing by bit to obtain the same long bit sequence C has the following properties: A and C X or O, B and C X or O; the same for the XOR between multiple sequences, there are The corresponding recovery method. It can be seen that after the XOR operation, the data nodes or the check nodes are connected with each other, and after any node is lost, it can be restored by all the remaining nodes. Since the final layer of check nodes has different scaling ratios, it is generally calculated using a conventional error correction coding strategy, such as a Reed-Solomon code.
  • the front and back layers of the Tornado code have an association relationship, that is, which node of the lower layer is calculated by which nodes of the previous layer.
  • a two-part graph is formed between the two nodes before and after, and the association between the nodes in the front and back layers is determined according to the association between the left and right nodes of the two-part graph.
  • the parameters n, m, i, p, etc. are determined by given protection capabilities and other requirements, such as the reasonableness of the data node size, the maximum acceptable network delay, etc., and given the node degree vector. Randomly distributed, and can be encoded in Tornado.
  • erasure codes In fact, the range of erasure codes is very large. Tornado codes are only one of them. In addition, there are RS (Reed-Solomon) codes and Low Density Parity Codes (LDPC).
  • RS Random-Solomon
  • LDPC Low Density Parity Codes
  • An important performance indicator of the erasure code is its error correction capability (or protection capability), which is directly reflected in the maximum number of lost packets allowed under the packet loss error (on the premise of a certain number of packets), or The package is able to correct the percentage of the package correctly above this maximum allowable number.
  • error correction capability or protection capability
  • the higher the protection the higher the redundancy rate under the same conditions.
  • the protection capability is not only applicable to erasure codes, but on a larger scale, all FEC codes can be measured by protection capabilities.
  • some data are relatively important, such as structural parameters of video sequences, structural parameters of images, header information, etc.
  • Other data are relatively less important, such as image content data.
  • FEC FEC
  • a more robust code is used for relatively important data, and a weaker code is used for relatively unimportant data. This balances protection and efficiency.
  • UEP Unequal Protection
  • QoS guarantee for video communication services is easily realized by unequal protection.
  • the idea of unequal protection is to protect data with different importance (relative) in multimedia data with different protection/protection strength protection mechanisms.
  • Different protection mechanisms can refer to large or small classes. For example, large classes differ in principle, and small classes differ only in structure or parameters.
  • Hierarchical protection is to divide the protection mechanism into multiple levels according to the protection ability.
  • Hierarchical protection is actually an adaptive strategy. The combination of protection and hierarchical protection forms. More complex and powerful protection strategies.
  • the existing anti-drop error smear can be roughly divided into two categories: (a) Active error-proof type: Take pre-protection measures, such as introducing a redundancy mechanism, try to ensure that the data packet is not lost or that the receiving end can recover a small amount of loss. (b) Error compensation type: Take certain compensation measures in case of error, for example, in the case of serious deterioration of network conditions, the packet loss rate is very high, and the active error prevention method loses its effect. The error is compensated.
  • error compensation method for error compensation is divided into two types: error masking and error spreading.
  • error concealment is focused on compensating the current impact of the error, and the error re-distribution elimination is to eliminate the subsequent influence of the error in spatial and temporal diffusion.
  • Error concealment can also lead to the spread of bit errors.
  • the codec and decoder decoding image cache contents do not match, resulting in the spread of bit errors in the time domain.
  • the existing H.264/RTP transport architecture and the RTCP-based QoS reporting method use RTP to directly encapsulate the NALU for transmission, and use the RTCP SR/RR report to monitor QoS information.
  • the related technical details have been introduced.
  • Tornado code used in the prior art is a relatively complicated solution.
  • the data transmission protection method based on H.261/H.263/H.263+/H.263++/H.264 video compression coding is implemented by using Tornado code.
  • the existing error elimination methods are independent error concealing methods or error diffusion elimination methods, and the error concealing methods include time domain masking, spatial domain masking, and space-time joint masking.
  • the error spread elimination has intraframe coding, identification, adaptive intra block refresh, and the like.
  • the time domain masking method uses the information of adjacent frames on the time axis to estimate the missing data.
  • the following methods can be used: Simply replace the missing data with the data of the same position of the adjacent frame; Consider the motion prediction factor, and perform motion prediction based on the adjacent frame data. In addition to this there are more complicated masking strategies, but the amount of calculation is very large.
  • the spatial domain masking method utilizes spatially adjacent regions of the lost data region for error concealment. This method is computationally intensive.
  • the space-time joint masking method is a combination of spatial domain and time domain error concealment. Or, combine spatial data and time data to cover up together.
  • the error code diffusion elimination method based on intraframe coding adopts intraframe coding for macroblocks affected by bit errors, that is, using the forward dependence of motion vectors to perform accurate error tracking, and adopting frames for macroblocks affected by error codes. Internal coding can effectively prevent bit error diffusion.
  • multi-level protection and unequal protection are not realized because there is no convenient solution for providing network condition monitoring and description of the relative importance of data.
  • the Tornado code scheme in the prior art is too complicated and inefficient, and is applied to the protection of video data, and the delay is large, which cannot meet the performance requirements of real-time communication.
  • the prior art protects the video communication stream with a fixed erasure code strategy, and cannot adapt to network communication changes; the alternative mechanism adopted by the error concealment method may cause error diffusion; the error diffusion elimination method requires a complicated mechanism or an additional feedback channel.
  • the system consumes resources and network bandwidth resources.
  • the header information of the NALU is completely encapsulated in the payload, so that the RTP protocol cannot directly know the attributes, levels, importance, and the like of the payload, and thus the QoS mechanism based on this cannot be implemented.
  • such an encapsulation format also causes the NALU header information to occupy the payload resources, because each NALU has header information, which results in many cases, because the header information of multiple NALUs of the same type in an RTP is the same. , thus wasting RTP transmission bandwidth resources.
  • the H.264/RTP multimedia communication framework uses a generic coordination control protocol RTCP to transmit QoS reports for QoS monitoring.
  • RTCP itself is not necessarily the most suitable for specific video communication applications such as H.264. Its own out-of-band re-opening logical channel to transmit QoS reports affects network conditions and leads to conflicts.
  • the key point is that the prior art does not implement a fault-tolerant and flexible protection strategy of the transport layer, and cannot provide the reliability and communication quality of multimedia transmission.
  • a primary object of the present invention is to provide a multimedia communication method and terminal thereof that improve transmission reliability and communication quality.
  • the embodiment of the invention provides a multimedia communication method, including:
  • the transmitting end selects the encoding mode according to the fault-tolerant elastic protection policy, encodes the multimedia data, and sends the encoded multimedia data encapsulated by the real-time transport protocol to the receiving end; the receiving end receives the multimedia data, if the receiving The multimedia data has a transmission error, and the receiving end restores or partially restores the multimedia data of the transmission error.
  • the receiving end collects the communication quality, generates a quality of service report, and sends it back to the Delivery end
  • the sending end adjusts the fault tolerant elastic protection policy according to the quality of service report. More suitably, further comprising:
  • the receiving end transmits error information according to the multimedia data of the transmission error, and implements an error concealment strategy
  • the receiving end feeds back the transmission error information to the sending end;
  • the transmitting end implements an error diffusion elimination strategy according to the transmission error information.
  • the real-time transport protocol header information carries code-related information
  • the receiving end recovers or partially recovers the multimedia data according to the code-related information.
  • the transmitting end obtains the positioning information of the lost strip according to the transmission error information, and performs segment-by-frame intra-frame coding on the lost strip to implement the error spreading elimination strategy.
  • a multimedia communication terminal has a basic function module for implementing multimedia communication, and includes a codec module for implementing a multimedia codec function, and further includes:
  • the fault-tolerant and flexible transmission control protocol module is configured to receive the multimedia data encoded by the codec module, perform fault-tolerant and flexible protection on the multimedia data, and send the fault-tolerant and elastic-protected data to the network side for transmission.
  • the fault-tolerant elastic implementation transmission control protocol module is further configured to receive the multimedia data on the network side, perform error correction on the multimedia data, and transmit the multimedia data to the codec module for decoding.
  • the terminal further comprises:
  • the protection method and the policy negotiation module are configured to perform fault-tolerant and flexible protection policy negotiation between the two communication parties, determine a protection policy set, and implement the transmission control protocol module selection for the fault-tolerant elasticity;
  • the terminal further comprises:
  • Error masking module for implementing error concealment function
  • the codec module is used to implement codec of the H.264 codec standard, and is also used for error diffusion elimination function;
  • a network condition analysis calculation module is also included for analyzing the calculated network condition and providing information to the error masking module and the codec module.
  • the terminal further comprises:
  • a supplementary enhanced message extension processing module is configured to implement a quality of service report and a network status report function, and send the report to the network status analysis calculation module.
  • the technical solution of the present invention adopts a fault-tolerant elastic real-time transmission protocol (ERRTP), and provides a transport layer encapsulation format that can carry information related to a fault-tolerant elastic coding scheme on the basis of the existing RTP, so that the multimedia data is in the ER.
  • ERRTP fault-tolerant elastic real-time transmission protocol
  • the TP transmits the corresponding fault-tolerant elastic coding scheme information at the same time, thereby integrating the fault-tolerant elastic mechanism into the transport layer; giving a dedicated ERRTP encapsulation method and protocol header information transformation scheme for the H.264 NALU structure, by using the same ERRTP
  • the header information bytes of all NALUs in the packet are combined into their header information, using a clever combination that does not affect the operation of the existing ERRTP protocol and device, and can directly reflect the attributes of the NALU payload in ERRTP.
  • the bearer efficiency is greatly improved, and on the other hand, the basis of the implementation of the QoS mechanism is provided;
  • the communication quality is measured by the receiving end and fed back to the transmitting end, and the extended message mechanism of the high-level media protocol H.264 itself is directly used to carry the QoS report information, avoiding the use of additional channels, and realizing a kind of "" In-band 'QoS reporting mechanism;
  • the unequal protection and the multiple fault-tolerant flexible schemes are alternately mixed, and the transmitting end selects and uses different levels according to the QoS report fed back by the receiving end and the related network transmission status message.
  • Protection strategy based on the data importance level reflected from the ERRTP header information, you can also select different levels of data. Use appropriate protection strategies;
  • An efficient Tornado code scheme is also provided. By ensuring that the data transmission protection capability is not significantly degraded, by setting an erasure code having only one layer of check nodes, the erasure code generation check node layer is reduced. The amount of calculation reduces the data transmission delay time, so that the data transmission protection performance and cost ratio are improved;
  • the above various multimedia communication related enhancement technologies are integrated on the multimedia communication system, and various technologies and protocol architectures are modularized, and various technologies work in coordination with each other to further enhance the reliability of multimedia communication.
  • the transmission structure saves the network transmission bandwidth; the realization of the unequal protection achieves the balance between protection capability and transmission efficiency, facilitates the realization of QoS guarantee for multimedia transmission, further improves the quality of service, reduces redundancy, and improves transmission efficiency. Achieving compatibility with the prior art has improved the robustness of the new method of ERRTP;
  • 1 is a schematic diagram showing the structure of a header information of an RTP data packet
  • FIG. 2 is a schematic diagram of a package format of an RTP packet payload to NALU data
  • Figure 3 is a schematic diagram of a format of a QoS report data packet based on the RTCP protocol
  • Figure 4 is a schematic diagram of the Tornado erasure code principle
  • FIG. 5 is a schematic structural diagram of a module supporting a fault tolerant elastic multimedia communication terminal according to a first embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a multimedia communication protocol stack according to a first embodiment of the present invention
  • FIG. 7 is a schematic diagram showing a header information structure of an ER TP data packet according to second and third embodiments of the present invention
  • FIG. 8 is a schematic diagram of an SEI encapsulation format for carrying a QoS report according to a fourth embodiment of the present invention.
  • Figure 9 is a diagram showing the principle of error spread elimination based on segmented successive intra coding according to a sixth embodiment of the present invention.
  • FIG. 10 is a block diagram showing the structure of the erasure code of the present invention. DETAILED DESCRIPTION OF THE INVENTION
  • the present invention integrates various enhancement techniques on a multimedia communication system, combining the respective advantages of various enhancement techniques to improve system performance, transmission reliability, and communication quality.
  • enhancements include the fault-tolerant elastic real-time transport protocol (ERRTP, Error Resilience Real-time Transport Protocol) that integrates FEC into the RTP protocol.
  • ERTP Error Resilience Real-time Transport Protocol
  • the invention combines various enhancement technologies and combines them in a multimedia communication system to realize fault-tolerant and flexible H.264 video communication, and the system includes a general control module and a user. Interfaces, network communication modules, I/O and underlying driver modules, various service modules, communication process control modules, application protocol modules, etc., and also include protection methods and policy negotiation modules for implementing various enhancement technologies, FEC modules, ERRTP modules, RTCP module, H.264 NAL module, I- 1.264 encoder module, H.264 decoder module, audio codec module, error masking module, SEI message extension processing module, network condition analysis and calculation module.
  • a plurality of enhancement technologies are implemented and modularized in a multimedia communication system, mainly referring to a multimedia communication terminal.
  • the implementation description of the device is performed from each component function module of the terminal, and a complete terminal internal module structure diagram is as follows.
  • Figure 5 shows. It should be noted that the functional modules mentioned here are all defined functionally, and the specific implementation manners may be software, hardware, firmware, and a combination of software and hardware.
  • a complete multimedia communication terminal must first contain the following modules:
  • Main control module responsible for the control of the entire terminal system
  • User interface (or interface) module Responsible for user input and output interaction, the user operates through interface control elements such as menu buttons, and displays feedback information such as current system status, parameters, network status, etc.
  • Network communication module responsible for communication with the network, providing TCP, UDPJP and lower communication protocol stacks such as Ethernet, PPP, ATM, etc.;
  • I/O and underlying driver modules responsible for driving hardware devices, such as video, audio capture devices and display/playback devices, and for video and audio data input and output;
  • Various business modules implement various specific services, such as videophone, multi-party conference, video mail, timely news, video chat, etc.
  • Communication process control module Controls in the specific communication process, such as implementing the application chairperson in the multi-party conference, releasing the chairman, Shenqi speaking, controlling the broadcasting of a certain venue, the venue browsing, etc.;
  • Application protocol module It can be a specific application protocol such as H.323 system (including H.225.0, RAS, H.245, H.235, H.460, etc.) and SIP.
  • H.323 system including H.225.0, RAS, H.245, H.235, H.460, etc.
  • SIP Session Initiation Protocol
  • Protection method and policy negotiation module The module is responsible for negotiating the protection method between the communication parties, determining the allowed set, and then negotiating a set according to the allowed set. The strategy of mixing and alternating use of protection methods. The negotiation is completed through the "application protocol module".
  • the module controls the FEC module, the latter implements different FEC protection modes, functions such as unequal protection and adaptive hierarchical protection;
  • FEC module This module supports a variety of FEC protection methods. They can be subclasses in multiple categories. It is assumed that a total of T different methods are supported. According to the results of the negotiation (from the "Protection Method and Policy Negotiation Module"), H.264 video data and audio data (not in the scope of this patent) are protected. The module internally stores the generation rules and parameters for the various FEC subclasses, so it contains an internal database for storing this data. This module enables mixing and alternate application of different protection methods;
  • ERRTP module Implementing the ERRTP protocol, the protocol encapsulation format for ERRTP and the related encapsulation decapsulation steps corresponding to H.264 will be described in detail in the following embodiments;
  • RTCP module Implements the normal RTCP function. Although the present invention provides a reporting mechanism based on the H.264 SEI message extension, the main RTCP information can be reported, but the use of RTCP is not excluded, and the two reporting mechanisms can coexist. Mainly considering compatibility and interoperability, the other terminal may not support the use of SEI message extension 4 advertising mechanism;
  • H.264 NAL module The function of implementing the H.264 network abstraction layer
  • H.264 encoder module In addition to realizing the normal H.264 encoder function, the error diffusion elimination function of the present invention is also implemented, so the information is derived from the "network condition analysis and calculation module,;
  • H.264 decoder module implements the normal H.264 decoder function
  • Audio codec module implement audio codec function, the supported protocol can be ITU-T
  • Error Masking Module Implements the error concealment function provided by the present invention. The information is based on the "Network Status Analysis Calculation Module” and the “H.264 Encoder Module”;
  • SEI message extension processing module implementing QoS and SEI message extension based on the present invention
  • the network status report function on the transmitting end, collects data to form RTCP SR, RR report, and then sends out through SEI extended message encapsulation; extracts RTCP SR, RR report from SEI extended message at the receiving end, and then sends the data to the network
  • the condition analysis calculation module "is analyzed and calculated;
  • Network condition analysis calculation module According to the data from the "SEI message extension processing module", perform analysis and calculation to obtain network status data, such as packet loss rate, jitter, delay, clockwise end-to-end bandwidth, etc., and then Use this data to control the "II.264 Encoder Module” and “Error Masking Module”, and also send this data to the "User Interface Module” which can be displayed to the user.
  • network status data such as packet loss rate, jitter, delay, clockwise end-to-end bandwidth, etc.
  • Fig. 6 is a block diagram showing the structure of a multimedia communication protocol stack in accordance with a first embodiment of the present invention.
  • the H.264/ER TP multimedia delivery architecture of the present invention differs from the traditional H.264/RTP architecture mainly in that:
  • SEI Extended Reporting Layer is added between the H.264 VCL layer and the NAL layer. This layer facilitates the implementation of QoS monitoring and network transmission status based on SEI extended messages.
  • the "FEC layer” is added between the H.264 NAL layer and the ERRTP/RTCP layer. This layer implements node partitioning, encoding, and encapsulation for the H.264 NALU data stream.
  • the first embodiment of the present invention provides a basic modular structure and a protocol stack composition as an example of a typical H.264 service, for other protocols.
  • the multimedia communication protocol or application that appears in the future only needs to implement the relevant technical details according to the specific application based on the principle of the present invention, and achieve the object of the invention without affecting the essence and scope of the present invention.
  • the present invention proposes an improved RTP protocol supporting fault tolerance resilience, which aims to integrate a fault-tolerant elastic mechanism into a transport layer protocol, which not only simplifies the transmission structure, reduces complexity, but also improves the fault-tolerant elastic mechanism. Flexibility enhances transmission reliability. Due to its fault tolerance, the present invention calls this improved RTP protocol a fault tolerant elastic real time transfer protocol (ERRTP or ER2TP, Error Resilience Real-time Transport Protocol).
  • ERRTP error tolerant elastic real time transfer protocol
  • ERTP Error Resilience Real-time Transport Protocol
  • the main difference between ERRTP and RTP is that the ERTP protocol packet header information extension can carry information about the fault-tolerant elastic coding scheme, such as FEC type, protection capability, and coding parameters.
  • the present invention conveniently realizes unequal protection. Firstly, various protection measures with different protection capabilities are available for selection, and then the sender can collect information such as network status and importance of multimedia data. These factors are used to select appropriate protection measures to achieve the goal of unequal protection and to achieve a balance between protection capability and transmission efficiency. Since the FEC related information is carried on each ERRTP data packet, the transmitting end only needs to fill in the information of the selected scheme into the ERRTP header information, and the receiving end can correctly recover or correct according to it. wrong.
  • the specific implementation method based on erasure code protection is given, including the steps of dividing, generating, encapsulating and decapsulating data nodes and check nodes.
  • a series of NALUs are equally divided into several data nodes, and then the Tornado code is used to generate the face nodes. All of these nodes are distributed in several ERRTP packets, and the receiver performs this inverse process.
  • the transmitting and receiving parties implement unequal protection based on ERRTP.
  • the main steps are as follows:
  • the transmitting end selects a fault-tolerant elastic coding scheme to perform erasure coding on the multimedia data.
  • ERRTP encapsulates the encoded multimedia data, and carries information related to the fault-tolerant elastic coding scheme in the ERRTP header information, and then sends the information to the receiving end;
  • the receiving end decapsulates the received ERRTP packet, and extracts the information about the fault-tolerant elastic coding scheme from the ERRTP header information, and then selects the fault-tolerant elastic coding scheme for fault-tolerant elastic decoding according to the information of the fault-tolerant elastic coding scheme to obtain the multimedia data.
  • the unequal protection is reflected in the fact that the transmitting end selects the fault tolerant elastic coding scheme according to the current network transmission status and/or the quality of service level of the multimedia data to be transmitted.
  • FIG. 7 is a diagram showing the structure of an ERRTP header information according to a first embodiment of the present invention.
  • the header information extension is also followed by the relevant information field about the fault-tolerant elastic coding scheme.
  • the fault-tolerant elastic coding type field, the fault-tolerant elastic coding parameter field, the packet length field, and the number of packets field are included.
  • the fault-tolerant elastic coding type field is used to indicate the erasure code type used by the fault-tolerant elastic coding scheme, and may also be referred to as an FEC Type field, that is, the FEC coding type is indicated, which is 4 bits, and can represent 16 different FEC types, from practical application. Medium is enough.
  • the types defined here are actually large types, and will continue to be subdivided into various schemes, called subtypes.
  • the large types in actual applications are, for example, 0010 for Tornado code and 0011 for RS code.
  • This field identifies 16 different types of FEC codes.
  • the LUT Look-Up Table
  • FECTypeLUT which is required by the two parties to agree on a correspondence between the FEC encoding type and the encoding type code
  • the fault-tolerant elastic coding subtype field is used to indicate the related parameter setting of the fault-tolerant elastic coding scheme. For each type of FEC coding, it is also necessary to determine the setting of various parameters to be specifically implemented. This field serves to clarify specific parameters. Since the resources in the ERRTP header information are limited, it is impossible to list specific parameters and their rules, etc. corresponding to various FEC encoding schemes, and the first embodiment of the present invention indicates various alternative parameters by using the concept of subtypes. Set the plan.
  • This field is also called the FEC encoding subtype field, FEC Subtype, which occupies 9 bits. This field mainly represents subtypes further subdivided under each of the large types defined in the FECTypeLUT.
  • MTU Maximum Transport Unit
  • the number of data packets is used to indicate the number of data nodes carried by the ERRTP packet, which is also called a Packet Number field, which occupies 8 bits. For example, after several NALUs are verified by the forward error correction code, the packet is encapsulated in multiple The number of data nodes carried in each ERRTP in ERRTP.
  • the decoding end or the network node can verify the received data packet according to the FEC code type and the check type of the data packet given by the field, and recover the lost data packet.
  • sub-type FEC Subtype field mentioned above has a total of 9 bits for encoding a parameter setting scheme indicating various alternatives, and how to perform the coding indication in the first embodiment of the present invention is given below. technical details.
  • the sending and receiving parties need to negotiate to determine the field indicating the relationship correspondence table.
  • the sender and the receiver negotiate to determine: for various types of FEC codes, the correspondence between the value of the FEC Subtype and the related parameter setting scheme of the FEC code indicated, and various alternatives. Specific parameter settings.
  • the sender and the receiver both establish a correspondence table according to the negotiation result, and are configured to query the corresponding FEC coding type or FEC codec processing module according to the FEC Type and FEC Subtype fields;
  • the transmitting end calls the corresponding erasure coding processing module to perform erasure coding
  • the receiving end calls the corresponding erasure decoding processing module to perform erasure decoding.
  • the so-called generation rule is a rule or algorithm (Algorithm) of how the data node is processed at the transmitting end to generate each check node. Of course, the opposite is done at the receiving end, such as If a packet loss occurs during transmission, that is, some nodes are lost, the lost node can be recovered or partially recovered according to the generation rule. It can be seen that the generation rule is very important information, according to which both parties of the communication can work based on the FEC mechanism.
  • Each of the FEC types listed in the FECTypeLUT has different generation rules; in each class, such as the Tornado code, the following subclass generation rules are combined with the specific generation parameters (generation parametei's). . So for each subclass here, the claim rule will be combined with the build parameters.
  • the generation parameters include the following data: the total number of data nodes, the total number of check nodes, the number of check node layers, the scaling ratio of the number of power saves between successive layers, and the association of node associations between successive layers.
  • Matrix if there is an L-layer check node, then such an associative matrix has L, or equivalent, bipartite of the relationship between successive two-layer nodes.
  • Parametric mathematical representation 0 In the case that the large generation rules are the same, the generation of parameters often determines the protection strength of the subtype.
  • Tornado code in the various generation parameters given above, the total number of data nodes and the total number of face nodes can basically determine the protection ability to a large extent (of course, strictly speaking, to fully determine the protection ability, all the generation is required. parameter).
  • some main parameters determining the maximum effect
  • representative generation parameters subclasses under the large class can be arranged in order of protection from weak to strong (ascending order).
  • creating a LUT is called FECSubTypeLUT.
  • Each large type specifically supports multiple subtypes below, and can have specific application and communication capabilities (CPU processing speed, memory, program complexity, etc.) and needs to be determined. If the communication environment changes a lot and the performance of the network fluctuates widely, then the subtypes that need to be supported are generally more, but less. This can be agreed upon by the communication parties through the capability negotiation process before the communication begins. Negotiation can be carried out through the current mainstream multimedia communication framework protocols such as H.323 or Session Initial Protocol (SIP).
  • H.323 Session Initial Protocol
  • each type of macro corresponds to a FEC processing module at the transmitting end, which is responsible for generating a check node; at the receiving end, it also corresponds to an FEC processing module, which is responsible for restoring the node.
  • both parties of the communication decide which FEC processing module to call and which generation parameters to read based on the information of the two information fields FEC Type and FEC Subtype.
  • the second embodiment of the present invention gives the NALU of H.264 with ERRTP.
  • the specific steps of the data stream for FEC encoding and decoding are as follows.
  • the sender sets multiple CiC to S) H.264 NALU merges into a group to perform unified coding transmission.
  • the S NALUs are re-divided into equal-length blocks, and the support is set to M. These M are data nodes. .
  • the S NARUs of ⁇ .264 are grouped into one group; then the S NALUs are concatenated end-to-end, connected to form a large block, and then the large block is equally divided into M data blocks, wherein Each data block has a length of K bytes.
  • the rounding operation should be performed so that the length of each data block is Ceiling (TB/M) bytes, and the Ceiling function indicates rounding. , that is, Ceiling(x) is equal to the smallest integer not less than X, and X is any real number. Then, in some data blocks, the operation of zero padding may be used, so that the number of bytes is equal to Ceiling (TB/M).
  • F data encoding is performed on the M data nodes to obtain N check nodes.
  • FEC code encoding for M data blocks to generate N school face blocks, the generation process uses the method described above to determine which FEC processing module to call for the generation of the check block according to the FEC Type and FEC Subtype information.
  • the sender encapsulates all data nodes and check node packets in the ERRTP packet for transmission.
  • the fields should be set as follows:
  • Type field FEC Type 0010, indicating the use of Tornado code
  • Packet Number (M+N)/P , which represents the number of data nodes carried in an ERRTP payload.
  • the receiver After receiving the ERRTP packets, the receiver decapsulates the data nodes and the check nodes.
  • the receiving end starts with P packets and starts decoding and recovering every time a group of P packets is received. How many packets of a group are determined by mutual agreement.
  • the receiving end performs fault-tolerant elastic decoding on the data node according to the check node.
  • the processing module decodes and recovers or partially loses data. Finally, after obtaining the complete data node, re-merge to obtain a large block, and divide the S NALUs in the same way as the sender.
  • the above example uses the ERRTP-based anti-data packet loss algorithm, which can greatly improve the anti-data packet loss capability of the video code stream when the codeword is less than 17%.
  • the RTP payload header structure only 4 bytes have been added, which shows that there is basically no effect on the transmission efficiency, and significant practical results have been achieved.
  • Another key technical point that has been mentioned above with respect to the present invention is the implementation of unequal protection. It is mainly embodied in two aspects. One is to select the appropriate coding scheme or parameters according to the multimedia data of different important levels, that is, to determine the aforementioned FEC coding type and subtype, and the other is to select according to the network conditions at different times. Corresponding to these two aspects, they are called mixed and alternate use of various FEC coding schemes. Hybrid refers to the simultaneous use of multiple FEC subtypes at the same time, mainly for protecting data of different importance. The so-called Alternation refers to the use at different times (different network conditions). Different FEC subtypes.
  • the header byte reflects the importance of the data, so the sender can evaluate the QoS level according to the NRI field or Type field in the NALU header information, and then select the fault-tolerant elastic coding scheme. , that is, the FEC Type field and the FEC Subtype field are determined.
  • the general network transmission has a corresponding network condition monitoring mechanism. The transmitting end can learn the transmission report fed back by the receiving end according to these mechanisms, thereby evaluating the network transmission status, and then selecting the fault-tolerant elastic coding scheme, that is, determining the FEC Type. Field and FEC Subtype fields.
  • the H.264 code stream is transmitted or stored based on the NALU, which consists of NAL header information and NAL payload.
  • NALU which consists of NAL header information and NAL payload.
  • different NALU types have different effects on decoding and restoring images. For example, if NI takes 0, it means that a Slice or Slice data strip of a non-reference image in the NALU does not affect subsequent decoding; and NRI takes a non-zero to indicate that a sequence/image parameter set or a slice of the reference image is stored in the NALU. Or slice data strips, which will seriously affect subsequent decoding.
  • the data of H.264 can be classified into two types according to the values of NRI or Nal_unit_type: One type is a relatively important image.
  • the data for example, Nal_ref_idc is equal to 1
  • the other is secondary image data (for example, Nal_ref_idc is equal to 0).
  • the important image data is protected by the FECI code with high redundancy and strong anti-dropping capability; and the secondary image data can be protected by the FEC2 code with less redundancy and weaker packet loss resistance. .
  • FEC1, FEC2 are just general representations, representing any two subtypes. These two seed types can belong to the same large type or to different major types.
  • the above method can be extended to a more general case, and the data is divided into more classes according to the value of NAL_unit-type, for example, five categories: the most important data, the second most important data, the general important data, the less important data, The least important data; can also be divided into 7 categories or more, then, can be protected with the same number of FEC subtypes, each type of data corresponds to a different subtype. As long as the protection ability is weak to strong, these subtypes do not necessarily belong to the same large type.
  • the image information that has not been restored after the protection of the FEC code with the strongest protection ability adopts techniques such as error concealment and prevention of error diffusion.
  • Another situation in which unequal protection is also within the scope of the present invention is the ability to select FECs of different protection capabilities depending on the real-time conditions of the network.
  • the ERRTP header information is then used to inform both parties of the communication so that they can correctly decode the data and recover the lost data.
  • the image information that has not been recovered after the protection of the FEC code with the strongest protection is error masking and error-preventing techniques are adopted. Perceived network conditions can be achieved through various existing QoS monitoring methods.
  • the data importance level and the network status level are in ascending order.
  • the subscript of FEC is represented by a two-dimensional subscript, and the fault-tolerant elastic mechanism FEC(i), 0 ⁇ i ⁇ U, 0 ⁇ j ⁇ V, in the table may be any one of the above T FEC schemes.
  • an improved Tornado erasure code is specifically employed.
  • the improved Tornado erasure code generates only one layer of the check node for a group of data nodes.
  • the coding delay is greatly reduced to meet the requirements of real-time communication.
  • the use of FEC code packet protection introduces a delay, the size of which is related to the size of the image data packet.
  • the S NALUs are grouped into one group, and one NALU contains a stream data of a Slice. If a frame of image is divided into a slice, the encoding end will have the delay of the S frame, and the decoding end will also have the delay of the S frame.
  • the relationship between NALU and the number of data nodes is as follows:
  • the delay of one frame of image T ""w is basically determined by the value of S, and the DataNode It also greatly affects the value of S. Therefore, under the premise of ensuring the ability of video communication to resist packet loss, the delay introduced by FEC is minimized, and the QoS of real-time video communication is further ensured.
  • the present invention employs an improved Tornado code protection algorithm in the case where the DataNode is limited.
  • the improved Tornado method does not use the encoding of multi-level even graphs, but only uses the encoding of a layer of check nodes.
  • the improved coding method greatly improves the flexibility of the algorithm.
  • the number of data nodes and check nodes can be arbitrarily set, and the complexity of the codec algorithm is also reduced. It can be used for real-time video communication.
  • Anti-packet loss In addition, the improved anti-data packet loss performance of the Tornado code is basically not reduced when the data node is limited. The specific principles and detailed steps of the improved Tornado coding method will be described in detail later.
  • ERRTP processes the same type of NALU and integrates the header information into the ERRTP header information.
  • the most basic difference from RTP is that in the ERRTP encapsulation process, the header information of the NALU packet with the same header information is integrated into the header information of ERRTP.
  • NALU header information structure has already been mentioned.
  • the NALU information includes: W occupies a 1-bit F field, which is used to indicate whether the NALU is in error;
  • a 5-bit Type field indicating the type of the NALU.
  • the execution steps of both the sender and the receiver are as follows.
  • the sender encapsulates multiple NALU data nodes or check nodes with the same header information in the same ERRTP packet in the ERRTP encapsulation format.
  • the first one can accumulate the same type of NALU until it is packaged into ERRTP after satisfying a certain number, and the other is the same. If the number of types of NALUs does not reach a certain number, the method of RTP padding is a waste of bandwidth, but this is insignificant. Another method is that if there are many NALUs of different types, you can use RTP encapsulation, anyway.
  • the receiving end can identify according to the ERRTP identifier and perform corresponding processing.
  • the same header information of the NALU carried by the NALU is integrated into the header information of the ERRTP packet, and the carried NALU is removed from the header information and then according to the aforementioned Process processing, partitioning, encoding, and encapsulation are populated into the payload of the ERRTP packet. So how do you integrate the NALU header into the ERRTP header? Two sets of solutions will be specifically given below to solve these several problems.
  • the N I field and the Type field in the NALU header information are filled in the PT field of the ERRTP header information, which has been described above, and the PT field is located after the second byte of the ERRTP header information.
  • the format of such an ERRTP header has been given in Figure 7, where the difference from RTP has been indicated in bold, and some places in the other figures are explained later.
  • the V field in the ERRTP header is used as the ERRTP identifier, which has been mentioned above; the F field in the NALU header information is filled in the M field of the ERRTP header information, and the M field is located in the first byte of the second byte of the ERRTP header information.
  • Bit, at the receiving end, according to the M field of the ERRTP packet it is judged whether the NALU carried by the ERRTP packet is in error, and the forbidden bit function of the F field is realized. It can be seen that the scheme can tell the receiver of the RTP data packet through the difference of the version.
  • the RTP protocol is ERRTP, so in the subsequent processing, it is necessary to follow the needle. The processing flow of the ERRTP protocol is performed.
  • the NALU header information byte (8 bits) is replaced by the identifier M field 1 bit in the original RTP header information and the PT field 7 bits in total 8 bits.
  • the specific replacement order can be like this:
  • NRI 2 bits replace the highest 2 bits of the PT 7 bits
  • Type 5 bits replaces the most 4 ⁇ 5 bits of the PT 7 bits.
  • the code-related information such as the FEC Type FEC Subtype Packet Number in the ER TP header is used to identify the coding mode used and the multimedia data packet.
  • the receiving end restores or partially restores the multimedia data according to the encoding related information.
  • the PT 7 bits are inherently free to use, as mentioned earlier.
  • the purpose of the M field is specified in RTP (RFC 3550) as follows: A specific profile (Profile) can specify not to use M bits, but to put it with A PT, so that the PT can have up to 8 bits, which distinguishes 256 different types. type. Therefore, replacing M bits with F bits is completely RTP-compliant and does not cause interworking between ERRTP and traditional RTP.
  • the package format of the ERRTP of the present invention has three obvious advantages: First, the overhead is small, especially when there are multiple NALUs in one RTP, the number of transmitted bits is obviously saved; Second, it is not necessary in the RP package. H.264 NALU data decoding can discriminate the relative importance of these NALUs. Third, without decoding the H.264 NALU data in the RTP packets, it can be identified whether the RTP packets will be correct due to other bit loss. decoding.
  • the 7 bits of the PT in the ERRTP header information are copied to the lowest 7 bits of a byte H (8 bits), and the highest bit of H is set to 0 as the F bit.
  • the generated H bytes are then appended to the top of each extracted NALU, thus restoring each NALU.
  • the F field in the ERRTP header is 1, it indicates that the NALU in the ERRTP packet is in error, so it can be directly discarded, and the processing time saved.
  • the second solution is given below, which is the same as the first one, that is, the NRI and Type fields in the NALU header are also filled into the 7 bits of the PT field of the ERRTP header.
  • ⁇ Use the M field to identify ERRTP One problem that comes with this is that the F field has no place to fill.
  • NALU still uses the original RTP transmission, and for normal, it uses ERRTP to transmit, but ignores the F bit. The specific details are as follows.
  • the M field is set to 1 to identify the ERRTP packet, which is located in the first 1 byte of the 2nd byte of the ERRTP header information.
  • F bits it is specified in the H.264 protocol: 1 if there is a syntax conflict or an error.
  • the network recognizes a bit error in this unit, it can be set to 1 so that the receiver drops the unit. It is mainly used to adapt to different kinds of network environments, such as wired and wireless combined environments.
  • the specific usage principle is: Generally, when the transmitting end and the receiving end of the communication perform H.264 encoding and decoding on the video, the bit is not "written, operated, and the decoding end performs a "read" operation on the bit.
  • the receiving end will discard the NALU during the decoding process.
  • the "write" operation for the F bit is mainly a gateway between two different networks. On the above, such as the case of encoding conversion (MPEG-4 to H.264, H.263 to H.264, etc.).
  • the present invention ignores the F bits and does not have to be defined with the original H.264.
  • the M field originally used to fill the F bits can be reserved, and the future extension carries more information, which is used to identify the ERRTP packet.
  • the present invention performs the following processing for this case: In the ERETP encapsulation format, the F field in the NALU header information is ignored; but on the transmitting end, the error NALU valid for the F field still uses the RTP packet. Encapsulation, only the normal NALU is encapsulated in ERRTP; at the receiving end, it is judged whether the packet is ERRTP or RTP packet and the packet is processed according to the corresponding encapsulation format.
  • the F bit when used in some special cases, it is used for the purpose of the original H.264 definition, that is, to indicate the possible H.264 NALU syntax error, if an intermediate device such as a gateway is in the When the video is video-encoded according to the H.264 protocol, it is found that a certain NALU has a syntax error, and then the NALU is separately encapsulated.
  • the sender first determines whether the F field in the header information of at least one NALU is valid, and accordingly divides it into a normal NALU and an error NALU;
  • the normal NALU is encapsulated into an ERRTP packet, and the ERRTP identifier is set; the error NALU is encapsulated into an RTP packet according to the RTP encapsulation format;
  • the receiving end first determines whether the header information of the received packet is set to the ERRTP identifier, and divides it into an ERRTP packet and an RTP packet;
  • the ERRTP packet is then processed according to the ERRTP encapsulation format, and the RTP packet is processed according to the RTP packet encapsulation format.
  • the gateway for the normal NALU according to the method described above, for the same type of H.264 NALU according to certain rules (determined by the specific application, mainly stipulates how many similar NALUs are encapsulated in each ERRTP packet) for ERRTP encapsulation
  • a regular RTP encapsulation is required for the NALU.
  • the regular RTP packet may contain only one H.264 NALU.
  • the biggest advantage of integrating the NALU header information into the ERRTP header information is that the multimedia transmission device can directly learn the relevant information of the NALU carried by the multimedia transmission device according to the ERRTP header information, and implement H.264 multimedia data accordingly.
  • Real-time delivery of QoS policies This is not possible in the existing RTP, because for the RTP layer, the NALU layer information is not concerned, and the head information of each NALU in the payload cannot be known, so that the QoS policy cannot be implemented.
  • the SEI On the basis of ERRTP, in order to achieve feedback from the receiving end, the SEI carries the enhanced technology of QoS reporting.
  • RTCP assumes the QoS reporting mechanism, but it is actually a general reporting method that can be used for reporting. QoS can also be used to report other information. For specific video communication applications, reporting with RTCP is not necessarily the most appropriate.
  • H.264 can be considered to carry the reported content. Based on this starting point, the present invention directly uses H.264 to carry QoS report information, which avoids the use of additional channels and implements an "in-band" reporting mechanism.
  • Another basis for transmitting QoS reports by H.264 higher layer protocols is that in current video communication applications, the adaptation measures for network transmission are mainly based on terminals, rather than network intermediate devices such as routers, switches or gateways. . Therefore, the encapsulation of the QoS report does not depend on the underlying protocol.
  • the terminal can understand the QoS report information carried in the H.264 to implement QoS monitoring, so it can be independent of the underlying RTCP and other protocols.
  • the "in-band" reporting mechanism of H.264 it does not mean to exclude the application of the RTCP reporting mechanism.
  • the two mechanisms can be used or coexisted, and the use of H.264 can reduce the reporting traffic of RTCP.
  • H.264 packets can take multiple protection measures and bear the QoS.
  • the reported H.264 packet which can be considered as important data, can be protected against high-intensity according to the principle of Unequal Protection (UEP). Thereby, the correct arrival of the report data can be ensured, and the reliability of the QoS monitoring is improved.
  • UDP Unequal Protection
  • the H.264-based extended message mechanism to carry QoS reports is roughly divided into the following three basic steps.
  • each multimedia communication terminal statistically generates a QoS report of H.264 multimedia communication.
  • the content of these reports may be the same as the SR and RR report contents of the RTCP, and may of course be different, but the described quality of service related to H.264 media communication. And information such as network status is consistent;
  • the terminal carries the QoS report by using the H.264 extended message and sends it to other communication terminals.
  • the H.264 extended message mechanism has been mentioned above.
  • SEI SEI
  • the SEI message is basically used by the present invention.
  • Later extensions of H.264 can also use other extended message payloads;
  • the terminal also receives the QoS report sent by other terminals while sending the QoS advertisement. In fact, each terminal will execute the QoS policy according to these QoS reports.
  • the present invention uses the SEI message to carry the QoS report.
  • the main content of the SR and RR reports of the RTCP can be directly used as the payload of the H.264 SEI message, and thus carried by the extended SEI message. these messages.
  • a specific SEI extended message is defined specifically for carrying QoS reports.
  • the invention stores the SR and RR report messages similar to RTCP in the SEI domain, which not only ensures the transmission efficiency, but also effectively feeds back the channel state and the decoded information, and facilitates the interactive anti-data packet loss between the encoding end and the decoding end.
  • the specific structure is shown in Figure 8, except that the header information is arranged according to the SEI message structure, and other QoS report contents are drawn from the format of the SR and RR reports of RTCP.
  • SEI Type The first byte (byte 0) is a payload type field (SEI Type), which is used to indicate that the payload is a corresponding QoS report.
  • the second and third bytes are the payload length field (SEI Packet-Length), which is used to indicate the corresponding QoS report length, which is the same as the length field in the RTCP QoS report;
  • SEI Packet-Length the payload length field
  • the 4th byte and later are the payload of the SEI message, that is, used to fill the corresponding QoS report.
  • the QoS report is also divided into the sender report and the receiver report.
  • the load type field indicates the difference, that is, the SEI Type value is different.
  • the specific content of the QoS report can be the same as the RTCP SR and RR reports, as shown in Figure 2:
  • V version information field
  • the padding field (P), which is 1 bit, is used to indicate whether there is padding content, the same as RTCP;
  • Receive report number field which is 5 bits, used to indicate the number of received report blocks reported in the QoS report
  • the sender SSRC field which is 32 bits, is used to identify the sender of the quality of service report
  • sender information block for describing the information about the sender of the report
  • a plurality of receiving report blocks are included for describing multimedia statistical information from different sources, each block containing the identifier of the source and related statistical indicators of the multimedia stream, and the meanings of various indicators have been described in the previous RTCP;
  • the content of the QoS report given in Figure 8 is basically the same as that of RTCP.
  • the RR and the SR are written into the SEI domain, the RTCP information can be transmitted without a dedicated logical channel, which saves part of the bandwidth overhead.
  • the essence of the present invention lies in the in-band bearer with the SEI message.
  • the statistical generation of the QoS report as long as the invention of the QoS monitoring can be achieved, the essence and scope of the present invention are not affected.
  • various QoS policies can be performed on this basis, for example, using the accumulated packet loss field of RTCP, which can be used for feedback decoding information in two-way video communication (the terminal has both an encoder and a decoder). For easy interactive anti-data packet loss.
  • the rate control algorithm can further ensure that the encoding end rate is nearly constant according to the information in the arrival delay jitter field; the sender byte count field can estimate the average rate of the payload, so that the sending end can reset the encoder parameters according to the network state. This includes adjusting the target frame rate, restoring the image quality, and the resolution of the original image.
  • H.264 data packets can adopt various protection measures after adopting H.264 "in-band" reporting mode, and can be considered as H.264 data packets carrying QoS reports.
  • Important data according to the principle of unequal protection, can be applied with high-intensity protection measures. This ensures the correct arrival of the report data.
  • the SEI for carrying the QoS report should be further carried by the NALU, and as described above, the NALU has a header information to set the importance of the content, so the communication terminal can set the NALU according to the reliability requirement of the QoS report transmission.
  • the nal-ref_idc field can be set to 1, 2, 3, etc. In the fault-tolerant elastic coding, different strength protection measures are taken according to the level of this field.
  • the communication terminal can also dynamically adjust the transmission period of the QoS advertisement based on the SEI message according to the current network state and the high-level application requirement.
  • the interval for writing RTCP information to the SEI domain (that is, the reporting period) is the same as the recommended RTCP transmission interval in RFC3550.
  • the possible reporting period may not be exactly the same as that specified in RFC 3550, but may be adjusted.
  • the reporting period is determined by the needs of the specific application. For example, an important use of reporting data is to dynamically estimate network performance: packet loss rate, latency, jitter, and more. If these data need to be detected frequently, the reporting period should be short, otherwise the reporting period can be long.
  • the SEI message can not only transmit the QoS report of the H.264 video, but also mix the QoS reports carrying the multiple media streams, and only need to add the corresponding receiving report blocks of the various media streams after the QoS report. For example, audio stream, etc., as long as the source of the SSRC specific report block content is added to the SR report.
  • communication The terminal may also select an existing RTCP transmission, or may simultaneously transmit one or both of the H.264 extended message and the RTCP.
  • the present invention provides a video transmission method for estimating the current communication status and adaptively adjusting the adaptive protection of the protection policy. Firstly, according to the performance impact of the protection method, different parameter configurations are given, and a multi-level protection strategy with different protection capabilities is set, which is selected for efficient and reliable protection under different communication conditions. Secondly, according to the communication statistics at the receiving end The network status and communication quality are sent back to the sender; finally, the sender adjusts according to the returned communication quality statistics to select the most appropriate protection policy level.
  • the key to the program is also the method of statistical communication quality and the channel for sending back statistical information.
  • the information of the packet loss rate and its location can be counted by using the sequence number loss of the H.264 NALU, and the extended SEI message structure of the payload part of the NALU is defined to carry the statistical information, and the statistical data is transmitted from the receiving end to the transmitting end.
  • the feedback mechanism is different from the SR/RR format of the QoS report, those skilled in the art can understand that the fundamental principles of the two methods are the same, but the content carried by the SEI is different, so the following description does not. then the QoS reporting scheme proposed specifically SEI bearer network packet loss ratio of the area of the other embodiment 1 J.
  • Tornado erasure codes need to set parameters such as '. number of data nodes, number of check nodes, scaling ratio, number of check node layers, and two levels of graphs used to calculate check nodes.
  • the transmitting end divides the video stream data into data nodes, and then generates a check node according to the Tornado encoding method, and sends it to the receiving end together; the receiving end performs error correction according to the Tornado decoding method to obtain video stream data.
  • this embodiment pre-sets a protection strategy series with different levels of protection strength. Used separately for different communication quality levels Protect video stream data. It can be seen that different levels of protection policies can adapt to changes in network communication quality, not only can meet the protection requirements of channel degradation, but also can appropriately reduce the protection strength in the case of signal improvement, thereby reducing system overhead and saving processing and bandwidth resources. .
  • Tornado erasure codes In order to give different levels of protection strategy, it is necessary to set Tornado erasure codes with different parameters. According to the foregoing parameters affecting the protection performance of Tornado erasure code, there are mainly the number of data nodes, the number of check nodes and the random distribution of the node degrees on both sides of the bipartite graph. For the sake of the single, the Tornado codes of different abilities are generally not unified. In the bipartite graph, the Tornado erasure code protection strategy with different protection strengths is given by using different number of data nodes and number of check nodes. According to the Tornado erasure code principle, the number of different data nodes and the number of check nodes can determine the Tornado erasure codes of different code rates or redundancy rates, thus giving different protection strengths and system overhead.
  • the receiving end receives the data and performs Tornado erasure code decoding to obtain the video stream data, and performs statistics according to the data loss situation, and obtains statistical information to represent the communication quality.
  • the sender needs to adjust the protection policy according to the communication quality. Therefore, the transmission needs to be counted.
  • the receiver collects the transmission according to the sequence number of the NALU of the H.264 video process data.
  • each terminal of the communication system has both an encoder and a decoder.
  • the NALU is sequence numbered, that is, the NALU sent by all the senders has a uniform sequence number. Therefore, the receiver can determine whether there is a NALU loss according to the sequence number of the received NALU. If the NALU sequence number is discontinuous, it indicates that there is a NALU loss.
  • the interrupted NALU sequence number is the sequence number of the lost NALU, and the number is the number of lost NALUs.
  • the receiving end can also send the packet loss information directly to the sending end, and the sending end performs statistics. Using the NALU sequence number for statistics not only ensures that the statistics are accurate, but also directly uses the existing data information without additional bearer overhead.
  • the receiving end sends the statistical information and other data loss information back to the sending end through the extended SEI message. After collecting statistics on the transmission status at the receiving end, it needs to be sent back to the sending.
  • the extended SEI message structure is specifically configured to carry the transmission status statistics sent back from the receiving end. After completing the statistics, the receiving end writes the information into the specifically defined extended SEI message body, and then writes it into the SEI field of the encoded code stream sent back by the terminal, and sends it back to the transmitting end. After receiving the SEI message, the sender can directly learn the statistics or obtain the ALSR, so as to establish a true perception mechanism of the packet loss rate of the network.
  • the SEI message is also carried by the basic unit NALU of the H.264 code stream.
  • Each SEI field contains one or more SEI messages, and the SEI message is composed of SEI header information and SEI payload.
  • the SEI header information includes two codewords: payload type and payload size.
  • the length of the payload type is not necessarily the same. For example, when the type is between 0 and 255, it is represented by one byte. When the type is between 256 and 511, it is represented by two bytes OxFFOO to OxFFFE, and so on. Define any number of load types. In the existing H.264 standard, the type 0 to type 18 standards have been defined as specific information such as buffer period, image timing, and the like. It can be seen that the SEI domain defined in H.264 can store enough user-defined information according to requirements.
  • an extended SEI message for carrying statistical information is defined in the reserved SEI payload type.
  • the sender adjusts the Tornado erasure code according to the statistics sent back, and uses a protection strategy that is more suitable for the current transmission situation. Finally, the sender will adjust the protection policy according to the statistical information, that is, select the appropriate level of protection strategy.
  • the transmitting end also presets a judgment threshold series corresponding to different protection levels, sets a threshold for entering each level, and then selects its corresponding level according to the threshold at which the ALSR falls.
  • Different protection strategy series are used for data of different importance. Considering the different protection requirements for critical and non-critical data, in order to further improve the fitness, two different protection strategy series were set up to protect critical and non-critical data. In this way, the data of two different communication requirements can be processed independently, and the protection strategy is selected according to the protection strength suitable for each requirement, thereby improving system efficiency.
  • n the number of data nodes
  • 1 the number of check nodes.
  • the Tornado code protection scheme determined by the parameters n, 1 is represented by TN(n+l,n). So corresponding to the key
  • the data protection scheme series is: TN K (n. +l Q , n.), ,
  • TN K (n Q +lo, n Q ) is used to protect the key data
  • TN NK (n Q +l., n Q ) is used to protect the non-critical data
  • G L-1 ⁇ A1SR ⁇ 1 , TN ⁇ nw+lw, ! ⁇ ) is used to protect key data
  • TN I is used to protect non-critical data.
  • the sender resends the information according to the lost data information sent back by the receiver.
  • the receiving end counts the lost NALU information, it obtains the positioning information of the image frame corresponding to the lost NALU, and the information includes the sequence number of the frame and the position in the frame.
  • the receiving end sends the positioning information back to the sending end, and the sending end can locate the corresponding video stream data and resend it.
  • video stream data with too long delay has lost value, but in some business situations or under certain mechanisms, data with a certain delay still has value, such as a large buffer range.
  • the data can be used to avoid interruption of the video stream playback. It can be seen that the retransmission mechanism has important value for improving the reliability and quality of service of video stream communication.
  • the basic idea of the scheme is to find the missing data information, such as the location of the slice, by using the statistics of the NALU serial number at the receiving end.
  • an efficient algorithm is used to simply replace the lost data to cover the error loss, and on the other hand, it will be wrong.
  • the code information is fed back to the sender.
  • the extended SEI message of H.264 establishes a bit error information feedback channel from the receiving end to the transmitting end. After the sender knows the error information, it immediately adopts the strategy of intra-frame coding successively, and segments the error slice to prevent the error from spreading.
  • the transmitting end encodes the video stream data to be encoded, obtains a video stream, and then encapsulates the NALU and transmits the packet to the receiving end through the packet message.
  • the receiving end receives the message and decodes it. At this time, the receiving end needs to determine whether the video stream data is lost, so as to perform subsequent error elimination operations.
  • the error elimination process is roughly divided into three major steps: masking, feedback, and diffusion elimination.
  • the receiving end judges whether data is lost according to the NALU sequence interruption condition, and counts the information of the lost data, that is, the error information.
  • NALU is the basic unit of H.264 video stream data transfer, and each NALU has a unique serial number. Therefore, the receiving end knows which NALUs are lost according to whether the NALU sequence number is interrupted. It is thus possible to implement an error concealment strategy for lost data.
  • the NALU serial number is used for statistics, which not only ensures the accuracy of the statistical information, but also directly uses the existing data information, and does not require additional bearer overhead.
  • the receiving end learns the sequence number by identifying the received NALU header information, and the discontinuous detection error occurs by the sequence number.
  • the previous NALU knows the video data that the missing NALU should carry, and locates the data loss caused by the error code. For example, if the previous NALU of the lost NALU bears the first slice of the Nth frame, the position of the slice carried by the lost NALU may be inferred in the order of transmission, which should be the latter slice of the current frame.
  • the receiving end needs to re-synchronize the video information. Because the H.264 video code stream is continuously transmitted, the receiving end and the data stream need to be synchronized, and then can be correctly received. Once the data stream is interrupted, the receiving end needs to re-synchronize. The resynchronization of the decoder is accomplished by finding the next NALU header information after the interruption. This process, the receiver also needs After that, the receiving end needs to perform error concealment, and the lost NALU is discarded. Therefore, the entire slice carried by the NALU is lost.
  • the error concealment strategy is to replace the lost data with data adjacent to the time domain or the spatial domain. For example, the slice recovery image data corresponding to the position of the previous frame of the frame in which the data is lost is masked.
  • the receiving end After receiving the error information, the receiving end feeds it back to the transmitting end.
  • the feedback error information needs a feedback channel.
  • the first embodiment of the present invention uses an existing H.264 communication mechanism to define an extended SEI message for carrying the error information to establish feedback. So that the sender combines the error information to prevent the error from spreading. In fact, combined with the error information feedback mechanism and the error diffusion elimination strategy at the transmitting end, the error spread caused by the error concealment strategy implemented by the previous receiving end can be avoided.
  • the extended SEI message of the H.264 is used to provide an information feedback mechanism from the receiving end to the transmitting end, so that the sending end can know which NALUs are lost in time, so that effective error spreading can be eliminated in time. Prevent future error spread due to these lost data.
  • the advantage of establishing an information feedback mechanism within the H.264 system is to save network bandwidth overhead, save system processing resources, and not affect interoperability.
  • the SEI message is also carried by the basic unit NALU of the H.264 code stream.
  • Each SEI field contains one or more SEI messages, and the SEI message is composed of the SEI header information and the SEI payload.
  • the SEI header information includes two codewords: payload type and payload size.
  • the length of the payload type is not necessarily the same. For example, the type is represented by one byte between 0 and 255.
  • the type When the type is between 256 and 511, it is represented by two bytes OxFFOO to OxFFFE, and so on, so that the user can customize Any of a variety of load types.
  • the type 0 to type 18 standards have been defined as specific information such as buffer period, image timing, and the like. It can be seen that the SEI domain defined in H.264 can store enough user-defined information according to requirements.
  • the transmitting end starts to perform error diffusion elimination according to the error information of the feedback.
  • the error diffusion elimination method of joint error information is better than the existing error-free diffusion elimination without feedback.
  • the sender can purposely take precautions against the lost slice, such as avoiding losing the slice in later encoding. For the reference frame, this can minimize the dependence of the receiver on the slice when decoding.
  • the error diffusion is also limited to the same slice. internal.
  • a strategy of intra-frame coding is performed in stages, that is, after the error is transmitted, the slice region of the subsequent frame is segmented into new slices, for example, P macroblocks are divided. A new slice is then intra-coded to eliminate the reference or dependency of the slice on the previously lost slice.
  • the H.264 video real-time transmission system uses a data rate control scheme to limit the fluctuation of each frame of data, so that the amount of data per frame is equalized, and the stability of video transmission is improved. Therefore, the amount of data that is intra-coded once in each frame, that is, the number of macroblocks, cannot be too much, otherwise it will exceed the H.264 data rate control range.
  • Figure 9 shows the principle of error spread elimination for segmented successive intra coding.
  • the error information is detected and fed back to the transmitting end, that is, the frame where the slice of the lost data and the intra-frame positioning information are sent back to the transmitting end through the extended SEI message.
  • the sender extracts the missing slice location information from the SEI message. For example, each frame in FIG. 9 is divided into three slices, namely, Slice#0, Slice#1, Slice#2, and the slice #1 of the nth frame is in the transmission. Lost, then segmented successive intraframe coding is required.
  • the encoding end divides P macroblocks into a new Slice#3 from the starting position in the macroblock scanning order, and the remaining macroblocks are still Slice#l, and there are four Slice, where the new Slice#3 is intra-coded.
  • Slice #3 which is divided into new components in the previous step, is intra-coded and then transmitted as Slice #3, and the other slices are still encoded as usual.
  • the number of macroblocks P divided each time should satisfy the following conditions, as large as possible, to avoid the number of divisions, reduce the processing delay, and shorten the range of influence, but it is necessary to satisfy the aforementioned H.264 data rate control range.
  • the number of macroblocks divided each time can be different, but the number of macroblocks divided last time will cause all macroblocks in the lost slice to be processed.
  • one frame of video stream data is composed of 240 macroblocks, and each 80 macroblocks are initially divided into one slice, that is, 1 - 80 macroblocks are Slice # 0, 81 - 160 macroblocks are Slice # 1 , 161 - 240 The macro block is Slice # 2.
  • the appropriate segmentation value P is determined to be 12 macroblock segments.
  • the first 12 macroblocks in the n+1th frame are intra-coded to form Slice #3.
  • Slice #3 can use conventional predictive coding, and the next 12 macroblocks are intra-coded to form Slice #4, and the last remaining until the n+7th frame is 8
  • the macroblock is intra-coded to form Slice #9, and the error spreading method flow of the segment-by-frame intra-frame coding is completed.
  • the seventh embodiment of the present invention is: setting an erasure code having only one layer of the check node, and performing data transfer protection based on the erasure code.
  • the Tornado erasure code scheme has only one check node layer, and the intermediate check node layer of the Tornado code is removed. Similarly, the inherent requirement of the last layer check node generated by the Reed-Solomon code in the Tornado code is removed. Thus, the erasure code of the present invention has only one layer of data node layer and one layer of check node as shown in FIG. 10. It can be said that the erasure code of the present invention is a structured tubular Tornado code, which is a An improved Tornado code.
  • the data node size L1 of the improved Tornado code of the present invention, the number n of data nodes in the data node layer, and the number L of check nodes in the check node layer can be determined according to actual needs.
  • the data node size L1 in the data node layer and the number of data nodes included in the data node layer are determined according to factors such as data transmission rate, data type such as audio data/video data, data protection capability requirements, maximum network delay that can be received, and the like. Check the number of check nodes L included in the node layer.
  • the proportional scaling factor of the number of nodes between adjacent two layers is ⁇ the last layer and the equal scaling factor of the number of nodes between the mth layers is the total number of nodes Total N of the Tornado code in the prior art. De is:
  • the improved Tornado code of the present invention has the condition that the number of hidden integer nodes is no longer required for the improved Tornado code because there is no intermediate check node layer.
  • the number of check nodes of the check node layer of the improved Tornado code of the present invention L For: L ⁇ n, the equal-ratio scaling factor of the number of nodes of the data node layer and the check node layer of the improved Tornado code of the present invention can be arbitrarily set, given the number n of data nodes, L Can be flexibly set.
  • the present invention integrates the above six enhancement technologies, modularizes the entire H.264/ERRTP transmission architecture, and combines them on a protocol stack, not only achieving their respective advantages, but also mutual Enhanced to reflect better reliability and quality of service.

Abstract

A multimedia communication method and the terminal thereof improve the transmission dependability and the communication quality. Based on the existing real-time transmission protocol RTP, the error resilience real-time transmission protocol ERRTP provides the transport layer encapsulation format carrying the error resilience encoding scheme related information, which makes the multimedia data mark the corresponding error resilience encoding scheme information while the multimedia data is transmitted over the ERRTP, thereby integrates the error resilience mechanism into the transport layer; for the private ERRTP encapsulation method and the adapted scheme for the protocol header information given by H.264 network abstract layer unit NALU architecture, the all NALU header information byte in the same ERRTP packet can be integrated into its header information, thereby the NALU important information is embodied in the ERRTP header information and the transmission efficiency is improved; the extended message mechanism of the H.264 itself is used to bear the QoS report information, a “in band” QoS report mechanism of the high layer protocol is implemented, and the additional overhead for using the channel is avoided.

Description

多媒体通信方法及其终端  Multimedia communication method and terminal thereof
本申请要求于 2005 年 11 月 03 日提交中国专利局、 申请号为 200510110013.5 ,发明名称为 "多媒体通信方法及其终端"的中国专利申 请的优先权。 技术领域  This application claims the priority of the Chinese Patent Application entitled "Multimedia Communication Method and Its Terminal" submitted to the China Patent Office on November 03, 2005, application number 200510110013.5. Technical field
本发明涉及多媒体通信技术领域,特别涉及支持容错弹性的多媒体 通信技术, 具体地说, 涉及多媒体通信方法及其终端。 背景技术  The present invention relates to the field of multimedia communication technologies, and in particular, to a multimedia communication technology supporting fault tolerance and elasticity, and in particular to a multimedia communication method and a terminal thereof. Background technique
随着计算机互联网(Internet )和移动通信网络的飞速发展, 流媒体 技术的应用越来越广泛, 从网上广播、 电影播放到远程教学以及在线的 新闻网站等都用到了流媒体技术。 当前网上传送视频、 音频主要有下载 ( Download )和;^式传送 ( Streaming ) 两种方式。 流式传送是连续传 送视 /音频信号, 当流媒体在客户机播放时其余部分在后台继续下载。 流式传送有顺序流式传送(Progressive Streaming)和实时流式传送 (Realtime Streaming)两种方式。 实时流式传送是实时传送, 特别适合现 场事件, 实时流式传送必须匹配连接带宽, 这意味着图像质量会因网络 速度降低而变差, 以减少对传送带宽的需求。 "实时" 的概念是指在一 尤其是随着第三代移动通信系统(3G, 3rd Generation )的出现和普 遍基于网际协议 ( IP Internet Protocol ) 的网络迅速发展, 视频通信正逐 步成为通信的主要业务之一。而双方或多方视频通信业务,如可视电话、 视频会议、 移动终端多媒体服务等, 更对多媒体数据流的传送及服务质 量提出苛刻的要求。 不仅要求网絡传送实时性更好, 而且等效的也要求 视频数据压缩编码效率更高。  With the rapid development of computer Internet (Internet) and mobile communication networks, streaming media technology is becoming more and more widely used, from streaming media, movie playback to distance learning and online news sites. Currently, there are two ways to transmit video and audio on the Internet, including Download (Download) and ^ Streaming (Streaming). Streaming is the continuous transmission of video/audio signals, and the rest continues to download in the background while streaming media is playing. Streaming has two methods: progressive streaming (Regressive Streaming) and real-time streaming (Realtime Streaming). Real-time streaming is a real-time delivery, especially for live events. Real-time streaming must match the connection bandwidth, which means that image quality will degrade due to reduced network speed to reduce the need for transmission bandwidth. The concept of "real-time" refers to the rapid development of networks, especially with the emergence of third-generation mobile communication systems (3G, 3rd Generation) and Internet-based protocols (IP Internet Protocol). Video communication is gradually becoming the main communication. One of the business. Two-way or multi-party video communication services, such as video telephony, video conferencing, and mobile terminal multimedia services, impose strict requirements on the transmission of multimedia data streams and the quality of services. Not only does network transmission require better real-time performance, but equivalently requires video data compression coding to be more efficient.
鉴于媒体通信的需求现状, 国际电信联盟标准部(ITU-T, )继制定 了 H.261、H.263、1-1.263+等视频压缩标准后,于 2003年正式发布了 H.264 标准。这是 ITU- T和国际标准化组织 ( ISO, International Standardization Organization )的运动图像专家组 ( MPEG, Moving Picture Experts Group ) 一起联合制定的适应新阶段网络媒体传送及通信需求的高效压缩编码 标准。 它同时也是 MPEG-4标准第 10部分的主要内容。 In view of the current demand for media communication, the International Telecommunication Union Standards Department (ITU-T) officially released the H.264 standard in 2003 following the development of video compression standards such as H.261, H.263, and 1-1.263+. This is the Moving Picture Experts Group of MPEG (International Standardization Organization). Efficient compression coding standards that are jointly developed to accommodate the new phase of network media delivery and communication needs. It is also the main content of Part 10 of the MPEG-4 standard.
制定 H.264标准的目的在于更加有效地提高视频编码效率和它对网 络的适配性。 H.264视频压缩编码标准很快就已经逐渐成为当前多媒体 通信中的主流标准。 大量的采用 H.264多媒体实时通信产品(如会议电 视, 可视电话, 3G移动通信终端)和网络流媒体产品先后问世, 随着 H.264的正式颁布和广泛使用, 基于 IP网络和 3G、 后 3G无线网络的 多媒体通信必然进入一个飞跃发展的新阶段。 The purpose of the H.264 standard is to improve video coding efficiency and its adaptability to the network more effectively. The H.264 4 video compression coding standard has quickly become the mainstream standard in multimedia communication. A large number of H.264 multimedia real-time communication products (such as conference TV, videophone, 3G mobile communication terminal) and network streaming media products have been released, with the official promulgation and widespread use of H.264, based on IP networks and 3G, The multimedia communication of the post-3G wireless network will inevitably enter a new stage of rapid development.
下面简单介绍 H.264标准的消息构成及发送机制: H.264标准采用 分层模式, 定义了视频编码层 (VCL, Video Coding Layer)和网络抽象层 The following is a brief introduction to the message composition and transmission mechanism of the H.264 standard: The H.264 standard uses a layered mode to define the video coding layer (VCL, Video Coding Layer) and the network abstraction layer.
( NAL, Network Abstraction Layer ), 后者专为网络传送设计, 能适应 不同网络中的视频传送, 进一步提高网络的"亲和性,,。 H.264引入了面 向 IP 包的编码机制, 有利于网絡中的分组传送, 支持网络中视频的流 媒体传送; 具有较强的抗误码特性, 特别适应丢包率高、 干扰严重的无 线视频传送的要求。 H.264的所有待传送数据, 包括图像数据及其他消 息均封装为统一格式的包传送,即网络抽象层单元(NALU, NAL Unit )。 每个 NALU是一个一定语法元素的可变长字节字符串, 包括包含一个 字节的头信息, 可用来表示数据类型, 以及若干整数字节的负荷数据。 一个 NAL单元可以携带一个编码片、 各自类型数据分割或一个序列或 图像参数集。 为了加强数据可靠性, 每帧图像都被分为若干个条带(NAL, Network Abstraction Layer), the latter is designed for network transmission, can adapt to video transmission in different networks, and further improve the "affinity of the network." H.264 introduces an encoding mechanism for IP packets, which is beneficial to Packet transmission in the network, supporting streaming media transmission of video in the network; having strong anti-error characteristics, especially suitable for wireless video transmission with high packet loss rate and serious interference. All data to be transmitted of H.264, including Image data and other messages are encapsulated into a uniform format packet transmission, ie, a network abstraction layer unit (NALU, NAL Unit). Each NALU is a variable long byte string of a certain syntax element, including a header containing one byte. Information, which can be used to represent data types, and load data of several integer bytes. A NAL unit can carry a code slice, a data segment of its own type, or a sequence or image parameter set. To enhance data reliability, each frame of image is divided. For several strips
( Slice ), 每个 Slice由一个 NALU承载, Slice又是由若干个更小的宏 块组成, 即为最小的处理单元。 一般的说, 前后帧对应位置的 Slice相 互关联, 不同位置的 Slice相互独立, 这样可以避免 Slice之间发生误码 相互扩散。 (Slice), each slice is carried by a NALU, and the Slice is composed of several smaller macroblocks, which is the smallest processing unit. Generally speaking, the slices of the corresponding positions of the preceding and succeeding frames are related to each other, and the Slices at different positions are independent of each other, so that bit error interdiffusion between slices can be avoided.
H.264数据包含非参考帧的紋理数据、 序列参数、 图像参数、 补充 增强消息( SEI, Supplemental Enhancement Information )、 参考帧紋理数 据等。 其中, SEI消息是在 H.264视频的解码、 显示及其它方面起辅助 作用的消息的统称。 现有技术定义了各类 SEI消息, 同时保留了 SEI预 留消息, 为未来的各种可能应用留下了扩展余地。 根据 H.264,SEI消息 并非在解码过程重构亮度和色度图像所必需的。符合 H.264标准的解码 器, 不需要对于 SEI作任何处理的。 也就是说, 不是所有符合 H.264基 本要求的终端都能够处理 SEI消息的,但是对于不能处理 SEI消息的终 端, 发送 SEI对于它是没有影响的, 它会筒单地忽略掉它不能处理的 SEI消息。按照 SEI语法规则,用户可以利用预留消息传送自定义消息, 实现功能扩展。 The H.264 data includes texture data of non-reference frames, sequence parameters, image parameters, Supplemental Enhancement Information (SEI), reference frame texture data, and the like. The SEI message is a general term for messages that assist in the decoding, display, and other aspects of H.264 video. The prior art defines various types of SEI messages while preserving the SEI reservation messages, leaving room for expansion for future possible applications. According to H.264, SEI message Not required to reconstruct luminance and chrominance images during the decoding process. A decoder conforming to the H.264 standard does not require any processing for the SEI. That is to say, not all terminals that meet the basic requirements of H.264 can process SEI messages, but for terminals that cannot handle SEI messages, sending SEI has no effect on it, it will ignore the things it can't handle. SEI message. According to the SEI grammar rules, users can use the reserved message to transmit custom messages to achieve functional extension.
下面首先介绍 SEI消息。 H.264中提供了多种可以进行消息扩展的 机制, 其中包括 SEI。 H.264中定义了补充增强信息 (SEI ), 它的数据 表示区域与视频编码数据独立, 它的使用方法在 H.264协议中 NAL的 描述中给出。 H.264码流的基本单位是 NALU, NALU可以承载各种 H.264数据类型, 比如视频序列参数 (Sequence parameters), 图像参数 (Picture parameters), Slice数据 (即具体图像数据), 以及 SEI消息数据。 SEI用于传递各种消息, 支持消息扩展。 因此 SEI域内用于传送为特定 目的而自定义的消息, 而不会影响基于 H.264视频通信系统的兼容性。 承载 SEI消息的 NALU叫做 SEI NALU。一个 SEI NALU含有一个或多 个 SEI消息。每个 SEI消息含有一些变量,主要是载荷类型( payloadType ) 和载荷大小 (payloadSize ), 这些变量指明了消息载荷的类型和大小。 在 H.264 Annex D.8, D.9中定义了一些常用的 H.264 SEI消息的文法和 语意。 NALU 中包含的载荷叫做原始字节序列载荷 (RBSP, Raw-Byte Sequence Payload) , SEI是 RBSP的一种类型。  The SEI message is first introduced below. H.264 provides a variety of mechanisms for message extension, including SEI. Supplemental Enhancement Information (SEI) is defined in H.264, and its data representation area is independent of video coding data. Its usage is given in the description of NAL in H.264 protocol. The basic unit of H.264 code stream is NALU. NALU can carry various H.264 data types, such as video sequence parameters, picture parameters, slice data (ie specific image data), and SEI messages. data. SEI is used to deliver various messages and support message extension. Therefore, the SEI domain is used to transmit messages customized for a specific purpose without affecting the compatibility based on the H.264 video communication system. The NALU carrying the SEI message is called SEI NALU. An SEI NALU contains one or more SEI messages. Each SEI message contains variables, mainly payload type (payloadType) and payload size (payloadSize), which indicate the type and size of the message payload. The grammar and semantics of some commonly used H.264 SEI messages are defined in H.264 Annex D.8, D.9. The payload contained in NALU is called Raw-Byte Sequence Payload (RBSP), and SEI is a type of RBSP.
SEI的数据表示区域筒称为 SEI域。每个 SEI域包含一个或多个 SEI 消息, 而 SEI消息又由 SEI头信息和 SEI有效载荷组成。 SEI头信息包 括两个字段:一个标识 SEI消息中载荷的类型,另一个表示载荷的大小。 用户可以自定义任意多种载荷类型。对于不支持解析这些用户自定义信 息的 H.264解码器, 会自动丢弃 SEI域中的数据。 因此, 在 SEI域内记 入有用的自定义信息不会影响基于 H.264视频通信系统的兼容性。 ; 如前所述, 多媒体通信不仅要求媒体压缩编码效率高, 而且要求网 络传送的实时性。 目前多媒体流传送基本上采用实时传送协议( RTP, Real-time Transport Protocol )及其控制协议 ( RTCP, Real-time Transport Control Protocol )。RTP是针对 Internet上多媒体数据流的一个传送协议, 由互联网工程任务组( IETF , Internet Engineering Task Force )发布。 RTP 被定义为在一对一或一对多的传送情况下工作,其目的是提供时间信息 和实现流同步。 RTP 的典型应用建立在用户数据包协议(UDP, User Datagram Protocol )上,但也可以在传送控制协议( TCP, Transport Control Protocol )或异步传送模式(ATM, Asynchronous Transfer Mode )等其 他协议之上工作。 The data of the SEI indicates that the zone cylinder is called the SEI domain. Each SEI field contains one or more SEI messages, which in turn consist of SEI header information and SEI payload. The SEI header information includes two fields: one identifies the type of payload in the SEI message and the other indicates the size of the payload. Users can customize any of a variety of load types. For H.264 decoders that do not support parsing these user-defined information, the data in the SEI field is automatically discarded. Therefore, the inclusion of useful custom information within the SEI domain does not affect compatibility based on H.264 video communication systems. ; As described above, multimedia communication only requires medium compression coding efficiency, and requires real-time transmission network. At present, multimedia streaming basically adopts Real-time Transport Protocol (RTP) and its control protocol (RTCP, Real-time Transport). Control Protocol ). RTP is a transport protocol for multimedia data streams over the Internet and is published by the Internet Engineering Task Force (IETF). RTP is defined to work in a one-to-one or one-to-many transmission with the goal of providing time information and stream synchronization. The typical application of RTP is based on the User Datagram Protocol (UDP), but it can also work on other protocols such as TCP (Transport Control Protocol) or Asynchronous Transfer Mode (ATM). .
RTP本身只保证实时数据的传送, 并不能提供可靠的传送机制、 流 量控制或拥塞控制, 而是依靠 RTCP提供这些服务。 RTCP负责管理传 送质量在当前应用进程之间交换控制信息。 在 RTP会话期间, 各参与 者周期性地传送 RTCP包, 包中含有已发送的数据包的数量、 丟失的数 据包的数量等统计资料, 因此, 服务器可以利用这些信息动态地改变传 送速率, 甚至改变有效载荷类型。 RTP和 RTCP配合使用, 能以有效的 反馈和最小的开销使传送效率最佳化, 故适合传送网上的实时数据。  RTP itself only guarantees the transmission of real-time data, and does not provide a reliable transmission mechanism, flow control or congestion control, but relies on RTCP to provide these services. RTCP is responsible for managing the transmission quality to exchange control information between current application processes. During the RTP session, each participant periodically transmits RTCP packets, which contain statistics such as the number of transmitted packets and the number of lost packets. Therefore, the server can use this information to dynamically change the transmission rate, even Change the payload type. RTP and RTCP work together to optimize transmission efficiency with efficient feedback and minimal overhead, making it suitable for real-time data transmission over the network.
而 H.264多媒体数据在 IP网络上传送,是基于 UDP和其上层的 RTP 协议。 RTP本身在结构上对于不同的媒体数据类型都能够适用, 但是在 多媒体通信中不同的高层协议或媒体压缩编码标准 (如 H.261 , H.263 , MPEG-l/-2/-4, MP3等), IETF都会制定针对该协议的 RTP净荷 (Payload) 打包方法的规范文件, 详细规定 RTP封装大包的方法, 对于该具体协 议是经过优化的。 同样的, 对于 H.264也存在对应的 IETF标准是 RFC 3984: RTP Payload Format for H.264 Video 0该标准目前是 H.264视频码 流在 IP 网络上传送的主要标准, 应用很广泛。 在视频通信领域, 各主 流厂商的产品都是基于 RFC 3984的, 也是目前仅有的 H.264/RTP传送 方式。 The H.264 multimedia data is transmitted over the IP network and is based on UDP and its upper layer RTP protocol. RTP itself is structurally applicable to different media data types, but different high-level protocols or media compression coding standards in multimedia communication (eg H.261, H.263, MPEG-1/-2/-4, MP3) Etc.), the IETF will develop a specification file for the RTP payload (Package) packaging method for the protocol, detailing the method of RTP encapsulation of large packets, which is optimized for this specific protocol. Similarly, the corresponding IETF standard for H.264 is RFC 3984: RTP Payload Format for H.264 Video 0 This standard is currently the main standard for H.264 video stream transmission over IP networks, and is widely used. In the field of video communication, the products of major manufacturers are based on RFC 3984, and it is currently the only H.264/RTP transmission method.
事实上, H.264和以往其它的视频压缩编码协议不同的关键地方在 于 H.264定义了一个新的层面,称为网络抽象层( AL, Network Abstract Layer ), 该层是一种使得可以标准的接口开放底层业务能力, 并屏蔽底 层网络的差异性而抽象的业务能力层。 H.264 为了增加其视频编码层 (VCL, Video Coding Layer)和下面具体的网絡传送协议层的分离和无关 性, 带来更大的应用灵活性,定义了 NAL这个新的层面,该层在 ITU-T 早期的视频压缩编码协议比如 H.261, H.263/H.263+/H.263++中都是没 有的。 然而, 如何在 NAL和 RTP协议承载协同工作中针对 H.264的优 点设计效率更高、 更好的方案, 使得 RTP对于 H.264的承载性能更好, 具有实用性, 值得研究。 In fact, the key difference between H.264 and other video compression coding protocols is that H.264 defines a new layer, called Network Abstract Layer (AL), which is a standard that makes it standard. The interface opens up the underlying business capabilities and shields the underlying network from the differences and abstracts the business capability layer. H.264 is designed to increase the separation of its video coding layer (VCL, Video Coding Layer) and the following specific network transport protocol layer. Sexuality, bringing greater application flexibility, defines a new layer of NAL, which is an early ITU-T video compression coding protocol such as H.261, H.263/H.263+/H.263++ There is no such thing. However, how to design a more efficient and better solution for the advantages of H.264 in the NAL and RTP protocol bearer cooperation makes RTP better for H.264, practical, and worthy of study.
RFC3984规范所提出的 RTP承载 H.264的 NAL层数据的方法是目 前主流传送方法, 该方案在 RTP协议(RFC 3550 ) 的基础上, 将 NAL 层数据封装在 RTP净荷中进行承载。 NAL层位于 VCL和 RTP之间, 规定要把视频码流按照定义的规则和结构, 分割成一连串的 NAL数据 单元( NALU, NAL Units )0在 RFC3984中定义了 RTP净荷对于 NALU 的封装格式。 下面依次简单介绍 RTP的帧格式和现有技术中 NALU的 封装方法。 The method of RTP carrying the NAL layer data of H.264 proposed by the RFC3984 specification is the current mainstream transmission method. The scheme encapsulates the NAL layer data in the RTP payload for carrying on the basis of the RTP protocol (RFC 3550). The NAL layer is located between the VCL and the RTP, and specifies that the video bitstream is divided into a series of NAL data units (NALU, NAL Units) according to defined rules and structures. 0 The RTP payload format for the NALU is defined in RFC3984. The following is a brief introduction to the frame format of the RTP and the encapsulation method of the NALU in the prior art.
RTP通常被承载于 UDP协议之上, 以利用其多路复用和校验的功 能。 如果底层提供多点分发, RTP支持多地址传送。 RTP提供的功能包 括: 载荷类型鉴别、 序列编号、 时间戳、 和发送监测。  RTP is typically carried over the UDP protocol to take advantage of its multiplexing and verification capabilities. If the underlying provides multipoint distribution, RTP supports multiple address transfers. Features provided by RTP include: payload type identification, sequence numbering, timestamp, and send monitoring.
RTP的包格式如下: RTP头信息基本选项占用 12字节 (最小情况), 而 IP协议和 UDP协议的头信息分别占用 20字节和 8字节, 因此 RTP 包封装在 UDP 包再封装在 IP 包中, 总的头信息占用字节数是 12+8+20=40字节。 RTP包的头信息的详细结构如图 1所示。  The RTP packet format is as follows: RTP header information basic option occupies 12 bytes (minimum case), while IP protocol and UDP protocol header information occupy 20 bytes and 8 bytes respectively, so RTP packets are encapsulated in UDP packets and then encapsulated in IP. In the package, the total number of bytes occupied by the header information is 12+8+20=40 bytes. The detailed structure of the header information of the RTP packet is shown in Figure 1.
图 1 中所示从前到后 RTP头信息依次为: 第 1字节为一些关于头 信息结构本身的字段, 第 2字节为定义净荷类型, 第 3、 4字节为包序 号 ( Sequence Number ), 第 5-8字节为时间戳( timestamp ), 第 9-12字 节为同步贡献源标识符 ( SSRC ID, Synchronous Source Identifier ), 最 后为贡南 源标识符 ( CSRC Ids , Contributing Source Identifiers )的列表 , 其数目不确定。  The front-to-back RTP header information shown in Figure 1 is: The first byte is the field about the header information structure itself, the second byte is the defined payload type, and the third and fourth bytes are the packet sequence number ( Sequence Number ), the 5th-8th byte is the timestamp (timestamp), the 9th-12th byte is the Synchronous Source Identifier (SSRC ID), and finally the Gonen Source Identifier (CSRC Ids, Contributing Source Identifiers) The list of ) is uncertain.
其中前 12个字节出现在所有不同类型的 RTP数据包中, 而头信息 中的其它数据, 比如贡献源标识符标识只有当混合器插入时才有。  The first 12 bytes appear in all different types of RTP packets, while other data in the header information, such as the contribution source identifier, is only available when the mixer is inserted.
上述各个字段的具体意义及全称分别描述如下:  The specific meanings and full names of the above fields are described as follows:
V字段为版本( Version )信息, 占 2比特 (bits) , 目前采用的版本为 2, 因此置 V=2, 而其他值如 V=l表示更早的 RTP版本, V=0表示最原 始的 RTP前身, 即在早期 Mbone网络上使用的语音 IP ( VOIP )通信系 统中采用, 后来演化成了 RTP, 而 V=3则尚未定义。 The V field is the version (Version) information, which is 2 bits. The current version is 2, thus setting V=2, while other values such as V=l indicate an earlier RTP version, and V=0 indicates the original RTP predecessor, which is used in voice IP (VOIP) communication systems used on early Mbone networks. It evolved into RTP, and V=3 has not yet been defined.
P字段为填充标识(Padding ), 占 1比特, P如果置位, 则表示数 据包末尾包含一个或多个填充字节 (Padding), 填充不属于有效载荷的一 部分;  The P field is a padding flag (Padding), which is 1 bit. If P is set, it indicates that the packet contains one or more padding bytes (Padding) at the end, and the padding does not belong to a part of the payload;
X字段为扩展标识比特(Extension ), 占 1 比特, 该头扩展的格式 定义在 RJFC3550第 5.3.1节中有详细描述。  The X field is an extension identification bit (Extension), which occupies 1 bit. The format of the header extension is described in detail in section 5.3.1 of RJFC3550.
CC字段为贡献源数目 (CSRC Count ), 占 4比特, 指明头信息最 后面的 CSRC 标识符的个数,接收方根据 CC字段可以确定头信息后面 的 CSRC IDs列表长度;  The CC field is the number of contributing sources (CSRC Count), which is 4 bits, indicating the number of CSRC identifiers at the end of the header information. The receiver can determine the length of the CSRC IDs list following the header information according to the CC field.
M字段为标识比特(Marker ), 占 1比特, 该标识比特的解译在特 定的层面 (Profile ) 中定义, 它允许标识出数据包流中的重要事件, 由 通信双方具体协定, 不受协议的限定;  The M field is a marker bit (Marker), which occupies 1 bit. The interpretation of the identifier bit is defined in a specific profile. It allows identification of important events in the packet stream, which are specifically agreed by the communicating parties and are not subject to agreement. Limited
PT字段为载荷类型(PT, Payload Type ), 共 7 比特, 标识 RTP载 荷的格式并确定他在应用程序中的解释; 也可以通过 RTP之外的信令 来进行动态协商定义 PT取值和媒体格式之间的关系。在一个 RTP会话 (Session)中 , RTP源是可以变更 PT的。  The PT field is a payload type (PT, Payload Type), which is 7 bits in total, identifies the format of the RTP payload and determines his interpretation in the application; it can also perform dynamic negotiation through signaling other than RTP to define the PT value and media. The relationship between the formats. In an RTP session, the RTP source can change the PT.
接着的字段就是序号共 16 比特, 接收者可以用它来检测数据包丢 失和恢复数据包顺序。  The next field is the sequence number of 16 bits, which the receiver can use to detect packet loss and recover the packet sequence.
时间戳占 32比特,它反映了 RTP数据包中第一个字节的采样时间, 接收方根据其调整媒体播放时间或者进行同步。  The time stamp occupies 32 bits, which reflects the sampling time of the first byte in the RTP packet, and the receiver adjusts the media playback time or synchronizes according to it.
同步源 SSRC ID占 32 比特, 其具体值可随机选择, 能唯一标识一 个媒体源, 如果一个源改变了源传送地址, 必须选择一个新的 SSRC标 志符  Synchronization source The SSRC ID is 32 bits. The specific value can be randomly selected to uniquely identify a media source. If a source changes the source transmission address, a new SSRC flag must be selected.
贡献源 CSRC列表,而被置为 SSRC或 CSRC。在多方通信中, CSRC ID是由混合器插入。  Contribute the source CSRC list and set it to SSRC or CSRC. In multiparty communication, the CSRC ID is inserted by the mixer.
在承载 H.264视频的情况下, RTP把 H.264的 NALU封装打包成 RTP 包流。 在 RPC 3984文件中主要定义了 NALU, 并且基于此给出 H.264层 NAL数据在 RTP中的封装打包格式。这种 NALU的 RTP封装 格式如图 2所示。 In the case of carrying H.264 video, RTP packages the NA.Package of H.264 into an RTP packet stream. The NALU is mainly defined in the RPC 3984 file and is given based on this. The encapsulation and packing format of H.264 layer NAL data in RTP. The RTP encapsulation format of this NALU is shown in Figure 2.
图 2中给出一个 NALU在 RTP的净荷中的封装结构, 包括 NALU 头信息、 NALU的数据内容, 多个 NALU首尾相接的填充到 RTP包的 净荷中。  Figure 2 shows the encapsulation structure of a NALU in the RTP payload, including NALU header information, NALU data content, and multiple NALUs that are filled end-to-end into the payload of the RTP packet.
NALU头信息即第 1个字节, 共有三个字段, 意义和全称分别描述 下:  The NALU header information is the first byte, and there are three fields. The meaning and full name are respectively described as follows:
F字段定义为禁止比特(forbidden— zero—比特), 占 1 比特, 用于标 识语法错等情况, 如果有语法冲突则置为 1 , 当网络识别此单元中存在 比特错误时, 可将其设为 1, 以便接收方丢掉该单元, 主要用于适应不 同种类的网络环境(比如有线无线相结合的环境);  The F field is defined as a forbidden bit (forbidden_zero-bit), which is 1 bit, used to identify grammatical errors, etc., and is set to 1 if there is a syntax conflict. When the network recognizes that there is a bit error in this unit, it can be set. Is 1, for the receiver to drop the unit, mainly used to adapt to different kinds of network environments (such as wired and wireless combined environment);
NRI字段定义为 NAL参考标识(nal— ref—idc ), 占 2 比特, 用于指 示 NALU数据的重要程度,其值为 00表示 NALU的内容不用于重建帧 间预测的参考图像, 而非 00 则表示当前 NALU是属于参考帧的条带 ( slice )或序列参数集( SPS, Sequence Parameter Set )、图像参数集( PPS, Picture Parameter Set )等重要数据, 该值越大表示当前 NAL越重要; The NRI field is defined as a NAL reference identifier (nal_ref_idc), which is 2 bits, and is used to indicate the degree of importance of the NALU data. A value of 00 indicates that the content of the NALU is not used to reconstruct the reference image for inter prediction, instead of 00. Indicates that the current NALU is important data such as a slice or a sequence parameter set (SPS) and a picture parameter set (PPS, Picture Parameter Set) belonging to a reference frame. The larger the value, the more important the current NAL is;
Type字段定义为 NALU类型(Nal_unit_Jype ), 共 5 比特, 可以有 32种 NALU的类型, 其值和具体类型的对应关系在表 1中详细给出。 The Type field is defined as NALU type (Nal_unit_Jype), which is 5 bits in total. There can be 32 types of NALU. The correspondence between the value and the specific type is given in Table 1.
表 1 NALU头信息中 Type字段取值与类型对应关系表  Table 1 Relationship between Type and Type of Type Fields in NALU Header Information
Type值 NALU内容的类型  Type value Type of NALU content
0 未指定  0 not specified
1 非 IDR图像的编码 slice  1 encoding of non-IDR images
2 编码 slice数据划分 A  2 encoding slice data division A
3 编码 slice数据划分 B  3 encoding slice data division B
4 编码 slice数据划分 C  4 encoding slice data division C
5 IDR图像中的编码 slice  5 Code slice in IDR image
6 SEI (补充增强信息)  6 SEI (Supplemental Enhancement Information)
7 SPS (序列参数集)  7 SPS (sequence parameter set)
8 PPS (图像参数集) 9 揆入早元定界符 8 PPS (image parameter set) 9 break into the early delimiter
10 序列结束  10 end of sequence
11 码流结束  11 code stream ends
12 填充数据  12 Fill data
13-23 保留  13-23 Reserved
24-31 未指定 可见, NALU 的头信息的一个字节中给出的信息主要包含 NALU 的有效性、 重要性等级, 根据这些信息可以确定 RTP 所承载的数据重 要性。  24-31 Unspecified It can be seen that the information given in one byte of the NALU header information mainly contains the validity and importance level of the NALU. Based on this information, the importance of the data carried by the RTP can be determined.
当前 H.264/RTP的多媒体通信框架下,主要通过运用配合控制协议 RCTP来完成服务质量(QoS, Quality of Service )监测的, 以及基于此 的拥塞控制和流量控制。 RTCP主要用于 RTP协议的控制和报告。 周期 性向两方或多方会话(Session )中的所有参与方传送控制数据包, 报告 釆用和 RTP数据包同样的分发机制。 底层协议提供数据和控制数据包 的多路复用 (例如各自使用单独的 UDP端口号等)。 在 RFC3550文件 中, 建议为 RTCP而增加的会话带宽为媒体带宽的 5%。  In the current multimedia communication framework of H.264/RTP, the quality of service (QoS) monitoring and the congestion control and flow control based on this are mainly implemented by using the RCTP with the control protocol. RTCP is mainly used for control and reporting of the RTP protocol. Periodically transmit control packets to all participants in a two-way or multi-session session, reporting the same distribution mechanism as RTP packets. The underlying protocol provides multiplexing of data and control packets (eg, using separate UDP port numbers, etc.). In the RFC3550 file, it is recommended to increase the session bandwidth for RTCP to 5% of the media bandwidth.
下面介绍 RTCP 数据包的类型和结构。 RTCP 中定义了以下几种 RTCP 数据包类型来携带多种控制信息: 发送方报告 (SR, Sender Report ), 有关主动发送方的传送和接收的统计信息; 接收方报告(RR, Receiver Report )从不是主动发送方的参与方接收统计信息; 资源描述 项 (SDES, Source Description ), 里面包括 CNAME; 参与方结束(退 出)标识(BYE ); 专用功能(APP, Application- specific fUnction )。  The types and structure of RTCP packets are described below. The following RTCP packet types are defined in RTCP to carry a variety of control information: sender report (SR, Sender Report), statistics on the transmission and reception of the active sender; receiver report (RR, Receiver Report) from Participants who are not active senders receive statistics; resource description items (SDES, Source Description), which include CNAME; participant end (exit) identifier (BYE); special function (APP, Application-specific fUnction).
RTCP发送和接收报告的包结构如图 3所示, 可以按内容类型分为 三段: 头信息、、发送方信息、、报告内容块, 最后的是特定层面 (Profile) 的扩展(所谓层面表示针对某种特定应用场景需要而制定的具体规则特 例)。 图 3中示出的各个具体字段的意义简要描述如下:  The packet structure of RTCP sending and receiving reports is shown in Figure 3. It can be divided into three segments according to the content type: header information, sender information, and report content block. Finally, the extension of the specific profile (so-called level representation) Specific rule exceptions that are tailored to the needs of a particular application scenario). The meanings of the specific fields shown in Figure 3 are briefly described as follows:
V字段为版本信息;  The V field is version information;
P字段为填充标志比特(Padding );  The P field is a padding flag bit (Padding);
RC字段为接收报告计数(RC, Reception Report Count ), 表示数据 包中包含的接收报告块的数目; The RC field is the Receive Report Count (RC, Reception Report Count), indicating the data. The number of received report blocks included in the package;
PT字段为数据包类型 (PT, Payload Type );  The PT field is the packet type (PT, Payload Type);
长度(Length ) 字段;  Length field;
发送方的 SSRC, 指示这个 SR数据包的发起者的同步源标识符 ( SSRC, Synchronous Source Identifier ),这里的同步源唯一标识一个媒 体数据源, 比如一路视频的源;  The SSRC of the sender indicates the Synchronous Source Identifier (SSRC) of the initiator of the SR packet, where the synchronization source uniquely identifies a media data source, such as the source of the video;
NTP timestamp 字段为网络时间协议时间戳 (NTP , Network Time Protocol), 指示 wallclock (绝对日期和时间 ), 与 RTP时间戳结合使用; The NTP timestamp field is Network Time Protocol (NTP), which indicates wall clock (absolute date and time), used in conjunction with RTP timestamps;
RTP timestampe字段为 RTP时间戳, 即 RTP协议产生的时间戳; 发送方的数据包计数字段, 指示从发送建立到产生这个 SR数据包 期间发送方传送的 RTP数据包总数; The RTP timestampe field is an RTP timestamp, that is, a timestamp generated by the RTP protocol; the sender's packet count field indicates the total number of RTP packets transmitted by the sender from the time the transmission is established to the generation of the SR packet;
发送方字节计数字段,指示从发送建立到产生这个 SR数据包期间, 发送方在 RTP 数据包中传送载荷 (Payload)的总字节数(不包括头或填 充), 该字段可以用来估算载荷的平均速率;  The sender byte count field indicates the total number of bytes (not including headers or padding) that the sender transmits in the RTP packet during the generation of the SR packet (excluding header or padding). This field can be used to estimate Average rate of load;
之后的字段包含了 0个或多个接收报告块,每一个接收报告块传递从单 个同步源收到的 RTP数据包的统计信息,包括:碎片丢失(fraction lost ); 累积丢失包数; The following fields contain zero or more receive report blocks, each of which receives the statistics of RTP packets received from a single sync source, including: fraction lost; cumulative lost packets;
其次是接收到的扩展最大序号、 到达时延抖动, 都反映网络传送状 况;  Secondly, the received maximum extension sequence number and arrival delay jitter all reflect the network transmission status;
上一个 SR ( LSR, Last SR ) 占 32比特, 是指该源上一个 SR报告 的时间戳标记, 取值为上一个 SR的 NTP的中间 32比特;  The last SR (LSR, Last SR) is 32 bits, which is the timestamp flag reported by the SR on the source, which is the middle 32 bits of the NTP of the previous SR;
自上一个 SR以来的时延'( DLSR, Delay since Last SR ), 占 32比 特, 是指自上一个 SR到这个 SR期间的时间间隔长度, 这个参数是用 来计算 QoS报告的关键参数。  The delay since last SR (DLSR, Delay since Last SR), which is 32 bits, refers to the length of the interval from the last SR to the SR. This parameter is used to calculate the key parameters of the QoS report.
接收报告(RR )数据包格式同发送报告 ( SR ) 的区别是: 数据包 类型字段的值为 201 ; 没有发送方信息部分。  The difference between the Receive Report (RR) packet format and the Transmit Report (SR) is: The value of the Packet Type field is 201; there is no sender information portion.
根据 RTP/ TCP协议标准, RTCP的功能如下:  According to the RTP/TCP protocol standard, the functions of RTCP are as follows:
基本功能, 为实时多媒体数据传送质量提供反馈报告机制, 通过 RTCP来传递发送方报告 ( SR)和接收方报告 ( RR ) 来实现; RTCP 为每一个 RTP 源传送一个永久传送层标识, 称为规范名 ( CNAME, Canonical Name ), SSRC标识在发现冲突或程序重启时可 能发生改变, 因此接收方需要通过 CNAME来跟踪每个参与方; Basic function, providing a feedback reporting mechanism for real-time multimedia data transmission quality, and transmitting the sender report (SR) and the receiver report (RR) through RTCP; RTCP transmits a permanent transport layer identifier for each RTP source, called the canonical name (CNAME, Canonical Name). The SSRC identifier may change when a conflict is found or the program is restarted, so the receiver needs to track each participant through CNAME;
为了让 RTP 能按比例地增加参与方的数目, 必须控制 RTCP数据 包的速率;  In order for RTP to proportionally increase the number of participants, the rate of RTCP packets must be controlled;
传送尽可能少的控制信息。  Deliver as little control information as possible.
可见,采用 RTCP协议传送 QoS报告,按照 RTCP协议规定的报告 内容来报告这些 QoS信息, 基于此实现对 H.264等承载媒体的 QoS监 测。  It can be seen that the QoS report is transmitted by using the RTCP protocol, and the QoS information is reported according to the report content specified by the RTCP protocol, and the QoS monitoring for the bearer media such as H.264 is implemented based on this.
然而, RTCP在带来能够提供 QoS报告机制的同时, 因为采用周期 性才艮告方法, 导致了额外网絡带宽的开销, 最高可以达到 5%。 如果网 络出现拥塞(Congestion ), 导致传送 QoS下降, 那么 RTCP产生的额 外流量将使得情况更加恶化。  However, while RTCP brings the ability to provide QoS reporting mechanisms, the use of periodic reporting methods results in additional network bandwidth overhead, up to 5%. If the network is congested, resulting in a drop in the transmission QoS, the extra traffic generated by the RTCP will make the situation worse.
H.264是未来多媒体通信的主要视频协议, 未来的多媒体通信应用 的网络主要是基于 IP的数据包交换网络和无线网络。 IP网络实现 "尽 力" (best effort )传送, 并不能保证传送视频数据的 QoS。 特别是对经 过高效压缩编码的 H.264码流, 问题更为突出。 IP网络上的尽力传送不 能保证实时视频通信的 QoS, 数据包丟失、 时延和时延抖动。 对恢复视 频的质量有影响。 H.264 is the main video protocol for multimedia communication in the future. The network of future multimedia communication applications is mainly IP-based packet switching networks and wireless networks. The IP network implements "best effort" transmission and does not guarantee the QoS of the transmitted video data. Especially for the H.264 code stream that has been efficiently compressed and encoded, the problem is more prominent. Best-effort delivery over IP networks does not guarantee QoS, packet loss, latency, and latency jitter for real-time video communications. It has an impact on the quality of the restored video.
容错弹性 (Error Resilience)是指传送机制具有预防错误发生或者在 错误发生后能够以一定能力纠正的能力。 在多媒体通信环境中, 一种视 频传送机制是否具有容错弹性将是非常关键的。  Error Resilience is the ability of a delivery mechanism to prevent errors from occurring or to be corrected with certain capabilities after an error has occurred. In a multimedia communication environment, it is critical that a video delivery mechanism is resilient to fault.
存在多种容错弹性机制, 比如前向纠错 (FEC , Forward Error Correction)、 自动重发请求 (ARQ, Automatic Retransmission Request)-. 错误掩盖(Error Concealment) , 信源信道联合编码 (JSCC , Joint Source-Channel Coding), 交织 ( Interleaving )及消除误码扩散等。 采用 多种纠错编码来对于要保护的数据进行编码, 实质是形成数据冗余, 从 而增加抗御错误的能力。 数据包在网络上主要的错误是丟包错误,这种错误在纠错编码理论 中叫做删除错误( Erasure Error )。 针对删除错误的纠错编码是一大类叫 做纠删码 (Erasure Codes )。 所谓纠删码就是把数据码流顺序逐段分割 成大小相同的一个个单元 (Unit), 也叫做数据节点 (Data Nodes ), 为表 示方便, 假设共有 n个数据节点。 然后按照数学运算规则对于这些数据 节点进行计算产生出校验节点 (Parity Nodes或 Check Nodes), 为了增强 保护能力, 还可以对于这些校验节点继续运算产生出第二层校验节点, 依次类推, 可以生成第三层, 第四层, 直至第 N层校验节点。 There are a variety of fault-tolerant resilience mechanisms, such as Forward Error Correction (FEC), Automatic Retransmission Request (ARQ), Error Concealment, and Joint Channel Coding (JSCC, Joint Source). -Channel Coding), interleaving and eliminating bit error spread. The use of multiple error correction coding to encode the data to be protected essentially forms data redundancy, thereby increasing the ability to resist errors. The main error of the packet on the network is the packet loss error, which is called Erasure Error in the error correction coding theory. Error correction codes for deletion errors are a large class called Erasure Codes. The so-called erasure code is to divide the data stream sequence into segments of the same size (Unit), also called data nodes (Data Nodes). For convenience of presentation, it is assumed that there are n data nodes. Then, according to the mathematical operation rules, the data nodes are calculated to generate check nodes (Parity Nodes or Check Nodes). In order to enhance the protection capability, the check nodes may continue to operate to generate the second layer check nodes, and so on. The third layer, the fourth layer, can be generated up to the Nth layer check node.
一般来说, 如果涉及多层校验节点, 每层上的节点数目相对于上一 层是按照一定规律递减的, 这样就行成一个逐层递缩的多层节点结构。 可以形象地表示为一个向右转 90度的金字塔。 其中, 最左边是数据节 点层, 向右排列依次是第一层校脸节点, 第二层校验节点, ... ..., 第 N 层校验节点。  In general, if multiple layers of check nodes are involved, the number of nodes on each layer is decremented according to a certain rule with respect to the previous layer, thus forming a layer-by-layer hierarchical multi-node structure. It can be visually represented as a pyramid that turns 90 degrees to the right. The leftmost side is the data node layer, and the right side is the first layer of the face node, the second layer of the check node, ..., the Nth layer check node.
纠删码具有一种非常重要的性质, 即处理需要的时间复杂度是和数 据节点数 n存在线性关系, 因此叫做线性时间特性(linear-time )。 而很 多其它的纠删码比如著名的 Reed-Solomon码需要的时间复杂度就要高 得多, 是 n*log2n*log(logn)数量级的。 因此, 具有线性时间性的纠删码 其在实时通信中的用途要好得多。  The erasure code has a very important property, that is, the time complexity required for processing has a linear relationship with the number n of data nodes, so it is called linear-time. Many other erasure codes, such as the famous Reed-Solomon code, require much more time complexity, on the order of n*log2n*log(logn). Therefore, the erasure code with linear time is much better used in real-time communication.
Tornado糾删码(下文均简称 Tornado码) 结构简单, 运算高效, 保护能力强。 目前已经获得较为广泛的应用。  Tornado erasure code (hereinafter referred to as Tornado code) is simple in structure, efficient in operation and strong in protection. It has been widely used.
在 Tornado码中, 从数据节点逐层产生出多个校验节点层。 校验节 点和数据节点都由发送端通过网络发送给接收端。如果在网络传送过程 中, 部分节点丟失了, 因为上层节点参加了下层节点的生成, 因此上层 节点的信息已经包含在了下层节点以及更下层节点中, 因此丢失节点的 信息可以通过足够多数目的下层节点或者更下层节点来完全恢复。设数 据节点个数为 n, 产生的校验节点数为 1。 则定义纠删码的码率和冗余 率分別是: r=n/(n+l), l-r=l/(n+l); 在其它条件相同情况下 (保护能力, 造成的延迟等), 码率越高(必然地, 冗余率越低), 则纠删码的效率越 高。 图 4示出了一种典型的 Tornado码数据节点及各层校验节点间的关 系。图中节点之间的连线称为边,表示边的左侧节点参与计算右侧节点, 可见前后两层节点之间是一种多对多的逻辑关系。设数据节点个数为 n, 总的校验节点个数为 m , 则定义纠删码的码率 r=n/(n+m)和冗余率 l-r=m/(n+m), 在相同情况下(保护能 , 造成的延迟等), 码率越高、 冗余率越^^ 则纠删码的效率越高。 Tornado码的结构和性能主要由三 个因素决定: (a)数据节点的数目以及逐层递缩的规律, 一般按等比例递 缩; (b)产生下一层节点的计算方法; (c)相邻两层节点之间的关联关系。 In the Tornado code, multiple check node layers are generated layer by layer from the data nodes. Both the check node and the data node are sent by the sender to the receiver through the network. If some nodes are lost during the network transmission process, because the upper node participates in the generation of the lower node, the information of the upper node is already included in the lower node and the lower node, so the information of the lost node can pass the lower level of sufficient majority. The node or lower node is fully recovered. Let the number of data nodes be n, and the number of check nodes generated is 1. Then define the code rate and redundancy rate of the erasure code are: r=n/(n+l), lr=l/(n+l); under the same conditions (protection capability, delay caused, etc.) The higher the code rate (inevitably, the lower the redundancy rate), the higher the efficiency of the erasure code. Figure 4 shows the relationship between a typical Tornado code data node and the check nodes of each layer. The line between the nodes in the figure is called the edge, and the node on the left side of the edge participates in the calculation of the right node. It can be seen that there is a many-to-many logical relationship between the two nodes before and after. Let the number of data nodes be n, and the total number of check nodes be m, then define the code rate of the erasure code r=n/(n+m) and the redundancy rate lr=m/(n+m). In the same situation (protection, delay, etc.), the higher the code rate and the higher the redundancy rate, the higher the efficiency of the erasure code. The structure and performance of Tornado code are mainly determined by three factors: (a) the number of data nodes and the law of layer-by-layer scaling, which is generally scaled down in equal proportions; (b) the calculation method for generating the next layer of nodes; (c) The relationship between two adjacent nodes.
Tornado码各个参数之间可以推得以下关系, 数据节点的数目设为 n, 校验节点数目设为 m, 递缩比例设为 p, 校验节点层数为 i, 则前 i-1 层校验节点的数目分别为 np、 np2、 …、 ηρΜ , 而最后一层即第 i层的数 目定为 npV(l- p), 这样得到总节点数 The following relationship can be derived between each parameter of the Tornado code. The number of data nodes is set to n, the number of check nodes is set to m, the scale of scaling is set to p, and the number of check nodes is i, then the front i-1 layer the number of check nodes, respectively np, np 2, ..., ηρ Μ, i.e., the last layer of the number of i-th layer as npV (l- p), so that the total number of nodes obtained
n+m=n+n+np2+..+np1_1+npV( 1 -p)=n/( 1 -p) , 贝, J有 m=np/(l-p), 为递 缩比例与校验节点数之间满足的隐含关系。 因为要保证每层的节点数 np、 np2 np" 及 npV(l-p)都是整数, 即可根据给定的 i和 p计算出 n的可行值, 比如 i=4,p = 1/2, 则可以推算出 n必须为 16的倍数。 n+m=n+n+np 2 +..+np 1_1 +npV( 1 -p)=n/( 1 -p) , Bay , J has m=np/(lp), is the scaling ratio and calibration Check the implicit relationship between the number of nodes. Since it is necessary to ensure that the number of nodes np, np 2 np" and npV(lp) of each layer are integers, the feasible values of n can be calculated according to the given i and p, such as i=4, p = 1/2, It can be inferred that n must be a multiple of 16.
Tornado码产生过程中最常采用的计算方法是异或运算, 因为异或 运算具有 4艮方便的恢复功能。 对于两个等长的比特序列
Figure imgf000014_0001
, 按比特进行异或运算得到同样长 的比特序列 C,则有以下性质: A与 C异或得到 B, B与 C异或得到 A; 同样的对于多个序列之间的异或运算, 也有相应的恢复方法。 可见, 经 过异或运算后, 数据节点或者校验节点之间即建立相互联系, 任意一个 节点丢失后, 均可由所有其余节点恢复。 由于最后一层校验节点的递缩 比例不同, 因此一般釆用常规的纠错编码策略进行计算, 比如 Reed-Solomon码。
The most commonly used calculation method in the Tornado code generation process is the exclusive OR operation, because the XOR operation has 4 convenient recovery functions. For two equal length bit sequences
Figure imgf000014_0001
, XORing by bit to obtain the same long bit sequence C, has the following properties: A and C X or O, B and C X or O; the same for the XOR between multiple sequences, there are The corresponding recovery method. It can be seen that after the XOR operation, the data nodes or the check nodes are connected with each other, and after any node is lost, it can be restored by all the remaining nodes. Since the final layer of check nodes has different scaling ratios, it is generally calculated using a conventional error correction coding strategy, such as a Reed-Solomon code.
Tornado码的前后层之间具有关联关系, 即下层的某个节点是由前 一层的哪些节点计算得到的。根据图论, 前后两层节点之间形成一个二 部图,才^据二部图左右节点间的关联即确定了前后层节点之间的关联关 系。 在目前的 Tornado码策略中, 通过给定保护能力和其它要求, 比如 数据节点大小合理性, 可以接受的最大网絡延迟等, 确定参数 n,m,i,p 等, 并给定节点度向量的随机分布, 并可进行 Tornado编码。 在接收端 进行解码时, 根据每一级的二部图, 如果有一个右节点被正确接收, 且 与它相关联的所有左节点中只有一个节点丟失,那么该丟失的节点就可 以通过这个右节点与所有未丢失的左节点恢复得到, 即达到了纠错的效 果。 The front and back layers of the Tornado code have an association relationship, that is, which node of the lower layer is calculated by which nodes of the previous layer. According to the graph theory, a two-part graph is formed between the two nodes before and after, and the association between the nodes in the front and back layers is determined according to the association between the left and right nodes of the two-part graph. In the current Tornado code strategy, the parameters n, m, i, p, etc. are determined by given protection capabilities and other requirements, such as the reasonableness of the data node size, the maximum acceptable network delay, etc., and given the node degree vector. Randomly distributed, and can be encoded in Tornado. When decoding is performed at the receiving end, according to the bipartite graph of each level, if one right node is correctly received and only one of all the left nodes associated with it is lost, the lost node can pass this right The node is recovered with all the left nodes that have not been lost, that is, the error correction effect is achieved.
其实, 纠删码的范围很大, Tornado码只是其中比较典型的一种, 另外还有比如 RS(Reed-Solomon)码、低密度校验码 (LDPC, Low Density Parity Codes" )等。  In fact, the range of erasure codes is very large. Tornado codes are only one of them. In addition, there are RS (Reed-Solomon) codes and Low Density Parity Codes (LDPC).
纠删码的一个重要的性能指标就是其纠错能力(或者叫做保护能 力), 直接体现为能够完全糾正丢包错误所允许的最大丟包数量(在一 定包的总数前提下), 或者当丟包高于这个最大允许数量条件下, 能够 正确纠正包的百分比。 一般来说, 在其他条件相同情况下, 保护能力越 高, 冗余率越高。  An important performance indicator of the erasure code is its error correction capability (or protection capability), which is directly reflected in the maximum number of lost packets allowed under the packet loss error (on the premise of a certain number of packets), or The package is able to correct the percentage of the package correctly above this maximum allowable number. In general, the higher the protection, the higher the redundancy rate under the same conditions.
保护能力不仅适用于纠删码, 在更大范围内, 所有 FEC编码都可 以用保护能力来度量。 在视频数据中, 有些数据相对重要性高, 比如视 频序列的结构参数、 图像的结构参数、 头信息等; 另外一些数据的重要 性相对低, 比如图像内容数据等。 在使用 FEC进行保护时, 对于相对 重要的数据采用保护能力较强的编码; 而对于相对不重要的数据采用保 护能力较弱的编码。 这样可以在保护能力和效率之间达成平衡。 这种根 据数据相对重要性来进行不同保护能力的 FEC保护的方法叫做不等保 护 (UEP, Unequal Protection) 通过不等保护, 容易实现视频通信服务 的 QoS保证。  The protection capability is not only applicable to erasure codes, but on a larger scale, all FEC codes can be measured by protection capabilities. In video data, some data are relatively important, such as structural parameters of video sequences, structural parameters of images, header information, etc. Other data are relatively less important, such as image content data. When using FEC for protection, a more robust code is used for relatively important data, and a weaker code is used for relatively unimportant data. This balances protection and efficiency. This method of FEC protection based on the relative importance of data for different protection capabilities is called UEP (Unequal Protection). QoS guarantee for video communication services is easily realized by unequal protection.
不等保护的思想是对于多媒体数据中具有不同重要性(相对的)的 数据采用不同保护能力 /保护强度的保护机制进行保护。 不同的保护机 制可以指大类或者小类, .比如大类在原理上不同, 小类仅仅在结构或者 参数上不同。 分级保护是对于保护机制按照保护能力分成多个级别, 分 级保护其实是一种自适应的策略, 不等保护和分级保护结合起来, 形成 更为复杂和强有力的保护策略的。 The idea of unequal protection is to protect data with different importance (relative) in multimedia data with different protection/protection strength protection mechanisms. Different protection mechanisms can refer to large or small classes. For example, large classes differ in principle, and small classes differ only in structure or parameters. Hierarchical protection is to divide the protection mechanism into multiple levels according to the protection ability. Hierarchical protection is actually an adaptive strategy. The combination of protection and hierarchical protection forms. More complex and powerful protection strategies.
在实际 H.264视频通信中, 由于丢包等引起的删除错误导致图像质 量退化是非常严重的, 更甚于引起解码端系统的崩溃。 在基于 H.264标 准的视频通信中, 除了容错弹性保护策略之外, 必须采用有效的抗丢包 等删除错误的技术, 并结合多种视频抗误码方法, 来保证恢复图像的质 量。  In actual H.264 video communication, the image quality degradation caused by the deletion error caused by packet loss or the like is very serious, and is more caused by the collapse of the decoder system. In the video communication based on the H.264 standard, in addition to the fault-tolerant elastic protection strategy, an effective anti-drop packet and other techniques for deleting errors must be adopted, and a plurality of video anti-error methods are combined to ensure the quality of the restored image.
现有的抗丟包错误抹术大体可以分为两类: (a )主动防错型: 事先 采取保护措施, 比如引入冗余机制, 尽量保证数据包不丟失或者确保接 收端能够恢复少量丢失的数据; (b )错误补偿型: 在发生误码情况下采 取一定的补偿措施, 比如在网络状况恶化严重情况下, 丟包率非常高, 主动防错方法失去效果, 这时就需要对已经发生的误码进行补偿。  The existing anti-drop error smear can be roughly divided into two categories: (a) Active error-proof type: Take pre-protection measures, such as introducing a redundancy mechanism, try to ensure that the data packet is not lost or that the receiving end can recover a small amount of loss. (b) Error compensation type: Take certain compensation measures in case of error, for example, in the case of serious deterioration of network conditions, the packet loss rate is very high, and the active error prevention method loses its effect. The error is compensated.
错误补偿的误码消除方法根据侧重点不同又分为错误掩盖和误码 扩散消除两种。 其中, 错误掩盖是侧重于补偿误码当前的影响, 误码扩 散消除则是消除误码在空间和时间上扩散带来的后续影响。 错误掩盖也会导致误码扩散的产生。 事实上, 由于错误掩盖会造成 编码端和解码端重构图像緩存内容不匹配,从而导致误码在时间域上的 扩散。  The error compensation method for error compensation is divided into two types: error masking and error spreading. Among them, the error concealment is focused on compensating the current impact of the error, and the error re-distribution elimination is to eliminate the subsequent influence of the error in spatial and temporal diffusion. Error concealment can also lead to the spread of bit errors. In fact, due to error concealment, the codec and decoder decoding image cache contents do not match, resulting in the spread of bit errors in the time domain.
现有的 H.264/RTP传送架构以及基于 RTCP的 QoS报告方法,采用 RTP直接封装 NALU进行传送,用 RTCP的 SR/RR报告监测 QoS信息, 前面已经介绍相关技术细节。  The existing H.264/RTP transport architecture and the RTCP-based QoS reporting method use RTP to directly encapsulate the NALU for transmission, and use the RTCP SR/RR report to monitor QoS information. The related technical details have been introduced.
另外, 现有技术中采用的 Tornado码是一种比较复杂的方案。 采用 Tornado码实现基于 H.261/H.263/H.263+/H.263++/H.264视频压缩编码 的数据传送保护方法。  In addition, the Tornado code used in the prior art is a relatively complicated solution. The data transmission protection method based on H.261/H.263/H.263+/H.263++/H.264 video compression coding is implemented by using Tornado code.
另外,现有的误码消除方法都是独立的错误掩盖方法或者误码扩散 消除方法,错误掩盖方法有时间域掩盖、空间域掩盖、时空联合掩盖等。 误码扩散消除又有帧内编码、 标识、 自适应帧内块刷新等。  In addition, the existing error elimination methods are independent error concealing methods or error diffusion elimination methods, and the error concealing methods include time domain masking, spatial domain masking, and space-time joint masking. The error spread elimination has intraframe coding, identification, adaptive intra block refresh, and the like.
时间域掩盖方法就是采用时间轴上相邻的帧的信息来推算丟失数 据。 可釆用以下推算的方法: 简单采用相邻帧相同位置的数据代替丢失数据; 考虑运动预测因 素, 根据相邻帧数据进行运动预测。 除此还有更加复杂的掩盖策略, 但 是计算量非常大。 The time domain masking method uses the information of adjacent frames on the time axis to estimate the missing data. The following methods can be used: Simply replace the missing data with the data of the same position of the adjacent frame; Consider the motion prediction factor, and perform motion prediction based on the adjacent frame data. In addition to this there are more complicated masking strategies, but the amount of calculation is very large.
空间域掩盖方法就是利用丟失数据区域的空间相邻区域来进行错 误掩盖。 这种方法计算量大。  The spatial domain masking method utilizes spatially adjacent regions of the lost data region for error concealment. This method is computationally intensive.
时空联合掩盖方法则是联合使用空间域和时间域的错误掩盖。 或 者, 融合空间数据和时间数据, 共同进行掩盖。  The space-time joint masking method is a combination of spatial domain and time domain error concealment. Or, combine spatial data and time data to cover up together.
基于帧内编码的误码扩散消除方法是将受误码影响的宏块采用帧 内编码, 即利用运动矢量的前向依赖关系进行准确误码跟踪, 并对受误 码影响的宏块采用帧内编码, 可以有效地防止误码扩散。 此外,现有技术中由于没有方便的能够提供网络状况监测的方案以 及数据相对重要性的描述, 都没有实现多级保护和不等保护。  The error code diffusion elimination method based on intraframe coding adopts intraframe coding for macroblocks affected by bit errors, that is, using the forward dependence of motion vectors to perform accurate error tracking, and adopting frames for macroblocks affected by error codes. Internal coding can effectively prevent bit error diffusion. In addition, in the prior art, multi-level protection and unequal protection are not realized because there is no convenient solution for providing network condition monitoring and description of the relative importance of data.
在实际应用中, 现有技术中的 Tornado码方案过于复杂, 效率低, 应用于视频数据的保护, 延时大, 无法满足实时通信的性能要求。  In practical applications, the Tornado code scheme in the prior art is too complicated and inefficient, and is applied to the protection of video data, and the delay is large, which cannot meet the performance requirements of real-time communication.
同时缺乏一种能够报告网络状况的机制, 因此, 通信双方无法根据 网络状况来决策采用合适的保护机制 ,从而不等保护和自适应分级保护 都无法有效使用, 多媒体通信的可靠性不能达到要求。  At the same time, there is a lack of a mechanism to report network conditions. Therefore, the two parties cannot decide to adopt appropriate protection mechanisms according to the network conditions, so that the protection and adaptive hierarchical protection cannot be effectively used, and the reliability of multimedia communication cannot meet the requirements.
而且错误掩盖和误码扩散消除两种方法没有很好的统一起来,有的 时候相互矛盾, 其作用相互抵消。  Moreover, the two methods of error concealment and error diffusion elimination are not well unified, and sometimes they are contradictory, and their effects cancel each other out.
现有技术中, H.264 NAL和 RTP协议的联合工作机制缺乏, 如何 基于 H.264 NAL和相应的 RTP封装方法来提供保护机制没有定义, 是 一个空白。 进一步来说, 好的方法, 比如高效率的 Tornado编码和其他 保护措施、 以及不等保护和自适应分级保护等现在还不能应用于 H.264 视频数据。 In the prior art, the joint working mechanism of the H.264 NAL and the RTP protocol is lacking. How to provide a protection mechanism based on the H.264 NAL and the corresponding RTP encapsulation method is not defined, and is a blank. Furthermore, a good method, such as Tornado coding and other protective measures for efficient, adaptive hierarchical and unequal protection and protection now can not be applied H.26 4 video data.
现有技术没有能够利用 H.264 的消息扩展机制来实现网络状况和 QoS信息的报告, 缺少这种机制, 很多好的技术就没有了应用的必要前 提条件。  The prior art does not have the ability to utilize H.264's message extension mechanism to report network status and QoS information. Without such a mechanism, many good technologies have no necessary preconditions for application.
现有技术都比较零散, 缺乏集成, 彼此的效果没有相互增强。 同时 很多技术还停留在学术探讨阶段,没有进入到通信协议层面的定义和开 发, 影响了实际应用。 在这些技术的整合上, 必须考虑实时通信性能要 求的约束, 选择的技术必须要性能好同时计算不能过于复杂。 The prior art is relatively fragmented, lacking integration, and the effects of each other are not mutually reinforcing. Simultaneously Many technologies are still in the stage of academic discussion, and have not entered the definition and development of the communication protocol level, which has affected the practical application. In the integration of these technologies, the constraints of real-time communication performance requirements must be considered. The selected technology must have good performance and the calculation cannot be too complicated.
现有技术用固定的纠删码策略保护视频通信流,无法适应网络通信 变化; 错误掩盖方法采用的替代机制会引起误码扩散; 误码扩散消除方 法都需要复杂的机制或者额外的反馈信道,耗费系统处理资源和网络带 宽资源。  The prior art protects the video communication stream with a fixed erasure code strategy, and cannot adapt to network communication changes; the alternative mechanism adopted by the error concealment method may cause error diffusion; the error diffusion elimination method requires a complicated mechanism or an additional feedback channel. The system consumes resources and network bandwidth resources.
现有技术方案中将 NALU的头信息完全封装在净荷当中,使得 RTP 协议无法直接获知有关净荷的属性、 级別、 重要程度等, 从而无法实现 基于此的 QoS机制。 其次, 这样的封装格式还造成了 NALU头信息占 用净荷资源, 因为每个 NALU 的都附带头信息, 导致在很多情况下, 由于一个 RTP中多个相同类型的 NALU的头信息都是一样的 , 从而浪 费了 RTP传送带宽资源。  In the prior art solution, the header information of the NALU is completely encapsulated in the payload, so that the RTP protocol cannot directly know the attributes, levels, importance, and the like of the payload, and thus the QoS mechanism based on this cannot be implemented. Secondly, such an encapsulation format also causes the NALU header information to occupy the payload resources, because each NALU has header information, which results in many cases, because the header information of multiple NALUs of the same type in an RTP is the same. , thus wasting RTP transmission bandwidth resources.
H.264/RTP 的多媒体通信框架采用了一种通用的配合控制协议 RTCP来传送 QoS报告, 以实现 QoS监测, 然而 RTCP本身对于 H.264 这样的特定视频通信应用不一定是最合适的, 由于其本身的带外重开逻 辑通道来传送 QoS报告, 影响了网络状况, 导致了矛盾的产生。  The H.264/RTP multimedia communication framework uses a generic coordination control protocol RTCP to transmit QoS reports for QoS monitoring. However, RTCP itself is not necessarily the most suitable for specific video communication applications such as H.264. Its own out-of-band re-opening logical channel to transmit QoS reports affects network conditions and leads to conflicts.
关键的地方在于, 现有技术没有实现传输层的容错弹性保护策略, 无法提供多媒体传送的可靠性和通信质量。  The key point is that the prior art does not implement a fault-tolerant and flexible protection strategy of the transport layer, and cannot provide the reliability and communication quality of multimedia transmission.
发明内容 Summary of the invention
本发明的主要目的在于提供一种多媒体通信方法及其终端,使得传 送可靠性和通信质量得到提高。  SUMMARY OF THE INVENTION A primary object of the present invention is to provide a multimedia communication method and terminal thereof that improve transmission reliability and communication quality.
本发明实施例提供一种多媒体通信方法, 包括:  The embodiment of the invention provides a multimedia communication method, including:
发送端根据容错弹性保护策略选择编码方式,对多媒体数据进行编 码, 并将通过实时传送协议封装的已编码的多媒体数据发送给接收端; 所述接收端接收所述多媒体数据 ,若所述接收的多媒体数据出现传 送错误, 所述接收端恢复或部分恢复所述传送错误的多媒体数据。  The transmitting end selects the encoding mode according to the fault-tolerant elastic protection policy, encodes the multimedia data, and sends the encoded multimedia data encapsulated by the real-time transport protocol to the receiving end; the receiving end receives the multimedia data, if the receiving The multimedia data has a transmission error, and the receiving end restores or partially restores the multimedia data of the transmission error.
更适宜地, 进一步包括:  More suitably, further comprising:
所述接收端统计通信质量, 生成服务质量报告, 将其发回给所述发 送端; The receiving end collects the communication quality, generates a quality of service report, and sends it back to the Delivery end
所述发送端根据所述服务质量报告调整所述容错弹性保护策略。 更适宜地, 进一步包括:  The sending end adjusts the fault tolerant elastic protection policy according to the quality of service report. More suitably, further comprising:
所述接收端根据传送错误的多媒体数据统计传送错误信息,并且实 施错误掩盖策略;  The receiving end transmits error information according to the multimedia data of the transmission error, and implements an error concealment strategy;
所述接收端将所述传送错误信息反馈给所述发送端;  The receiving end feeds back the transmission error information to the sending end;
所述发送端根据所述传送错误信息实施误码扩散消除策略。  The transmitting end implements an error diffusion elimination strategy according to the transmission error information.
优选地, 在所述实时传送协议包头信息中携带编码相关信息, 所述 接收端根据该编码相关信息恢复或部分恢复所述多媒体数据。  Preferably, the real-time transport protocol header information carries code-related information, and the receiving end recovers or partially recovers the multimedia data according to the code-related information.
优选地,所述发送端根据所述传送错误信息获得所述丟失条带的定 位信息, 通过对该丢失条带进行分段逐次帧内编码, 以实现所述误码扩 散消除策略。  Preferably, the transmitting end obtains the positioning information of the lost strip according to the transmission error information, and performs segment-by-frame intra-frame coding on the lost strip to implement the error spreading elimination strategy.
根据本发明实施例还一种多媒体通信终端,具有用于实现多媒体通 信的基本功能模块, 其中包含用于实现多媒体编解码功能的编解码模 块, 还包括:  According to an embodiment of the present invention, a multimedia communication terminal has a basic function module for implementing multimedia communication, and includes a codec module for implementing a multimedia codec function, and further includes:
容错弹性实施传送控制协议模块,用于接收所述编解码模块编码后 的多媒体数据, 对所述多媒体数据进行容错弹性保护, 将所述进行容错 弹性保护的数据发送到网络侧进行传送,所述容错弹性实施传送控制协 议模块还用于接收网络侧的所述多媒体数据,对所述多媒体数据进行纠 错, 并将所述多媒体数据传给所述编解码模块进行解码。  The fault-tolerant and flexible transmission control protocol module is configured to receive the multimedia data encoded by the codec module, perform fault-tolerant and flexible protection on the multimedia data, and send the fault-tolerant and elastic-protected data to the network side for transmission. The fault-tolerant elastic implementation transmission control protocol module is further configured to receive the multimedia data on the network side, perform error correction on the multimedia data, and transmit the multimedia data to the codec module for decoding.
更适宜地, 该终端还包括:  Preferably, the terminal further comprises:
保护方法和策略协商模块,用于负责在通信双方之间进行容错弹性 保护策略协商, 确定保护策略集合, 供所述容错弹性实施传送控制协议 模块选择;  The protection method and the policy negotiation module are configured to perform fault-tolerant and flexible protection policy negotiation between the two communication parties, determine a protection policy set, and implement the transmission control protocol module selection for the fault-tolerant elasticity;
前向纠错模块, 用于实现至少一种前向纠错保护方法, 维护所述前 向糾错保护方法的相关参数,其中所述保护方法和策略协商模块控制所 述前向纠错模块以实现不等保护和自适应分级保护功能,所述容错弹性 实施传送控制协议模块通过调用该前向纠错模块实现容错弹性保护和 纠错功能。 更适宜地, 该终端还包括: a forward error correction module, configured to implement at least one forward error correction protection method, to maintain related parameters of the forward error correction protection method, wherein the protection method and the policy negotiation module control the forward error correction module to The unequal protection and adaptive hierarchical protection functions are implemented, and the fault-tolerant elastic implementation transmission control protocol module implements fault-tolerant elastic protection and error correction functions by calling the forward error correction module. Preferably, the terminal further comprises:
错误掩盖模块, 用于实现错误掩盖功能;  Error masking module for implementing error concealment function;
所述编解码模块用于实现 H. 264编解码标准的编解码,还用于误码 扩散消除功能;  The codec module is used to implement codec of the H.264 codec standard, and is also used for error diffusion elimination function;
还包含网络状况分析计算模块, 用于分析计算网络状况, 并向所述 错误掩盖模块和所述编解码模块提供信息。  A network condition analysis calculation module is also included for analyzing the calculated network condition and providing information to the error masking module and the codec module.
更适宜地, 该终端还包括:  Preferably, the terminal further comprises:
补充增强消息扩展处理模块,用于实现服务质量报告和网络状况报 告功能, 并将报告发送给所述网络状况分析计算模块。  A supplementary enhanced message extension processing module is configured to implement a quality of service report and a network status report function, and send the report to the network status analysis calculation module.
与现有技术相比较,本发明的技术方案采用了容错弹性实时传送协 议( ERRTP ), 在现有 RTP基础上提供了可以携带容错弹性编码方案相 关信息的传送层封装格式, 使得多媒体数据在 ER TP上传送的同时标 记其相应的容错弹性编码方案信息, 从而将容错弹性机制融入传送层; 针对 H.264 NALU结构给出专用的 ERRTP封装方法和协议头信息 的改造方案, 通过将同一个 ERRTP包中的所有 NALU的头信息字节结 合到其头信息中,采用了一种巧妙的结合方式使得既不影响现有 ERRTP 协议及设备的运作, 而且能够将 NALU净荷的属性直接体现在 ERRTP 头信息中, 一方面使得承载效率大大提高, 另一方面提供了 QoS机制 实现的基础;  Compared with the prior art, the technical solution of the present invention adopts a fault-tolerant elastic real-time transmission protocol (ERRTP), and provides a transport layer encapsulation format that can carry information related to a fault-tolerant elastic coding scheme on the basis of the existing RTP, so that the multimedia data is in the ER. The TP transmits the corresponding fault-tolerant elastic coding scheme information at the same time, thereby integrating the fault-tolerant elastic mechanism into the transport layer; giving a dedicated ERRTP encapsulation method and protocol header information transformation scheme for the H.264 NALU structure, by using the same ERRTP The header information bytes of all NALUs in the packet are combined into their header information, using a clever combination that does not affect the operation of the existing ERRTP protocol and device, and can directly reflect the attributes of the NALU payload in ERRTP. In the header information, on the one hand, the bearer efficiency is greatly improved, and on the other hand, the basis of the implementation of the QoS mechanism is provided;
基于 H.264消息扩展机制,通过接收端统计通信质量并反馈给发送 端, 直接采用高层媒体协议 H.264本身的扩展消息机制来承载 QoS报 告信息, 避免使用额外的信道, 实现了一种"带内' 'QoS报告机制;  Based on the H.264 message extension mechanism, the communication quality is measured by the receiving end and fed back to the transmitting end, and the extended message mechanism of the high-level media protocol H.264 itself is directly used to carry the QoS report information, avoiding the use of additional channels, and realizing a kind of "" In-band 'QoS reporting mechanism;
在发送端还可以根据当前网络状况和多媒体数据重要性等级等因 素来选择采用各种备用的容错弹性编码方案, 从而达到不等保护的目 的, 实现保护能力和传送效率的均衡;  At the transmitting end, it is also possible to select various standby fault-tolerant elastic coding schemes according to current network conditions and multimedia data importance levels, thereby achieving the goal of unequal protection and achieving a balance between protection capability and transmission efficiency;
在从接收端到发送端的反馈机制的基础上, 实现不等保护和多种容 错弹性方案的 -交替混合使用, 发送端根据接收端反馈的 QoS报告以及 相关网络传送状况消息, 选择使用不同等级的保护策略, 另外基于从 ERRTP 头信息反映的数据重要性等级, 也可以选择对不同等级的数据 使用合适的保护策略; On the basis of the feedback mechanism from the receiving end to the transmitting end, the unequal protection and the multiple fault-tolerant flexible schemes are alternately mixed, and the transmitting end selects and uses different levels according to the QoS report fed back by the receiving end and the related network transmission status message. Protection strategy, based on the data importance level reflected from the ERRTP header information, you can also select different levels of data. Use appropriate protection strategies;
针对 H.264 NALU数据流,给出错误掩盖和误码扩散消除结合的方 案, 综合体现两种技术的优点, 通过误码信息反馈机制和分段逐次帧内 编码实现误码扩散消除;  Aiming at the H.264 NALU data stream, a combination of error concealment and error diffusion elimination is presented, which comprehensively reflects the advantages of the two technologies, and achieves bit error diffusion elimination through error information feedback mechanism and segmental successive intraframe coding;
还提供了一种高效的 Tornado码方案, 在确保数据传送保护能力没 有显著下降的情况下, 通过设置仅具有一层校验节点层的纠删码, 减少 了糾删码生成校验节点层的运算量, 减少了数据传送延迟时间, 使数据 传送保护性能与代价比得到提高;  An efficient Tornado code scheme is also provided. By ensuring that the data transmission protection capability is not significantly degraded, by setting an erasure code having only one layer of check nodes, the erasure code generation check node layer is reduced. The amount of calculation reduces the data transmission delay time, so that the data transmission protection performance and cost ratio are improved;
最后,将上述各种多媒体通信相关的增强技术整合在多媒体通信系 统上, 并模块化实现了各种技术及协议架构, 各种技术相互协调工作, 彼此进一步增强多媒体通信可靠性。 性传送结构, 节省了网絡传送带宽; 不等保护的实现, 达到了保护能力 和传送效率的均衡, 方便于多媒体传送的 QoS保证的实现, 进一步提 高服务质量, 降低冗余、.提高传送效率, 实现了与现有技术的兼容, 都 提高了 ERRTP这种新方法的健壮性;  Finally, the above various multimedia communication related enhancement technologies are integrated on the multimedia communication system, and various technologies and protocol architectures are modularized, and various technologies work in coordination with each other to further enhance the reliability of multimedia communication. The transmission structure saves the network transmission bandwidth; the realization of the unequal protection achieves the balance between protection capability and transmission efficiency, facilitates the realization of QoS guarantee for multimedia transmission, further improves the quality of service, reduces redundancy, and improves transmission efficiency. Achieving compatibility with the prior art has improved the robustness of the new method of ERRTP;
基于 H.264的消息扩展机制的 QoS报告 , 在带内实现 QoS监测, 降低带宽开销, 且降低系统实现的复杂性, 提高目前 H.264视频网络传 送质量的报告机制的效果和效率, 从而提升 H.264视频网络传送质量; 不等保护和多级保护策略更加灵活、 准确、 及时地适应网络传送需 求, 提高保护能力, 提高系统效率和可靠性, 能保证统计信息精确无误 而且节省系统资源;  QoS reporting based on H.264 message extension mechanism, implementing QoS monitoring in-band, reducing bandwidth overhead, reducing system implementation complexity, improving the effectiveness and efficiency of the current H.264 video network transmission quality reporting mechanism, thereby improving H.264 video network transmission quality; unequal protection and multi-level protection strategies are more flexible, accurate and timely to adapt to network transmission requirements, improve protection capabilities, improve system efficiency and reliability, ensure accurate statistical information and save system resources;
结合错误掩盖和误码扩散消除, 避免由错误掩盖引起的误码扩散, 在简单复杂度前提下, 达到理想的误码消除效果, 提高视频传送质量, 节省开销、 筒化机制, 且保证系统兼容性;  Combine error concealment and error diffusion elimination to avoid error diffusion caused by error concealment. Under the premise of simple complexity, achieve the ideal error elimination effect, improve video transmission quality, save overhead, reduce the mechanism, and ensure system compatibility. Sex
使用改进的 Tornado纠删码方案, 提高数据传送保护性价比、 提高 数据传送效率、 促进 H.264等新技术应用;  Improve the cost-effectiveness of data transmission, improve data transmission efficiency, and promote the application of new technologies such as H.264 by using the improved Tornado erasure code scheme;
将多种增强技术综合在多媒体通信系统中,共同提高多媒体通信质 量, 可以大大提高基于 H.264的多媒体通信产品比如会议电视、 可视电 话在 IP网络上应用的性能和用户满意度。 附图说明 Integrating multiple enhancement technologies into multimedia communication systems to improve the quality of multimedia communication, and greatly improve H.264-based multimedia communication products such as conference television and video The performance and user satisfaction of the application on the IP network. DRAWINGS
图 1是 RTP数据包的头信息结构示意图;  1 is a schematic diagram showing the structure of a header information of an RTP data packet;
图 2是 RTP包净荷对 NALU数据的封装格式示意图;  2 is a schematic diagram of a package format of an RTP packet payload to NALU data;
图 3是基于 RTCP协议的 QoS报告数据包格式示意图;  Figure 3 is a schematic diagram of a format of a QoS report data packet based on the RTCP protocol;
图 4是 Tornado纠删码原理示意图;  Figure 4 is a schematic diagram of the Tornado erasure code principle;
图 5是才 ^据本发明的第一实施方式的支持容错弹性多媒体通信终端 模块结构示意图;  FIG. 5 is a schematic structural diagram of a module supporting a fault tolerant elastic multimedia communication terminal according to a first embodiment of the present invention; FIG.
图 6是根据本发明的第一实施例的多媒体通信协议栈结构示意图; 图 7是根据本发明的第二、 三实施例的 ER TP数据包的头信息结 构示意图;  6 is a schematic structural diagram of a multimedia communication protocol stack according to a first embodiment of the present invention; FIG. 7 is a schematic diagram showing a header information structure of an ER TP data packet according to second and third embodiments of the present invention;
图 8是根据本发明的第四实施例的承载 QoS报告的 SEI封装格式 示意图;  8 is a schematic diagram of an SEI encapsulation format for carrying a QoS report according to a fourth embodiment of the present invention;
图 9是根据本发明的第六实施例的基于分段逐次帧内编码的误码扩 散消除原理示意图。  Figure 9 is a diagram showing the principle of error spread elimination based on segmented successive intra coding according to a sixth embodiment of the present invention.
图 10是本发明的纠删码结构示意图。 具体实施方式 为使本发明的目的、技术方案和优点更加清楚, 下面将结合附图对 本发明作进一步地详细描述。  Figure 10 is a block diagram showing the structure of the erasure code of the present invention. DETAILED DESCRIPTION OF THE INVENTION In order to make the objects, technical solutions and advantages of the present invention more comprehensible, the present invention will be further described in detail with reference to the accompanying drawings.
本发明将各种增强技术综合在一个多媒体通信系统上实现,将各种 增强技术的各自的优点结合共同提高系统性能、 传送可靠性和通信质 量。 这些增强技术包含了将 FEC融合于 RTP协议的容错弹性实时传送 协议 ( ERRTP, Error Resilience Real-time Transport Protocol 将 NALU 头信息综合于 RTP包头的技术、 采用 SEI扩展消息承载 QoS报告及网 絡状况的反馈技术、 基于反馈实现的多级保护和不等保护机制、 采用错 误掩盖和扩散消除结合的技术以及改进的 Tornado编码方案。  The present invention integrates various enhancement techniques on a multimedia communication system, combining the respective advantages of various enhancement techniques to improve system performance, transmission reliability, and communication quality. These enhancements include the fault-tolerant elastic real-time transport protocol (ERRTP, Error Resilience Real-time Transport Protocol) that integrates FEC into the RTP protocol. The technology of synthesizing NALU header information into the RTP header, feedback on the SEI extended message bearer QoS report and network status Technology, multi-level protection and unequal protection mechanisms based on feedback, techniques that combine error masking and diffusion elimination, and improved Tornado coding schemes.
本发明将各种增强技术模块化之后, 组合在一个多媒体通信系统 中, 实现容错弹性 H.264视频通信, 该系统包括一般的主控模块、 用户 接口、 网络通信模块、 I/O和底层驱动模块、 各种业务模块、 通信过程 控制模块、应用协议模块等, 还包括实现各种增强技术的保护方法和策 略协商模块、 FEC模块、 ERRTP模块、 RTCP模块、 H.264 NAL模块、 I- 1.264编码器模块、 H.264解码器模块、 音频编解码模块、 错误掩盖模 块、 SEI消息扩展处理模块、 网络状况分析计算模块。 The invention combines various enhancement technologies and combines them in a multimedia communication system to realize fault-tolerant and flexible H.264 video communication, and the system includes a general control module and a user. Interfaces, network communication modules, I/O and underlying driver modules, various service modules, communication process control modules, application protocol modules, etc., and also include protection methods and policy negotiation modules for implementing various enhancement technologies, FEC modules, ERRTP modules, RTCP module, H.264 NAL module, I- 1.264 encoder module, H.264 decoder module, audio codec module, error masking module, SEI message extension processing module, network condition analysis and calculation module.
【本发明的第一实施例中】  [In the first embodiment of the present invention]
将多种增强技术实现并模块化综合在一个多媒体通信系统中,主要 是指多媒体通信终端 ,下面首先从终端的各个組成功能模块来进行该装 置的实现描述,一个完整的终端内部模块结构图如图 5所示。应当说明, 这里所说的功能模块, 都是从功能上来定义的, 具体的实现方式可以是 软件、 硬件、 固件 (firmware)及软件硬件混合方式。 一个完整的多媒体 通信终端首先必须包含以下模块:  A plurality of enhancement technologies are implemented and modularized in a multimedia communication system, mainly referring to a multimedia communication terminal. First, the implementation description of the device is performed from each component function module of the terminal, and a complete terminal internal module structure diagram is as follows. Figure 5 shows. It should be noted that the functional modules mentioned here are all defined functionally, and the specific implementation manners may be software, hardware, firmware, and a combination of software and hardware. A complete multimedia communication terminal must first contain the following modules:
主控模块: 负责整个终端系统的控制;  Main control module: responsible for the control of the entire terminal system;
用户接口 (或者叫界面)模块: 负责用户输入输出的交互, 用户通 过界面控制元素如菜单按鈕等进行操作, 同时显示反馈信息, 比如当前 系统状态, 参数, 网络状况等;  User interface (or interface) module: Responsible for user input and output interaction, the user operates through interface control elements such as menu buttons, and displays feedback information such as current system status, parameters, network status, etc.
网络通信模块: 负责和网络进行通信, 提供 TCP,UDPJP和更下层 的通信协议栈如 Ethernet,PPP,ATM等;  Network communication module: Responsible for communication with the network, providing TCP, UDPJP and lower communication protocol stacks such as Ethernet, PPP, ATM, etc.;
I/O和底层驱动模块:负责对于硬件设备进行驱动, 比如视频, 音频 采集设备和显示 /播放设备的驱动, 同时负责视频, 音频数据的输入和 输出;  I/O and underlying driver modules: responsible for driving hardware devices, such as video, audio capture devices and display/playback devices, and for video and audio data input and output;
各种业务模块: 实现各种具体的业务, 比如可视电话, 多方会议, 视频邮件, 及时消息, 视频聊天等等;  Various business modules: Implement various specific services, such as videophone, multi-party conference, video mail, timely news, video chat, etc.
通信过程控制模块: 在具体的通信过程中进行控制, 比如在多方会 议中实现申请主席, 释放主席, 申奇发言, 控制广播某个会场, 会场浏 览等;  Communication process control module: Controls in the specific communication process, such as implementing the application chairperson in the multi-party conference, releasing the chairman, Shenqi speaking, controlling the broadcasting of a certain venue, the venue browsing, etc.;
应用协议模块:可以是 H.323体系(包括其下的 H.225.0, RAS, H.245 , H.235,H.460等) 和 SIP等具体的应用协议。 一^:来说, 这个协议是一 系列协议的总称, 叫做 "协议伞" ( protocol umbrella ); 此外对应于各种增强技术, 分别在以下模块中实现: 保护方法和策略协商模块:该模块负责在通信双方之间进行保护方 法协商, 确定允许集合, 然后才艮据允许集合, 来协商一组保护方法混合 和交替使用的策略。 协商通过 "应用协议模块" 来进行通信完成。 该模 块控制 FEC模块, 后者实现不同的 FEC保护方式, 不等保护和自适应 分级保护等功能; Application protocol module: It can be a specific application protocol such as H.323 system (including H.225.0, RAS, H.245, H.235, H.460, etc.) and SIP. One: In this case, this agreement is a general term for a series of agreements called "protocol umbrella"; In addition, corresponding to various enhancement technologies, respectively implemented in the following modules: Protection method and policy negotiation module: The module is responsible for negotiating the protection method between the communication parties, determining the allowed set, and then negotiating a set according to the allowed set. The strategy of mixing and alternating use of protection methods. The negotiation is completed through the "application protocol module". The module controls the FEC module, the latter implements different FEC protection modes, functions such as unequal protection and adaptive hierarchical protection;
FEC模块: 该模块支持多种 FEC保护方法, 它们作为子类可以属 于多个大类, 假设共支持 T种不同的方法。 根据协商的结果(来自 "保 护方法和策略协商模块" ), 对于 H.264视频数据和音频数据 (不在本专 利范围内)进行保护。 该模块内部保存了各种 FEC子类对应的生成规 则和参数, 因此含有一个内部的数据库, 用于存储这些数据。 该模块可 以实现不同保护方法的混合和交替应用;  FEC module: This module supports a variety of FEC protection methods. They can be subclasses in multiple categories. It is assumed that a total of T different methods are supported. According to the results of the negotiation (from the "Protection Method and Policy Negotiation Module"), H.264 video data and audio data (not in the scope of this patent) are protected. The module internally stores the generation rules and parameters for the various FEC subclasses, so it contains an internal database for storing this data. This module enables mixing and alternate application of different protection methods;
ERRTP模块: 实现 ERRTP协议, 关于 ERRTP的协议封装格式以 及对应 H.264的相关封装去封装步骤在下面的实施例中还会详细描述; ERRTP module: Implementing the ERRTP protocol, the protocol encapsulation format for ERRTP and the related encapsulation decapsulation steps corresponding to H.264 will be described in detail in the following embodiments;
RTCP模块: 实现正常的 RTCP功能,虽然本发明提供了基于 H.264 SEI消息扩展的报告机制, 可以实现主要 RTCP信息的报告, 但是并不 排除 RTCP的使用, 两种报告机制可以并存, 这样做主要是考虑兼容性 和互通性, 因此对方终端可能不支持采用 SEI消息扩展 4艮告机制; RTCP module: Implements the normal RTCP function. Although the present invention provides a reporting mechanism based on the H.264 SEI message extension, the main RTCP information can be reported, but the use of RTCP is not excluded, and the two reporting mechanisms can coexist. Mainly considering compatibility and interoperability, the other terminal may not support the use of SEI message extension 4 advertising mechanism;
H.264 NAL模块: 实现 H.264网络抽象层的功能;  H.264 NAL module: The function of implementing the H.264 network abstraction layer;
H.264编码器模块: 除了实现正常 H.264编码器功能外, 还实现了 本发明的误码扩散消除功能, 所以据的信息来自 "网络状况分析计算模 块,,;  H.264 encoder module: In addition to realizing the normal H.264 encoder function, the error diffusion elimination function of the present invention is also implemented, so the information is derived from the "network condition analysis and calculation module,;
H.264解码器模块: 实现正常的 H.264解码器功能;  H.264 decoder module: implements the normal H.264 decoder function;
音频编解码模块:实现音频的编解码功能,支持的协议可以是 ITU- T Audio codec module: implement audio codec function, the supported protocol can be ITU-T
G.711, G.722, G.723.1, G.728, G.728, G.722.2 (3GPP AMR), MPEG组织的G.711, G.722, G.723.1, G.728, G.728, G.722.2 (3GPP AMR), organized by MPEG
MP3,AAC等等; MP3, AAC, etc.;
错误掩盖模块: 实现本发明提供的错误掩盖功能。 所依据的信息来 自 "网络状况分析计算模块" 和 "H.264编码器模块";  Error Masking Module: Implements the error concealment function provided by the present invention. The information is based on the "Network Status Analysis Calculation Module" and the "H.264 Encoder Module";
SEI消息扩展处理模块: 实现本发明的基于 SEI消息扩展的 QoS和 网络状况报告功能, 在发送端, 收集数据形成 RTCP SR,RR报告, 然后 通过 SEI扩展消息封装发送出去;在接收端从 SEI扩展消息中提取 RTCP SR,RR报告, 然后把这些数据发给 "网络状况分析计算模块"进行分析 和计算; SEI message extension processing module: implementing QoS and SEI message extension based on the present invention The network status report function, on the transmitting end, collects data to form RTCP SR, RR report, and then sends out through SEI extended message encapsulation; extracts RTCP SR, RR report from SEI extended message at the receiving end, and then sends the data to the network The condition analysis calculation module "is analyzed and calculated;
网络状况分析计算模块: 根据来自 "SEI消息扩展处理模块" 的数 据, 进行分析计算获得网络状况数据, 比如丟包率, 抖动, 延时, 顺时 端到端的带宽(throughput )等等, 然后, 用这些数据来控制 "I-I.264编 码器模块" 和 "错误掩盖模块", 同时还把这些数据发送到 "用户接口 模块" 可以显示出来给用户看。  Network condition analysis calculation module: According to the data from the "SEI message extension processing module", perform analysis and calculation to obtain network status data, such as packet loss rate, jitter, delay, clockwise end-to-end bandwidth, etc., and then Use this data to control the "II.264 Encoder Module" and "Error Masking Module", and also send this data to the "User Interface Module" which can be displayed to the user.
在整体了解通信终端的模块构成之后,再来从协议栈的层次方面来 描述这种终端。 一个通信终端系统实现了多个不同层面的协议, 这些协 议构成了协议栈(Protocol Stack )。 对于本发明的终端, 其协议栈和普 通的多媒体通信终端有相同的地方, 也有不同的地方, 在某些地方增加 了一些新的层次。 图 6示出了根据本发明的第一实施例的多媒体通信协 议栈结构示意图。  After understanding the module structure of the communication terminal as a whole, the terminal is described from the level of the protocol stack. A communication terminal system implements multiple different levels of protocols, which form the protocol stack. For the terminal of the present invention, the protocol stack and the common multimedia communication terminal have the same place, and there are also different places, and some new levels are added in some places. Fig. 6 is a block diagram showing the structure of a multimedia communication protocol stack in accordance with a first embodiment of the present invention.
本发明的 H.264/ER TP的多媒体传送架构与传统的 H.264/RTP架 构区别主要在于:  The H.264/ER TP multimedia delivery architecture of the present invention differs from the traditional H.264/RTP architecture mainly in that:
用 ERRTP/RTCP层取代了一般终端协议栈中的 RTP/RTCP层, 使 用支持容错弹性的 ERRTP后,将容错弹性保护机制结合在传送层实现; 在应用协议层中增加了一个 "保护机制和策略协商层", 这一层主 要用于通信双方在实现多级保护和不等保护时协商各种保护等级及其 相关保护方案;  Replace the RTP/RTCP layer in the general terminal protocol stack with the ERRTP/RTCP layer, and use the fault-tolerant elastic ERRTP to combine the fault-tolerant elastic protection mechanism in the transport layer; add a "protection mechanism and strategy" in the application protocol layer. Negotiation layer", this layer is mainly used for communication parties to negotiate various protection levels and related protection schemes when implementing multi-level protection and unequal protection;
在 H.264 VCL层和 NAL层之间增加了 "SEI扩展报告层", 这一层 方便于收发双方实现基于 SEI扩展消息的 QoS监测和网络传送状况反 贝 ,  The "SEI Extended Reporting Layer" is added between the H.264 VCL layer and the NAL layer. This layer facilitates the implementation of QoS monitoring and network transmission status based on SEI extended messages.
在 H.264 NAL层和 ERRTP/RTCP层之间增加了 "FEC层,,, 这一层 实现了对于 H.264的 NALU数据流的节点划分、 编码和封装。  The "FEC layer" is added between the H.264 NAL layer and the ERRTP/RTCP layer. This layer implements node partitioning, encoding, and encapsulation for the H.264 NALU data stream.
熟悉本领域的技术人员可以理解,上述本发明的第一实施例中以典 型的 H.264业务为例给出基本模块化结构及协议栈組成,对于其他协议 或未来出现的多媒体通信协议或者应用,只需在本发明的原理基础之上 按照具体应用实现相关技术细节, 达到发明目的, 不影响本发明的实质 和范围。 It will be understood by those skilled in the art that the first embodiment of the present invention provides a basic modular structure and a protocol stack composition as an example of a typical H.264 service, for other protocols. The multimedia communication protocol or application that appears in the future, only needs to implement the relevant technical details according to the specific application based on the principle of the present invention, and achieve the object of the invention without affecting the essence and scope of the present invention.
在给出系统整体架构的前提下,下面将依次描述每一种增强技术的 实现细节。  On the premise of giving the overall architecture of the system, the implementation details of each enhancement technique will be described in turn below.
针对现有技术存在的诸多问题,本发明提出一种改进的支持容错弹 性的 RTP协议, 旨在将容错弹性机制融入传送层协议, 不但可以简化 传送结构降低复杂度, 而且还能提高容错弹性机制灵活性增强传送可靠 性。 由于具有容错弹性, 本发明称这种改进的 RTP协议为容错弹性实 时传送协议 ( ERRTP或者 ER2TP , Error Resilience Real-time Transport Protocol )。 ERRTP与 RTP的主要区别在于 ERRTP协议数据包头信息扩 展可以携带容错弹性编码方案相关信息, 比如 FEC类型、 保护能力、 编码参数等。  Aiming at many problems existing in the prior art, the present invention proposes an improved RTP protocol supporting fault tolerance resilience, which aims to integrate a fault-tolerant elastic mechanism into a transport layer protocol, which not only simplifies the transmission structure, reduces complexity, but also improves the fault-tolerant elastic mechanism. Flexibility enhances transmission reliability. Due to its fault tolerance, the present invention calls this improved RTP protocol a fault tolerant elastic real time transfer protocol (ERRTP or ER2TP, Error Resilience Real-time Transport Protocol). The main difference between ERRTP and RTP is that the ERTP protocol packet header information extension can carry information about the fault-tolerant elastic coding scheme, such as FEC type, protection capability, and coding parameters.
在 ERRTP基础上, 本发明很方便地实现了不等保护, 首先提供多 种保护能力不同的保护措施可供选择使用,然后发送端在收集得到网络 状况和多媒体数据重要性等信息后,可以根据这些因素来选择合适的保 护措施, 从而达到不等保护的目的, 实现保护能力和传送效率的均衡。 由于在每个 ERRTP数据包上都携带了其所釆用的 FEC相关信息, 因此 发送端只需将所选择的方案的信息填入 ERRTP包头信息中, 接收端就 能根据其进行正确恢复或纠错。  On the basis of ERRTP, the present invention conveniently realizes unequal protection. Firstly, various protection measures with different protection capabilities are available for selection, and then the sender can collect information such as network status and importance of multimedia data. These factors are used to select appropriate protection measures to achieve the goal of unequal protection and to achieve a balance between protection capability and transmission efficiency. Since the FEC related information is carried on each ERRTP data packet, the transmitting end only needs to fill in the information of the selected scheme into the ERRTP header information, and the receiving end can correctly recover or correct according to it. wrong.
最后对于 H.264的 NALU数据传送应用,给出了基于纠删码保护的 具体实现方法, 包括划分、 生成、 封装和解封装数据节点和校验节点的 步骤。 将连续一串 NALU —起等长地划分为若干个数据节点, 然后用 Tornado码产生校臉节点,所有这些节点又分布在若干个 ERRTP包中传 送, 接收端则进行这个逆过程。  Finally, for the NALU data transmission application of H.264, the specific implementation method based on erasure code protection is given, including the steps of dividing, generating, encapsulating and decapsulating data nodes and check nodes. A series of NALUs are equally divided into several data nodes, and then the Tornado code is used to generate the face nodes. All of these nodes are distributed in several ERRTP packets, and the receiver performs this inverse process.
【本发明的第二实施例】  [Second Embodiment of the Invention]
在第一实施例的基 上, 收发双方基于 ERRTP实现不等保护, 主 要步骤如下所述:  On the basis of the first embodiment, the transmitting and receiving parties implement unequal protection based on ERRTP. The main steps are as follows:
发送端选择容错弹性编码方案对多媒体数据进行纠删编码, 用 ERRTP封装编码后的多媒体数据, 并在 ERRTP包头信息中携带容错弹 性编码方案相关信息, 然后发送到接收端; The transmitting end selects a fault-tolerant elastic coding scheme to perform erasure coding on the multimedia data. ERRTP encapsulates the encoded multimedia data, and carries information related to the fault-tolerant elastic coding scheme in the ERRTP header information, and then sends the information to the receiving end;
接收端将收到的 ERRTP包解封装,并从 ERRTP包头信息中提取容 错弹性编码方案相关信息, 然后根据容错弹性编码方案相关信息, 选择 容错弹性编码方案进行容错弹性解码, 获得多媒体数据。  The receiving end decapsulates the received ERRTP packet, and extracts the information about the fault-tolerant elastic coding scheme from the ERRTP header information, and then selects the fault-tolerant elastic coding scheme for fault-tolerant elastic decoding according to the information of the fault-tolerant elastic coding scheme to obtain the multimedia data.
其中, 不等保护体现在发送端是根据当前网络传送状况和 /或待发 送多媒体数据的服务质量等级来选择容错弹性编码方案的。  The unequal protection is reflected in the fact that the transmitting end selects the fault tolerant elastic coding scheme according to the current network transmission status and/or the quality of service level of the multimedia data to be transmitted.
首先介绍 ERRTP的具体结构,下面给出具体 ERRTP的头信息结构 实施例。图 7是根据本发明的第一实施例的 ERRTP头信息结构示意图。 从图中可以看出, 版本信息字段 V取值为 3 , 表示 ERRTP协议, 以区 别于传统的 RTP协议(V=2 )。 其中在头信息扩展也就是最后附有关于 容错弹性编码方案的相关信息字段, 此例中包括: 容错弹性编码类型字 段、 容错弹性编码参数字段、 数据包长度字段、 数据包数目字段。  First, the specific structure of ERRTP is introduced. The specific ERRTP header information structure embodiment is given below. Figure 7 is a diagram showing the structure of an ERRTP header information according to a first embodiment of the present invention. As can be seen from the figure, the version information field V has a value of 3, indicating the ERRTP protocol, to distinguish it from the traditional RTP protocol (V=2). The header information extension is also followed by the relevant information field about the fault-tolerant elastic coding scheme. In this example, the fault-tolerant elastic coding type field, the fault-tolerant elastic coding parameter field, the packet length field, and the number of packets field are included.
容错弹性编码类型字段,用于指示容错弹性编码方案采用的纠删码 类型, 也可以称为 FEC Type字段, 即指示 FEC编码类型, 占 4比特, 可以表示 16种不同的 FEC类型, 从实际应用中, 是足够的。 这里定义 的类型其实是大的类型, 后面还将继续细分为各种不同的方案, 称为子 类型, 实际应用中的大类型例如: 0010表示 Tornado码, 0011表示 RS 码等。 该字段可标识 16种不同的 FEC码大类型, 通信双方需要事先约 定一个 FEC 编码类型和编码类型代号之间对应关系的查表(LUT, Look- Up Table )称为 FECTypeLUT。  The fault-tolerant elastic coding type field is used to indicate the erasure code type used by the fault-tolerant elastic coding scheme, and may also be referred to as an FEC Type field, that is, the FEC coding type is indicated, which is 4 bits, and can represent 16 different FEC types, from practical application. Medium is enough. The types defined here are actually large types, and will continue to be subdivided into various schemes, called subtypes. The large types in actual applications are, for example, 0010 for Tornado code and 0011 for RS code. This field identifies 16 different types of FEC codes. The LUT (Look-Up Table), which is required by the two parties to agree on a correspondence between the FEC encoding type and the encoding type code, is called FECTypeLUT.
容错弹性编码子类型字段,用于指示容错弹性编码方案的相关参数 设置, 对于每种类型的 FEC编码还需要确定各种参数的设置才能具体 实施, 这个字段就是起到明确具体参数的作用。 由于 ERRTP头信息中 资源有限, 不可能巴各种 FEC编码方案所对应的具体参数及其规则等 一一罗列,本发明的第一实施例通过用子类型的概念来指示各种备选的 参数设置方案。 该字段也称为 FEC编码子类型字段, FEC Subtype, 占 9比特。该域主要表示在 FECTypeLUT中定义的各大类型下面进一步细 分的子类型。 数据包长度字段,用于指示容错弹性编码方案在对多媒体数据进行 纠删编码后的数据节点长度, 称为 Data Length字段, 占 11比特。 由于 每个数据包长度应小于网络传送最大传送单元 (MTU , Maximum Transport Unit ), 而目前有线信道 MTI 1500 = 0x5DC字节, 无线信道 MTI 100字节, 因此该字段 11个比特足以存放数据包的长度。 The fault-tolerant elastic coding subtype field is used to indicate the related parameter setting of the fault-tolerant elastic coding scheme. For each type of FEC coding, it is also necessary to determine the setting of various parameters to be specifically implemented. This field serves to clarify specific parameters. Since the resources in the ERRTP header information are limited, it is impossible to list specific parameters and their rules, etc. corresponding to various FEC encoding schemes, and the first embodiment of the present invention indicates various alternative parameters by using the concept of subtypes. Set the plan. This field is also called the FEC encoding subtype field, FEC Subtype, which occupies 9 bits. This field mainly represents subtypes further subdivided under each of the large types defined in the FECTypeLUT. The data packet length field is used to indicate the data node length of the fault-tolerant elastic coding scheme after erasure-removing and encoding the multimedia data, which is called a Data Length field, which is 11 bits. Since each packet length should be less than the Maximum Transport Unit (MTU), the current cable channel MTI 1500 = 0x5DC bytes, the wireless channel MTI is 100 bytes, so the field 11 bits is enough to store the data packet. length.
数据包数目字段,用于指示该 ERRTP包所承载的数据节点的数目, 又称为 Packet Number字段, 占 8比特, 比如对于若干个 NALU经过前 向纠错码校验后, 分組封装在多个 ERRTP中,每个 ERRTP中所承载的 数据节点数。  The number of data packets is used to indicate the number of data nodes carried by the ERRTP packet, which is also called a Packet Number field, which occupies 8 bits. For example, after several NALUs are verified by the forward error correction code, the packet is encapsulated in multiple The number of data nodes carried in each ERRTP in ERRTP.
可见有了这些字段之后,解码端或网络节点可以根据该字段给出的 FEC码类型和数据包的校验类型对接收到的数据包进行校验,并恢复丢 失的数据包。  It can be seen that after these fields are available, the decoding end or the network node can verify the received data packet according to the FEC code type and the check type of the data packet given by the field, and recover the lost data packet.
值得注意的是, 上面提到的子类型 FEC Subtype字段共 9个比特是 用来编码指示各种备选的参数设置方案的,下面就给出本发明的第一实 施例中如何进行编码指示的技术细节。  It should be noted that the sub-type FEC Subtype field mentioned above has a total of 9 bits for encoding a parameter setting scheme indicating various alternatives, and how to perform the coding indication in the first embodiment of the present invention is given below. technical details.
首先收发双方需要协商确定该字段指示关系对应表。在开始传送之 前, 发送端和接收端协商确定: 对于各种 FEC码大类型, FEC Subtype 的取值与其所指示的该种 FEC码的相关参数设置方案的对应关系, 及 各种备选方案的具体参数设置情况。  First, the sending and receiving parties need to negotiate to determine the field indicating the relationship correspondence table. Before starting the transmission, the sender and the receiver negotiate to determine: for various types of FEC codes, the correspondence between the value of the FEC Subtype and the related parameter setting scheme of the FEC code indicated, and various alternatives. Specific parameter settings.
然后, 发送端和接收端都根据协商结果建立对应关系表, 用于根据 FEC Type和 FEC Subtype字段来查询所对应的 FEC编码类型或 FEC编 解码处理模块;  Then, the sender and the receiver both establish a correspondence table according to the negotiation result, and are configured to query the corresponding FEC coding type or FEC codec processing module according to the FEC Type and FEC Subtype fields;
在收发过程中, 发送端调用相应纠删编码处理模块进行纠删编码, 接收端调用相应纠删解码处理模块进行纠删解码。  In the process of transmitting and receiving, the transmitting end calls the corresponding erasure coding processing module to perform erasure coding, and the receiving end calls the corresponding erasure decoding processing module to perform erasure decoding.
在实际应用中, 子类型的信息实际上指示两个方面:  In practical applications, subtype information actually indicates two aspects:
A. FEC编码的生成规则 ( Generation Rule );  A. Generation rules for FEC coding (Generation Rule);
B. 保护强度 /保护能力。  B. Protection strength / protection.
所谓生成规则就是在发送端如何将数据节点进行处理生成各个校 验节点的规则或者算法 (Algorithm)。 当然在接收端所做的正好相反, 如 果在传送过程中发生了丟包, 即某些节点丟失了, 那么根据生成规则可 以恢复或者部分恢复丟失的节点。 可见生成规则是很重要的信息, 根据 它, 通信的双方就可以基于 FEC机制来工作了。 在 FECTypeLUT中列 出的 FEC类型中的每一类, 都有不同的生成规则; 而在每一类中, 比 如 Tornado 码, 下面的子类的生成规则还要结合具体的生成参数 (generation parametei's)。 因此具体到这里的每个子类, 声称规则将和生 成参数结合起来。 The so-called generation rule is a rule or algorithm (Algorithm) of how the data node is processed at the transmitting end to generate each check node. Of course, the opposite is done at the receiving end, such as If a packet loss occurs during transmission, that is, some nodes are lost, the lost node can be recovered or partially recovered according to the generation rule. It can be seen that the generation rule is very important information, according to which both parties of the communication can work based on the FEC mechanism. Each of the FEC types listed in the FECTypeLUT has different generation rules; in each class, such as the Tornado code, the following subclass generation rules are combined with the specific generation parameters (generation parametei's). . So for each subclass here, the claim rule will be combined with the build parameters.
比如对于 Tornado码, 生成参数包括如下数据: 数据节点总数、 校 验节点总数、 校验节点层数、 相继两层之间节电数目的递缩比例、 表示 相继两层之间节点关联关系的关联矩阵, 如果有 L层校验节点, 那么这 样的关联矩阵就有 L个、或者等效的表示相继两层节点关联关系的二部 图 (Bipartite)的参数 4 数学表示 (parametric mathematical representation) 0 一般来说, 在大的生成规则相同的前提下, 生成参数往往决定子类 型的保护强度。 比如 Tornado码, 在上面给出的各项生成参数中, 数据 节点总数和检臉节点总数基本上能够在很大程度上决定保护能力(当然 严格来说, 要完全决定保护能力, 需要全部的生成参数)。 在本发明中, 对于每个 FEC大类型, 选择一些决定保护能力的主要参数 (决定作用最 大)作为代表性生成参数 ( representative generation parameters )。 通过使 用代表性生成参数 ,就可以把大类下面的子类按照保护能力从弱到强的 顺序 (升序)排列起来。 从而建立一个 LUT叫做 FECSubTypeLUT。 For example, for the Tornado code, the generation parameters include the following data: the total number of data nodes, the total number of check nodes, the number of check node layers, the scaling ratio of the number of power saves between successive layers, and the association of node associations between successive layers. Matrix, if there is an L-layer check node, then such an associative matrix has L, or equivalent, bipartite of the relationship between successive two-layer nodes. Parametric mathematical representation 0 In the case that the large generation rules are the same, the generation of parameters often determines the protection strength of the subtype. For example, Tornado code, in the various generation parameters given above, the total number of data nodes and the total number of face nodes can basically determine the protection ability to a large extent (of course, strictly speaking, to fully determine the protection ability, all the generation is required. parameter). In the present invention, for each FEC large type, some main parameters (determining the maximum effect) that determine the protection ability are selected as representative generation parameters. By using representative generation parameters, subclasses under the large class can be arranged in order of protection from weak to strong (ascending order). Thus creating a LUT is called FECSubTypeLUT.
每个大类型下面具体支持多个子类型,可以有具体的应用和通信双 方的通信能力(CPU处理速度、 内存、程序复杂度等因素)和需要决定。 如果通信环境变化很大, 网络的性能波动范围很大, 那么需要支持的子 类型一般来说要多, 相反可以较少。 这个完全可以在通信开始前通过能 力协商过程, 由通信双方来达成一致的约定。 协商可以通过 H.323或会 话初始协议 ( SIP, Session Initial Protocol )等目前主流的多媒体通信框 架协议进行。  Each large type specifically supports multiple subtypes below, and can have specific application and communication capabilities (CPU processing speed, memory, program complexity, etc.) and needs to be determined. If the communication environment changes a lot and the performance of the network fluctuates widely, then the subtypes that need to be supported are generally more, but less. This can be agreed upon by the communication parties through the capability negotiation process before the communication begins. Negotiation can be carried out through the current mainstream multimedia communication framework protocols such as H.323 or Session Initial Protocol (SIP).
假定针对某个大类下面的子类,如果需要区分 S个子类型 (S≤29- 1), 代表性生成参数有 k个, 用 p^2,...^^表示, 那么表 2给出一个对应关 系的例子, 表中上标表示 FEC大类型, 下标表示具体哪个参数。 Assume that for a subclass below a large class, if it is necessary to distinguish S subtypes (S ≤ 2 9 - 1), there are k representative generation parameters, denoted by p^ 2 , ... ^^, then Table 2 gives Make a corresponding correspondence In the example of the system, the superscript in the table indicates the large type of FEC, and the subscript indicates which parameter is specific.
Figure imgf000030_0001
Figure imgf000030_0002
Figure imgf000030_0001
Figure imgf000030_0002
比如, 对于 Tornado码, 可以设置对应关系是 :000000010 - ( 24, 20 ) (数据节点总数 =20,校验节点总数 =4 ), 000000011 - ( 30, 20 ), 111111111 -其它。  For example, for a Tornado code, the correspondence can be set to :000000010 - ( 24, 20 ) (total number of data nodes = 20, total number of check nodes = 4), 000000011 - ( 30, 20 ), 111111111 - others.
针对某种特性的 FEC编码的子类型, 一组给定的生成规则结合相 应的生成参数对应唯一的一个编码方案, 即唯一决定了如何由数据节点 生成校验节点, 以及如何恢复丢失的节点。 可以建立一个数据库, 来存 储每种大类型和子类型对应的生成参数。 而生成规则本身用硬件或者软 件模块来实现。 因此, 每种大类型在发送端对应一个 FEC处理模块, 负责生成校验节点; 在接收端同样对应一个 FEC处理模块, 负责恢复 节点。 但是, 对应每种大类型的模块, 需要从上述生成参数数据库中读 取具体的每种子类型的生成参数, 从而来进行处理。 因此, 通信双方都 是根据 FEC Type和 FEC Subtype两个信息域的信息来决定调用哪个 FEC处理模块和读取哪些生成参数。  For a subtype of FEC coding of a certain characteristic, a given set of generation rules combined with corresponding generation parameters corresponds to a unique coding scheme, that is, the only decision is how to generate a check node from the data node, and how to recover the lost node. A database can be created to store the generation parameters for each of the large types and subtypes. The generation rules themselves are implemented in hardware or software modules. Therefore, each type of macro corresponds to a FEC processing module at the transmitting end, which is responsible for generating a check node; at the receiving end, it also corresponds to an FEC processing module, which is responsible for restoring the node. However, for each of the large types of modules, it is necessary to read the specific generation parameters of each seed type from the above generated parameter database, thereby performing processing. Therefore, both parties of the communication decide which FEC processing module to call and which generation parameters to read based on the information of the two information fields FEC Type and FEC Subtype.
由于目前多媒体通信技术的发展, H.264视频编码标准已逐渐成为 主流媒体编码格式, 因此本发明的第二实施例在第一实施例的基础上, 给出了用 ERRTP对 H.264的 NALU数据流进行 FEC编解码的具体步骤, 其流程如下所述。  Due to the development of multimedia communication technology, the H.264 video coding standard has gradually become the mainstream media coding format. Therefore, based on the first embodiment, the second embodiment of the present invention gives the NALU of H.264 with ERRTP. The specific steps of the data stream for FEC encoding and decoding are as follows.
发送端将多个 Ci艮设为 S个) H.264 NALU合并为一组统一进行编码 传送, 先把 S个 NALU重新划分为等长的块, 支设为 M个, 这 M个就 是数据节点。 在该步中, 将 Η.264的 S个 NALU分为一组; 然后将 S个 NALU 首尾相接 (concatenated),连接形成一个大块,然后将该大块等分为 M个 数据块,其中每个数据块的长度为 K个字节。这里如果该大块的总的字 节数(设为 TB )不能被 M整除, 那么应该进行取整运算,使得每个数据 块的长度为 Ceiling(TB/M)字节, Ceiling函数表示取整, 即 Ceiling(x)等 于不小于 X的最小整数, X为任意实数。那么在某些数据块中的后面可能 要釆用填充零串(zero padding)的操作, 使得字节数凑齐到 Ceiling(TB/M) )。 The sender sets multiple CiC to S) H.264 NALU merges into a group to perform unified coding transmission. First, the S NALUs are re-divided into equal-length blocks, and the support is set to M. These M are data nodes. . In this step, the S NARUs of Η.264 are grouped into one group; then the S NALUs are concatenated end-to-end, connected to form a large block, and then the large block is equally divided into M data blocks, wherein Each data block has a length of K bytes. Here, if the total number of bytes of the large block (set to TB) cannot be divisible by M, then the rounding operation should be performed so that the length of each data block is Ceiling (TB/M) bytes, and the Ceiling function indicates rounding. , that is, Ceiling(x) is equal to the smallest integer not less than X, and X is any real number. Then, in some data blocks, the operation of zero padding may be used, so that the number of bytes is equal to Ceiling (TB/M).
接着, 对 M个数据节点其进行 FEC编码, 得到 N个校验节点。 对 M个数据块使用 FEC码编码生成 N个校脸块, 生成过程采用前面描述 过的方法,根据 FEC Type和 FEC Subtype信息,确定调用具体哪个 FEC 处理模块进行校验块的生成。  Then, F data encoding is performed on the M data nodes to obtain N check nodes. Using FEC code encoding for M data blocks to generate N school face blocks, the generation process uses the method described above to determine which FEC processing module to call for the generation of the check block according to the FEC Type and FEC Subtype information.
然后, 发送端将所有数据节点和校验节点分组封装在 ERRTP包中 进行发送。 在此例中各个字段应该按如下设置:  Then, the sender encapsulates all data nodes and check node packets in the ERRTP packet for transmission. In this case the fields should be set as follows:
类型字段 FEC Type = 0010, 表示使用 Tornado码;  Type field FEC Type = 0010, indicating the use of Tornado code;
子类型字段则由发送端具体根据实际情况选择, 比如取值为 FEC Subtype = 000000010,表示使用 Tomado(24,20)码,其中数据节点 20个, 校验节点 4个, 信道编码冗余度为 16.7%; 该纠删码在丟包率小于等于 3%时, 可以完全恢复丟失的数据包;  The subtype field is selected by the sender according to the actual situation. For example, the value is FEC Subtype = 000000010, which means that the Tomado (24, 20) code is used, wherein 20 data nodes and 4 check nodes have channel coding redundancy. 16.7%; the erasure code can completely recover the lost data packet when the packet loss rate is less than or equal to 3%;
数据包长度 Data- Length = K Bytes;  Packet length Data- Length = K Bytes;
数据包数目 Packet Number = (M+N)/P , 表示一个 ERRTP载荷中承 载的数据节点个数。  Number of packets Packet Number = (M+N)/P , which represents the number of data nodes carried in an ERRTP payload.
接收端在接收到这些 ERRTP包后, 解封装得到数据节点和校验节 点。接收端以 P个数据包为周期,每接收到一组 P个数据包就开始进行 一次解码恢复。 一组多少个数据包由双方协商确定。  After receiving the ERRTP packets, the receiver decapsulates the data nodes and the check nodes. The receiving end starts with P packets and starts decoding and recovering every time a group of P packets is received. How many packets of a group are determined by mutual agreement.
接收端根据校验节点对数据节点进行容错弹性解码。每次在收到数 据包 P+1后开始检测前面收到的 P个数据包中是否有数据包丟失,如果 有就采用前面描述的方法, 根据 FEC Type和 FEC Subtype信息, 确定 调用具体哪个 FEC处理模块进行解码和恢复或者部分丢失的数据。 最后在得到完整的数据节点后, 重新合并就得到一个大块, 采用与 发送端相同的方式, 划分得到 S个 NALU。 The receiving end performs fault-tolerant elastic decoding on the data node according to the check node. Each time after receiving the data packet P+1, it starts to detect whether there is a packet loss in the P packets received before. If there is, the method described above is used, and according to the FEC Type and FEC Subtype information, it is determined which FEC is called. The processing module decodes and recovers or partially loses data. Finally, after obtaining the complete data node, re-merge to obtain a large block, and divide the S NALUs in the same way as the sender.
在实际应用中发现, 上例采用基于 ERRTP的抗数据包丢失算法, 可以在增加不到 17%码字的情况下,大大提高视频码流的抗数据包丢失 能力。 而与 RTP载荷头结构相比, 仅仅增加了 4 字节, 可见对传送效 率基本没有影响, 取得了显著的实际效果。  In practical applications, the above example uses the ERRTP-based anti-data packet loss algorithm, which can greatly improve the anti-data packet loss capability of the video code stream when the codeword is less than 17%. Compared with the RTP payload header structure, only 4 bytes have been added, which shows that there is basically no effect on the transmission efficiency, and significant practical results have been achieved.
在前面已经提到关于本发明的另外一个关键技术点就是不等保护 的实现。 主要体现在两个方面, 一个是根据不同重要等级的多媒体数据 来选择合适的编码方案或者参数, 即确定前述 FEC编码类型与子类型, 另一个就是才艮据不同时刻的网络状况来选择。 对应这两个方面, 分别称 为混合和交替使用各种 FEC编码方案。 所谓混合 (Hybrid), 是指在同一 时间内同时使用多种 FEC子类型, 主要用于保护不同重要性的数据; 而所谓交替 (Alternation), 是指在不同时间(不同的网络状况下)使用不 同的 FEC子类型。  Another key technical point that has been mentioned above with respect to the present invention is the implementation of unequal protection. It is mainly embodied in two aspects. One is to select the appropriate coding scheme or parameters according to the multimedia data of different important levels, that is, to determine the aforementioned FEC coding type and subtype, and the other is to select according to the network conditions at different times. Corresponding to these two aspects, they are called mixed and alternate use of various FEC coding schemes. Hybrid refers to the simultaneous use of multiple FEC subtypes at the same time, mainly for protecting data of different importance. The so-called Alternation refers to the use at different times (different network conditions). Different FEC subtypes.
因此对于 H.264 NALU数据流, 前面提到, 其头字节体现了数据的 重要程度, 因此发送端根据 NALU的头信息中的 NRI字段或 Type字段 可以评估 QoS等级, 进而选择容错弹性编码方案, 即确定 FEC Type字 段和 FEC Subtype字段。 而对于网絡状况, 一般的网络传送都有相应的 网络状况监测机制,发送端可以根据这些机制获知接收端反馈的传送报 告, 以此评价网络传送状况, 进而选择容错弹性编码方案, 即确定 FEC Type字段和 FEC Subtype字段。  Therefore, for the H.264 NALU data stream, as mentioned above, the header byte reflects the importance of the data, so the sender can evaluate the QoS level according to the NRI field or Type field in the NALU header information, and then select the fault-tolerant elastic coding scheme. , that is, the FEC Type field and the FEC Subtype field are determined. For the network condition, the general network transmission has a corresponding network condition monitoring mechanism. The transmitting end can learn the transmission report fed back by the receiving end according to these mechanisms, thereby evaluating the network transmission status, and then selecting the fault-tolerant elastic coding scheme, that is, determining the FEC Type. Field and FEC Subtype fields.
H.264码流是基于 NALU进行传送或存储, NALU由 NAL头信息 和 NAL载荷组成。在 H.264的 NALU中, 不同 NALU类型对解码恢复 图像的影响不同。例如, N I取 0表示 NALU中存放非参考图象的一个 Slice或 Slice数据条带, 不会影响后续解码; 而 NRI取非 0表明 NALU 中存放一个序列 /图像参数集或者是参考图像的一个 Slice或 Slice数据 条带, 会严重影响后续解码。  The H.264 code stream is transmitted or stored based on the NALU, which consists of NAL header information and NAL payload. In the NALU of H.264, different NALU types have different effects on decoding and restoring images. For example, if NI takes 0, it means that a Slice or Slice data strip of a non-reference image in the NALU does not affect subsequent decoding; and NRI takes a non-zero to indicate that a sequence/image parameter set or a slice of the reference image is stored in the NALU. Or slice data strips, which will seriously affect subsequent decoding.
因此, 在对 H.264 的码流进行数据包保护时, 可以根据 NRI 或 Nal_unit_type的取值将 H.264的数据分为两类: 一类为相对重要的图像 数据 (例如 Nal_ref_idc 等于 1 ); 另一类为次要的图像数据 (例如 Nal— ref— idc等于 0 )。 然后, 对重要的图像数据使用冗余度较大、 抗丢 包能力强的 FECI码进行保护;而次要的图像数据可以使用冗余度较小、 抗丢包能力较弱的 FEC2码进行保护。 Therefore, when packet protection is performed on the code stream of H.264, the data of H.264 can be classified into two types according to the values of NRI or Nal_unit_type: One type is a relatively important image. The data (for example, Nal_ref_idc is equal to 1); the other is secondary image data (for example, Nal_ref_idc is equal to 0). Then, the important image data is protected by the FECI code with high redundancy and strong anti-dropping capability; and the secondary image data can be protected by the FEC2 code with less redundancy and weaker packet loss resistance. .
通过这种不等保护算法,保证了各类重要信息在高数据包丢失环境 下的正确恢复 , 而对 FEC2码仍然未能恢复的图像信息采用错误掩盖和 防止误码扩散等技术。 FEC1,FEC2这里只是一般的表示方法,表示任意 两种子类型。 这两种子类型可以属于同一大类型, 也可以属于不同大类 型。  Through this unequal protection algorithm, the correct recovery of all kinds of important information in the case of high packet loss environment is ensured, and the image information that the FEC2 code still fails to recover adopts techniques such as error concealment and error diffusion prevention. FEC1, FEC2 are just general representations, representing any two subtypes. These two seed types can belong to the same large type or to different major types.
很显然, 上述方法可以推广到更加一般的情形, 把数据按照 NAL— unit-type 的取值分成更多类, 比如五类: 最重要数据、 次重要数 据、 一般重要数据、 较不重要数据、 最不重要数据; 也可以分成 7类或 者更多, 那么, 可以用相同数量的 FEC子类型来保护, 每类数据对应 一种不同的子类型。 只要保护能力从弱到强就可以了, 这些子类型不一 定属于同一个大类型。 而对保护能力最强的 FEC码保护后仍然未能恢 复的图像信息采用错误掩盖和防止误码扩散等技术。  Obviously, the above method can be extended to a more general case, and the data is divided into more classes according to the value of NAL_unit-type, for example, five categories: the most important data, the second most important data, the general important data, the less important data, The least important data; can also be divided into 7 categories or more, then, can be protected with the same number of FEC subtypes, each type of data corresponds to a different subtype. As long as the protection ability is weak to strong, these subtypes do not necessarily belong to the same large type. The image information that has not been restored after the protection of the FEC code with the strongest protection ability adopts techniques such as error concealment and prevention of error diffusion.
不等保护的另外一种情况也在本发明范围内,就是可以根据网络实 时状况选择不同保护能力的的 FEC。 然后通过 ERRTP的头信息来通知 通信的双方, 使得它们能够正确对数据进行解码和恢复丢失的数据。 可 以把网络当前受到影响传送性能下降的情况分成几个级别。 比如五级: 最严重、 次严重、 一般严重、 较不严重、 最不严重; 也可以分成 7级或 者更多, 那么, 可以用相同数量的 FEC子类型来保护, 每级对应一种 不同的子类型。 只要保护能力从弱到强就可以了, 这些子类型不一定属 于同一个大类型。 而对保护能力最强的 FEC码保护后仍然未能恢复的 图像信息采用错误掩盖和防止误码扩散等技术。感知网络状况可以通过 现有的各种 QoS监测方法实现。  Another situation in which unequal protection is also within the scope of the present invention is the ability to select FECs of different protection capabilities depending on the real-time conditions of the network. The ERRTP header information is then used to inform both parties of the communication so that they can correctly decode the data and recover the lost data. It is possible to divide the current situation in which the network is affected by the transmission performance degradation into several levels. For example, five levels: the most serious, the second most serious, the more serious, the less serious, the least serious; can also be divided into 7 or more, then, you can use the same number of FEC subtypes to protect, each level corresponds to a different Subtype. As long as the protection ability is weak to strong, these subtypes do not necessarily belong to the same large type. The image information that has not been recovered after the protection of the FEC code with the strongest protection is error masking and error-preventing techniques are adopted. Perceived network conditions can be achieved through various existing QoS monitoring methods.
更为复杂的应用方案也在本发明范围内, 如果总共有 T种 FEC方 案 (不同类型 /子类型)可以使用 (通信双方终端都支持)。 决定采用哪种 FEC, 要同时取决于数据重要性和网络的状况。 那么可以采用一个二维 LUT的方法, 如表 3所示: More complex applications are also within the scope of the invention, if a total of T FEC schemes (different types/subtypes) are available (both terminals are supported by both parties). Deciding which FEC to use depends on both the importance of the data and the state of the network. Then you can use a two-dimensional The LUT method, as shown in Table 3:
表 3 多种 FEC机制混合和交替使用的二维 LUT  Table 3 Two-dimensional LUTs mixed and alternately used in various FEC mechanisms
Figure imgf000034_0001
Figure imgf000034_0001
以上表格中, 数据重要性级别和网络状况级别都按照升序排列。 其 中 FEC的下标用二维下标表示, 表中的容错弹性机制 FEC(i ), 0<i < U,0<j < V, 可以是上述 T个 FEC方案中的任意一种。  In the above table, the data importance level and the network status level are in ascending order. The subscript of FEC is represented by a two-dimensional subscript, and the fault-tolerant elastic mechanism FEC(i), 0<i < U, 0<j < V, in the table may be any one of the above T FEC schemes.
需要提及的是, 上述发明的实施例描述中均以 FEC纠删码特别是 Tornado码为例, 但对于其他类似的容错弹性机制特别是除 Tornado码 以外的 FEC编码方案都可以适用 , 并不影响本发明的实质和范围。  It should be mentioned that the description of the embodiments of the above invention is based on the FEC erasure code, especially the Tornado code, but can be applied to other similar fault-tolerant elastic mechanisms, especially the FEC coding scheme except the Tornado code. The spirit and scope of the invention are affected.
而在本发明的另夕 1、一个实施例中, 专门采用了一种改进的 Tornado 纠删码, 这种改进的 Tornado纠删码对于一组数据节点仅生成一层所述 校验节点, 可以大大减少编码延时, 满足实时通信的需求。  In another embodiment of the present invention, an improved Tornado erasure code is specifically employed. The improved Tornado erasure code generates only one layer of the check node for a group of data nodes. The coding delay is greatly reduced to meet the requirements of real-time communication.
在实时视频通信中, 使用 FEC码数据包保护会引入时延, 时延的 大小与图像数据数据包的大小相关。 将 S个 NALU分为一組, 其中一 个 NALU包含一个 Slice的码流数据。 如果一帧图像划分为一个 Slice, 则编码端就会有 S帧的时延, 同样解码端也会有 S帧的时延。 NALU与 数据节点个数的关系如下式所示:  In real-time video communication, the use of FEC code packet protection introduces a delay, the size of which is related to the size of the image data packet. The S NALUs are grouped into one group, and one NALU contains a stream data of a Slice. If a frame of image is divided into a slice, the encoding end will have the delay of the S frame, and the decoding end will also have the delay of the S frame. The relationship between NALU and the number of data nodes is as follows:
s  s
^NalSize; = PackSize x DataNode  ^NalSize; = PackSize x DataNode
ί=0  =0=0
式中 S个 NALU长度值相加等于数据节点个数乘上每个节点数据 包的大小。 由式 (1)可以看出当 S取值受限时, PackSize xDataNode的取值 也会受限, 另外由于 IP网络传送的有效性导致 PackSize取值不能太小, 因此 DataNode的取值受限。 IP网络上实时视频通信中, 一帧图像的延时7^计算如下: 该式中 Tn是加入 FEC保护后引入的时延, τ^和 τ" 分别是 Η.264 编解码器处理时延和网絡传送时延。 由于数字信号处理技术和 IP 网络 的迅速发展, 可以假定 和 都能够满足实时性要求: The sum of the S NALU length values in the equation is equal to the number of data nodes multiplied by the size of each node packet. It can be seen from equation (1) that when the value of S is limited, the value of PackSize xDataNode is also limited. In addition, the value of PackSize cannot be too small due to the validity of IP network transmission, so the value of DataNode is limited. IP network real-time video communications, a delayed image 7 ^ is calculated as follows: where T n is the time delay introduced by the addition of FEC protection, and [tau] ^ τ "are Η.264 codec processing delay And network transmission delay. Due to the rapid development of digital signal processing technology and IP networks, it can be assumed that both can meet the real-time requirements:
^aukc Tlh , Tlmm <= T\h , 其中 Til, - Flarsel ^aukc T lh , T lmm <= T\ h , where Til, - F larsel
式中 是解码目标帧率(可取值 10Hz, 30Hz等), 且设一帧图 像划分为一个 Slice, 这时式 (2)可改为: Where is the decoding target frame rate (available values of 10Hz , 30Hz, etc.), and the image of one frame is divided into a slice, then the formula (2) can be changed to:
TllMl <= S * T,h + 2 * Tlh = (S + 2r Tlh 由上两式可知, 一帧图像的延时 T""w的延时基本由 S的取值确定, 而 DataNode又大大影响 S的取值。 因此, 要在能够保证视频通信抗数据 包丢失能力的前提下, 尽量减少 FEC 引入的时延, 进一步保证实时视 频通信的 QoS。 T llMl <= S * T, h + 2 * T lh = (S + 2r T lh is known from the above two equations. The delay of one frame of image T ""w is basically determined by the value of S, and the DataNode It also greatly affects the value of S. Therefore, under the premise of ensuring the ability of video communication to resist packet loss, the delay introduced by FEC is minimized, and the QoS of real-time video communication is further ensured.
本发明在 DataNode受限的情况下, 釆用改进的 Tornado码保护算 法。 该改进的 Tornado方法, 不釆用多级偶图的编码方式, 而是只使用 一层校验节点的编码方式。 与原来的 Tornado编码方式相比, 改进后的 编码方法大大提高了算法的灵活性,数据节点和校验节点的个数可以任 意设置, 也降低了编解码算法的复杂度, 可用于实时视频通信的抗数据 包丟失。 另外, 在数据节点受限的情况下, 改进 Tornado码的抗数据包 丢失性能基本没有下降。 该改进的 Tornado编码方法具体原理及详细步 骤, 在后文将详细阐述。  The present invention employs an improved Tornado code protection algorithm in the case where the DataNode is limited. The improved Tornado method does not use the encoding of multi-level even graphs, but only uses the encoding of a layer of check nodes. Compared with the original Tornado coding method, the improved coding method greatly improves the flexibility of the algorithm. The number of data nodes and check nodes can be arbitrarily set, and the complexity of the codec algorithm is also reduced. It can be used for real-time video communication. Anti-packet loss. In addition, the improved anti-data packet loss performance of the Tornado code is basically not reduced when the data node is limited. The specific principles and detailed steps of the improved Tornado coding method will be described in detail later.
【本发明的第三实施例】  [Third embodiment of the present invention]
注意到上例 ER TP对于 NALU的封装中并没有提到 NALU的信息 头怎么处理, 因此, 在第二实施例的基础上, ERRTP将同类 NALU— 起处理并将头信息综合到 ERRTP头信息中。 与 RTP最基本的不同点在 于, 在 ERRTP封装过程中, 将具有相同头信息的 NALU包的头信息综 合入 ERRTP的头信息中。  Note that the above example ER TP does not mention how to handle the information header of the NALU in the encapsulation of the NALU. Therefore, based on the second embodiment, ERRTP processes the same type of NALU and integrates the header information into the ERRTP header information. . The most basic difference from RTP is that in the ERRTP encapsulation process, the header information of the NALU packet with the same header information is integrated into the header information of ERRTP.
前面已经提到过 NALU头信息结构, 这里再次强调一下, NALU 信息依次包含: W 占 1比特的 F字段, 用于指示所述 NALU是否出错; The NALU header information structure has already been mentioned. Here again, the NALU information includes: W occupies a 1-bit F field, which is used to indicate whether the NALU is in error;
占 2比特的 N I字段, 用于指示所述 NALU的重要性;  a 2-bit N I field indicating the importance of the NALU;
占 5比特的 Type字段, 用于指示所述 NALU的类型。  A 5-bit Type field indicating the type of the NALU.
收发双方的执行步骤如下所述。 发送端按 ERRTP封装格式将头信 息相同的多个 NALU数据节点或者校验节点封装在同一个 ERRTP 包 中。 才 据实际工程经验, 在一般情况下, 因为 H.264比特流总是存在相 邻的部分其对应的 NALU类型相同这个属性, 这个支设总是可以满足 的。 即使在某些情况下无法满足, 也可以有几种对策可以处理这样的情 况: 第一种可以将现同类型的 NALU 累积, 直到满足一定的数目后在 封装到 ERRTP中, 另一种如果相同类型的 NALU的数目达不到一定的 数目的话, 采用 RTP填充的方法, 虽然浪費一点带宽, 但这微不足道, 还有一种方法是如果类型不同的 NALU非常多,则可以釆用 RTP封装, 反正在接收端能够根据 ERRTP标识来识別, 进行对应的处理。  The execution steps of both the sender and the receiver are as follows. The sender encapsulates multiple NALU data nodes or check nodes with the same header information in the same ERRTP packet in the ERRTP encapsulation format. According to the actual engineering experience, in general, because the H.264 bitstream always has an adjacent part whose corresponding NALU type is the same attribute, this support can always be satisfied. Even if it can't be satisfied in some cases, there are several countermeasures that can handle such a situation: The first one can accumulate the same type of NALU until it is packaged into ERRTP after satisfying a certain number, and the other is the same. If the number of types of NALUs does not reach a certain number, the method of RTP padding is a waste of bandwidth, but this is insignificant. Another method is that if there are many NALUs of different types, you can use RTP encapsulation, anyway. The receiving end can identify according to the ERRTP identifier and perform corresponding processing.
上面提到的在所述 ERRTP封装格式中, 将其所承载的 NALU所具 有的相同头信息综合在该 ERRTP包的头信息中, 并将所承载的 NALU 去掉其头信息再按照前面提到的流程处理, 进行划分、 编码和封装, 填 充入该 ERRTP包的净荷中。 那么如何将 NALU头综合到 ERRTP头中 呢? 下面将具体给出两套方案以解决这个几个问题。  As mentioned above, in the ERRTP encapsulation format, the same header information of the NALU carried by the NALU is integrated into the header information of the ERRTP packet, and the carried NALU is removed from the header information and then according to the aforementioned Process processing, partitioning, encoding, and encapsulation are populated into the payload of the ERRTP packet. So how do you integrate the NALU header into the ERRTP header? Two sets of solutions will be specifically given below to solve these several problems.
在 ERRTP封装格式中, NALU头信息中的 N I字段和 Type字段 填充在 ERRTP包头信息的 PT字段中, 前面已经叙述, 该 PT字段位于 ERRTP包头信息的第 2个字节的后 Ί比特。 在图 7中已经给出这样一 个 ERRTP头的格式, 其中与 RTP不同的地方已经用粗体部分表示, 另 外图中有些地方在后面还会解释。  In the ERRTP encapsulation format, the N I field and the Type field in the NALU header information are filled in the PT field of the ERRTP header information, which has been described above, and the PT field is located after the second byte of the ERRTP header information. The format of such an ERRTP header has been given in Figure 7, where the difference from RTP has been indicated in bold, and some places in the other figures are explained later.
另外将 ERRTP包头中的 V字段作为 ERRTP标识, 前面已提到; NALU头信息中的 F字段填充在 ERRTP包头信息的 M字段中, 该 M 字段位于 ERRTP包头信息的第 2个字节的前 1比特, 在接收端则根据 ERRTP包的 M字段判断其所承载的 NALU是否出错,也就实现了 F字 段的禁止比特功能。 可见该方案通过版本的区别, 可以告诉 RTP数据 包的接收方, 该 RTP协议是 ERRTP, 从而在后面的处理, 就要按照针 对 ERRTP协议的处理流程进行。 In addition, the V field in the ERRTP header is used as the ERRTP identifier, which has been mentioned above; the F field in the NALU header information is filled in the M field of the ERRTP header information, and the M field is located in the first byte of the second byte of the ERRTP header information. Bit, at the receiving end, according to the M field of the ERRTP packet, it is judged whether the NALU carried by the ERRTP packet is in error, and the forbidden bit function of the F field is realized. It can be seen that the scheme can tell the receiver of the RTP data packet through the difference of the version. The RTP protocol is ERRTP, so in the subsequent processing, it is necessary to follow the needle. The processing flow of the ERRTP protocol is performed.
在该方案中, 将 NALU头信息字节 (8个比特)替换原 RTP头信 息中的标识 M字段 1个比特和 PT字段 7个比特共 8个比特。 具体的 替换顺序比如可以是这样:  In this scheme, the NALU header information byte (8 bits) is replaced by the identifier M field 1 bit in the original RTP header information and the PT field 7 bits in total 8 bits. The specific replacement order can be like this:
F比特替换 M比特;  F bits replace M bits;
NRI 2个比特替换 PT 7个比特中的最高 2个比特;  NRI 2 bits replace the highest 2 bits of the PT 7 bits;
Type 5个比特替换 PT 7个比特中的最 4氐 5个比特。  Type 5 bits replaces the most 4 氐 5 bits of the PT 7 bits.
图 7中, ER TP包头中 FEC Type FEC Subtype Packet Number等编 码相关信息, 用于标识所采用的编码方式、 所述多媒体数据包。 以使接 收端根据该编码相关信息恢复或部分恢复所述多媒体数据。  In FIG. 7, the code-related information such as the FEC Type FEC Subtype Packet Number in the ER TP header is used to identify the coding mode used and the multimedia data packet. The receiving end restores or partially restores the multimedia data according to the encoding related information.
实际上, 这样的替换方案是有其合理性的。 PT 7个比特本来就是可 以自由使用的, 前面已经提到。 M字段的用途在 RTP(RFC 3550)中规定 如下: 某种具体的层面(Profile )可以规定不使用 M比特, 而是把它并 A PT, 这样 PT最多可以有 8个比特, 区别 256种不同的类型。 因此, 用 F比特替换 M比特完全是符合 RTP规定的, 不会引起 ERRTP和传 统 RTP之间互通的问题。  In fact, such an alternative is justified. The PT 7 bits are inherently free to use, as mentioned earlier. The purpose of the M field is specified in RTP (RFC 3550) as follows: A specific profile (Profile) can specify not to use M bits, but to put it with A PT, so that the PT can have up to 8 bits, which distinguishes 256 different types. type. Therefore, replacing M bits with F bits is completely RTP-compliant and does not cause interworking between ERRTP and traditional RTP.
容易看出本发明 ERRTP的封装格式具有明显的三个优点: 第一, 额外开销少 , 尤其是一个 RTP中有多个 NALU时, 明显节省传送比特 数; 第二, 不用对 R P德包中的 H.264 NALU数据解码就可以判别 这些 NALU的相对重要性;第三,不用对 RTP数据包中的 H.264 NALU 数据解码就可识别由于其它的比特丟失而是否会造成该 RTP 包能否正 确解码。  It can be easily seen that the package format of the ERRTP of the present invention has three obvious advantages: First, the overhead is small, especially when there are multiple NALUs in one RTP, the number of transmitted bits is obviously saved; Second, it is not necessary in the RP package. H.264 NALU data decoding can discriminate the relative importance of these NALUs. Third, without decoding the H.264 NALU data in the RTP packets, it can be identified whether the RTP packets will be correct due to other bit loss. decoding.
为了进一步详细描述本发明的技术细节, 下面给出一个 ERRTP封 装和去封装的过程描述。 在进行上述处理后, 在同一个 ERRTP数据包 中的多个 H.264 NALU类型完全相同, 即它们的头信息字节都相同, 那 么在他们划分、 编码、 封装到 ERRTP数据包中的时候, 可以剥离掉原 来的头信息字节, 这样如果有 N个 NALU, 可以减少 N个字节。 去封 装时, 就是把 NALU从 ERRTP数据包中提取解码、 重新划分还原为原 来的形式,即将这 N个 NALU从他们所在的 ERRTP数据包中提取解码 出来, 然后把 ERRTP头信息中的 PT的 7个比特拷贝到一个字节 H(8 比特)中的最低 7个比特中去, 而 H的最高比特作为 F比特, 设置为 0。 然后把生成的 H字节附加到每个提取出来的 NALU的最前面, 这样就 还原了每个 NALU。 当然如果说 ERRTP包头中的 F字段为 1的话, 说 明该 ERRTP包中的 NALU出错, 因此, 直接丟弃即可, 也节省的处理 时间。 In order to describe the technical details of the present invention in further detail, a description of the process of ERRTP encapsulation and decapsulation is given below. After performing the above processing, multiple H.264 NALU types in the same ERRTP packet are identical, that is, their header information bytes are the same, then when they are divided, encoded, and encapsulated into the ERRTP packet, The original header bytes can be stripped off, so that if there are N NALUs, N bytes can be reduced. When decapsulating, the NALU is extracted and decoded from the ERRTP packet and re-partitioned into the original form, that is, the N NALUs are extracted and decoded from the ERRTP packet in which they are located. Then, the 7 bits of the PT in the ERRTP header information are copied to the lowest 7 bits of a byte H (8 bits), and the highest bit of H is set to 0 as the F bit. The generated H bytes are then appended to the top of each extracted NALU, thus restoring each NALU. Of course, if the F field in the ERRTP header is 1, it indicates that the NALU in the ERRTP packet is in error, so it can be directly discarded, and the processing time saved.
下面给出第二种解决方案, 该方案与第一个有一点是相同的, 即也 是将 NALU头中的 NRI和 Type字段填充到 ERRTP头的 PT字段的 7 个比特中。 不同的地方有两点: 釆用 M字段标识 ERRTP, 这样带来的 一个问题就是 F字段没有地方填充了,该实施例中将 F是否置位的两类 NALU分别对待,对于 F置位的出错 NALU还是采用原先的 RTP传送, 而对于正常的则釆用 ERRTP传,但忽略该 F比特。具体细节如下所述。  The second solution is given below, which is the same as the first one, that is, the NRI and Type fields in the NALU header are also filled into the 7 bits of the PT field of the ERRTP header. There are two points in different places: 标识Use the M field to identify ERRTP. One problem that comes with this is that the F field has no place to fill. In this embodiment, the two types of NALUs that set F are treated separately, and the error for F is set. NALU still uses the original RTP transmission, and for normal, it uses ERRTP to transmit, but ignores the F bit. The specific details are as follows.
将 M字段取值为 1来标识 ERRTP包, 该 M字段位于所述 ERRTP 包头信息的第 2个字节的前 1比特。 而对于 F比特, 在 H.264协议中规 定: 如果有语法冲突或者错误, 则为 1。 当网络识别此单元中存在比特 错误时, 可将其设为 1 , 以便接收方丟掉该单元。 主要用于适应不同种 类的网络环境, 比如有线无线相结合的环境。 具体的使用原则是: 一般 情况下通信的发送端和接收端在对于视频进行 H.264 编码和解码的时 候, 不对于该比特进行 " 写,, 操作, 解码端对于该比特进行 "读" 操 作。 如果发现 F=l , 则接收端在解码过程中将丟弃该 NALU。 根据目前 的业界普遍应用情况来看, 对于 F比特进行 "写"操作, 主要是在两种 不同网络之间的网关上进行, 比如进行编码转换的情况( MPEG- 4 到 H.264, H.263到 H.264等)。  The M field is set to 1 to identify the ERRTP packet, which is located in the first 1 byte of the 2nd byte of the ERRTP header information. For F bits, it is specified in the H.264 protocol: 1 if there is a syntax conflict or an error. When the network recognizes a bit error in this unit, it can be set to 1 so that the receiver drops the unit. It is mainly used to adapt to different kinds of network environments, such as wired and wireless combined environments. The specific usage principle is: Generally, when the transmitting end and the receiving end of the communication perform H.264 encoding and decoding on the video, the bit is not "written, operated, and the decoding end performs a "read" operation on the bit. If F=l is found, the receiving end will discard the NALU during the decoding process. According to the current general application of the industry, the "write" operation for the F bit is mainly a gateway between two different networks. On the above, such as the case of encoding conversion (MPEG-4 to H.264, H.263 to H.264, etc.).
因此, 本发明将 F比特忽略, 不用与原来 H.264定义的目的。 从而 使得原先用于填充 F比特的 M字段可以保留, 用于未来的扩展携带更 多信息, 这里就是用于标识 ERRTP包。 这样做的好处是, 不需要对于 版本信息 V = 2进行修改, ERRTP还是用原来版本 V取值 2。 这也是节 约了目前仅有的 RTP版本信息资源。  Therefore, the present invention ignores the F bits and does not have to be defined with the original H.264. Thus, the M field originally used to fill the F bits can be reserved, and the future extension carries more information, which is used to identify the ERRTP packet. The advantage of this is that you do not need to modify the version information V = 2, and ERRTP still uses the original version V to take the value 2. This also saves the only RTP version information resources currently available.
然而, 在实际应用中可能出现需要使用 F比特的小概率情况, 比如 NALU语法错的时候, 本发明对于这种情况做如下处理: 在 ERETP封 装格式中, 忽略所述 NALU头信息中的 F字段; 但在发送端, 对于 F 字段有效的出错 NALU, 仍旧采用 RTP包封装, 仅对正常的 NALU采 用 ERRTP包装; 在接收端则判断该包为 ERRTP还是 RTP包后按相应 封装格式处理该包。 也就是说, 当 F比特在某些特殊情况下, 要用于原 来 H.264定义的目的, 即要用于表示可能存在的 H.264 NALU语法错误 的情况,如果一个中间设备比如网关在对于视频按照 H.264协议进行视 频编码的时候,发现某个 NALU存在语法错误,那么就要对于该 NALU 单独进行封装处理。 However, in practical applications, there may be a small probability that F bits need to be used, such as When the NALU syntax is wrong, the present invention performs the following processing for this case: In the ERETP encapsulation format, the F field in the NALU header information is ignored; but on the transmitting end, the error NALU valid for the F field still uses the RTP packet. Encapsulation, only the normal NALU is encapsulated in ERRTP; at the receiving end, it is judged whether the packet is ERRTP or RTP packet and the packet is processed according to the corresponding encapsulation format. That is, when the F bit is used in some special cases, it is used for the purpose of the original H.264 definition, that is, to indicate the possible H.264 NALU syntax error, if an intermediate device such as a gateway is in the When the video is video-encoded according to the H.264 protocol, it is found that a certain NALU has a syntax error, and then the NALU is separately encapsulated.
归纳上述 ERRTP和 RTP交替处理的方法流程如下:  The method of summarizing the above ERRTP and RTP alternate processing is as follows:
发送端首先判断至少一个 NALU的头信息中的 F字段是否有效, 据此将其分为正常 NALU和出错 NALU;  The sender first determines whether the F field in the header information of at least one NALU is valid, and accordingly divides it into a normal NALU and an error NALU;
然后按 ERRTP封装格式将正常 NALU封装成 ERRTP 包, 并设 ERRTP标识; 按 RTP封装格式将出错 NALU封装成 RTP包;  Then, according to the ERRTP encapsulation format, the normal NALU is encapsulated into an ERRTP packet, and the ERRTP identifier is set; the error NALU is encapsulated into an RTP packet according to the RTP encapsulation format;
接收端首先判断接收到的包的头信息是否设 ERRTP标识, 将其分 为 ERRTP包和 RTP包;  The receiving end first determines whether the header information of the received packet is set to the ERRTP identifier, and divides it into an ERRTP packet and an RTP packet;
然后根据 ERRTP封装格式处理 ERRTP包, 根据 RTP包封装格式 处理 RTP包。  The ERRTP packet is then processed according to the ERRTP encapsulation format, and the RTP packet is processed according to the RTP packet encapsulation format.
可见, 网关对于正常的 NALU, 按照前面描述的方法, 对于类型相 同的 H.264 NALU按照一定的规则(由具体应用决定, 主要规定每个 ERRTP数据包中封装多少个同类的 NALU)进行 ERRTP封装,一旦发现 某个 NALU存在语法错误,那么就要对于该 NALU采用常规 RTP封装。 这个时候常规的 RTP数据包中也许就只含有一个 H.264 NALU。  It can be seen that the gateway for the normal NALU, according to the method described above, for the same type of H.264 NALU according to certain rules (determined by the specific application, mainly stipulates how many similar NALUs are encapsulated in each ERRTP packet) for ERRTP encapsulation Once a syntax error is found in a NALU, a regular RTP encapsulation is required for the NALU. At this time, the regular RTP packet may contain only one H.264 NALU.
最后还需要说明的一点是,注意到前文提到的表 1中给出的 NALU 的类型及其对应 Type字段的取值, 可以发现现有的类型不足 16种, 也 就是说 Type的 5个比特完全可以缩减为 4个, 这不影响现有的 H.264 传送, 因此在 ERRTP封装格式中, 当 NALU的所有类型少于 16种时, 仅用 Type字段的低 4比特表征,而 Type的最高比特作为扩展保留比特, 称作 C字段。将该 C比特留待以后使用, 继续进行功能扩展。将比特 C 进行保留后, 表 1中给出的 NALU类型要做相应修改: 共 16个值, 取 值 0-12与表 1相同, 取值 13-15为保留。 Finally, it should be noted that, taking note of the type of NALU given in Table 1 mentioned above and the value of its corresponding Type field, it can be found that there are less than 16 types of existing types, that is, 5 bits of Type. It can be reduced to 4, which does not affect the existing H.264 transmission. Therefore, in the ERRTP encapsulation format, when all types of NALU are less than 16 types, only the lower 4 bits of the Type field are used, and the Type is the highest. The bit is an extended reserved bit, called the C field. Leave the C bit for later use and continue with the function expansion. Bit C After the reservation, the NALU types given in Table 1 should be modified accordingly: A total of 16 values, the values 0-12 are the same as Table 1, and the values 13-15 are reserved.
当然虽然目前 H.264的 NALU类型只有 13种, 但是 H.264后续会 发展, 可能会产生更多的 NALU类型, 如果未来 NALU类型增加到 16 种以上, 那么还是需要用 PT 7个比特中的最低 4个比特加上 C比特作 为类型指示。  Of course, although there are only 13 NALU types of H.264, H.264 will be developed later, and more NALU types may be generated. If the number of NALUs is increased to more than 16 in the future, it is still necessary to use PT 7 bits. The lowest 4 bits plus C bits are used as type indications.
需要提及的是这里将 NALU头信息综合到 ERRTP包头信息中的最 大好处也就是, 多媒体传送设备可以根据 ERRTP头信息直接获知其所 承载的 NALU的相关信息, 并据此实施 H.264多媒体数据实时传送的 QoS策略。 这一点在现有的 RTP是无法实现的, 因为对于 RTP层来说, NALU层信息是不关心的, 也就无法获知净荷中的每个 NALU的头信 息的, 从而无法实现 QoS策略。  It should be mentioned that the biggest advantage of integrating the NALU header information into the ERRTP header information is that the multimedia transmission device can directly learn the relevant information of the NALU carried by the multimedia transmission device according to the ERRTP header information, and implement H.264 multimedia data accordingly. Real-time delivery of QoS policies. This is not possible in the existing RTP, because for the RTP layer, the NALU layer information is not concerned, and the head information of each NALU in the payload cannot be known, so that the QoS policy cannot be implemented.
在 ERRTP的基础之上,为了实现接收端的反馈,釆用 SEI承载 QoS 报告的增强技术, 从前文描述可见, RTCP承担了 QoS报告机制, 但它 其实是一种通用的报告方法, 可以用于报告 QoS, 也可以用于报告其它 信息。对于特定的视频通信应用 ,用 RTCP来报告却不一定是最合适的。 在某些时候, 如果 QoS信息的发送方和接收方都能使用更高层的协议 比如 H.264来通信, 则完全可以考虑用 H.264来承载报告的内容。 本发 明就是基于这个出发点, 直接采用 H.264来承载 QoS报告信息, 可以 避免使用额外的信道, 实现了一种 "带内" 报告机制。  On the basis of ERRTP, in order to achieve feedback from the receiving end, the SEI carries the enhanced technology of QoS reporting. As can be seen from the foregoing description, RTCP assumes the QoS reporting mechanism, but it is actually a general reporting method that can be used for reporting. QoS can also be used to report other information. For specific video communication applications, reporting with RTCP is not necessarily the most appropriate. At some point, if both the sender and the receiver of the QoS information can communicate using a higher layer protocol such as H.264, then H.264 can be considered to carry the reported content. Based on this starting point, the present invention directly uses H.264 to carry QoS report information, which avoids the use of additional channels and implements an "in-band" reporting mechanism.
由 H.264高层协议来传送 QoS报告的另一个依据是, 在目前的视 频通信应用中, 对于网络传送的适应措施, 主要基于终端来实现, 而不 是网络中间设备比如路由器,交换机或者网关来实现。 因此 QoS报告的 封装提取并不依赖于底层协议,只需终端能够理解提取 H.264中承载的 QoS报告信息就能实现 QoS监测,因此可以不依赖于底层的 RTCP等协 议。 当然,通过采用 H.264的 "带内"报告机制, 并不意味着排斥 RTCP 报告机制的应用, 两种机制可以选择使用, 也可以共存, H.264的使用 反而能够降低 RTCP的报告流量。 另外, 如果采用 H.264 "带内,, 报告 方式, 则 H.264 的数据包可以采取多种保护措施, 并且对于承载 QoS 报告的 H.264 数据包, 可以认为是重要的数据, 根据不等保护 (UEP, Unequal Protection)的原则, 可以对其采用高强度的保护措施。从而可以 保证报告数据的正确到达, 提高 QoS监测的可靠性。 Another basis for transmitting QoS reports by H.264 higher layer protocols is that in current video communication applications, the adaptation measures for network transmission are mainly based on terminals, rather than network intermediate devices such as routers, switches or gateways. . Therefore, the encapsulation of the QoS report does not depend on the underlying protocol. The terminal can understand the QoS report information carried in the H.264 to implement QoS monitoring, so it can be independent of the underlying RTCP and other protocols. Of course, by adopting the "in-band" reporting mechanism of H.264, it does not mean to exclude the application of the RTCP reporting mechanism. The two mechanisms can be used or coexisted, and the use of H.264 can reduce the reporting traffic of RTCP. In addition, if H.264 "in-band, reporting mode is adopted, H.264 packets can take multiple protection measures and bear the QoS. The reported H.264 packet, which can be considered as important data, can be protected against high-intensity according to the principle of Unequal Protection (UEP). Thereby, the correct arrival of the report data can be ensured, and the reliability of the QoS monitoring is improved.
【本发明的第四实施例】  [Fourth Embodiment of the Invention]
在第三实施例的基础上, 基于 H.264 的扩展消息机制来承载 QoS 报告的, 大致分为以下三个基本步骤  Based on the third embodiment, the H.264-based extended message mechanism to carry QoS reports is roughly divided into the following three basic steps.
首先, 各个多媒体通信终端统计生成 H.264多媒体通信的 QoS报 告, 这些报告的内容可以与 RTCP的 SR、 RR报告内容相同, 当然也可 以不同,但是所描述的有关 H.264媒体通信的服务质量及网络状态等信 息是一致的;  First, each multimedia communication terminal statistically generates a QoS report of H.264 multimedia communication. The content of these reports may be the same as the SR and RR report contents of the RTCP, and may of course be different, but the described quality of service related to H.264 media communication. And information such as network status is consistent;
然后, 终端用 H.264扩展消息承载这些 QoS报告, 发给其他通信 终端, H.264扩展消息机制前面已提及, 典型的有 SEI等, 本发明所采 用的基本上就是 SEI消息, 当然随着以后 H.264的扩展也可以使用其它 扩展消息 载;  Then, the terminal carries the QoS report by using the H.264 extended message and sends it to other communication terminals. The H.264 extended message mechanism has been mentioned above. Typically, there is an SEI, etc., and the SEI message is basically used by the present invention. Later extensions of H.264 can also use other extended message payloads;
在发送 QoS 告的同时终端也接收到其它终端发来的 QoS报告, 事实上每个终端都将根据这些 QoS报告执行 QoS策略。  The terminal also receives the QoS report sent by other terminals while sending the QoS advertisement. In fact, each terminal will execute the QoS policy according to these QoS reports.
本发明以 SEI消息承载 QoS报告的 , 以现有的 RTCP的 QoS报告 为例, 可以直接将 RTCP的 SR、 RR报告的主要内容, 作为 H.264 SEI 消息的载荷, 从而用扩展 SEI消息来承载这些信息。  The present invention uses the SEI message to carry the QoS report. Taking the existing RTCP QoS report as an example, the main content of the SR and RR reports of the RTCP can be directly used as the payload of the H.264 SEI message, and thus carried by the extended SEI message. these messages.
基于这种思想, 在本发明的第四实施例中, 定义具体的 SEI扩展消 息专门用于承载 QoS报告。 H.264规定, SEI信息存放在一类 NALU中, 如前所述其 Type = 6。 本发明在 SEI域中存放类似 RTCP的 SR和 RR 报告消息, 既保证了传送效率, 又能有效地反馈信道状态及解码信息, 便于编码端和解码端交互式抗数据包丢失。 具体结构如图 8所示, 其中 除了头信息按照 SEI消息结构来安排以外, 其它 QoS报告内容都借鉴 RTCP的 SR、 RR报告的格式。  Based on this idea, in the fourth embodiment of the present invention, a specific SEI extended message is defined specifically for carrying QoS reports. H.264 specifies that SEI information is stored in a class of NALUs, as described above, with Type = 6. The invention stores the SR and RR report messages similar to RTCP in the SEI domain, which not only ensures the transmission efficiency, but also effectively feeds back the channel state and the decoded information, and facilitates the interactive anti-data packet loss between the encoding end and the decoding end. The specific structure is shown in Figure 8, except that the header information is arranged according to the SEI message structure, and other QoS report contents are drawn from the format of the SR and RR reports of RTCP.
用于承载 QoS报告的 SEI消息的头信息包含以下字段:  The header information of the SEI message used to carry the QoS report contains the following fields:
第 1个字节(字节 0 )为载荷类型字段(SEI Type ), 用于指示载荷 为对应 QoS报告, 本实施例中, SEI Type = 200表示存放在 SEI域中的 是类似 RTCP中的发送报告 ( SR ), 而 SEI Type=201表示其为接收报告 (RR); The first byte (byte 0) is a payload type field (SEI Type), which is used to indicate that the payload is a corresponding QoS report. In this embodiment, SEI Type = 200 indicates that the SEI type is stored in the SEI domain. Is similar to the transmission report (SR) in RTCP, and SEI Type=201 indicates that it is the reception report (RR);
第 2、 3个字节(字节 1、 2 )为载荷长度字段(SEI Packet-Length ), 用于指示对应 QoS报告长度, 这个长度与 RTCP的 QoS报告中的长度 字段采用相同的定义;  The second and third bytes (bytes 1, 2) are the payload length field (SEI Packet-Length), which is used to indicate the corresponding QoS report length, which is the same as the length field in the RTCP QoS report;
第 4个字节及以后为 SEI消息的载荷, 也即用于填充对应 QoS报 告。  The 4th byte and later are the payload of the SEI message, that is, used to fill the corresponding QoS report.
QoS报告也分为发送方报告和接收方报告, 由载荷类型字段指示区 分, 即 SEI Type取值不同, QoS报告的具体内容可以与 RTCP的 SR、 RR报告相同, 比如图 2中所示:  The QoS report is also divided into the sender report and the receiver report. The load type field indicates the difference, that is, the SEI Type value is different. The specific content of the QoS report can be the same as the RTCP SR and RR reports, as shown in Figure 2:
版本信息字段( V ), 占 2比特, 本例取值为二进制 11即 V = 3 , 表 示与以前版本的区别; ,  The version information field (V), which is 2 bits, is in the form of a binary 11 or V = 3, indicating the difference from the previous version;
填充字段(P ), 占 1比特, 用于指示是否有填充内容, 与 RTCP相 同;  The padding field (P), which is 1 bit, is used to indicate whether there is padding content, the same as RTCP;
接收报告数字段(RC ), 占 5比特, 用于指示该 QoS报告中所报告 接收报告块数目;  Receive report number field (RC), which is 5 bits, used to indicate the number of received report blocks reported in the QoS report;
发送方 SSRC字段,. 占 32比特, 用于标识该服务质量报告的发送 方;  The sender SSRC field, which is 32 bits, is used to identify the sender of the quality of service report;
对于发送方报告, 这里还包含发送方信息块, 用于描述该报告的发 送方的相关信息;  For the sender report, there is also a sender information block for describing the information about the sender of the report;
之后包含多块接收报告块, 用于描述来自不同源的多媒体统计信 息, 每块包含源的标识符和多媒体流的相关统计指标, 前面 RTCP中已 经描述了各种指标的意义;  Then, a plurality of receiving report blocks are included for describing multimedia statistical information from different sources, each block containing the identifier of the source and related statistical indicators of the multimedia stream, and the meanings of various indicators have been described in the previous RTCP;
最后包含特定层面扩展, 用于特定层面的保留功能扩展。  Finally, a specific level of extension is used to extend the reserved functionality at a particular level.
可见, 图 8中给出的 QoS报告内容与 RTCP基本相同。 RTCP的基 本内容 RR和 SR写入 SEI域后,可以不需要专门的逻辑信道传递 RTCP 信息, 节省了部分带宽开销。 事实上, 本发明的精髓在于用 SEI消息进 行带内承载, 至于 QoS报告的如何统计生成, 只要能实现 QoS监测的 发明目的, 都不影响本发明的实质和范围。 在实现 QoS报告之后, 即可在此基础上进行多种 QoS策略, 比如 利用 RTCP的累计数据包丢失字段, 它们在双向视频通信(终端既有编 码器又有解码器) 中可用于反馈解码信息, 便于交互式抗数据包丢失。 It can be seen that the content of the QoS report given in Figure 8 is basically the same as that of RTCP. After the basic content of the RTCP, the RR and the SR are written into the SEI domain, the RTCP information can be transmitted without a dedicated logical channel, which saves part of the bandwidth overhead. In fact, the essence of the present invention lies in the in-band bearer with the SEI message. As for the statistical generation of the QoS report, as long as the invention of the QoS monitoring can be achieved, the essence and scope of the present invention are not affected. After implementing the QoS report, various QoS policies can be performed on this basis, for example, using the accumulated packet loss field of RTCP, which can be used for feedback decoding information in two-way video communication (the terminal has both an encoder and a decoder). For easy interactive anti-data packet loss.
另外, 在 QoS 4艮告中有到达时延抖动和发送方字节计数等字段, 它们都可用于感知网络状态。 其中, 速率控制算法可根据到达时延抖动 字段中的信息, 进一步保证编码端速率接近恒定; 发送方字节计数字段 可以估算载荷的平均速率 ,便于发送端根据网絡状态重新设定编码器参 数, 包括调整目标帧率、 恢复图像质量和原始图像的分辨率等等。  In addition, in the QoS 4 report, there are fields such as arrival delay jitter and sender byte count, which can be used to sense the network status. The rate control algorithm can further ensure that the encoding end rate is nearly constant according to the information in the arrival delay jitter field; the sender byte count field can estimate the average rate of the payload, so that the sending end can reset the encoder parameters according to the network state. This includes adjusting the target frame rate, restoring the image quality, and the resolution of the original image.
为了改进 RTCP传送的可靠性不足, 在采用 H.264 "带内" 报告方 式后, H.264的数据包可以采取多种保护措施, 并且对于承载 QoS报告 的 H.264数据包, 可以认为是重要的数据, 根据不等保护的原则, 可以 对其采用高强度的保护措施。 从而可以保证报告数据的正确到达。 比如 用于承载 QoS报告的 SEI应该进一步由 NALU承载,而如前所述 NALU 是有一个头信息可以设置该内容的重要程度的, 因此通信终端可以根据 QoS报告传送的可靠性要求来设置该 NALU的 nal— ref_idc字段, 可以 设为 1,2,3等, 在容错弹性编码中即会根据这一字段的等级不同而采取 不同强度的保护措施。  In order to improve the reliability of RTCP transmission, H.264 data packets can adopt various protection measures after adopting H.264 "in-band" reporting mode, and can be considered as H.264 data packets carrying QoS reports. Important data, according to the principle of unequal protection, can be applied with high-intensity protection measures. This ensures the correct arrival of the report data. For example, the SEI for carrying the QoS report should be further carried by the NALU, and as described above, the NALU has a header information to set the importance of the content, so the communication terminal can set the NALU according to the reliability requirement of the QoS report transmission. The nal-ref_idc field can be set to 1, 2, 3, etc. In the fault-tolerant elastic coding, different strength protection measures are taken according to the level of this field.
通信终端还可以根据当前网络状态和高层应用需求来动态调整基 于 SEI消息的 QoS 艮告的发送周期。 缺省情况下, 将 RTCP信息写入 SEI域的时间间隔(即报告周期)与 RFC3550中建议 RTCP传送间隔一致。 当然, 根据特定应用的需要(特定的保护方法等), 可能报告周期不一 定和 RFC 3550规定的完全一样, 而是可以调整。 报告周期根据特定应 用的需要确定。比如,报告数据的一个重要用途是动态估计网络的性能: 丢包率,延迟,抖动等。 如果需要频繁检测这些数据, 则报告周期要短, 否则报告周期可以长。 在网络状况良好的时候, 可以停止报告。 另夕 |、, 用 SEI消息不仅可以传送 H.264视频的 QoS报告, 还可以混合承载多 种媒体流的 QoS报告, 只需在 QoS报告后面加入各种媒体流相应的接 收报告块即可。 比如音频流等, 只要在 SR报告中增加其源的 SSRC具 体的报告块内容。 前面也提到, 除了采用 SEI进行带内监测之后, 通信 终端还可以选择现有的 RTCP传送, 也可以同时使用 H.264扩展消息、 RTCP中的一种或两种来传送 7|载 QoS报告。 The communication terminal can also dynamically adjust the transmission period of the QoS advertisement based on the SEI message according to the current network state and the high-level application requirement. By default, the interval for writing RTCP information to the SEI domain (that is, the reporting period) is the same as the recommended RTCP transmission interval in RFC3550. Of course, depending on the needs of the particular application (specific protection methods, etc.), the possible reporting period may not be exactly the same as that specified in RFC 3550, but may be adjusted. The reporting period is determined by the needs of the specific application. For example, an important use of reporting data is to dynamically estimate network performance: packet loss rate, latency, jitter, and more. If these data need to be detected frequently, the reporting period should be short, otherwise the reporting period can be long. When the network is in good condition, you can stop reporting. In addition, the SEI message can not only transmit the QoS report of the H.264 video, but also mix the QoS reports carrying the multiple media streams, and only need to add the corresponding receiving report blocks of the various media streams after the QoS report. For example, audio stream, etc., as long as the source of the SSRC specific report block content is added to the SR report. As mentioned earlier, in addition to the SEI for in-band monitoring, communication The terminal may also select an existing RTCP transmission, or may simultaneously transmit one or both of the H.264 extended message and the RTCP.
在给出了采用 SEI实现从接收端反馈网络状况相关的 QoS报告之 后, 在此基础上就容易实现自适应的保护策略调整, 包括多级保护和不 等保护。  After the SEI is implemented to feedback the QoS report related to the network status from the receiving end, it is easy to implement adaptive protection policy adjustment, including multi-level protection and unequal protection.
【本发明的第五实施例】  [Fifth Embodiment of the Invention]
根据现有技术对于网络通信状况无法自适应调整的问题,本发明给 出一种统计当前通信状况并自适应调整保护策略的自适应保护的视频 传送方法。 首先按照保护方法性能影响, 给出不同参数配置, 设置保护 能力不同的多等级保护策略,用于在不同通信状况下被选用于进行高效 可靠的保护; 其次, 在接收端才艮据通信情况统计网络状况、 通信质量, 并将其发回 发送端; 最后由发送端根据发回的通信质量统计信息进行 调整, 选择最合适的保护策略等级。  According to the prior art, for the problem that the network communication condition cannot be adaptively adjusted, the present invention provides a video transmission method for estimating the current communication status and adaptively adjusting the adaptive protection of the protection policy. Firstly, according to the performance impact of the protection method, different parameter configurations are given, and a multi-level protection strategy with different protection capabilities is set, which is selected for efficient and reliable protection under different communication conditions. Secondly, according to the communication statistics at the receiving end The network status and communication quality are sent back to the sender; finally, the sender adjusts according to the returned communication quality statistics to select the most appropriate protection policy level.
该方案的关键还在于统计通信质量的方法及发回统计信息的渠道。 利用 H.264 NALU的序号丢失情况可以统计丢包率及其位置等信息,并 通过定义 NALU中净荷部分的扩展 SEI消息结构, 用于承载该统计信 息, 从接收端传送统计数据到发送端。 这样的反馈机制虽然与 QoS报 告的 SR/RR格式不尽相同, 但熟悉本领域的技术人员可以理解, 两种 方式的根本原理是相同的, 只是用 SEI承载的内容不同, 因此下面的描 述不再专门提出 SEI承载网络丟包率的方案与 QoS报告的方案的区另1 J。 The key to the program is also the method of statistical communication quality and the channel for sending back statistical information. The information of the packet loss rate and its location can be counted by using the sequence number loss of the H.264 NALU, and the extended SEI message structure of the payload part of the NALU is defined to carry the statistical information, and the statistical data is transmitted from the receiving end to the transmitting end. . Although the feedback mechanism is different from the SR/RR format of the QoS report, those skilled in the art can understand that the fundamental principles of the two methods are the same, but the content carried by the SEI is different, so the following description does not. then the QoS reporting scheme proposed specifically SEI bearer network packet loss ratio of the area of the other embodiment 1 J.
以 Tornado纠删码为例, 即才艮据前述 Tornado纠删码的编码解码方 法来对视频流数据进行保护。 Tornado纠删码需要设定参数有'. 数据节 点数目、 校验节点数目、 递缩比率、 校验节点层数、 用于计算校验节点 的各级二部图。 在视频流通信过程中, 发送端将视频流数据分割为数据 节点, 然后按照 Tornado编码方法产生校验节点, 一起发送给接收端; 接收端则按照 Tornado解码方法进行纠错, 获得视频流数据。  Taking the Tornado erasure code as an example, the video stream data is protected according to the encoding and decoding method of the aforementioned Tornado erasure code. Tornado erasure codes need to set parameters such as '. number of data nodes, number of check nodes, scaling ratio, number of check node layers, and two levels of graphs used to calculate check nodes. In the process of video stream communication, the transmitting end divides the video stream data into data nodes, and then generates a check node according to the Tornado encoding method, and sends it to the receiving end together; the receiving end performs error correction according to the Tornado decoding method to obtain video stream data.
由于实际 IP 网络带宽等因素是经常变化而不稳定的, 因此固定的 保护策略将带来低效率或者高误码率等问题, 因此本实施例预先设定了 保护力度不同等级的保护策略系列,分别用于在不同通信质量等级情况 下保护视频流数据。 可见, 不同等級的保护策略可以适应网络通信质量 的变化, 不但能够满足信道劣化情况下的保护力度要求, 而且能够在信 号改善情况下适当调低保护力度, 以减少系统开销, 节约处理、 带宽资 源。 Since the actual IP network bandwidth and other factors are constantly changing and unstable, the fixed protection policy will bring problems such as low efficiency or high bit error rate. Therefore, this embodiment pre-sets a protection strategy series with different levels of protection strength. Used separately for different communication quality levels Protect video stream data. It can be seen that different levels of protection policies can adapt to changes in network communication quality, not only can meet the protection requirements of channel degradation, but also can appropriately reduce the protection strength in the case of signal improvement, thereby reducing system overhead and saving processing and bandwidth resources. .
为了给定不同等级保护策略,需要设定不同参数的 Tornado糾删码。 根据前述影响 Tornado纠删码保护性能的参数主要有数据节点数目、 校 验节点数目及二部图两侧节点度向量的随机分布, 为筒单起见, 不同能 力的 Tornado码, 一般不会有统一的二部图的, 采用不同的数据节点数 目和校验节点数目来给出不同保护力度的 Tornado纠删码保护策略。 根 据 Tornado纠删码原理, 不同数据节点数目和校验节点数目即能确定不 同码率或冗余率的 Tornado纠删码, 从而给出不同的保护力度和系统开 销。  In order to give different levels of protection strategy, it is necessary to set Tornado erasure codes with different parameters. According to the foregoing parameters affecting the protection performance of Tornado erasure code, there are mainly the number of data nodes, the number of check nodes and the random distribution of the node degrees on both sides of the bipartite graph. For the sake of the single, the Tornado codes of different abilities are generally not unified. In the bipartite graph, the Tornado erasure code protection strategy with different protection strengths is given by using different number of data nodes and number of check nodes. According to the Tornado erasure code principle, the number of different data nodes and the number of check nodes can determine the Tornado erasure codes of different code rates or redundancy rates, thus giving different protection strengths and system overhead.
接收端接收数据并进行 Tornado纠删码解码得到视频流数据, 同时 根据数据丢失情况进行统计, 得到统计信息表征通信质量。  The receiving end receives the data and performs Tornado erasure code decoding to obtain the video stream data, and performs statistics according to the data loss situation, and obtains statistical information to represent the communication quality.
发送端需要根据通信质量状况来进行保护策略调整, 因此需要对传 送情况进行统计,接收端根据 H.264视频流程数据的 NALU的序列号来 统计传送情况。 在基于 H.264双向视频通信中, 通信系统的各个终端都 既有编码器、 又有解码器。 而 NALU是序列编号的, 即所有发送端发 送出去的 NALU 具有统一的序列编号, 因此, 接收端可以根据收到 NALU的序号, 判断是否有 NALU丟失。 如果有 NALU序号不连续就 说明存在 NALU丟失, 中断的 NALU序号就是丟失 NALU的序号, 其 个数就是丢失的 NALU数目。 经过一段时间的累计, 即可计算得到该 段时间内丟失的 NALU的总数目 , 再对该时间段内所有 NALU数目进 行归一化, 即可得到累计丢包率( ALSR, Accumulated Lost Slice Rate )。 当然,接收端也可以将丢包信息直接发回给发送端,由发送端进行统计。 采用 NALU序号来进行统计, 不但能保证统计信息精确无误, 而且直 接利用现有数据信息, 不需要额外的承载开销。  The sender needs to adjust the protection policy according to the communication quality. Therefore, the transmission needs to be counted. The receiver collects the transmission according to the sequence number of the NALU of the H.264 video process data. In H.264-based two-way video communication, each terminal of the communication system has both an encoder and a decoder. The NALU is sequence numbered, that is, the NALU sent by all the senders has a uniform sequence number. Therefore, the receiver can determine whether there is a NALU loss according to the sequence number of the received NALU. If the NALU sequence number is discontinuous, it indicates that there is a NALU loss. The interrupted NALU sequence number is the sequence number of the lost NALU, and the number is the number of lost NALUs. After a period of time accumulation, the total number of lost NALUs in the period can be calculated, and then the number of all NALUs in the time period is normalized to obtain the cumulative loss rate (ALSR). . Of course, the receiving end can also send the packet loss information directly to the sending end, and the sending end performs statistics. Using the NALU sequence number for statistics not only ensures that the statistics are accurate, but also directly uses the existing data information without additional bearer overhead.
接收端将统计信息以及其他数据丢失信息通过扩展 SEI消息发回发 送端。 在接收端统计得到关于传送情况的统计信息后, 需要发回给发送 端, 本实施例定义了扩展 SEI消息结构, 专门用于承载从接收端发回的 传送情况统计信息。 接收端在完成统计后, 将该信息写入专门定义的扩 展 SEI消息体中, 然后写入该终端发回的编码码流的 SEI域中, 发回发 送端。 发送端收到该 SEI消息后, 即可直接得知统计信息, 或者统计得 到 ALSR, 从而建立发送端对于网络丟包率的真实感知机制。 The receiving end sends the statistical information and other data loss information back to the sending end through the extended SEI message. After collecting statistics on the transmission status at the receiving end, it needs to be sent back to the sending. In this embodiment, the extended SEI message structure is specifically configured to carry the transmission status statistics sent back from the receiving end. After completing the statistics, the receiving end writes the information into the specifically defined extended SEI message body, and then writes it into the SEI field of the encoded code stream sent back by the terminal, and sends it back to the transmitting end. After receiving the SEI message, the sender can directly learn the statistics or obtain the ALSR, so as to establish a true perception mechanism of the packet loss rate of the network.
如前所述 SEI消息也由 H.264码流的基本单位 NALU所承载,每个 SEI域包含一个或多个 SEI消息, 而 SEI消息又由 SEI头信息和 SEI有 效载荷组成。 SEI头信息包括两个码字: 载荷类型和载荷大小。 其中载 荷类型的长度不一定, 比如类型在 0到 255之间时用一个字节表示, 当 类型在 256到 511之间.时用两个字节 OxFFOO到 OxFFFE表示, 依次类 推, 这样用户可以自定义任意多种载荷类型。 在现有 H.264标准中, 类 型 0到类型 18标准中已定义为特定的信息, 如缓存周期、 图像定时等。 由此可见 H.264中定义的 SEI域可根据需求存放足够多的用户自定义信 息。 在本发明的第一实施例中, 在预留的 SEI载荷类型中定义一种用于 承载统计信息的扩展 SEI消息。  As mentioned above, the SEI message is also carried by the basic unit NALU of the H.264 code stream. Each SEI field contains one or more SEI messages, and the SEI message is composed of SEI header information and SEI payload. The SEI header information includes two codewords: payload type and payload size. The length of the payload type is not necessarily the same. For example, when the type is between 0 and 255, it is represented by one byte. When the type is between 256 and 511, it is represented by two bytes OxFFOO to OxFFFE, and so on. Define any number of load types. In the existing H.264 standard, the type 0 to type 18 standards have been defined as specific information such as buffer period, image timing, and the like. It can be seen that the SEI domain defined in H.264 can store enough user-defined information according to requirements. In a first embodiment of the invention, an extended SEI message for carrying statistical information is defined in the reserved SEI payload type.
最后发送端根据发回的统计信息进行 Tornado纠删码的调整, 使用 更加合适当前传送情况的保护策略。最后发送端将要根据统计信息来调 整保护策略, 即选择合适等级的保护策略。 这里发送端还要预先设定对 应于不同保护等级的判断阔值系列, 设定进入各个级别的阈值, 然后根 据 ALSR所落在阈值选择其相应的等级。 由此建立的传送情况的统计、 反馈、 调整机制能够准确、 及时地适应网络传送需求, 提高保护能力。  Finally, the sender adjusts the Tornado erasure code according to the statistics sent back, and uses a protection strategy that is more suitable for the current transmission situation. Finally, the sender will adjust the protection policy according to the statistical information, that is, select the appropriate level of protection strategy. Here, the transmitting end also presets a judgment threshold series corresponding to different protection levels, sets a threshold for entering each level, and then selects its corresponding level according to the threshold at which the ALSR falls. The statistics, feedback, and adjustment mechanisms of the transmission conditions thus established can accurately and timely adapt to network transmission requirements and improve protection capabilities.
对不同重要性的数据采用了不同的保护策略系列。考虑到关键数据 和非关键数据的保护力度要求不同, 为了进一步提高适应度, 设定了两 个不同的保护策略系列, 分别用于保护关键数据和非关键数据。 这样, 两种不同通信需求的数据, 即可独立处理, 按适合各自需求的保护力度 选择保护策略, 提高系统效率。  Different protection strategy series are used for data of different importance. Considering the different protection requirements for critical and non-critical data, in order to further improve the fitness, two different protection strategy series were set up to protect critical and non-critical data. In this way, the data of two different communication requirements can be processed independently, and the protection strategy is selected according to the protection strength suitable for each requirement, thereby improving system efficiency.
例如, 用不同等级的 Tornado码作为保护方案系列, 其保护能力等 级用参数 n、 1表征, 其中 n表示数据节点数, 1表示校验节点数。 用 TN(n+l,n)表示由参数 n、 1确定的 Tornado码保护方案。 因此对应于关键 数据 的保护 方案 系 列 为 : TNK(n。+lQ,n。), ,For example, different levels of Tornado code are used as the protection scheme series, and the protection capability level is characterized by parameters n, 1, where n represents the number of data nodes and 1 represents the number of check nodes. The Tornado code protection scheme determined by the parameters n, 1 is represented by TN(n+l,n). So corresponding to the key The data protection scheme series is: TN K (n. +l Q , n.), ,
TNK(nL-I+lL-l5nL-1); 同样的对于非关键数据的保护方案系列为:TN K (n LI +l L-l5 n L-1 ); The same series of protection schemes for non-critical data are:
ΤΝΝΚ0+100)5 ΤΝΝκίη,+Ι,,η , , TNNK(nL-1+lL-1,nL-1)。 设定阔值系列ΤΝ ΝΚ0 +1 0 , η 0 ) 5 ΤΝΝ κίη, + Ι, η , , TN NK (n L-1 + l L-1 , n L-1 ). Set the threshold series
0<G1;G2,…… ,GL-1<1 , 即用于判断选择保护等级。 发送端在调整保护策 略时, 根据 ALSR和阈值 GhG^ ...... ,GL-1的关系, 进行如下操作: 0<G 1; G 2 , . . . , G L-1 <1 , which is used to judge the selection of the protection level. When the sender adjusts the protection policy, according to the relationship between the ALSR and the thresholds GhG^..., G L-1 , the following operations are performed:
如果 0<A1SR< G1 , 则采用 TNK(nQ+lo,nQ)对于关键数据进行保护, 采用 TNNK (nQ+l。,nQ)对于非关键数据进行保护; If 0<A1SR< G1, TN K (n Q +lo, n Q ) is used to protect the key data, and TN NK (n Q +l., n Q ) is used to protect the non-critical data;
如果 Gi<AlSR< Gi+1, i-1,2, ..... ,L-2, 则采用 对于关键数 据进行保护, 采用 ΤΝ Η^)对于非关键数据进行保护; If Gi<AlSR< G i+1 , i-1, 2, ....., L-2, the key data is protected, and 非 Η^) is used to protect non-critical data;
如果 GL-1<A1SR<1 , 则采用 TN^nw+lw,!^)对于关键数据进行保 护, 采用 TN I )对于非关键数据进行保护。 If G L-1 <A1SR<1 , TN^nw+lw, !^) is used to protect key data, and TN I is used to protect non-critical data.
此外, 发送端还才艮据接收端发回的丢失数据信息, 重新发送这些信 息。 接收端在统计丟失的 NALU信息时, 同时获得丟失的 NALU所对 应包含的图像帧的定位信息, 该信息包含所在帧的序号及帧中的位置。 接收端将定位信息发回给发送端, 发送端即可定位到对应的视频流数 据, 并重新发送。 在实时视频通信中, 延时太长的视频流数据已经失去 了价值, 但在某些业务需求情况下或者某种机制下, 具有一定延时的数 据仍然具有价值, 比如在緩冲范围较大的视频通信中, 只要延时的视频 流数据仍然落在缓冲区内, 这些数据就可以用于避免视频流播放的中 断。可见重发机制对于提高视频流通信的可靠性和服务质量具有重要价 值的。  In addition, the sender resends the information according to the lost data information sent back by the receiver. When the receiving end counts the lost NALU information, it obtains the positioning information of the image frame corresponding to the lost NALU, and the information includes the sequence number of the frame and the position in the frame. The receiving end sends the positioning information back to the sending end, and the sending end can locate the corresponding video stream data and resend it. In real-time video communication, video stream data with too long delay has lost value, but in some business situations or under certain mechanisms, data with a certain delay still has value, such as a large buffer range. In video communication, as long as the delayed video stream data still falls in the buffer, the data can be used to avoid interruption of the video stream playback. It can be seen that the retransmission mechanism has important value for improving the reliability and quality of service of video stream communication.
【本发明的第六实施例】  [Sixth embodiment of the present invention]
除了采用容错弹性保护策略以外, 在第五实施例的基础上, 从错误 掩盖和误码扩散消除两个方面出发,结合接收端错误掩盖策略和发送端 的误码扩散消除策略, 以实现既能尽量减少误码带来的视频质量损失又 能避免误码引起扩散的目的。 对于错误掩盖, 釆用筒单替代方案即可达 到以尽量低的复杂度实现补偿误码损失的效果; 对于误码扩散消除, 通 过 H.264已有通道建立误码信息反馈机制, 根据反馈实施帧内编码, 以 达到扩散消除效果, 且不增加额外网絡负担, 确保视频码流对误码问题 的鲁棒性, 也因此避免因错误掩盖引起的误码扩散。 In addition to adopting the fault-tolerant elastic protection strategy, based on the fifth embodiment, from the two aspects of error concealment and error-distribution elimination, combined with the error concealment strategy at the receiving end and the error diffusion elimination strategy at the transmitting end, Reducing the video quality loss caused by the error can prevent the bit error from spreading. For the error concealment, the effect of compensating the error loss with the lowest complexity can be achieved by using the single-sheet alternative; for the error diffusion elimination, the error information feedback mechanism is established through the existing channel of H.264, and the feedback is implemented according to the feedback. Intra-frame coding to achieve diffusion cancellation without adding additional network load, ensuring video bitstream error The robustness also avoids bit error spread caused by false masking.
该方案的基本思路是, 在接收端通过对 NALU序号的统计, 发现 丢失数据信息, 如 Slice的位置等, 一方面采用高效算法对丟失数据进 行简单替代以掩盖误码损失, 另一方面将误码信息反馈给发送端。 通过 The basic idea of the scheme is to find the missing data information, such as the location of the slice, by using the statistics of the NALU serial number at the receiving end. On the one hand, an efficient algorithm is used to simply replace the lost data to cover the error loss, and on the other hand, it will be wrong. The code information is fed back to the sender. by
H.264的扩展 SEI消息, 建立从接收端到发送端的误码信息反馈通道。 发送端获知误码信息后, 立即采取分段逐次进行帧内编码的策略, 将误 码 Slice分段刷新, 以防止误码扩散。 The extended SEI message of H.264 establishes a bit error information feedback channel from the receiving end to the transmitting end. After the sender knows the error information, it immediately adopts the strategy of intra-frame coding successively, and segments the error slice to prevent the error from spreading.
H.264视频通信过程中, 发送端对待发送视频流数据进行编码, 得 到视频码流, 然后封装 NALU并通过分组报文传送给接收端。 接收端 接收报文并进行解码, 此时接收端需要判断视频流数据是否有丟失, 以 进行后续的误码消除操作。 误码消除流程大致分为掩盖、 反馈、 扩散消 除三个大步骤。  In the H.264 video communication process, the transmitting end encodes the video stream data to be encoded, obtains a video stream, and then encapsulates the NALU and transmits the packet to the receiving end through the packet message. The receiving end receives the message and decodes it. At this time, the receiving end needs to determine whether the video stream data is lost, so as to perform subsequent error elimination operations. The error elimination process is roughly divided into three major steps: masking, feedback, and diffusion elimination.
首先, 接收端根据 NALU序号中断情况来判断是否丢失数据, 并 统计丢失数据的信息, 即误码信息。 如前所述, NALU是 H.264视频流 数据传送的基本单位, 每个 NALU都有唯一连续的序号。 因此, 接收 端根据接收到 NALU序号是否有中断, 获知哪些 NALU丢失了。 从而 可以实施针对丟失数据的错误掩盖策略。 采用 NALU序号来进行统计, 不但能保证统计信息精确无误, 而且直接利用现有数据信息, 不需要额 外的承载开销。  First, the receiving end judges whether data is lost according to the NALU sequence interruption condition, and counts the information of the lost data, that is, the error information. As mentioned earlier, NALU is the basic unit of H.264 video stream data transfer, and each NALU has a unique serial number. Therefore, the receiving end knows which NALUs are lost according to whether the NALU sequence number is interrupted. It is thus possible to implement an error concealment strategy for lost data. The NALU serial number is used for statistics, which not only ensures the accuracy of the statistical information, but also directly uses the existing data information, and does not require additional bearer overhead.
首先, 接收端通过识别接收到的 NALU头信息获知序号, 由序号 的不连续检测误码发生, 通过前面 NALU得知中间缺失的 NALU应该 承载的视频数据, 对误码引起的数据丟失进行定位, 比如丟失 NALU 的前一个 NALU承载的是第 N帧的第 1个 Slice, 则按传送顺序可以推 断丢失 NALU所承载的 Slice的位置, 应该是本帧的后一个 Slice。  First, the receiving end learns the sequence number by identifying the received NALU header information, and the discontinuous detection error occurs by the sequence number. The previous NALU knows the video data that the missing NALU should carry, and locates the data loss caused by the error code. For example, if the previous NALU of the lost NALU bears the first slice of the Nth frame, the position of the slice carried by the lost NALU may be inferred in the order of transmission, which should be the latter slice of the current frame.
接着, 接收端需要进行视频信息的重同步, 由于 H.264视频码流连 续传送过程中, 接收端与数据流需要同步, 然后才能正确接收, 一旦数 据流有中断之后, 接收端需要重新进行同步, 通过找到中断处之后的下 一个 NALU头信息来完成解码器的重同步。 这一过程, 接收端也需要 之后, 接收端需要进行错误掩盖, 丟失数据的 NALU被整个丢弃, 因此该 NALU所承载整个 Slice丢失,错误掩盖策略就是通过筒单替代, 用时间域或者空间域相邻的数据代替丢失的数据, 比如釆用丟失数据所 在帧的前一帧对应位置的 Slice恢复图像数据进行掩盖。 Then, the receiving end needs to re-synchronize the video information. Because the H.264 video code stream is continuously transmitted, the receiving end and the data stream need to be synchronized, and then can be correctly received. Once the data stream is interrupted, the receiving end needs to re-synchronize. The resynchronization of the decoder is accomplished by finding the next NALU header information after the interruption. This process, the receiver also needs After that, the receiving end needs to perform error concealment, and the lost NALU is discarded. Therefore, the entire slice carried by the NALU is lost. The error concealment strategy is to replace the lost data with data adjacent to the time domain or the spatial domain. For example, the slice recovery image data corresponding to the position of the previous frame of the frame in which the data is lost is masked.
接收端在获得误码信息后, 将其反馈给发送端。 反馈误码信息需要 一条反馈通道, 为了减少网络负担、 简化实现机制, 本发明的第一实施 例中采用现有的 H.264通信机制, 定义扩展 SEI消息, 用于承载误码信 息建立反馈, 以便发送端结合误码信息防止误码扩散。 事实上, 结合误 码信息反馈机制和发送端的误码扩散消除策略, 才能避免因前面接收端 实施的错误掩盖策略导致的误码扩散。  After receiving the error information, the receiving end feeds it back to the transmitting end. The feedback error information needs a feedback channel. In order to reduce the network burden and simplify the implementation mechanism, the first embodiment of the present invention uses an existing H.264 communication mechanism to define an extended SEI message for carrying the error information to establish feedback. So that the sender combines the error information to prevent the error from spreading. In fact, combined with the error information feedback mechanism and the error diffusion elimination strategy at the transmitting end, the error spread caused by the error concealment strategy implemented by the previous receiving end can be avoided.
在前面的实施例中,利用 H.264的扩展 SEI消息提供一种从接收端 到发送端的信息反馈机制, 使得发送端可以及时了解哪些 NALU丢失 了, 这样可以及时进行有效的误码扩散消除, 防止因这些丢失的数据引 起以后的误码扩散。  In the foregoing embodiment, the extended SEI message of the H.264 is used to provide an information feedback mechanism from the receiving end to the transmitting end, so that the sending end can know which NALUs are lost in time, so that effective error spreading can be eliminated in time. Prevent future error spread due to these lost data.
在 H.264体系内部建立信息反馈机制的好处在于节约网络带宽开 销、 节省系统处理资源、 且不影响互通性。 下面介绍如何定义扩展 SEI 消息。如前所述 SEI消息也由 H.264码流的基本单位 NALU所承载,每 个 SEI域包含一个或多个 SEI消息, 而 SEI消息又由 SEI头信息和 SEI 有效载荷组成。 SEI头信息包括两个码字: 载荷类型和载荷大小。 其中 载荷类型的长度不一定, 比如类型在 0到 255之间时用一个字节表示, 当类型在 256到 511之间时用两个字节 OxFFOO到 OxFFFE表示, 依次 类推, 这样用户可以自定义任意多种载荷类型。 在现有 H.264标准中, 类型 0到类型 18标准中已定义为特定的信息, 如緩存周期、 图像定时 等。 由此可见 H.264中定义的 SEI域可才艮据需求存放足够多的用户自定 义信息  The advantage of establishing an information feedback mechanism within the H.264 system is to save network bandwidth overhead, save system processing resources, and not affect interoperability. Here's how to define an extended SEI message. As mentioned above, the SEI message is also carried by the basic unit NALU of the H.264 code stream. Each SEI field contains one or more SEI messages, and the SEI message is composed of the SEI header information and the SEI payload. The SEI header information includes two codewords: payload type and payload size. The length of the payload type is not necessarily the same. For example, the type is represented by one byte between 0 and 255. When the type is between 256 and 511, it is represented by two bytes OxFFOO to OxFFFE, and so on, so that the user can customize Any of a variety of load types. In the existing H.264 standard, the type 0 to type 18 standards have been defined as specific information such as buffer period, image timing, and the like. It can be seen that the SEI domain defined in H.264 can store enough user-defined information according to requirements.
然后, 发送端即开始根据反馈的误码信息进行误码扩散消除。 联合 误码信息的误码扩散消除方法,其效果要比现有的无反馈的误码扩散消 除好。 利用误码信息, 比如丟失 Slice的位置, 发送端可以有目的的针 对所丢失 Slice采取防止措施, 比如在以后的编码中避免以丟失 Slice作 为参考帧, 这样可以尽量缩短接收端解码时对该 Slice的依赖。 Then, the transmitting end starts to perform error diffusion elimination according to the error information of the feedback. The error diffusion elimination method of joint error information is better than the existing error-free diffusion elimination without feedback. Using error information, such as the location of the lost slice, the sender can purposely take precautions against the lost slice, such as avoiding losing the slice in later encoding. For the reference frame, this can minimize the dependence of the receiver on the slice when decoding.
由于 H.264编码是基于 Slice的, 即前后帧的同一 Slice的数据是具 有参考关联的,后续帧的同一 Slice数据是通过前面帧的 Slice预测编码 的, 因此误码扩散也将限定在同一 Slice内部。 本发明的第二实施例中, 采用分段逐次进行帧内编码的策略, 即在发送误码之后, 对以后帧的该 Slice区域分段分割为新的 Slice,比如划分处 P个宏块作为一个新 Slice, 然后对其采用帧内编码, 以消除该 Slice对前面丢失的 Slice的参考或依 赖。 由于 H.264视频实时传送系统为了保证传送质量, 采用数据率控制 方案来限制每帧数据的波动, 使得每帧数据量均衡, 提高视频传送的稳 定性。因此,在每帧中一次进行帧内编码的数据量即宏块数目不能太多, 否则将会超过 H.264数据率控制范围。  Since the H.264 encoding is based on Slice, that is, the data of the same slice of the preceding and succeeding frames is associated with the reference, and the same slice data of the subsequent frame is encoded by the slice prediction of the previous frame, the error diffusion is also limited to the same slice. internal. In the second embodiment of the present invention, a strategy of intra-frame coding is performed in stages, that is, after the error is transmitted, the slice region of the subsequent frame is segmented into new slices, for example, P macroblocks are divided. A new slice is then intra-coded to eliminate the reference or dependency of the slice on the previously lost slice. In order to ensure the transmission quality, the H.264 video real-time transmission system uses a data rate control scheme to limit the fluctuation of each frame of data, so that the amount of data per frame is equalized, and the stability of video transmission is improved. Therefore, the amount of data that is intra-coded once in each frame, that is, the number of macroblocks, cannot be too much, otherwise it will exceed the H.264 data rate control range.
图 9示出了分段逐次帧内编码的误码扩散消除的原理。 当接收端出 现无法恢复的丟包错误后, 检测并反馈误码信息给发送端, 即丟失数据 的 Slice所在帧及帧内定位信息通过扩展的 SEI消息发回给发送端。 发 送端从 SEI消息中提取丟失的 slice定位信息, 比如图 9中的每帧划分 为三个 Slice, 即 Slice#0、 Slice#l、 Slice#2, 而第 n帧的 Slice#l在传送 中丢失, 之后需要执行分段逐次帧内编码。  Figure 9 shows the principle of error spread elimination for segmented successive intra coding. When the receiving end fails to recover the packet loss error, the error information is detected and fed back to the transmitting end, that is, the frame where the slice of the lost data and the intra-frame positioning information are sent back to the transmitting end through the extended SEI message. The sender extracts the missing slice location information from the SEI message. For example, each frame in FIG. 9 is divided into three slices, namely, Slice#0, Slice#1, Slice#2, and the slice #1 of the nth frame is in the transmission. Lost, then segmented successive intraframe coding is required.
首先, 在第 n帧中, 编码端对 Slice#l按宏块扫描顺序, 从起始位 置开始分割 P个宏块组成新的 Slice#3 , 剩余宏块仍然为 Slice#l, 此时 有四个 Slice , 其中对新的 Slice#3进行帧内编码。  First, in the nth frame, the encoding end divides P macroblocks into a new Slice#3 from the starting position in the macroblock scanning order, and the remaining macroblocks are still Slice#l, and there are four Slice, where the new Slice#3 is intra-coded.
接着, 在第 n + 1帧中, 上一步中分割新组成的 Slice#3在帧内编码 之后, 作为 Slice#3发送出去, 而其他 Slice仍然按照常规编码。  Next, in the n + 1 frame, Slice #3, which is divided into new components in the previous step, is intra-coded and then transmitted as Slice #3, and the other slices are still encoded as usual.
此后, 需要判断 Slice#l中是否还剩余宏块, 如果还有没有分割的, 则返回第一步在下一帧中继续将 Slice#l剩余宏块分段组成新的帧, 实 施帧内编码并发送, 直到所有宏块处理完毕。  After that, it is necessary to determine whether there are still macroblocks remaining in Slice#1. If there is still no segmentation, return to the first step to continue to slice the remaining macroblocks of Slice#1 into a new frame in the next frame, and perform intraframe coding. Send until all macro blocks have been processed.
上面每次划分的宏块个数 P应该满足以下条件, 尽量大, 以避免分 割次数、 减少处理时延、 缩短影响范围, 但是需要满足前述 H.264数据 率控制范围。 每次划分的宏块个数可以不一样, 但最后一次划分的宏块 数将使得丢失 Slice中的所有宏块都处理完毕。 比如说视频流数据的一帧由 240个宏块组成, 初始划分每 80个宏 块为一个 Slice, 即 1 - 80宏块为 Slice # 0, 81 - 160宏块为 Slice # 1 , 161 - 240宏块为 Slice # 2。 而根据数据率计算确定合适分段数值 P为 12个宏块一段。 则第 n帧中丢失 SHce # l后, Slice # 1的 80个宏块应 该进行分段逐次帧内编码, 首先在第 n+1帧中选前 12个宏块进行帧内 编码组成为 Slice # 3 , 这样在第 11+2帧中 Slice # 3即可釆用常规预测编 码, 而接着的 12个宏块再进行帧内编码組成 Slice # 4, 依次直到第 n+7 帧时最后剩余为 8个宏块进行帧内编码组成 Slice # 9, 才完成分段逐次 帧内编码的误码扩散方法流程。 The number of macroblocks P divided each time should satisfy the following conditions, as large as possible, to avoid the number of divisions, reduce the processing delay, and shorten the range of influence, but it is necessary to satisfy the aforementioned H.264 data rate control range. The number of macroblocks divided each time can be different, but the number of macroblocks divided last time will cause all macroblocks in the lost slice to be processed. For example, one frame of video stream data is composed of 240 macroblocks, and each 80 macroblocks are initially divided into one slice, that is, 1 - 80 macroblocks are Slice # 0, 81 - 160 macroblocks are Slice # 1 , 161 - 240 The macro block is Slice # 2. According to the data rate calculation, the appropriate segmentation value P is determined to be 12 macroblock segments. Then, after SHce # l is lost in the nth frame, 80 macroblocks of Slice #1 should be segmentally successively intra-coded. First, the first 12 macroblocks in the n+1th frame are intra-coded to form Slice #3. Thus, in the 11+2 frame, Slice #3 can use conventional predictive coding, and the next 12 macroblocks are intra-coded to form Slice #4, and the last remaining until the n+7th frame is 8 The macroblock is intra-coded to form Slice #9, and the error spreading method flow of the segment-by-frame intra-frame coding is completed.
, 根据实验结果发现釆用本发明的错误掩盖和误码扩散消除联合的 方法后, 得到的视频图像效果非常好。  According to the experimental results, it was found that the video image obtained by the method of the error concealment and error diffusion elimination of the present invention is very effective.
【本发明的第七实施例】  [Seventh Embodiment of the Invention]
在最后还要提出一种改进的 Tornado编码方案, 在该方案中使用这 种 Tornado编码方案作为'容错弹性保护策略。下面简单指出该种 Tornado 编码方案与传统的编码方案的主要区别。  Finally, an improved Tornado coding scheme is proposed, in which this Tornado coding scheme is used as a 'fault-tolerant elastic protection strategy. The following is a brief indication of the main differences between this Tornado encoding scheme and the traditional encoding scheme.
在采用 Tornado码进行数据传送保护的过程中,设置多层的 Tornado 码校验节点层会在一定程度上增强数据传送保护能力, 但是, 设置多层 的 Tornado码校验节点层也会使 Tornado码的运算量大, 从而使数据在 进行传送保护过程中付出了时间延迟长的代价。如果能够在确保数据传 送保护能力没有显著下降的情况下, 减少校验节点层的层数, 就能够有 效减少 Tornado码的运算量, 大大减小数据传送过程中的时间延迟, 从 而寻求到更高的数据传送保护性能 -代价比。 因此, 本发明的第七实施 例是: 设置仅具有一层校验节点层的纠删码, 根据所述纠删码进行数据 传送保护。  In the process of data transmission protection using Tornado code, setting the multi-layer Tornado code check node layer will enhance the data transmission protection capability to a certain extent, however, setting the multi-layer Tornado code check node layer will also make the Tornado code. The amount of computation is large, so that the data is paid a long time delay in the transmission protection process. If the number of layers of the check node layer can be reduced without ensuring a significant drop in the data transmission protection capability, the amount of operations of the Tornado code can be effectively reduced, and the time delay in the data transfer process can be greatly reduced, thereby seeking higher Data transfer protection performance - cost ratio. Therefore, the seventh embodiment of the present invention is: setting an erasure code having only one layer of the check node, and performing data transfer protection based on the erasure code.
该 Tornado纠删码方案仅具有一层校验节点层, 去掉了 Tornado码 的中间校验节点层, 同样, 也去掉了 Tornado码中按照 Reed- Solomon 编码产生最后一层校验节点的固有要求, 这样, 本发明的纠删码如附图 10所示,仅具有一层数据节点层和一层校验节点层,可以说本发明的纠 删码是一种结构筒化的 Tornado码, 是一种改进的 Tornado码。 本发明改进的 Tornado码的数据节点大小 L1、数据节点层中数据节 点的个数 n、 校验节点层中校验节点个数 L可根据实际需求来确定。 如 根据数据传送速率、 数据类型如音频数据 /视频数据等、 数据保护能力 要求、能够接收的最大网络延迟等因素确定数据节点层中数据节点大小 Ll、 数据节点层中包含的数据节点个数 n、 校验节点层中包含的校验节 点个数 L。 The Tornado erasure code scheme has only one check node layer, and the intermediate check node layer of the Tornado code is removed. Similarly, the inherent requirement of the last layer check node generated by the Reed-Solomon code in the Tornado code is removed. Thus, the erasure code of the present invention has only one layer of data node layer and one layer of check node as shown in FIG. 10. It can be said that the erasure code of the present invention is a structured tubular Tornado code, which is a An improved Tornado code. The data node size L1 of the improved Tornado code of the present invention, the number n of data nodes in the data node layer, and the number L of check nodes in the check node layer can be determined according to actual needs. The data node size L1 in the data node layer and the number of data nodes included in the data node layer are determined according to factors such as data transmission rate, data type such as audio data/video data, data protection capability requirements, maximum network delay that can be received, and the like. Check the number of check nodes L included in the node layer.
如果设定现有技术中 Tornado码具有 m层中间校验节点层,且从数 据节点层至第 m个中间层, 相邻两层之间的节点数目的等比递缩因子 为^ 最后层与第 m层之间的节点数目的等比递缩因子为 则 现有技术中 Tornado码的总节点数 TotalNde为:If the prior art Tornado code has a m-layer intermediate check node layer, and from the data node layer to the mth intermediate layer, the proportional scaling factor of the number of nodes between adjacent two layers is ^ the last layer and The equal scaling factor of the number of nodes between the mth layers is the total number of nodes Total N of the Tornado code in the prior art. De is:
Figure imgf000052_0001
Figure imgf000052_0001
由于 TotalNde=n+L, .因此, L的设置是有限制的, L=[ /?/(l -^)]n, L 不能够任意设定。 由于需要保证 Tornado码中每层节点的节点个数都是 整数, 因此需要 η , ηβ2, ηβ' 以及 η[^+1/(1 )]都是整数, 这 个条件叫做隐含整数节点数条件。 根据该条件, 如果给定 Tornado码中 的 m和 就可以计算出 n需要满足的条件, 如当 m=3, β = 1/2 , 则可 以计算出 n=16k, 其中 k为任意自然数。 由此可知, n能够取得的最小 值为 16 , 且现有技术中 Tornado码的码率 1'为: r=n/(n+L)=l-)5 = 1/2 , 而冗余率 l - r为: 1 - r^=l/20 Thanks to Total N. De =n+L, . Therefore, the setting of L is limited, L=[ /?/(l -^)]n, L cannot be arbitrarily set. Since it is necessary to ensure that the number of nodes of each node in the Tornado code is an integer, η, ηβ 2 , ηβ' and η[^ +1 /(1 )] are required to be integers. This condition is called the condition of the number of implicit integer nodes. . According to this condition, if m is summed in the Tornado code, the condition that n needs to be satisfied can be calculated. For example, when m=3, β = 1/2, n=16k can be calculated, where k is an arbitrary natural number. It can be seen that the minimum value that n can obtain is 16 , and the code rate 1' of the Tornado code in the prior art is: r=n/(n+L)=l-)5= 1/2, and the redundancy rate l - r is: 1 - r^=l/2 0
本发明改进的 Tornado码由于不存在中间校验节点层, 使改进的 Tornado 码不再需要上述隐含整数节点数目的条件, 本发明改进的 Tornado码的校验节点层的校验节点个数 L为: L=^n, 本发明的改进的 Tornado码的数据节点层与校验节点层的节点数目的等比递缩因子?可 以任意设置, 在给定数据节点个数 n的条件下, L可以灵活设定。  The improved Tornado code of the present invention has the condition that the number of hidden integer nodes is no longer required for the improved Tornado code because there is no intermediate check node layer. The number of check nodes of the check node layer of the improved Tornado code of the present invention L For: L=^n, the equal-ratio scaling factor of the number of nodes of the data node layer and the check node layer of the improved Tornado code of the present invention can be arbitrarily set, given the number n of data nodes, L Can be flexibly set.
本发明改进的 Tornado码的码率 r 为: r=l/(l+^); 本发明改进的 Tornado码的冗余率 1 - r为: 1 - τ=β/( +β)。  The code rate r of the improved Tornado code of the present invention is: r = l / (l + ^); The redundancy rate of the improved Tornado code of the present invention 1 - r is: 1 - τ = β / ( + β).
本发明改进的 Tornado码可表示为 TN ( n+L, n ), 如 TN ( 30, 20 ), 表示数据节点层中数据节点数目 n=20、 校验节点层中校验节点数目 L=10。 此时, 本发明改进的 Tornado 码的^ L/n= 10/20= 1/2 , 而码率 r=2/3=66.7%0 The improved Tornado code of the present invention can be expressed as TN (n+L, n), such as TN (30, 20), indicating that the number of data nodes in the data node layer is n=20, and the number of check nodes in the check node layer is L=10. . At this time, the improved Tornado code of the present invention has ^ L/n = 10/20 = 1/2 , and the code rate r=2/3=66.7% 0
综上所述, 本发明在综合上述六种增强技术的基础上, 将整个 H.264/ERRTP传送架构模块化实现, 并且相互结合在一个协议栈上, 不 仅能够实现各自的优点, 而且经过相互增强之后能够体现更好的可靠性 和服务质量。  In summary, the present invention integrates the above six enhancement technologies, modularizes the entire H.264/ERRTP transmission architecture, and combines them on a protocol stack, not only achieving their respective advantages, but also mutual Enhanced to reflect better reliability and quality of service.
本领域的技术人员可以理解, 上述实施例的描述中, 涉及到各种具 体实现细节和参数选择等, 都是可以根据具体应用另外确定, 并不影响 本发明的实质和范围。  It will be understood by those skilled in the art that the description of the above embodiments, the specific implementation details and the parameter selections, etc., may be determined according to the specific application without affecting the essence and scope of the invention.
虽然通过参照本发明的某些优选实施方式, 已经对本发明进行了图 示和描述, 但本领域的普通技术人员应该明白, 可对其作各种变更和等 同, 而不偏离本发明的精神和范围。  Although the present invention has been illustrated and described with reference to the preferred embodiments of the present invention, range.

Claims

权 利 要 求 Rights request
1. 一种多媒体通信方法, 其特征在于, 包括: A multimedia communication method, comprising:
发送端根据容错弹性保护策略选择编码方式,对多媒体数据进行编 码, 并将通过实时传送协议封装的巳编码的多媒体数据发送给接收端; 所述接收端接收所述多媒体数据,若所述接收的多媒体数据出现传 送错误, 所述接收端恢复或部分恢复所述传送错误的多媒体数据。  The transmitting end selects the encoding mode according to the fault-tolerant elastic protection policy, encodes the multimedia data, and sends the encoded multimedia data encapsulated by the real-time transport protocol to the receiving end; the receiving end receives the multimedia data, if the receiving The multimedia data has a transmission error, and the receiving end restores or partially restores the multimedia data of the transmission error.
2. 根据权利要求 1所述的多媒体通信方法, 其特征在于, 进一步 包括:  2. The multimedia communication method according to claim 1, further comprising:
所述接收端统计通信质量, 生成服务质量报告, 将其发回给所述发 送端;  The receiving end collects the communication quality, generates a service quality report, and sends it back to the sending end;
所述发送端根据所述服务质量报告调整所述容错弹性保护策略。 The sending end adjusts the fault tolerant elastic protection policy according to the quality of service report.
3. 根据权利要求 2所述的多媒体通信方法, 其特征在于, 进一步 包括: The multimedia communication method according to claim 2, further comprising:
所述接收端根据传送错误的多媒体数据统计传送错误信息,并且实 施错误掩盖策略;  The receiving end transmits error information according to the multimedia data of the transmission error, and implements an error concealment strategy;
所述接收端将所述传送错误信息反馈给所述发送端;  The receiving end feeds back the transmission error information to the sending end;
所述发送端根据所述传送错误信息实施误码扩散消除策略。  The transmitting end implements an error diffusion elimination strategy according to the transmission error information.
4. 根据权利要求 1所述的多媒体通信方法, 其特征在于, 在所述实时传送协议包头信息中携带编码相关信息,所述接收端根 据该编码相关信息恢复或部分恢复所述多媒体数据。  The multimedia communication method according to claim 1, wherein the real-time transmission protocol header information carries coding-related information, and the receiving end recovers or partially recovers the multimedia data according to the coding-related information.
5. 根据权利要求 1所述的多媒体通信方法, 其特征在于, 所述编 码后的多媒体数据分为数据节点和校验节点两类。  The multimedia communication method according to claim 1, wherein the encoded multimedia data is classified into two types: a data node and a check node.
6. 根据权利要求 4所述的多媒体通信方法, 其特征在于, 所述发 送端根据当前网络传送状况或 /和待发送多媒体数据的服务质量等级, 选择前向纠错编码方式,  The multimedia communication method according to claim 4, wherein the transmitting end selects a forward error correction coding mode according to a current network transmission status or/and a quality of service level of the multimedia data to be transmitted,
其中待发送多媒体数据的服务质量等级取决于不同数据的相对重 要等级。  The quality of service level of the multimedia data to be transmitted depends on the relative importance of different data.
7. 根据权利要求 6所述的多媒体通信方法, 其特征在于, 所述实 时传送协议包头信息中包含: 实时传送协议标识字段, 用于指示以区别于实时传送协议; 前向纠错编码类型字段, 用于指示所采用的前向纠错码类型; 前向纠错编码子类型字段,用于指示所述前向纠错编码方式的相关 参数设置; The multimedia communication method according to claim 6, wherein the real-time transport protocol header information includes: a real-time transport protocol identification field for indicating to distinguish from a real-time transport protocol; a forward error correction coding type field for indicating a forward error correction code type to be used; a forward error correction coding subtype field for indicating The related parameter setting of the forward error correction coding mode;
数据包长度字段,用于指示在对所述多媒体数据进行前向纠错编码 后得到的节点的长度;  a packet length field, configured to indicate a length of a node obtained after performing forward error correction coding on the multimedia data;
数据包数目字段,用.于指示该实时传送协议包所承载的所述节点的 数目。  The number of packets field is used to indicate the number of said nodes carried by the real-time transport protocol packet.
8. 根据权利要求 Ί所述的多媒体通信方法, 其特征在于, 所述发 送端将 H.264网絡抽象层单元划分为等长的至少一个数据节点, 然后对 其进行前向纠错编码, 得到至少一个校验节点;  8. The multimedia communication method according to claim </ RTI>, wherein the transmitting end divides the H.264 network abstraction layer unit into at least one data node of equal length, and then performs forward error correction coding to obtain At least one check node;
所述发送端将所述数据节点和所述校验节点分组封装在至少一个 所述容错弹性实时传送协议包中进行发送;  Transmitting, by the sending end, the data node and the check node packet in at least one fault tolerant elastic real-time transport protocol packet for sending;
所述接收端在接收到所述实时传送协议包后,去封装得到所述数据 节点和所述校验节点;  After receiving the real-time transport protocol packet, the receiving end decapsulates the data node and the check node;
如果发生了传送过程中的数据节点丟失,则所述接收端根据所述校 验节点对所述丢失的数据节点进行基于前向纠错解码的恢复或者部分 恢复, 并划分得到 H.264网絡抽象层单元。  If the data node is lost during the transmission, the receiving end performs recovery or partial recovery based on the forward error correction decoding on the lost data node according to the check node, and divides the H.264 network abstraction. Layer unit.
9. 根据权利要求 8所述的多媒体通信方法, 其特征在于, 所述发 送端将所述数据节点和所述校验节点分组封装在至少一个所述容错弹 性实时传送协议包中进行发送之前, 还包含步骤:  The multimedia communication method according to claim 8, wherein the transmitting end encapsulates the data node and the check node packet in at least one of the fault tolerant elastic real-time transport protocol packets for transmission. Also includes steps:
所述发送端和所述接收端协商确定, 对于各种所述前向纠错码类 型,所述前向纠错码子类型字段的取值与其所指示的该种前向纠错码的 相关参数设置的对应关系。  The transmitting end and the receiving end negotiate to determine, for each of the forward error correcting code types, the value of the forward error correcting code subtype field and the related parameter of the forward error correcting code indicated by the sending end The corresponding relationship set.
10. 根据权利要求 9所述的多媒体通信方法, 其特征在于, 所述发 送端和所述接收端都根据所述前向纠错编码子类型字段指示的对应关 系建立对应关系表,用于根据所述前向纠错编码类型字段和所述前向纠 错编码子类型字段查询所对应的前向纠错编码或前向纠错解码处理模 块; 所述发送端调用相应前向糾错编码处理模块进行前向糾错编码; 所述接收端调用相应前向纠错解码处理模块进行前向纠错解码。The multimedia communication method according to claim 9, wherein the transmitting end and the receiving end both establish a correspondence relationship table according to the correspondence relationship indicated by the forward error correction coding subtype field, for The forward error correction coding type field and the forward error correction coding subtype field query corresponding forward error correction coding or forward error correction decoding processing module; The transmitting end invokes a corresponding forward error correction coding processing module to perform forward error correction coding; the receiving end invokes a corresponding forward error correction decoding processing module to perform forward error correction decoding.
11. 根据权利要求 10所述的多媒体通信方法, 其特征在于, 所述 发送端根据所述 H.264网络抽象层单元的头信息中的网络抽象层参考标 的相对重要性, 从而确定所述服务质量等级, 进而选择所述前向纠错编 码方式, 确定所述前向纠错编码类型字段和前向纠错编码子类型字段。 The multimedia communication method according to claim 10, wherein the transmitting end determines the service according to the relative importance of the network abstraction layer reference target in the header information of the H.264 network abstraction layer unit. The quality level, and then the forward error correction coding mode, is selected to determine the forward error correction coding type field and the forward error correction coding subtype field.
12. 根据权利要求 10所述的多媒体通信方法, 其特征在于, 所述 发送端根据所述接收端反馈的传送报告评价所述网络传送状况,进而选 择所述前向纠错编码方式,确定所述前向纠错编码类型字段和前向纠错 编码子类型字段。  The multimedia communication method according to claim 10, wherein the transmitting end evaluates the network transmission status according to the transmission report fed back by the receiving end, and further selects the forward error correction coding mode to determine The forward error correction coding type field and the forward error correction coding subtype field are described.
13. 才艮据权利要求 8所述的多媒体通信方法, 其特征在于, 所述发 送端将头信息相同的至少一个网络抽象层单元去掉其头信息后再一起 进行划分、 编码和封装入所述容错弹性实时传送协议包, 并将该网络抽 象层单元所具有的相同头信息综合在该容错弹性实时传送协议包的头 信息中;  13. The multimedia communication method according to claim 8, wherein the transmitting end removes, compresses, and encapsulates the at least one network abstraction layer unit with the same header information, and then divides, codes, and encapsulates the same. The fault-tolerant elastic real-time transport protocol packet, and synthesizing the same header information of the network abstraction layer unit in the header information of the fault-tolerant elastic real-time transport protocol packet;
所述接收端从接收到的所述容错弹性实时传送协议包的头信息中 获取所承载的头信息,并添加到从所述容错弹性实时传送协议包提取出 的剥离了头信息的网络抽象层单元的头部, 获得完整的网络抽象层单 元; 如果存在传送错误, 则根据预置策略有进行前向纠错解码恢复或者 部分恢复数据节点, 然后再从中提取出网络抽象层单元。  Receiving, by the receiving end, the carried header information from the received header information of the fault-tolerant elastic real-time transport protocol packet, and adding to the network abstraction layer stripped from the header information extracted from the fault-tolerant elastic real-time transport protocol packet The head of the unit obtains a complete network abstraction layer unit; if there is a transmission error, the forward error correction decoding is performed according to the preset strategy to recover or partially restore the data node, and then the network abstraction layer unit is extracted therefrom.
14. 根据权利要求 13所述的多媒体通信方法, 其特征在于, 在所 述容错弹性实时传送协议头信息中,所述网络抽象层单元头信息中的网 络抽象层参考标识字段和类型字段填充在所述容错弹性实时传送协议 包头信息的净荷类型字段中。  The multimedia communication method according to claim 13, wherein in the fault tolerant elastic real-time transmission protocol header information, a network abstraction layer reference identifier field and a type field in the network abstraction layer unit header information are filled in The fault-tolerant elastic real-time transport protocol header information is stored in a payload type field.
15. 根据权利要求 14所述的多媒体通信方法, 其特征在于, 所述 容错弹性实时传送协议标识字段为所述容错弹性实时传送协议包头信 息的版本信息字段。  The multimedia communication method according to claim 14, wherein the fault tolerant elastic real time transmission protocol identification field is a version information field of the fault tolerant elastic real time transmission protocol header information.
16. 根据权利要求 15所述的多媒体通信方法, 其特征在于, 在所 述容错弹性实时传送协议封装格式中,所述网络抽象层单元头信息中的 禁止比特字段填充在所述容错弹性实时传送协议包头信息的标记字段 中; 16. The multimedia communication method according to claim 15, wherein In the fault tolerant elastic real-time transport protocol encapsulation format, the forbidden bit field in the network abstraction layer unit header information is filled in a tag field of the fault tolerant elastic real-time transport protocol header information;
接收端根据所述容错弹性实时传送协议包的标记字段判断其所承 载的网络抽象层单元是否出错。  The receiving end determines, according to the tag field of the fault-tolerant elastic real-time transport protocol packet, whether the network abstraction layer unit it carries is in error.
17. 根据权利要求 14所述的多媒体通信方法, 其特征在于, 所述 容错弹性实时传送协议标识为所述容错弹性实时传送协议包头信息的 标记字段取值, 该标记字段位于所述容错弹性实时传送协议包头信息 中。  The multimedia communication method according to claim 14, wherein the fault-tolerant elastic real-time transmission protocol identifier is a value of a tag field of the fault-tolerant elastic real-time transmission protocol header information, and the tag field is located in the fault-tolerant elastic real-time. In the transfer protocol header information.
18. 根据权利要求 17所述的多媒体通信方法, 其特征在于, 所述发送端首先判断至少一个所述网络抽象层单元的头信息中的 禁止比特字段是否有效,据此将其分为正常网絡抽象层单元和出错网络 抽象层单元;  The multimedia communication method according to claim 17, wherein the transmitting end first determines whether the forbidden bit field in the header information of at least one of the network abstraction layer units is valid, and then divides the virtualized network into a normal network. Abstract layer unit and error network abstraction layer unit;
然后按所述容错弹性实时传送协议封装格式将所迷正常网络抽象 层单元封装成所述容错弹性实时传送协议包,并设所述容错弹性实时传 送协议标识;  Then, the normal network abstraction layer unit is encapsulated into the fault-tolerant elastic real-time transport protocol packet according to the fault-tolerant elastic real-time transport protocol encapsulation format, and the fault-tolerant elastic real-time transport protocol identifier is set;
按所述实时传送协议封装格式将所述出错网络抽象层单元封装成 所述实时传送协议包;  Encapsulating the error network abstraction layer unit into the real-time transport protocol packet according to the real-time transport protocol encapsulation format;
所述接收端首先判断接收到的包的头信息是否设所述容错弹性实 时传送协议标识,将其分为所述容错弹性实时传送协议包和所述实时传 送协议包;  The receiving end first determines whether the header information of the received packet is set to the fault-tolerant elastic real-time transport protocol identifier, and divides the fault-tolerant elastic real-time transport protocol packet into the real-time transport protocol packet;
然后根据所述容错弹性实时传送协议封装格式处理所述容错弹性 实时传送协议包,根据所述实时传送协议包封装格式处理所述实时传送 协议包。  The fault tolerant elastic real-time transport protocol packet is then processed according to the fault tolerant elastic real-time transport protocol encapsulation format, and the real-time transport protocol packet is processed according to the real-time transport protocol packet encapsulation format.
19. 根据权利要求 8所述的多媒体通信方法, 其特征在于, 所述接收端统计生成所述服务质量 4艮告;  The multimedia communication method according to claim 8, wherein the receiving end statistically generates the service quality 4 report;
所述接收端用 H.264扩展消息承载所述服务质量报告,发给所述发 送端。  The receiving end carries the quality of service report by using an H.264 extended message, and sends the quality of service report to the sending end.
20. 根据权利要求 19所述的多媒体通信方法, 其特征在于, 所述 H.264扩展消息为补充增强信息; 20. The multimedia communication method according to claim 19, wherein: The H.264 extension message is supplemental to the enhanced information;
所述补充增强信息包含:  The supplemental enhancement information includes:
载荷类型字段, 用于指示载荷为对应服务质量报告;  a payload type field for indicating that the payload is a corresponding quality of service report;
载荷长度字段, 用于指示对应服务质量报告长度;  a payload length field, used to indicate a corresponding quality of service report length;
载荷, 用于填充对应服务质量报告。  Load, used to populate the corresponding quality of service report.
21. 根据权利要求 20所述的多媒体通信方法, 其特征在于, 所述 服务质量报告分为发送方报告和接收方报告, 由所述载荷类型字段指示 区分;  The multimedia communication method according to claim 20, wherein the quality of service report is divided into a sender report and a receiver report, and is indicated by the payload type field;
当所述服务质量报告被填充于所述补充增强信息的载荷中时,所述 补充增强信息的载荷包含版本信息字段、 填充字段、接收报告数字段以 及发送方同步源标识符字段;  When the quality of service report is filled in the payload of the supplemental enhancement information, the payload of the supplemental enhancement information includes a version information field, a padding field, a received report number field, and a sender synchronization source identifier field;
当所述服务质量报告为发送方报告时, 还包含发送方信息块, 用于 描述该服务质量报告的发送方的相关信息;  When the quality of service report is a sender report, the sender information block is further included to describe related information of the sender of the service quality report;
包含至少一块所述接收^ =艮告块,用于描述来自不同源的多媒体统计 信息;  Include at least one piece of the received ^= acknowledgment block for describing multimedia statistics from different sources;
包含特定层面扩展, 用于特定层面的保留功能扩展。  Contains specific level extensions for retention extensions at specific levels.
22. 根据权利要求 20所述的多媒体通信方法, 其特征在于, 用于 承载所述服务质量报告的所述补充增强信息进一步由抽象网络层单元 承载;  22. The multimedia communication method according to claim 20, wherein the supplementary enhancement information for carrying the quality of service report is further carried by an abstract network layer unit;
所述通信终端根据所述服务质量报告传送的可靠性要求设置该抽 象网络层单元的网络抽象层参考标识。  The communication terminal sets a network abstraction layer reference identifier of the abstraction network layer unit according to the reliability requirement of the quality of service report transmission.
23. 根据权利要求 20所述的多媒体通信方法, 其特征在于, 所述 通信终端根据当前网络状态和高层应用需求动态调整所述服务质量报 告的统计生成及发送的周期。  The multimedia communication method according to claim 20, wherein the communication terminal dynamically adjusts a period of statistical generation and transmission of the quality of service report according to a current network state and a high-level application requirement.
24. 根据权利要求 19所述的多媒体通信方法, 其特征在于, 所述 接收端根据接收到的所述视频流数据的网络抽象层单元序号,统计丢失 的所述网络抽象层单元数目, 生成所述服务质量报告, 发回给所述发送 端;  The multimedia communication method according to claim 19, wherein the receiving end counts the number of the lost network abstraction layer units according to the received network abstraction layer unit serial number of the video stream data, and generates a Said service quality report, sent back to the sender;
所述发送端根据所述丢失的网络抽象层单元序号,计算得到所述累 计丟包率, 据此调整所述容错弹性保护策略。 The sending end calculates the tiredness according to the missing network abstract layer unit sequence number The packet loss rate is measured, and the fault-tolerant elastic protection strategy is adjusted accordingly.
25. 根据权利要求 24所述的多媒体通信方法, 其特征在于, 所述 接收端根据接收到的服务质量报告, 分析计算网络状况参数; 所述参数 包括端到端的瞬时带宽、 延时和抖动。  The multimedia communication method according to claim 24, wherein the receiving end analyzes and calculates a network condition parameter according to the received quality of service report; the parameter includes end-to-end instantaneous bandwidth, delay, and jitter.
26. 根据权利要求 25所述的多媒体通信方法, 其特征在于, 所述 发送端设置不同等级的容错弹性保护策略系列,根据所述服务质量报告 选择使用相应等级的所述容错弹性保护策略。  The multimedia communication method according to claim 25, wherein the transmitting end sets different levels of fault-tolerant elastic protection policy series, and selects the corresponding fault-tolerant elastic protection policy according to the service quality report.
27. 才艮据权利要求 26所述的多媒体通信方法, 其特征在于, 所述 接收端根据接收到的所述视频流数据的网络抽象层单元序号,统计得到 丟失视频流数据的定位信息, 并将其发回给所述发送端;  The multimedia communication method according to claim 26, wherein the receiving end obtains the positioning information of the lost video stream data according to the received network abstract layer unit sequence number of the video stream data, and Send it back to the sender;
所述发送端根据所述丟失视频流数据的定位信息,重新发送所述丟 失视频流数据给所迷接收端。  And sending, by the sending end, the lost video stream data to the receiving end according to the positioning information of the lost video stream data.
28. 根据权利要求 8所述的多媒体通信方法, 其特征在于, 所述发 送端根据所述传送错误信息获得所述丟失条带的定位信息,通过对该丟 失条带进行分段逐次帧内编码 , 以实现所述误码扩散消除策略。  The multimedia communication method according to claim 8, wherein the transmitting end obtains the positioning information of the lost strip according to the transmission error information, and performs segmental successive intra coding on the lost strip. To implement the error diffusion elimination strategy.
29. 根据权利要求 28所述的多媒体通信方法, 其特征在于, 所述 分段逐次帧内编码包含以下步骤:  29. The multimedia communication method according to claim 28, wherein the segment-by-sequence intra-frame coding comprises the following steps:
从所述丢失条带中分割一组连续的宏块, 组成新条带, 剩余的所述 宏块仍属于所述丢失条带;  Splitting a set of consecutive macroblocks from the lost strip to form a new strip, and the remaining macroblocks still belong to the missing stripe;
对所述新条带进行帧内编码, 在下一帧时发送, 在此之后该新条带 做常规编码。  The new strip is intra-coded and transmitted at the next frame, after which the new strip is conventionally encoded.
30. 才艮据权利要求 8所述的多媒体通信方法, 其特征在于, 所述接收端检测传送错误, 并统计传送错误信息;  30. The multimedia communication method according to claim 8, wherein the receiving end detects a transmission error and statistically transmits the error information;
所述接收端在发生传送错误后, 进行视频信息重同步;  The receiving end performs video information resynchronization after a transmission error occurs;
所述接收端根据所述传送错误信息实施所述错误掩盖策略。  The receiving end implements the error concealment policy according to the transmission error information.
31. 居权利要求 30所述的多媒体通信方法, 其特征在于, 所述 接收端根据网絡抽象层单元序号的不连续情况来检测并统计传送错误 信息。  The multimedia communication method according to claim 30, wherein the receiving end detects and statistically transmits the error information according to the discontinuity of the network abstraction layer unit number.
32. 根据权利要求 31所述的多媒体通信方法, 其特征在于, 所述 接收端根据所述网络抽象层单元序号的中断情况获得丟失条带的定位 信息,该定位信息包含所述丟失条带所在的帧号和所述丢失条带在该帧 的位置。 The multimedia communication method according to claim 31, wherein: The receiving end obtains the positioning information of the lost strip according to the interruption condition of the sequence number of the network abstraction layer unit, where the positioning information includes a frame number where the lost strip is located and a position of the lost strip at the frame.
33. 根据权利要求 35所述的多媒体通信方法, 其特征在于, 所述 错误掩盖策略包含步骤:所述接收端用所述丟失条带所在帧的前一帧的 相应条带, 来替代该丟失条带。  33. The multimedia communication method according to claim 35, wherein the error concealment policy comprises the step of: the receiving end replacing the loss with a corresponding strip of a previous frame of the frame in which the lost strip is located Bands.
34. 根据权利要求 8至 33中任意一项所述的多媒体通信方法, 其 特征在于, 所述容错弹性编码方案包含改进的 "Tornado" 纠删码; 所述改进的 "Tornado" 纠删码对于一组所述数据节点仅生成一层 所述校验节点。  The multimedia communication method according to any one of claims 8 to 33, wherein the fault tolerant elastic coding scheme includes an improved "Tornado" erasure code; the improved "Tornado" erasure code A set of said data nodes generates only one layer of said check nodes.
35. 一种多媒体通信终端, 具有用于实现多媒体通信的基本功能模 块, 其中包含用于实现多媒体编解码功能的编解码模块, 其特征在于, 还包括以下模块:  35. A multimedia communication terminal having a basic function module for implementing multimedia communication, comprising a codec module for implementing a multimedia codec function, wherein the method further comprises the following modules:
容错弹性实施传送控制协议模块,用于接收所述编解码模块编码后 的多媒体数据, 对所述多媒体数据进行容错弹性保护, 将所述进行容错 弹性保护的数据发送到网络侧进行传送,所述容错弹性实施传送控制协 议模块还用于接收网络侧的所述多媒体数据,对所述多媒体数据进行纠 错, 并将所述多媒体数据传给所述编解码模块进行解码。  The fault-tolerant and flexible transmission control protocol module is configured to receive the multimedia data encoded by the codec module, perform fault-tolerant and flexible protection on the multimedia data, and send the fault-tolerant and elastic-protected data to the network side for transmission. The fault-tolerant elastic implementation transmission control protocol module is further configured to receive the multimedia data on the network side, perform error correction on the multimedia data, and transmit the multimedia data to the codec module for decoding.
36. 根据权利要求 35所述的多媒体通信终端, 其特征在于, 还包 括以下模块:  The multimedia communication terminal according to claim 35, further comprising the following modules:
保护方法和策略协商模块,用于负责在通信双方之间进行容错弹性 保护策略协商, 确定保护策略集合, 供所述容错弹性实施传送控制协议 模块选择; ,  The protection method and the policy negotiation module are configured to perform fault-tolerant and flexible protection policy negotiation between the two communication parties, determine a protection policy set, and implement the transmission control protocol module selection for the fault-tolerant elasticity;
前向纠错模块, 用于实现至少一种前向纠错保护方法, 维护所述前 向纠错保护方法的相关参数,其中所述保护方法和策略协商模块控制所 述前向纠错模块以实现不等保护和自适应分级保护功能,所述容错弹性 实施传送控制协议模块通过调用该前向纠错模块实现容错弹性保护和 纠错功能。  a forward error correction module, configured to implement at least one forward error correction protection method, to maintain related parameters of the forward error correction protection method, wherein the protection method and the policy negotiation module control the forward error correction module to The unequal protection and adaptive hierarchical protection functions are implemented, and the fault-tolerant elastic implementation transmission control protocol module implements fault-tolerant elastic protection and error correction functions by calling the forward error correction module.
37. 根据权利要求 36所述的多媒体通信终端, 其特征在于, 还包 括: 37. The multimedia communication terminal according to claim 36, further comprising Includes:
错误掩盖模块, 用于实现错误掩盖功能;  Error masking module for implementing error concealment function;
所述编解码模块用于实现 H.264编解码标准的编解码,还用于误码 扩散消除功能;  The codec module is used to implement codec of the H.264 codec standard, and is also used for error diffusion elimination function;
还包含网络状况分析计算模块, 用于分析计算网络状况, 并向所述 错误掩盖模块和所述编解码模块提供信息。  A network condition analysis calculation module is also included for analyzing the calculated network condition and providing information to the error masking module and the codec module.
38. 根据权利要求 37所述的多媒体通信终端, 其特征在于, 还包 括:  The multimedia communication terminal according to claim 37, further comprising:
补充增强消息扩展处理模块,用于实现服务质量报告和网络状况报 告功能, 并将报告发送给所述网络状况分析计算模块。  A supplementary enhanced message extension processing module is configured to implement a quality of service report and a network status report function, and send the report to the network status analysis calculation module.
39. 根据权利要求 37所述的多媒体通信终端, 其特征在于, 其传 送层基于所述容错弹性实时传送协议 /实时传送控制协议, 用于实现支 持错误弹性的多媒体传送功能;  The multimedia communication terminal according to claim 37, wherein the transmission layer is based on the fault tolerant elastic real-time transmission protocol/real-time transmission control protocol for implementing a multimedia transmission function supporting error resilience;
其应用协议层包含保护机制和策略协商子层,用于实现分级保护和 不等保护功能;  The application protocol layer includes a protection mechanism and a policy negotiation sublayer for implementing hierarchical protection and unequal protection functions;
其 H.264视频编码层包含补充增强消息扩展报告层,用于实现基于 补充增强消息扩展的报告功能;  The H.264 video coding layer includes a supplementary enhanced message extension reporting layer for implementing a reporting function based on supplementary enhanced message extension;
其 H.264网络抽象层中包含前向纠错编码层, 用于实现前向纠错编 码功能。  The H.264 network abstraction layer includes a forward error correction coding layer for implementing forward error correction coding.
40. 根据权利要求 35至 39中任一项所述的多媒体通信终端, 其特 征在于,所述用于实现多媒体通信的基本功能模块包含以下之一或其任 意组合: '  The multimedia communication terminal according to any one of claims 35 to 39, wherein the basic function module for implementing multimedia communication comprises one of the following or any combination thereof:
主控模块, 用于负责整个终端的控制;  The main control module is used to control the entire terminal;
用户接口模块, 用于负责用户输入输出的交互和信息的显示; 网络通信模块, 用于负责和网络进行通信, 提供下层传送通道; 输入输出和底层驱动模块, 用于负责对于硬件设备进行驱动; 业务模块, 用于实现高层业务;  a user interface module, configured to be responsible for user input and output interaction and display of information; a network communication module for communicating with the network to provide a lower layer transmission channel; an input and output and an underlying driver module for driving the hardware device; Service module, used to implement high-level business;
通信过程控制模块, 用于控制通信过程;  a communication process control module for controlling a communication process;
应用协议模块, 用于实现应用协议功能; 实施传送控制协议裤块, 用于实现实施传送控制协议功能; 网络抽象层模块, 用于实现网络抽象层功能; Application protocol module, used to implement application protocol functions; Implementing a transport control protocol trousers for implementing a transport control protocol function; a network abstraction layer module for implementing a network abstraction layer function;
音频编解码模块, 用于实现音频编解码功能。 Audio codec module for audio codec function.
PCT/CN2006/002961 2005-11-03 2006-11-03 A multimedia communication method and the terminal thereof WO2007051425A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNB2006100690163A CN100466725C (en) 2005-11-03 2005-11-03 Multimedia communication method and terminal thereof
CN200510110013.5 2005-11-03

Publications (1)

Publication Number Publication Date
WO2007051425A1 true WO2007051425A1 (en) 2007-05-10

Family

ID=37390610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2006/002961 WO2007051425A1 (en) 2005-11-03 2006-11-03 A multimedia communication method and the terminal thereof

Country Status (2)

Country Link
CN (1) CN100466725C (en)
WO (1) WO2007051425A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101800751A (en) * 2010-03-09 2010-08-11 上海雅海网络科技有限公司 Distributed real-time data-coding transmission method
CN103118241A (en) * 2012-02-24 2013-05-22 金三立视频科技(深圳)有限公司 Mobile video monitoring streaming media transmission self-adaptive adjustment algorithm based on the 3rd generation telecommunication (3G) network
CN103167319A (en) * 2011-12-16 2013-06-19 中国移动通信集团公司 Transmission processing method, device and system of streaming media
CN112311802A (en) * 2020-11-05 2021-02-02 维沃移动通信有限公司 Information transmission method and information transmission device
WO2021180065A1 (en) * 2020-03-09 2021-09-16 华为技术有限公司 Data transmission method and communication apparatus
CN114070458A (en) * 2020-08-04 2022-02-18 成都鼎桥通信技术有限公司 Data transmission method, device, equipment and storage medium
CN115189810A (en) * 2022-07-07 2022-10-14 福州大学 Low-delay real-time video FEC coding transmission control method

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8356331B2 (en) * 2007-05-08 2013-01-15 Qualcomm Incorporated Packet structure for a mobile display digital interface
JP4488027B2 (en) * 2007-05-17 2010-06-23 ソニー株式会社 Information processing apparatus and method, and information processing system
BRPI0907748A2 (en) * 2008-02-05 2015-07-21 Thomson Licensing Methods and apparatus for implicit block segmentation in video encoding and decoding
CN102075312B (en) * 2011-01-10 2013-03-20 西安电子科技大学 Video service quality-based hybrid selective repeat method
CN102438002B (en) * 2011-08-10 2016-08-03 中山大学深圳研究院 A kind of based on the video file data transmission under Ad hoc network
US8549570B2 (en) * 2012-02-23 2013-10-01 Ericsson Television Inc. Methods and apparatus for managing network resources used by multimedia streams in a virtual pipe
CN102956233B (en) * 2012-10-10 2015-07-08 深圳广晟信源技术有限公司 Extension structure of additional data for digital audio coding and corresponding extension device
CN105653530B (en) * 2014-11-12 2021-11-30 上海交通大学 Efficient and scalable multimedia transmission, storage and presentation method
FR3031428A1 (en) * 2015-01-07 2016-07-08 Orange SYSTEM FOR TRANSMITTING DATA PACKETS ACCORDING TO A MULTIPLE ACCESS PROTOCOL
CN105307050B (en) * 2015-10-26 2018-10-26 何震宇 A kind of network flow-medium application system and method based on HEVC
CN107181783B (en) * 2016-03-11 2020-06-23 上汽通用汽车有限公司 Method and device for transmitting data in a vehicle using Ethernet
CN105916058B (en) * 2016-05-05 2019-09-20 青岛海信宽带多媒体技术有限公司 A kind of streaming media buffer playback method, device and display equipment
CN106921843B (en) * 2017-01-18 2020-06-26 苏州科达科技股份有限公司 Data transmission method and device
CN109756468B (en) * 2017-11-07 2021-08-17 中兴通讯股份有限公司 Data packet repairing method, base station and computer readable storage medium
CN108702487A (en) * 2017-11-20 2018-10-23 深圳市大疆创新科技有限公司 The image transfer method and device of unmanned plane
EP3550919B1 (en) * 2018-02-08 2020-06-03 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Wireless communication method and terminal
CN110139150A (en) * 2019-04-12 2019-08-16 北京物资学院 A kind of method for processing video frequency and device
CN110233716A (en) * 2019-05-31 2019-09-13 北京文香信息技术有限公司 A kind of communication interaction method, apparatus, storage medium, terminal device and server
CN110740135A (en) * 2019-10-21 2020-01-31 湖南新云网科技有限公司 Same-screen data transmission method, device and system for multimedia classrooms
CN111010593A (en) * 2019-11-08 2020-04-14 深圳市麦谷科技有限公司 Method and device for packaging H.265 video data based on FLV format
CN110769206B (en) * 2019-11-19 2022-01-07 深圳开立生物医疗科技股份有限公司 Electronic endoscope signal transmission method, device and system and electronic equipment
CN112866178B (en) * 2019-11-27 2023-09-05 北京沃东天骏信息技术有限公司 Method and device for transmitting audio data
CN111083510A (en) * 2019-12-18 2020-04-28 深圳市麦谷科技有限公司 Method and device for pushing HEVC (high efficiency video coding) video
CN111490984B (en) * 2020-04-03 2022-03-29 上海宽创国际文化科技股份有限公司 Network data coding and encryption algorithm thereof
CN111629279B (en) * 2020-04-13 2021-04-16 北京创享苑科技文化有限公司 Video data transmission method based on fixed-length format
CN111629282B (en) * 2020-04-13 2021-02-09 北京创享苑科技文化有限公司 Real-time erasure code coding redundancy dynamic adjustment method
CN111800388A (en) * 2020-06-09 2020-10-20 盐城网之易传媒有限公司 Media information processing method and media information processing device
CN113873340B (en) * 2021-09-18 2024-01-16 恒安嘉新(北京)科技股份公司 Data processing method, device, equipment, system and storage medium
CN113938881A (en) * 2021-10-18 2022-01-14 上海华讯网络系统有限公司 Transmission system and method suitable for internet data
CN114615549B (en) * 2022-05-11 2022-09-20 北京搜狐新动力信息技术有限公司 Streaming media seek method, client, storage medium and mobile device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1065168A (en) * 1991-02-19 1992-10-07 菲利浦光灯制造公司 Transmission system and the receiver that is used for this system
US20030229822A1 (en) * 2002-04-24 2003-12-11 Joohee Kim Methods and systems for multiple substream unequal error protection and error concealment
WO2004036760A1 (en) * 2002-10-15 2004-04-29 Koninklijke Philips Electronics N.V. System and method for providing error recovery for streaming fgs encoded video over an ip network
US6944802B2 (en) * 2000-03-29 2005-09-13 The Regents Of The University Of California Method and apparatus for transmitting and receiving wireless packet

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7110452B2 (en) * 2001-03-05 2006-09-19 Intervideo, Inc. Systems and methods for detecting scene changes in a video data stream

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1065168A (en) * 1991-02-19 1992-10-07 菲利浦光灯制造公司 Transmission system and the receiver that is used for this system
US6944802B2 (en) * 2000-03-29 2005-09-13 The Regents Of The University Of California Method and apparatus for transmitting and receiving wireless packet
US20030229822A1 (en) * 2002-04-24 2003-12-11 Joohee Kim Methods and systems for multiple substream unequal error protection and error concealment
WO2004036760A1 (en) * 2002-10-15 2004-04-29 Koninklijke Philips Electronics N.V. System and method for providing error recovery for streaming fgs encoded video over an ip network

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101800751A (en) * 2010-03-09 2010-08-11 上海雅海网络科技有限公司 Distributed real-time data-coding transmission method
CN103167319A (en) * 2011-12-16 2013-06-19 中国移动通信集团公司 Transmission processing method, device and system of streaming media
CN103167319B (en) * 2011-12-16 2016-06-22 中国移动通信集团公司 The transfer processing method of a kind of Streaming Media, Apparatus and system
CN103118241A (en) * 2012-02-24 2013-05-22 金三立视频科技(深圳)有限公司 Mobile video monitoring streaming media transmission self-adaptive adjustment algorithm based on the 3rd generation telecommunication (3G) network
WO2021180065A1 (en) * 2020-03-09 2021-09-16 华为技术有限公司 Data transmission method and communication apparatus
CN114070458A (en) * 2020-08-04 2022-02-18 成都鼎桥通信技术有限公司 Data transmission method, device, equipment and storage medium
CN112311802A (en) * 2020-11-05 2021-02-02 维沃移动通信有限公司 Information transmission method and information transmission device
CN112311802B (en) * 2020-11-05 2023-10-27 维沃移动通信有限公司 Information transmission method and information transmission device
CN115189810A (en) * 2022-07-07 2022-10-14 福州大学 Low-delay real-time video FEC coding transmission control method
CN115189810B (en) * 2022-07-07 2024-04-16 福州大学 Low-delay real-time video FEC coding transmission control method

Also Published As

Publication number Publication date
CN1863302A (en) 2006-11-15
CN100466725C (en) 2009-03-04

Similar Documents

Publication Publication Date Title
WO2007051425A1 (en) A multimedia communication method and the terminal thereof
AU2006321552B2 (en) Systems and methods for error resilience and random access in video communication systems
Wenger et al. RTP payload format for H. 264 video
US8462856B2 (en) Systems and methods for error resilience in video communication systems
EP1936868B1 (en) A method for monitoring quality of service in multimedia communication
Wang et al. RTP payload format for H. 264 video
WO2007045141A1 (en) A method for supporting multimedia data transmission with error resilience
CN100558167C (en) Multimedia video communication method and system
WO2006105713A1 (en) Video transmission protection method based on h.264
Wenger et al. RFC 3984: RTP payload format for H. 264 video
JP2005033556A (en) Data transmitter, data transmitting method, data receiver, data receiving method
Wang et al. RFC 6184: RTP Payload Format for H. 264 Video
AU2012216587B2 (en) Systems and methods for error resilience and random access in video communication systems
Esgueva Martínez Vídeo streaming modelling over optical networks
AU2012201576A1 (en) Improved systems and methods for error resilience in video communication systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06805162

Country of ref document: EP

Kind code of ref document: A1