WO2007051425A1

WO2007051425A1 - A multimedia communication method and the terminal thereof

Info

Publication number: WO2007051425A1
Application number: PCT/CN2006/002961
Authority: WO
Inventors: Zhong Luo; Bin Song
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2005-11-03
Filing date: 2006-11-03
Publication date: 2007-05-10
Also published as: CN1863302A; CN100466725C

Abstract

A multimedia communication method and the terminal thereof improve the transmission dependability and the communication quality. Based on the existing real-time transmission protocol RTP, the error resilience real-time transmission protocol ERRTP provides the transport layer encapsulation format carrying the error resilience encoding scheme related information, which makes the multimedia data mark the corresponding error resilience encoding scheme information while the multimedia data is transmitted over the ERRTP, thereby integrates the error resilience mechanism into the transport layer; for the private ERRTP encapsulation method and the adapted scheme for the protocol header information given by H.264 network abstract layer unit NALU architecture, the all NALU header information byte in the same ERRTP packet can be integrated into its header information, thereby the NALU important information is embodied in the ERRTP header information and the transmission efficiency is improved; the extended message mechanism of the H.264 itself is used to bear the QoS report information, a “in band” QoS report mechanism of the high layer protocol is implemented, and the additional overhead for using the channel is avoided.

Description

Multimedia communication method and terminal thereof

This application claims the priority of the Chinese Patent Application entitled "Multimedia Communication Method and Its Terminal" submitted to the China Patent Office on November 03, 2005, application number 200510110013.5. Technical field

The present invention relates to the field of multimedia communication technologies, and in particular, to a multimedia communication technology supporting fault tolerance and elasticity, and in particular to a multimedia communication method and a terminal thereof. Background technique

With the rapid development of computer Internet (Internet) and mobile communication networks, streaming media technology is becoming more and more widely used, from streaming media, movie playback to distance learning and online news sites. Currently, there are two ways to transmit video and audio on the Internet, including Download (Download) and ^ Streaming (Streaming). Streaming is the continuous transmission of video/audio signals, and the rest continues to download in the background while streaming media is playing. Streaming has two methods: progressive streaming (Regressive Streaming) and real-time streaming (Realtime Streaming). Real-time streaming is a real-time delivery, especially for live events. Real-time streaming must match the connection bandwidth, which means that image quality will degrade due to reduced network speed to reduce the need for transmission bandwidth. The concept of "real-time" refers to the rapid development of networks, especially with the emergence of third-generation mobile communication systems (3G, 3rd Generation) and Internet-based protocols (IP Internet Protocol). Video communication is gradually becoming the main communication. One of the business. Two-way or multi-party video communication services, such as video telephony, video conferencing, and mobile terminal multimedia services, impose strict requirements on the transmission of multimedia data streams and the quality of services. Not only does network transmission require better real-time performance, but equivalently requires video data compression coding to be more efficient.

In view of the current demand for media communication, the International Telecommunication Union Standards Department (ITU-T) officially released the H.264 standard in 2003 following the development of video compression standards such as H.261, H.263, and 1-1.263+. This is the Moving Picture Experts Group of MPEG (International Standardization Organization). Efficient compression coding standards that are jointly developed to accommodate the new phase of network media delivery and communication needs. It is also the main content of Part 10 of the MPEG-4 standard.

The purpose of the H.264 standard is to improve video coding efficiency and its adaptability to the network more effectively. The H.264 ⁴ video compression coding standard has quickly become the mainstream standard in multimedia communication. A large number of H.264 multimedia real-time communication products (such as conference TV, videophone, 3G mobile communication terminal) and network streaming media products have been released, with the official promulgation and widespread use of H.264, based on IP networks and 3G, The multimedia communication of the post-3G wireless network will inevitably enter a new stage of rapid development.

The following is a brief introduction to the message composition and transmission mechanism of the H.264 standard: The H.264 standard uses a layered mode to define the video coding layer (VCL, Video Coding Layer) and the network abstraction layer.

(NAL, Network Abstraction Layer), the latter is designed for network transmission, can adapt to video transmission in different networks, and further improve the "affinity of the network." H.264 introduces an encoding mechanism for IP packets, which is beneficial to Packet transmission in the network, supporting streaming media transmission of video in the network; having strong anti-error characteristics, especially suitable for wireless video transmission with high packet loss rate and serious interference. All data to be transmitted of H.264, including Image data and other messages are encapsulated into a uniform format packet transmission, ie, a network abstraction layer unit (NALU, NAL Unit). Each NALU is a variable long byte string of a certain syntax element, including a header containing one byte. Information, which can be used to represent data types, and load data of several integer bytes. A NAL unit can carry a code slice, a data segment of its own type, or a sequence or image parameter set. To enhance data reliability, each frame of image is divided. For several strips

(Slice), each slice is carried by a NALU, and the Slice is composed of several smaller macroblocks, which is the smallest processing unit. Generally speaking, the slices of the corresponding positions of the preceding and succeeding frames are related to each other, and the Slices at different positions are independent of each other, so that bit error interdiffusion between slices can be avoided.

The H.264 data includes texture data of non-reference frames, sequence parameters, image parameters, Supplemental Enhancement Information (SEI), reference frame texture data, and the like. The SEI message is a general term for messages that assist in the decoding, display, and other aspects of H.264 video. The prior art defines various types of SEI messages while preserving the SEI reservation messages, leaving room for expansion for future possible applications. According to H.264, SEI message Not required to reconstruct luminance and chrominance images during the decoding process. A decoder conforming to the H.264 standard does not require any processing for the SEI. That is to say, not all terminals that meet the basic requirements of H.264 can process SEI messages, but for terminals that cannot handle SEI messages, sending SEI has no effect on it, it will ignore the things it can't handle. SEI message. According to the SEI grammar rules, users can use the reserved message to transmit custom messages to achieve functional extension.

The SEI message is first introduced below. H.264 provides a variety of mechanisms for message extension, including SEI. Supplemental Enhancement Information (SEI) is defined in H.264, and its data representation area is independent of video coding data. Its usage is given in the description of NAL in H.264 protocol. The basic unit of H.264 code stream is NALU. NALU can carry various H.264 data types, such as video sequence parameters, picture parameters, slice data (ie specific image data), and SEI messages. data. SEI is used to deliver various messages and support message extension. Therefore, the SEI domain is used to transmit messages customized for a specific purpose without affecting the compatibility based on the H.264 video communication system. The NALU carrying the SEI message is called SEI NALU. An SEI NALU contains one or more SEI messages. Each SEI message contains variables, mainly payload type (payloadType) and payload size (payloadSize), which indicate the type and size of the message payload. The grammar and semantics of some commonly used H.264 SEI messages are defined in H.264 Annex D.8, D.9. The payload contained in NALU is called Raw-Byte Sequence Payload (RBSP), and SEI is a type of RBSP.

The data of the SEI indicates that the zone cylinder is called the SEI domain. Each SEI field contains one or more SEI messages, which in turn consist of SEI header information and SEI payload. The SEI header information includes two fields: one identifies the type of payload in the SEI message and the other indicates the size of the payload. Users can customize any of a variety of load types. For H.264 decoders that do not support parsing these user-defined information, the data in the SEI field is automatically discarded. Therefore, the inclusion of useful custom information within the SEI domain does not affect compatibility based on H.264 video communication systems. _; As described above, multimedia communication only requires medium compression coding efficiency, and requires real-time transmission network. At present, multimedia streaming basically adopts Real-time Transport Protocol (RTP) and its control protocol (RTCP, Real-time Transport). Control Protocol ). RTP is a transport protocol for multimedia data streams over the Internet and is published by the Internet Engineering Task Force (IETF). RTP is defined to work in a one-to-one or one-to-many transmission with the goal of providing time information and stream synchronization. The typical application of RTP is based on the User Datagram Protocol (UDP), but it can also work on other protocols such as TCP (Transport Control Protocol) or Asynchronous Transfer Mode (ATM). .

RTP itself only guarantees the transmission of real-time data, and does not provide a reliable transmission mechanism, flow control or congestion control, but relies on RTCP to provide these services. RTCP is responsible for managing the transmission quality to exchange control information between current application processes. During the RTP session, each participant periodically transmits RTCP packets, which contain statistics such as the number of transmitted packets and the number of lost packets. Therefore, the server can use this information to dynamically change the transmission rate, even Change the payload type. RTP and RTCP work together to optimize transmission efficiency with efficient feedback and minimal overhead, making it suitable for real-time data transmission over the network.

The H.264 multimedia data is transmitted over the IP network and is based on UDP and its upper layer RTP protocol. RTP itself is structurally applicable to different media data types, but different high-level protocols or media compression coding standards in multimedia communication (eg H.261, H.263, MPEG-1/-2/-4, MP3) Etc.), the IETF will develop a specification file for the RTP payload (Package) packaging method for the protocol, detailing the method of RTP encapsulation of large packets, which is optimized for this specific protocol. Similarly, the corresponding IETF standard for H.264 is RFC 3984: RTP Payload Format for H.264 Video ₀ This standard is currently the main standard for H.264 video stream transmission over IP networks, and is widely used. In the field of video communication, the products of major manufacturers are based on RFC 3984, and it is currently the only H.264/RTP transmission method.

In fact, the key difference between H.264 and other video compression coding protocols is that H.264 defines a new layer, called Network Abstract Layer (AL), which is a standard that makes it standard. The interface opens up the underlying business capabilities and shields the underlying network from the differences and abstracts the business capability layer. H.264 is designed to increase the separation of its video coding layer (VCL, Video Coding Layer) and the following specific network transport protocol layer. Sexuality, bringing greater application flexibility, defines a new layer of NAL, which is an early ITU-T video compression coding protocol such as H.261, H.263/H.263+/H.263++ There is no such thing. However, how to design a more efficient and better solution for the advantages of H.264 in the NAL and RTP protocol bearer cooperation makes RTP better for H.264, practical, and worthy of study.

The method of RTP carrying the NAL layer data of H.264 proposed by the RFC3984 specification is the current mainstream transmission method. The scheme encapsulates the NAL layer data in the RTP payload for carrying on the basis of the RTP protocol (RFC 3550). The NAL layer is located between the VCL and the RTP, and specifies that the video bitstream is divided into a series of NAL data units (NALU, NAL Units) according to defined rules and structures. _{0 The} RTP payload format for the NALU is defined in RFC3984. The following is a brief introduction to the frame format of the RTP and the encapsulation method of the NALU in the prior art.

RTP is typically carried over the UDP protocol to take advantage of its multiplexing and verification capabilities. If the underlying provides multipoint distribution, RTP supports multiple address transfers. Features provided by RTP include: payload type identification, sequence numbering, timestamp, and send monitoring.

The RTP packet format is as follows: RTP header information basic option occupies 12 bytes (minimum case), while IP protocol and UDP protocol header information occupy 20 bytes and 8 bytes respectively, so RTP packets are encapsulated in UDP packets and then encapsulated in IP. In the package, the total number of bytes occupied by the header information is 12+8+20=40 bytes. The detailed structure of the header information of the RTP packet is shown in Figure 1.

The front-to-back RTP header information shown in Figure 1 is: The first byte is the field about the header information structure itself, the second byte is the defined payload type, and the third and fourth bytes are the packet sequence number ( Sequence Number ), the 5th-8th byte is the timestamp (timestamp), the 9th-12th byte is the Synchronous Source Identifier (SSRC ID), and finally the Gonen Source Identifier (CSRC Ids, Contributing Source Identifiers) The list of ) is uncertain.

The first 12 bytes appear in all different types of RTP packets, while other data in the header information, such as the contribution source identifier, is only available when the mixer is inserted.

The specific meanings and full names of the above fields are described as follows:

The V field is the version (Version) information, which is 2 bits. The current version is 2, thus setting V=2, while other values such as V=l indicate an earlier RTP version, and V=0 indicates the original RTP predecessor, which is used in voice IP (VOIP) communication systems used on early Mbone networks. It evolved into RTP, and V=3 has not yet been defined.

The P field is a padding flag (Padding), which is 1 bit. If P is set, it indicates that the packet contains one or more padding bytes (Padding) at the end, and the padding does not belong to a part of the payload;

The X field is an extension identification bit (Extension), which occupies 1 bit. The format of the header extension is described in detail in section 5.3.1 of RJFC3550.

The CC field is the number of contributing sources (CSRC Count), which is 4 bits, indicating the number of CSRC identifiers at the end of the header information. The receiver can determine the length of the CSRC IDs list following the header information according to the CC field.

The M field is a marker bit (Marker), which occupies 1 bit. The interpretation of the identifier bit is defined in a specific profile. It allows identification of important events in the packet stream, which are specifically agreed by the communicating parties and are not subject to agreement. Limited

The PT field is a payload type (PT, Payload Type), which is 7 bits in total, identifies the format of the RTP payload and determines his interpretation in the application; it can also perform dynamic negotiation through signaling other than RTP to define the PT value and media. The relationship between the formats. In an RTP session, the RTP source can change the PT.

The next field is the sequence number of 16 bits, which the receiver can use to detect packet loss and recover the packet sequence.

The time stamp occupies 32 bits, which reflects the sampling time of the first byte in the RTP packet, and the receiver adjusts the media playback time or synchronizes according to it.

Synchronization source The SSRC ID is 32 bits. The specific value can be randomly selected to uniquely identify a media source. If a source changes the source transmission address, a new SSRC flag must be selected.

Contribute the source CSRC list and set it to SSRC or CSRC. In multiparty communication, the CSRC ID is inserted by the mixer.

In the case of carrying H.264 video, RTP packages the NA.Package of H.264 into an RTP packet stream. The NALU is mainly defined in the RPC 3984 file and is given based on this. The encapsulation and packing format of H.264 layer NAL data in RTP. The RTP encapsulation format of this NALU is shown in Figure 2.

Figure 2 shows the encapsulation structure of a NALU in the RTP payload, including NALU header information, NALU data content, and multiple NALUs that are filled end-to-end into the payload of the RTP packet.

The NALU header information is the first byte, and there are three fields. The meaning and full name are respectively described as follows:

The F field is defined as a forbidden bit (forbidden_zero-bit), which is 1 bit, used to identify grammatical errors, etc., and is set to 1 if there is a syntax conflict. When the network recognizes that there is a bit error in this unit, it can be set. Is 1, for the receiver to drop the unit, mainly used to adapt to different kinds of network environments (such as wired and wireless combined environment);

The NRI field is defined as a NAL reference identifier (nal_ref_idc), which is 2 bits, and is used to indicate the degree of importance of the NALU data. A value of 00 indicates that the content of the NALU is not used to reconstruct the reference image for inter prediction, instead of 00. Indicates that the current NALU is important data such as a slice or a sequence parameter set (SPS) and a picture parameter set (PPS, Picture Parameter Set) belonging to a reference frame. The larger the value, the more important the current NAL is;

The Type field is defined as NALU type (Nal_unit_Jype), which is 5 bits in total. There can be 32 types of NALU. The correspondence between the value and the specific type is given in Table 1.

Table 1 Relationship between Type and Type of Type Fields in NALU Header Information

Type value Type of NALU content

0 not specified

1 encoding of non-IDR images

2 encoding slice data division A

3 encoding slice data division B

4 encoding slice data division C

5 Code slice in IDR image

6 SEI (Supplemental Enhancement Information)

7 SPS (sequence parameter set)

8 PPS (image parameter set) 9 break into the early delimiter

10 end of sequence

11 code stream ends

12 Fill data

13-23 Reserved

24-31 Unspecified It can be seen that the information given in one byte of the NALU header information mainly contains the validity and importance level of the NALU. Based on this information, the importance of the data carried by the RTP can be determined.

In the current multimedia communication framework of H.264/RTP, the quality of service (QoS) monitoring and the congestion control and flow control based on this are mainly implemented by using the RCTP with the control protocol. RTCP is mainly used for control and reporting of the RTP protocol. Periodically transmit control packets to all participants in a two-way or multi-session session, reporting the same distribution mechanism as RTP packets. The underlying protocol provides multiplexing of data and control packets (eg, using separate UDP port numbers, etc.). In the RFC3550 file, it is recommended to increase the session bandwidth for RTCP to 5% of the media bandwidth.

The types and structure of RTCP packets are described below. The following RTCP packet types are defined in RTCP to carry a variety of control information: sender report (SR, Sender Report), statistics on the transmission and reception of the active sender; receiver report (RR, Receiver Report) from Participants who are not active senders receive statistics; resource description items (SDES, Source Description), which include CNAME; participant end (exit) identifier (BYE); special function (APP, Application-specific fUnction).

The packet structure of RTCP sending and receiving reports is shown in Figure 3. It can be divided into three segments according to the content type: header information, sender information, and report content block. Finally, the extension of the specific profile (so-called level representation) Specific rule exceptions that are tailored to the needs of a particular application scenario). The meanings of the specific fields shown in Figure 3 are briefly described as follows:

The V field is version information;

The P field is a padding flag bit (Padding);

The RC field is the Receive Report Count (RC, Reception Report Count), indicating the data. The number of received report blocks included in the package;

The PT field is the packet type (PT, Payload Type);

Length field;

The SSRC of the sender indicates the Synchronous Source Identifier (SSRC) of the initiator of the SR packet, where the synchronization source uniquely identifies a media data source, such as the source of the video;

The NTP timestamp field is Network Time Protocol (NTP), which indicates wall clock (absolute date and time), used in conjunction with RTP timestamps;

The RTP timestampe field is an RTP timestamp, that is, a timestamp generated by the RTP protocol; the sender's packet count field indicates the total number of RTP packets transmitted by the sender from the time the transmission is established to the generation of the SR packet;

The sender byte count field indicates the total number of bytes (not including headers or padding) that the sender transmits in the RTP packet during the generation of the SR packet (excluding header or padding). This field can be used to estimate Average rate of load;

The following fields contain zero or more receive report blocks, each of which receives the statistics of RTP packets received from a single sync source, including: fraction lost; cumulative lost packets;

Secondly, the received maximum extension sequence number and arrival delay jitter all reflect the network transmission status;

The last SR (LSR, Last SR) is 32 bits, which is the timestamp flag reported by the SR on the source, which is the middle 32 bits of the NTP of the previous SR;

The delay since last SR (DLSR, Delay since Last SR), which is 32 bits, refers to the length of the interval from the last SR to the SR. This parameter is used to calculate the key parameters of the QoS report.

The difference between the Receive Report (RR) packet format and the Transmit Report (SR) is: The value of the Packet Type field is 201; there is no sender information portion.

According to the RTP/TCP protocol standard, the functions of RTCP are as follows:

Basic function, providing a feedback reporting mechanism for real-time multimedia data transmission quality, and transmitting the sender report (SR) and the receiver report (RR) through RTCP; RTCP transmits a permanent transport layer identifier for each RTP source, called the canonical name (CNAME, Canonical Name). The SSRC identifier may change when a conflict is found or the program is restarted, so the receiver needs to track each participant through CNAME;

In order for RTP to proportionally increase the number of participants, the rate of RTCP packets must be controlled;

Deliver as little control information as possible.

It can be seen that the QoS report is transmitted by using the RTCP protocol, and the QoS information is reported according to the report content specified by the RTCP protocol, and the QoS monitoring for the bearer media such as H.264 is implemented based on this.

However, while RTCP brings the ability to provide QoS reporting mechanisms, the use of periodic reporting methods results in additional network bandwidth overhead, up to 5%. If the network is congested, resulting in a drop in the transmission QoS, the extra traffic generated by the RTCP will make the situation worse.

H.264 is the main video protocol for multimedia communication in the future. The network of future multimedia communication applications is mainly IP-based packet switching networks and wireless networks. The IP network implements "best effort" transmission and does not guarantee the QoS of the transmitted video data. Especially for the H.264 code stream that has been efficiently compressed and encoded, the problem is more prominent. Best-effort delivery over IP networks does not guarantee QoS, packet loss, latency, and latency jitter for real-time video communications. It has an impact on the quality of the restored video.

Error Resilience is the ability of a delivery mechanism to prevent errors from occurring or to be corrected with certain capabilities after an error has occurred. In a multimedia communication environment, it is critical that a video delivery mechanism is resilient to fault.

There are a variety of fault-tolerant resilience mechanisms, such as Forward Error Correction (FEC), Automatic Retransmission Request (ARQ), Error Concealment, and Joint Channel Coding (JSCC, Joint Source). -Channel Coding), interleaving and eliminating bit error spread. The use of multiple error correction coding to encode the data to be protected essentially forms data redundancy, thereby increasing the ability to resist errors. The main error of the packet on the network is the packet loss error, which is called Erasure Error in the error correction coding theory. Error correction codes for deletion errors are a large class called Erasure Codes. The so-called erasure code is to divide the data stream sequence into segments of the same size (Unit), also called data nodes (Data Nodes). For convenience of presentation, it is assumed that there are n data nodes. Then, according to the mathematical operation rules, the data nodes are calculated to generate check nodes (Parity Nodes or Check Nodes). In order to enhance the protection capability, the check nodes may continue to operate to generate the second layer check nodes, and so on. The third layer, the fourth layer, can be generated up to the Nth layer check node.

In general, if multiple layers of check nodes are involved, the number of nodes on each layer is decremented according to a certain rule with respect to the previous layer, thus forming a layer-by-layer hierarchical multi-node structure. It can be visually represented as a pyramid that turns 90 degrees to the right. The leftmost side is the data node layer, and the right side is the first layer of the face node, the second layer of the check node, ..., the Nth layer check node.

The erasure code has a very important property, that is, the time complexity required for processing has a linear relationship with the number n of data nodes, so it is called linear-time. Many other erasure codes, such as the famous Reed-Solomon code, require much more time complexity, on the order of n*log2n*log(logn). Therefore, the erasure code with linear time is much better used in real-time communication.

Tornado erasure code (hereinafter referred to as Tornado code) is simple in structure, efficient in operation and strong in protection. It has been widely used.

In the Tornado code, multiple check node layers are generated layer by layer from the data nodes. Both the check node and the data node are sent by the sender to the receiver through the network. If some nodes are lost during the network transmission process, because the upper node participates in the generation of the lower node, the information of the upper node is already included in the lower node and the lower node, so the information of the lost node can pass the lower level of sufficient majority. The node or lower node is fully recovered. Let the number of data nodes be n, and the number of check nodes generated is 1. Then define the code rate and redundancy rate of the erasure code are: r=n/(n+l), lr=l/(n+l); under the same conditions (protection capability, delay caused, etc.) The higher the code rate (inevitably, the lower the redundancy rate), the higher the efficiency of the erasure code. Figure 4 shows the relationship between a typical Tornado code data node and the check nodes of each layer. The line between the nodes in the figure is called the edge, and the node on the left side of the edge participates in the calculation of the right node. It can be seen that there is a many-to-many logical relationship between the two nodes before and after. Let the number of data nodes be n, and the total number of check nodes be m, then define the code rate of the erasure code r=n/(n+m) and the redundancy rate lr=m/(n+m). In the same situation (protection, delay, etc.), the higher the code rate and the higher the redundancy rate, the higher the efficiency of the erasure code. The structure and performance of Tornado code are mainly determined by three factors: (a) the number of data nodes and the law of layer-by-layer scaling, which is generally scaled down in equal proportions; (b) the calculation method for generating the next layer of nodes; (c) The relationship between two adjacent nodes.

The following relationship can be derived between each parameter of the Tornado code. The number of data nodes is set to n, the number of check nodes is set to m, the scale of scaling is set to p, and the number of check nodes is i, then the front i-1 layer the number of check nodes, respectively ^{np, np 2, ..., ηρ} Μ, i.e., the last layer of the number of i-th layer as npV (l- p), so that the total number of nodes obtained

n+m=n+n+np ² +..+np ^1_1 +npV( 1 -p)=n/( 1 -p) , Bay ^, J has m=np/(lp), is the scaling ratio and calibration Check the implicit relationship between the number of nodes. Since it is necessary to ensure that the number of nodes np, np ² np" and npV(lp) of each layer are integers, the feasible values of n can be calculated according to the given i and p, such as i=4, p = 1/2, It can be inferred that n must be a multiple of 16.

The most commonly used calculation method in the Tornado code generation process is the exclusive OR operation, because the XOR operation has 4 convenient recovery functions. For two equal length bit sequences

, XORing by bit to obtain the same long bit sequence C, has the following properties: A and C X or O, B and C X or O; the same for the XOR between multiple sequences, there are The corresponding recovery method. It can be seen that after the XOR operation, the data nodes or the check nodes are connected with each other, and after any node is lost, it can be restored by all the remaining nodes. Since the final layer of check nodes has different scaling ratios, it is generally calculated using a conventional error correction coding strategy, such as a Reed-Solomon code.

The front and back layers of the Tornado code have an association relationship, that is, which node of the lower layer is calculated by which nodes of the previous layer. According to the graph theory, a two-part graph is formed between the two nodes before and after, and the association between the nodes in the front and back layers is determined according to the association between the left and right nodes of the two-part graph. In the current Tornado code strategy, the parameters n, m, i, p, etc. are determined by given protection capabilities and other requirements, such as the reasonableness of the data node size, the maximum acceptable network delay, etc., and given the node degree vector. Randomly distributed, and can be encoded in Tornado. When decoding is performed at the receiving end, according to the bipartite graph of each level, if one right node is correctly received and only one of all the left nodes associated with it is lost, the lost node can pass this right The node is recovered with all the left nodes that have not been lost, that is, the error correction effect is achieved.

In fact, the range of erasure codes is very large. Tornado codes are only one of them. In addition, there are RS (Reed-Solomon) codes and Low Density Parity Codes (LDPC).

An important performance indicator of the erasure code is its error correction capability (or protection capability), which is directly reflected in the maximum number of lost packets allowed under the packet loss error (on the premise of a certain number of packets), or The package is able to correct the percentage of the package correctly above this maximum allowable number. In general, the higher the protection, the higher the redundancy rate under the same conditions.

The protection capability is not only applicable to erasure codes, but on a larger scale, all FEC codes can be measured by protection capabilities. In video data, some data are relatively important, such as structural parameters of video sequences, structural parameters of images, header information, etc. Other data are relatively less important, such as image content data. When using FEC for protection, a more robust code is used for relatively important data, and a weaker code is used for relatively unimportant data. This balances protection and efficiency. This method of FEC protection based on the relative importance of data for different protection capabilities is called UEP (Unequal Protection). QoS guarantee for video communication services is easily realized by unequal protection.

The idea of unequal protection is to protect data with different importance (relative) in multimedia data with different protection/protection strength protection mechanisms. Different protection mechanisms can refer to large or small classes. For example, large classes differ in principle, and small classes differ only in structure or parameters. Hierarchical protection is to divide the protection mechanism into multiple levels according to the protection ability. Hierarchical protection is actually an adaptive strategy. The combination of protection and hierarchical protection forms. More complex and powerful protection strategies.

In actual H.264 video communication, the image quality degradation caused by the deletion error caused by packet loss or the like is very serious, and is more caused by the collapse of the decoder system. In the video communication based on the H.264 standard, in addition to the fault-tolerant elastic protection strategy, an effective anti-drop packet and other techniques for deleting errors must be adopted, and a plurality of video anti-error methods are combined to ensure the quality of the restored image.

The existing anti-drop error smear can be roughly divided into two categories: (a) Active error-proof type: Take pre-protection measures, such as introducing a redundancy mechanism, try to ensure that the data packet is not lost or that the receiving end can recover a small amount of loss. (b) Error compensation type: Take certain compensation measures in case of error, for example, in the case of serious deterioration of network conditions, the packet loss rate is very high, and the active error prevention method loses its effect. The error is compensated.

The error compensation method for error compensation is divided into two types: error masking and error spreading. Among them, the error concealment is focused on compensating the current impact of the error, and the error re-distribution elimination is to eliminate the subsequent influence of the error in spatial and temporal diffusion. Error concealment can also lead to the spread of bit errors. In fact, due to error concealment, the codec and decoder decoding image cache contents do not match, resulting in the spread of bit errors in the time domain.

The existing H.264/RTP transport architecture and the RTCP-based QoS reporting method use RTP to directly encapsulate the NALU for transmission, and use the RTCP SR/RR report to monitor QoS information. The related technical details have been introduced.

In addition, the Tornado code used in the prior art is a relatively complicated solution. The data transmission protection method based on H.261/H.263/H.263+/H.263++/H.264 video compression coding is implemented by using Tornado code.

In addition, the existing error elimination methods are independent error concealing methods or error diffusion elimination methods, and the error concealing methods include time domain masking, spatial domain masking, and space-time joint masking. The error spread elimination has intraframe coding, identification, adaptive intra block refresh, and the like.

The time domain masking method uses the information of adjacent frames on the time axis to estimate the missing data. The following methods can be used: Simply replace the missing data with the data of the same position of the adjacent frame; Consider the motion prediction factor, and perform motion prediction based on the adjacent frame data. In addition to this there are more complicated masking strategies, but the amount of calculation is very large.

The spatial domain masking method utilizes spatially adjacent regions of the lost data region for error concealment. This method is computationally intensive.

The space-time joint masking method is a combination of spatial domain and time domain error concealment. Or, combine spatial data and time data to cover up together.

The error code diffusion elimination method based on intraframe coding adopts intraframe coding for macroblocks affected by bit errors, that is, using the forward dependence of motion vectors to perform accurate error tracking, and adopting frames for macroblocks affected by error codes. Internal coding can effectively prevent bit error diffusion. In addition, in the prior art, multi-level protection and unequal protection are not realized because there is no convenient solution for providing network condition monitoring and description of the relative importance of data.

In practical applications, the Tornado code scheme in the prior art is too complicated and inefficient, and is applied to the protection of video data, and the delay is large, which cannot meet the performance requirements of real-time communication.

At the same time, there is a lack of a mechanism to report network conditions. Therefore, the two parties cannot decide to adopt appropriate protection mechanisms according to the network conditions, so that the protection and adaptive hierarchical protection cannot be effectively used, and the reliability of multimedia communication cannot meet the requirements.

Moreover, the two methods of error concealment and error diffusion elimination are not well unified, and sometimes they are contradictory, and their effects cancel each other out.

In the prior art, the joint working mechanism of the H.264 NAL and the RTP protocol is lacking. How to provide a protection mechanism based on the H.264 NAL and the corresponding RTP encapsulation method is not defined, and is a blank. Furthermore, a good method, such as Tornado coding and other protective measures for efficient, adaptive hierarchical and unequal protection and protection now can not be applied H.26 ⁴ video data.

The prior art does not have the ability to utilize H.264's message extension mechanism to report network status and QoS information. Without such a mechanism, many good technologies have no necessary preconditions for application.

The prior art is relatively fragmented, lacking integration, and the effects of each other are not mutually reinforcing. Simultaneously Many technologies are still in the stage of academic discussion, and have not entered the definition and development of the communication protocol level, which has affected the practical application. In the integration of these technologies, the constraints of real-time communication performance requirements must be considered. The selected technology must have good performance and the calculation cannot be too complicated.

The prior art protects the video communication stream with a fixed erasure code strategy, and cannot adapt to network communication changes; the alternative mechanism adopted by the error concealment method may cause error diffusion; the error diffusion elimination method requires a complicated mechanism or an additional feedback channel. The system consumes resources and network bandwidth resources.

In the prior art solution, the header information of the NALU is completely encapsulated in the payload, so that the RTP protocol cannot directly know the attributes, levels, importance, and the like of the payload, and thus the QoS mechanism based on this cannot be implemented. Secondly, such an encapsulation format also causes the NALU header information to occupy the payload resources, because each NALU has header information, which results in many cases, because the header information of multiple NALUs of the same type in an RTP is the same. , thus wasting RTP transmission bandwidth resources.

The H.264/RTP multimedia communication framework uses a generic coordination control protocol RTCP to transmit QoS reports for QoS monitoring. However, RTCP itself is not necessarily the most suitable for specific video communication applications such as H.264. Its own out-of-band re-opening logical channel to transmit QoS reports affects network conditions and leads to conflicts.

The key point is that the prior art does not implement a fault-tolerant and flexible protection strategy of the transport layer, and cannot provide the reliability and communication quality of multimedia transmission.

Summary of the invention

SUMMARY OF THE INVENTION A primary object of the present invention is to provide a multimedia communication method and terminal thereof that improve transmission reliability and communication quality.

The embodiment of the invention provides a multimedia communication method, including:

The transmitting end selects the encoding mode according to the fault-tolerant elastic protection policy, encodes the multimedia data, and sends the encoded multimedia data encapsulated by the real-time transport protocol to the receiving end; the receiving end receives the multimedia data, if the receiving The multimedia data has a transmission error, and the receiving end restores or partially restores the multimedia data of the transmission error.

More suitably, further comprising:

The receiving end collects the communication quality, generates a quality of service report, and sends it back to the Delivery end

The sending end adjusts the fault tolerant elastic protection policy according to the quality of service report. More suitably, further comprising:

The receiving end transmits error information according to the multimedia data of the transmission error, and implements an error concealment strategy;

The receiving end feeds back the transmission error information to the sending end;

The transmitting end implements an error diffusion elimination strategy according to the transmission error information.

Preferably, the real-time transport protocol header information carries code-related information, and the receiving end recovers or partially recovers the multimedia data according to the code-related information.

Preferably, the transmitting end obtains the positioning information of the lost strip according to the transmission error information, and performs segment-by-frame intra-frame coding on the lost strip to implement the error spreading elimination strategy.

According to an embodiment of the present invention, a multimedia communication terminal has a basic function module for implementing multimedia communication, and includes a codec module for implementing a multimedia codec function, and further includes:

The fault-tolerant and flexible transmission control protocol module is configured to receive the multimedia data encoded by the codec module, perform fault-tolerant and flexible protection on the multimedia data, and send the fault-tolerant and elastic-protected data to the network side for transmission. The fault-tolerant elastic implementation transmission control protocol module is further configured to receive the multimedia data on the network side, perform error correction on the multimedia data, and transmit the multimedia data to the codec module for decoding.

Preferably, the terminal further comprises:

The protection method and the policy negotiation module are configured to perform fault-tolerant and flexible protection policy negotiation between the two communication parties, determine a protection policy set, and implement the transmission control protocol module selection for the fault-tolerant elasticity;

a forward error correction module, configured to implement at least one forward error correction protection method, to maintain related parameters of the forward error correction protection method, wherein the protection method and the policy negotiation module control the forward error correction module to The unequal protection and adaptive hierarchical protection functions are implemented, and the fault-tolerant elastic implementation transmission control protocol module implements fault-tolerant elastic protection and error correction functions by calling the forward error correction module. Preferably, the terminal further comprises:

Error masking module for implementing error concealment function;

The codec module is used to implement codec of the H.264 codec standard, and is also used for error diffusion elimination function;

A network condition analysis calculation module is also included for analyzing the calculated network condition and providing information to the error masking module and the codec module.

Preferably, the terminal further comprises:

A supplementary enhanced message extension processing module is configured to implement a quality of service report and a network status report function, and send the report to the network status analysis calculation module.

Compared with the prior art, the technical solution of the present invention adopts a fault-tolerant elastic real-time transmission protocol (ERRTP), and provides a transport layer encapsulation format that can carry information related to a fault-tolerant elastic coding scheme on the basis of the existing RTP, so that the multimedia data is in the ER. The TP transmits the corresponding fault-tolerant elastic coding scheme information at the same time, thereby integrating the fault-tolerant elastic mechanism into the transport layer; giving a dedicated ERRTP encapsulation method and protocol header information transformation scheme for the H.264 NALU structure, by using the same ERRTP The header information bytes of all NALUs in the packet are combined into their header information, using a clever combination that does not affect the operation of the existing ERRTP protocol and device, and can directly reflect the attributes of the NALU payload in ERRTP. In the header information, on the one hand, the bearer efficiency is greatly improved, and on the other hand, the basis of the implementation of the QoS mechanism is provided;

Based on the H.264 message extension mechanism, the communication quality is measured by the receiving end and fed back to the transmitting end, and the extended message mechanism of the high-level media protocol H.264 itself is directly used to carry the QoS report information, avoiding the use of additional channels, and realizing a kind of "" In-band 'QoS reporting mechanism;

At the transmitting end, it is also possible to select various standby fault-tolerant elastic coding schemes according to current network conditions and multimedia data importance levels, thereby achieving the goal of unequal protection and achieving a balance between protection capability and transmission efficiency;

On the basis of the feedback mechanism from the receiving end to the transmitting end, the unequal protection and the multiple fault-tolerant flexible schemes are alternately mixed, and the transmitting end selects and uses different levels according to the QoS report fed back by the receiving end and the related network transmission status message. Protection strategy, based on the data importance level reflected from the ERRTP header information, you can also select different levels of data. Use appropriate protection strategies;

Aiming at the H.264 NALU data stream, a combination of error concealment and error diffusion elimination is presented, which comprehensively reflects the advantages of the two technologies, and achieves bit error diffusion elimination through error information feedback mechanism and segmental successive intraframe coding;

An efficient Tornado code scheme is also provided. By ensuring that the data transmission protection capability is not significantly degraded, by setting an erasure code having only one layer of check nodes, the erasure code generation check node layer is reduced. The amount of calculation reduces the data transmission delay time, so that the data transmission protection performance and cost ratio are improved;

Finally, the above various multimedia communication related enhancement technologies are integrated on the multimedia communication system, and various technologies and protocol architectures are modularized, and various technologies work in coordination with each other to further enhance the reliability of multimedia communication. The transmission structure saves the network transmission bandwidth; the realization of the unequal protection achieves the balance between protection capability and transmission efficiency, facilitates the realization of QoS guarantee for multimedia transmission, further improves the quality of service, reduces redundancy, and improves transmission efficiency. Achieving compatibility with the prior art has improved the robustness of the new method of ERRTP;

QoS reporting based on H.264 message extension mechanism, implementing QoS monitoring in-band, reducing bandwidth overhead, reducing system implementation complexity, improving the effectiveness and efficiency of the current H.264 video network transmission quality reporting mechanism, thereby improving H.264 video network transmission quality; unequal protection and multi-level protection strategies are more flexible, accurate and timely to adapt to network transmission requirements, improve protection capabilities, improve system efficiency and reliability, ensure accurate statistical information and save system resources;

Combine error concealment and error diffusion elimination to avoid error diffusion caused by error concealment. Under the premise of simple complexity, achieve the ideal error elimination effect, improve video transmission quality, save overhead, reduce the mechanism, and ensure system compatibility. Sex

Improve the cost-effectiveness of data transmission, improve data transmission efficiency, and promote the application of new technologies such as H.264 by using the improved Tornado erasure code scheme;

Integrating multiple enhancement technologies into multimedia communication systems to improve the quality of multimedia communication, and greatly improve H.264-based multimedia communication products such as conference television and video The performance and user satisfaction of the application on the IP network. DRAWINGS

1 is a schematic diagram showing the structure of a header information of an RTP data packet;

2 is a schematic diagram of a package format of an RTP packet payload to NALU data;

Figure 3 is a schematic diagram of a format of a QoS report data packet based on the RTCP protocol;

Figure 4 is a schematic diagram of the Tornado erasure code principle;

FIG. 5 is a schematic structural diagram of a module supporting a fault tolerant elastic multimedia communication terminal according to a first embodiment of the present invention; FIG.

6 is a schematic structural diagram of a multimedia communication protocol stack according to a first embodiment of the present invention; FIG. 7 is a schematic diagram showing a header information structure of an ER TP data packet according to second and third embodiments of the present invention;

8 is a schematic diagram of an SEI encapsulation format for carrying a QoS report according to a fourth embodiment of the present invention;

Figure 9 is a diagram showing the principle of error spread elimination based on segmented successive intra coding according to a sixth embodiment of the present invention.

Figure 10 is a block diagram showing the structure of the erasure code of the present invention. DETAILED DESCRIPTION OF THE INVENTION In order to make the objects, technical solutions and advantages of the present invention more comprehensible, the present invention will be further described in detail with reference to the accompanying drawings.

The present invention integrates various enhancement techniques on a multimedia communication system, combining the respective advantages of various enhancement techniques to improve system performance, transmission reliability, and communication quality. These enhancements include the fault-tolerant elastic real-time transport protocol (ERRTP, Error Resilience Real-time Transport Protocol) that integrates FEC into the RTP protocol. The technology of synthesizing NALU header information into the RTP header, feedback on the SEI extended message bearer QoS report and network status Technology, multi-level protection and unequal protection mechanisms based on feedback, techniques that combine error masking and diffusion elimination, and improved Tornado coding schemes.

The invention combines various enhancement technologies and combines them in a multimedia communication system to realize fault-tolerant and flexible H.264 video communication, and the system includes a general control module and a user. Interfaces, network communication modules, I/O and underlying driver modules, various service modules, communication process control modules, application protocol modules, etc., and also include protection methods and policy negotiation modules for implementing various enhancement technologies, FEC modules, ERRTP modules, RTCP module, H.264 NAL module, I- 1.264 encoder module, H.264 decoder module, audio codec module, error masking module, SEI message extension processing module, network condition analysis and calculation module.

[In the first embodiment of the present invention]

A plurality of enhancement technologies are implemented and modularized in a multimedia communication system, mainly referring to a multimedia communication terminal. First, the implementation description of the device is performed from each component function module of the terminal, and a complete terminal internal module structure diagram is as follows. Figure 5 shows. It should be noted that the functional modules mentioned here are all defined functionally, and the specific implementation manners may be software, hardware, firmware, and a combination of software and hardware. A complete multimedia communication terminal must first contain the following modules:

Main control module: responsible for the control of the entire terminal system;

User interface (or interface) module: Responsible for user input and output interaction, the user operates through interface control elements such as menu buttons, and displays feedback information such as current system status, parameters, network status, etc.

Network communication module: Responsible for communication with the network, providing TCP, UDPJP and lower communication protocol stacks such as Ethernet, PPP, ATM, etc.;

I/O and underlying driver modules: responsible for driving hardware devices, such as video, audio capture devices and display/playback devices, and for video and audio data input and output;

Various business modules: Implement various specific services, such as videophone, multi-party conference, video mail, timely news, video chat, etc.

Communication process control module: Controls in the specific communication process, such as implementing the application chairperson in the multi-party conference, releasing the chairman, Shenqi speaking, controlling the broadcasting of a certain venue, the venue browsing, etc.;

Application protocol module: It can be a specific application protocol such as H.323 system (including H.225.0, RAS, H.245, H.235, H.460, etc.) and SIP. One: In this case, this agreement is a general term for a series of agreements called "protocol umbrella"; In addition, corresponding to various enhancement technologies, respectively implemented in the following modules: Protection method and policy negotiation module: The module is responsible for negotiating the protection method between the communication parties, determining the allowed set, and then negotiating a set according to the allowed set. The strategy of mixing and alternating use of protection methods. The negotiation is completed through the "application protocol module". The module controls the FEC module, the latter implements different FEC protection modes, functions such as unequal protection and adaptive hierarchical protection;

FEC module: This module supports a variety of FEC protection methods. They can be subclasses in multiple categories. It is assumed that a total of T different methods are supported. According to the results of the negotiation (from the "Protection Method and Policy Negotiation Module"), H.264 video data and audio data (not in the scope of this patent) are protected. The module internally stores the generation rules and parameters for the various FEC subclasses, so it contains an internal database for storing this data. This module enables mixing and alternate application of different protection methods;

ERRTP module: Implementing the ERRTP protocol, the protocol encapsulation format for ERRTP and the related encapsulation decapsulation steps corresponding to H.264 will be described in detail in the following embodiments;

RTCP module: Implements the normal RTCP function. Although the present invention provides a reporting mechanism based on the H.264 SEI message extension, the main RTCP information can be reported, but the use of RTCP is not excluded, and the two reporting mechanisms can coexist. Mainly considering compatibility and interoperability, the other terminal may not support the use of SEI message extension 4 advertising mechanism;

H.264 NAL module: The function of implementing the H.264 network abstraction layer;

H.264 encoder module: In addition to realizing the normal H.264 encoder function, the error diffusion elimination function of the present invention is also implemented, so the information is derived from the "network condition analysis and calculation module,;

H.264 decoder module: implements the normal H.264 decoder function;

Audio codec module: implement audio codec function, the supported protocol can be ITU-T

G.711, G.722, G.723.1, G.728, G.728, G.722.2 (3GPP AMR), organized by MPEG

MP3, AAC, etc.;

Error Masking Module: Implements the error concealment function provided by the present invention. The information is based on the "Network Status Analysis Calculation Module" and the "H.264 Encoder Module";

SEI message extension processing module: implementing QoS and SEI message extension based on the present invention The network status report function, on the transmitting end, collects data to form RTCP SR, RR report, and then sends out through SEI extended message encapsulation; extracts RTCP SR, RR report from SEI extended message at the receiving end, and then sends the data to the network The condition analysis calculation module "is analyzed and calculated;

Network condition analysis calculation module: According to the data from the "SEI message extension processing module", perform analysis and calculation to obtain network status data, such as packet loss rate, jitter, delay, clockwise end-to-end bandwidth, etc., and then Use this data to control the "II.264 Encoder Module" and "Error Masking Module", and also send this data to the "User Interface Module" which can be displayed to the user.

After understanding the module structure of the communication terminal as a whole, the terminal is described from the level of the protocol stack. A communication terminal system implements multiple different levels of protocols, which form the protocol stack. For the terminal of the present invention, the protocol stack and the common multimedia communication terminal have the same place, and there are also different places, and some new levels are added in some places. Fig. 6 is a block diagram showing the structure of a multimedia communication protocol stack in accordance with a first embodiment of the present invention.

The H.264/ER TP multimedia delivery architecture of the present invention differs from the traditional H.264/RTP architecture mainly in that:

Replace the RTP/RTCP layer in the general terminal protocol stack with the ERRTP/RTCP layer, and use the fault-tolerant elastic ERRTP to combine the fault-tolerant elastic protection mechanism in the transport layer; add a "protection mechanism and strategy" in the application protocol layer. Negotiation layer", this layer is mainly used for communication parties to negotiate various protection levels and related protection schemes when implementing multi-level protection and unequal protection;

The "SEI Extended Reporting Layer" is added between the H.264 VCL layer and the NAL layer. This layer facilitates the implementation of QoS monitoring and network transmission status based on SEI extended messages.

The "FEC layer" is added between the H.264 NAL layer and the ERRTP/RTCP layer. This layer implements node partitioning, encoding, and encapsulation for the H.264 NALU data stream.

It will be understood by those skilled in the art that the first embodiment of the present invention provides a basic modular structure and a protocol stack composition as an example of a typical H.264 service, for other protocols. The multimedia communication protocol or application that appears in the future, only needs to implement the relevant technical details according to the specific application based on the principle of the present invention, and achieve the object of the invention without affecting the essence and scope of the present invention.

On the premise of giving the overall architecture of the system, the implementation details of each enhancement technique will be described in turn below.

Aiming at many problems existing in the prior art, the present invention proposes an improved RTP protocol supporting fault tolerance resilience, which aims to integrate a fault-tolerant elastic mechanism into a transport layer protocol, which not only simplifies the transmission structure, reduces complexity, but also improves the fault-tolerant elastic mechanism. Flexibility enhances transmission reliability. Due to its fault tolerance, the present invention calls this improved RTP protocol a fault tolerant elastic real time transfer protocol (ERRTP or ER2TP, Error Resilience Real-time Transport Protocol). The main difference between ERRTP and RTP is that the ERTP protocol packet header information extension can carry information about the fault-tolerant elastic coding scheme, such as FEC type, protection capability, and coding parameters.

On the basis of ERRTP, the present invention conveniently realizes unequal protection. Firstly, various protection measures with different protection capabilities are available for selection, and then the sender can collect information such as network status and importance of multimedia data. These factors are used to select appropriate protection measures to achieve the goal of unequal protection and to achieve a balance between protection capability and transmission efficiency. Since the FEC related information is carried on each ERRTP data packet, the transmitting end only needs to fill in the information of the selected scheme into the ERRTP header information, and the receiving end can correctly recover or correct according to it. wrong.

Finally, for the NALU data transmission application of H.264, the specific implementation method based on erasure code protection is given, including the steps of dividing, generating, encapsulating and decapsulating data nodes and check nodes. A series of NALUs are equally divided into several data nodes, and then the Tornado code is used to generate the face nodes. All of these nodes are distributed in several ERRTP packets, and the receiver performs this inverse process.

[Second Embodiment of the Invention]

On the basis of the first embodiment, the transmitting and receiving parties implement unequal protection based on ERRTP. The main steps are as follows:

The transmitting end selects a fault-tolerant elastic coding scheme to perform erasure coding on the multimedia data. ERRTP encapsulates the encoded multimedia data, and carries information related to the fault-tolerant elastic coding scheme in the ERRTP header information, and then sends the information to the receiving end;

The receiving end decapsulates the received ERRTP packet, and extracts the information about the fault-tolerant elastic coding scheme from the ERRTP header information, and then selects the fault-tolerant elastic coding scheme for fault-tolerant elastic decoding according to the information of the fault-tolerant elastic coding scheme to obtain the multimedia data.

The unequal protection is reflected in the fact that the transmitting end selects the fault tolerant elastic coding scheme according to the current network transmission status and/or the quality of service level of the multimedia data to be transmitted.

First, the specific structure of ERRTP is introduced. The specific ERRTP header information structure embodiment is given below. Figure 7 is a diagram showing the structure of an ERRTP header information according to a first embodiment of the present invention. As can be seen from the figure, the version information field V has a value of 3, indicating the ERRTP protocol, to distinguish it from the traditional RTP protocol (V=2). The header information extension is also followed by the relevant information field about the fault-tolerant elastic coding scheme. In this example, the fault-tolerant elastic coding type field, the fault-tolerant elastic coding parameter field, the packet length field, and the number of packets field are included.

The fault-tolerant elastic coding type field is used to indicate the erasure code type used by the fault-tolerant elastic coding scheme, and may also be referred to as an FEC Type field, that is, the FEC coding type is indicated, which is 4 bits, and can represent 16 different FEC types, from practical application. Medium is enough. The types defined here are actually large types, and will continue to be subdivided into various schemes, called subtypes. The large types in actual applications are, for example, 0010 for Tornado code and 0011 for RS code. This field identifies 16 different types of FEC codes. The LUT (Look-Up Table), which is required by the two parties to agree on a correspondence between the FEC encoding type and the encoding type code, is called FECTypeLUT.

The fault-tolerant elastic coding subtype field is used to indicate the related parameter setting of the fault-tolerant elastic coding scheme. For each type of FEC coding, it is also necessary to determine the setting of various parameters to be specifically implemented. This field serves to clarify specific parameters. Since the resources in the ERRTP header information are limited, it is impossible to list specific parameters and their rules, etc. corresponding to various FEC encoding schemes, and the first embodiment of the present invention indicates various alternative parameters by using the concept of subtypes. Set the plan. This field is also called the FEC encoding subtype field, FEC Subtype, which occupies 9 bits. This field mainly represents subtypes further subdivided under each of the large types defined in the FECTypeLUT. The data packet length field is used to indicate the data node length of the fault-tolerant elastic coding scheme after erasure-removing and encoding the multimedia data, which is called a Data Length field, which is 11 bits. Since each packet length should be less than the Maximum Transport Unit (MTU), the current cable channel MTI 1500 = 0x5DC bytes, the wireless channel MTI is 100 bytes, so the field 11 bits is enough to store the data packet. length.

The number of data packets is used to indicate the number of data nodes carried by the ERRTP packet, which is also called a Packet Number field, which occupies 8 bits. For example, after several NALUs are verified by the forward error correction code, the packet is encapsulated in multiple The number of data nodes carried in each ERRTP in ERRTP.

It can be seen that after these fields are available, the decoding end or the network node can verify the received data packet according to the FEC code type and the check type of the data packet given by the field, and recover the lost data packet.

It should be noted that the sub-type FEC Subtype field mentioned above has a total of 9 bits for encoding a parameter setting scheme indicating various alternatives, and how to perform the coding indication in the first embodiment of the present invention is given below. technical details.

First, the sending and receiving parties need to negotiate to determine the field indicating the relationship correspondence table. Before starting the transmission, the sender and the receiver negotiate to determine: for various types of FEC codes, the correspondence between the value of the FEC Subtype and the related parameter setting scheme of the FEC code indicated, and various alternatives. Specific parameter settings.

Then, the sender and the receiver both establish a correspondence table according to the negotiation result, and are configured to query the corresponding FEC coding type or FEC codec processing module according to the FEC Type and FEC Subtype fields;

In the process of transmitting and receiving, the transmitting end calls the corresponding erasure coding processing module to perform erasure coding, and the receiving end calls the corresponding erasure decoding processing module to perform erasure decoding.

In practical applications, subtype information actually indicates two aspects:

A. Generation rules for FEC coding (Generation Rule);

B. Protection strength / protection.

The so-called generation rule is a rule or algorithm (Algorithm) of how the data node is processed at the transmitting end to generate each check node. Of course, the opposite is done at the receiving end, such as If a packet loss occurs during transmission, that is, some nodes are lost, the lost node can be recovered or partially recovered according to the generation rule. It can be seen that the generation rule is very important information, according to which both parties of the communication can work based on the FEC mechanism. Each of the FEC types listed in the FECTypeLUT has different generation rules; in each class, such as the Tornado code, the following subclass generation rules are combined with the specific generation parameters (generation parametei's). . So for each subclass here, the claim rule will be combined with the build parameters.

For example, for the Tornado code, the generation parameters include the following data: the total number of data nodes, the total number of check nodes, the number of check node layers, the scaling ratio of the number of power saves between successive layers, and the association of node associations between successive layers. Matrix, if there is an L-layer check node, then such an associative matrix has L, or equivalent, bipartite of the relationship between successive two-layer nodes. Parametric mathematical representation ₀ In the case that the large generation rules are the same, the generation of parameters often determines the protection strength of the subtype. For example, Tornado code, in the various generation parameters given above, the total number of data nodes and the total number of face nodes can basically determine the protection ability to a large extent (of course, strictly speaking, to fully determine the protection ability, all the generation is required. parameter). In the present invention, for each FEC large type, some main parameters (determining the maximum effect) that determine the protection ability are selected as representative generation parameters. By using representative generation parameters, subclasses under the large class can be arranged in order of protection from weak to strong (ascending order). Thus creating a LUT is called FECSubTypeLUT.

Each large type specifically supports multiple subtypes below, and can have specific application and communication capabilities (CPU processing speed, memory, program complexity, etc.) and needs to be determined. If the communication environment changes a lot and the performance of the network fluctuates widely, then the subtypes that need to be supported are generally more, but less. This can be agreed upon by the communication parties through the capability negotiation process before the communication begins. Negotiation can be carried out through the current mainstream multimedia communication framework protocols such as H.323 or Session Initial Protocol (SIP).

Assume that for a subclass below a large class, if it is necessary to distinguish S subtypes (S ≤ 2 ⁹ - 1), there are k representative generation parameters, denoted by p^ ² , ... ^^, then Table 2 gives Make a corresponding correspondence In the example of the system, the superscript in the table indicates the large type of FEC, and the subscript indicates which parameter is specific.

For example, for a Tornado code, the correspondence can be set to :000000010 - ( 24, 20 ) (total number of data nodes = 20, total number of check nodes = 4), 000000011 - ( 30, 20 ), 111111111 - others.

For a subtype of FEC coding of a certain characteristic, a given set of generation rules combined with corresponding generation parameters corresponds to a unique coding scheme, that is, the only decision is how to generate a check node from the data node, and how to recover the lost node. A database can be created to store the generation parameters for each of the large types and subtypes. The generation rules themselves are implemented in hardware or software modules. Therefore, each type of macro corresponds to a FEC processing module at the transmitting end, which is responsible for generating a check node; at the receiving end, it also corresponds to an FEC processing module, which is responsible for restoring the node. However, for each of the large types of modules, it is necessary to read the specific generation parameters of each seed type from the above generated parameter database, thereby performing processing. Therefore, both parties of the communication decide which FEC processing module to call and which generation parameters to read based on the information of the two information fields FEC Type and FEC Subtype.

Due to the development of multimedia communication technology, the H.264 video coding standard has gradually become the mainstream media coding format. Therefore, based on the first embodiment, the second embodiment of the present invention gives the NALU of H.264 with ERRTP. The specific steps of the data stream for FEC encoding and decoding are as follows.

The sender sets multiple CiC to S) H.264 NALU merges into a group to perform unified coding transmission. First, the S NALUs are re-divided into equal-length blocks, and the support is set to M. These M are data nodes. . In this step, the S NARUs of Η.264 are grouped into one group; then the S NALUs are concatenated end-to-end, connected to form a large block, and then the large block is equally divided into M data blocks, wherein Each data block has a length of K bytes. Here, if the total number of bytes of the large block (set to TB) cannot be divisible by M, then the rounding operation should be performed so that the length of each data block is Ceiling (TB/M) bytes, and the Ceiling function indicates rounding. , that is, Ceiling(x) is equal to the smallest integer not less than X, and X is any real number. Then, in some data blocks, the operation of zero padding may be used, so that the number of bytes is equal to Ceiling (TB/M).

Then, F data encoding is performed on the M data nodes to obtain N check nodes. Using FEC code encoding for M data blocks to generate N school face blocks, the generation process uses the method described above to determine which FEC processing module to call for the generation of the check block according to the FEC Type and FEC Subtype information.

Then, the sender encapsulates all data nodes and check node packets in the ERRTP packet for transmission. In this case the fields should be set as follows:

Type field FEC Type = 0010, indicating the use of Tornado code;

The subtype field is selected by the sender according to the actual situation. For example, the value is FEC Subtype = 000000010, which means that the Tomado (24, 20) code is used, wherein 20 data nodes and 4 check nodes have channel coding redundancy. 16.7%; the erasure code can completely recover the lost data packet when the packet loss rate is less than or equal to 3%;

Packet length Data- Length = K Bytes;

Number of packets Packet Number = (M+N)/P , which represents the number of data nodes carried in an ERRTP payload.

After receiving the ERRTP packets, the receiver decapsulates the data nodes and the check nodes. The receiving end starts with P packets and starts decoding and recovering every time a group of P packets is received. How many packets of a group are determined by mutual agreement.

The receiving end performs fault-tolerant elastic decoding on the data node according to the check node. Each time after receiving the data packet P+1, it starts to detect whether there is a packet loss in the P packets received before. If there is, the method described above is used, and according to the FEC Type and FEC Subtype information, it is determined which FEC is called. The processing module decodes and recovers or partially loses data. Finally, after obtaining the complete data node, re-merge to obtain a large block, and divide the S NALUs in the same way as the sender.

In practical applications, the above example uses the ERRTP-based anti-data packet loss algorithm, which can greatly improve the anti-data packet loss capability of the video code stream when the codeword is less than 17%. Compared with the RTP payload header structure, only 4 bytes have been added, which shows that there is basically no effect on the transmission efficiency, and significant practical results have been achieved.

Another key technical point that has been mentioned above with respect to the present invention is the implementation of unequal protection. It is mainly embodied in two aspects. One is to select the appropriate coding scheme or parameters according to the multimedia data of different important levels, that is, to determine the aforementioned FEC coding type and subtype, and the other is to select according to the network conditions at different times. Corresponding to these two aspects, they are called mixed and alternate use of various FEC coding schemes. Hybrid refers to the simultaneous use of multiple FEC subtypes at the same time, mainly for protecting data of different importance. The so-called Alternation refers to the use at different times (different network conditions). Different FEC subtypes.

Therefore, for the H.264 NALU data stream, as mentioned above, the header byte reflects the importance of the data, so the sender can evaluate the QoS level according to the NRI field or Type field in the NALU header information, and then select the fault-tolerant elastic coding scheme. , that is, the FEC Type field and the FEC Subtype field are determined. For the network condition, the general network transmission has a corresponding network condition monitoring mechanism. The transmitting end can learn the transmission report fed back by the receiving end according to these mechanisms, thereby evaluating the network transmission status, and then selecting the fault-tolerant elastic coding scheme, that is, determining the FEC Type. Field and FEC Subtype fields.

The H.264 code stream is transmitted or stored based on the NALU, which consists of NAL header information and NAL payload. In the NALU of H.264, different NALU types have different effects on decoding and restoring images. For example, if NI takes 0, it means that a Slice or Slice data strip of a non-reference image in the NALU does not affect subsequent decoding; and NRI takes a non-zero to indicate that a sequence/image parameter set or a slice of the reference image is stored in the NALU. Or slice data strips, which will seriously affect subsequent decoding.

Therefore, when packet protection is performed on the code stream of H.264, the data of H.264 can be classified into two types according to the values of NRI or Nal_unit_type: One type is a relatively important image. The data (for example, Nal_ref_idc is equal to 1); the other is secondary image data (for example, Nal_ref_idc is equal to 0). Then, the important image data is protected by the FECI code with high redundancy and strong anti-dropping capability; and the secondary image data can be protected by the FEC2 code with less redundancy and weaker packet loss resistance. .

Through this unequal protection algorithm, the correct recovery of all kinds of important information in the case of high packet loss environment is ensured, and the image information that the FEC2 code still fails to recover adopts techniques such as error concealment and error diffusion prevention. FEC1, FEC2 are just general representations, representing any two subtypes. These two seed types can belong to the same large type or to different major types.

Obviously, the above method can be extended to a more general case, and the data is divided into more classes according to the value of NAL_unit-type, for example, five categories: the most important data, the second most important data, the general important data, the less important data, The least important data; can also be divided into 7 categories or more, then, can be protected with the same number of FEC subtypes, each type of data corresponds to a different subtype. As long as the protection ability is weak to strong, these subtypes do not necessarily belong to the same large type. The image information that has not been restored after the protection of the FEC code with the strongest protection ability adopts techniques such as error concealment and prevention of error diffusion.

Another situation in which unequal protection is also within the scope of the present invention is the ability to select FECs of different protection capabilities depending on the real-time conditions of the network. The ERRTP header information is then used to inform both parties of the communication so that they can correctly decode the data and recover the lost data. It is possible to divide the current situation in which the network is affected by the transmission performance degradation into several levels. For example, five levels: the most serious, the second most serious, the more serious, the less serious, the least serious; can also be divided into 7 or more, then, you can use the same number of FEC subtypes to protect, each level corresponds to a different Subtype. As long as the protection ability is weak to strong, these subtypes do not necessarily belong to the same large type. The image information that has not been recovered after the protection of the FEC code with the strongest protection is error masking and error-preventing techniques are adopted. Perceived network conditions can be achieved through various existing QoS monitoring methods.

More complex applications are also within the scope of the invention, if a total of T FEC schemes (different types/subtypes) are available (both terminals are supported by both parties). Deciding which FEC to use depends on both the importance of the data and the state of the network. Then you can use a two-dimensional The LUT method, as shown in Table 3:

Table 3 Two-dimensional LUTs mixed and alternately used in various FEC mechanisms

In the above table, the data importance level and the network status level are in ascending order. The subscript of FEC is represented by a two-dimensional subscript, and the fault-tolerant elastic mechanism FEC(i), 0<i < U, 0<j < V, in the table may be any one of the above T FEC schemes.

It should be mentioned that the description of the embodiments of the above invention is based on the FEC erasure code, especially the Tornado code, but can be applied to other similar fault-tolerant elastic mechanisms, especially the FEC coding scheme except the Tornado code. The spirit and scope of the invention are affected.

In another embodiment of the present invention, an improved Tornado erasure code is specifically employed. The improved Tornado erasure code generates only one layer of the check node for a group of data nodes. The coding delay is greatly reduced to meet the requirements of real-time communication.

In real-time video communication, the use of FEC code packet protection introduces a delay, the size of which is related to the size of the image data packet. The S NALUs are grouped into one group, and one NALU contains a stream data of a Slice. If a frame of image is divided into a slice, the encoding end will have the delay of the S frame, and the decoding end will also have the delay of the S frame. The relationship between NALU and the number of data nodes is as follows:

s

^NalSize; = PackSize x DataNode

=0=0

The sum of the S NALU length values in the equation is equal to the number of data nodes multiplied by the size of each node packet. It can be seen from equation (1) that when the value of S is limited, the value of PackSize xDataNode is also limited. In addition, the value of PackSize cannot be too small due to the validity of IP network transmission, so the value of DataNode is limited. IP network real-time video communications, a delayed image ⁷ ^ is calculated as follows: where ^T n is the time delay introduced by the addition of FEC protection, and ^[tau] ^ ^τ "are Η.264 codec processing delay And network transmission delay. Due to the rapid development of digital signal processing technology and IP networks, it can be assumed that both can meet the real-time requirements:

^aukc T _lh , T _lmm <= T\ _h , where Til, - F _larsel

Where is the decoding target frame rate (available values of _10Hz , 30Hz, etc.), and the image of one frame is divided into a slice, then the formula (2) can be changed to:

T _llMl <= S * T, _h + 2 * T _lh = (S + 2r T _lh is known from the above two equations. The delay of one frame of image ^T ""w is basically determined by the value of S, and the DataNode It also greatly affects the value of S. Therefore, under the premise of ensuring the ability of video communication to resist packet loss, the delay introduced by FEC is minimized, and the QoS of real-time video communication is further ensured.

The present invention employs an improved Tornado code protection algorithm in the case where the DataNode is limited. The improved Tornado method does not use the encoding of multi-level even graphs, but only uses the encoding of a layer of check nodes. Compared with the original Tornado coding method, the improved coding method greatly improves the flexibility of the algorithm. The number of data nodes and check nodes can be arbitrarily set, and the complexity of the codec algorithm is also reduced. It can be used for real-time video communication. Anti-packet loss. In addition, the improved anti-data packet loss performance of the Tornado code is basically not reduced when the data node is limited. The specific principles and detailed steps of the improved Tornado coding method will be described in detail later.

[Third embodiment of the present invention]

Note that the above example ER TP does not mention how to handle the information header of the NALU in the encapsulation of the NALU. Therefore, based on the second embodiment, ERRTP processes the same type of NALU and integrates the header information into the ERRTP header information. . The most basic difference from RTP is that in the ERRTP encapsulation process, the header information of the NALU packet with the same header information is integrated into the header information of ERRTP.

The NALU header information structure has already been mentioned. Here again, the NALU information includes: W occupies a 1-bit F field, which is used to indicate whether the NALU is in error;

a 2-bit N I field indicating the importance of the NALU;

A 5-bit Type field indicating the type of the NALU.

The execution steps of both the sender and the receiver are as follows. The sender encapsulates multiple NALU data nodes or check nodes with the same header information in the same ERRTP packet in the ERRTP encapsulation format. According to the actual engineering experience, in general, because the H.264 bitstream always has an adjacent part whose corresponding NALU type is the same attribute, this support can always be satisfied. Even if it can't be satisfied in some cases, there are several countermeasures that can handle such a situation: The first one can accumulate the same type of NALU until it is packaged into ERRTP after satisfying a certain number, and the other is the same. If the number of types of NALUs does not reach a certain number, the method of RTP padding is a waste of bandwidth, but this is insignificant. Another method is that if there are many NALUs of different types, you can use RTP encapsulation, anyway. The receiving end can identify according to the ERRTP identifier and perform corresponding processing.

As mentioned above, in the ERRTP encapsulation format, the same header information of the NALU carried by the NALU is integrated into the header information of the ERRTP packet, and the carried NALU is removed from the header information and then according to the aforementioned Process processing, partitioning, encoding, and encapsulation are populated into the payload of the ERRTP packet. So how do you integrate the NALU header into the ERRTP header? Two sets of solutions will be specifically given below to solve these several problems.

In the ERRTP encapsulation format, the N I field and the Type field in the NALU header information are filled in the PT field of the ERRTP header information, which has been described above, and the PT field is located after the second byte of the ERRTP header information. The format of such an ERRTP header has been given in Figure 7, where the difference from RTP has been indicated in bold, and some places in the other figures are explained later.

In addition, the V field in the ERRTP header is used as the ERRTP identifier, which has been mentioned above; the F field in the NALU header information is filled in the M field of the ERRTP header information, and the M field is located in the first byte of the second byte of the ERRTP header information. Bit, at the receiving end, according to the M field of the ERRTP packet, it is judged whether the NALU carried by the ERRTP packet is in error, and the forbidden bit function of the F field is realized. It can be seen that the scheme can tell the receiver of the RTP data packet through the difference of the version. The RTP protocol is ERRTP, so in the subsequent processing, it is necessary to follow the needle. The processing flow of the ERRTP protocol is performed.

In this scheme, the NALU header information byte (8 bits) is replaced by the identifier M field 1 bit in the original RTP header information and the PT field 7 bits in total 8 bits. The specific replacement order can be like this:

F bits replace M bits;

NRI 2 bits replace the highest 2 bits of the PT 7 bits;

Type 5 bits replaces the most 4 氐 5 bits of the PT 7 bits.

In FIG. 7, the code-related information such as the FEC Type FEC Subtype Packet Number in the ER TP header is used to identify the coding mode used and the multimedia data packet. The receiving end restores or partially restores the multimedia data according to the encoding related information.

In fact, such an alternative is justified. The PT 7 bits are inherently free to use, as mentioned earlier. The purpose of the M field is specified in RTP (RFC 3550) as follows: A specific profile (Profile) can specify not to use M bits, but to put it with A PT, so that the PT can have up to 8 bits, which distinguishes 256 different types. type. Therefore, replacing M bits with F bits is completely RTP-compliant and does not cause interworking between ERRTP and traditional RTP.

It can be easily seen that the package format of the ERRTP of the present invention has three obvious advantages: First, the overhead is small, especially when there are multiple NALUs in one RTP, the number of transmitted bits is obviously saved; Second, it is not necessary in the RP package. H.264 NALU data decoding can discriminate the relative importance of these NALUs. Third, without decoding the H.264 NALU data in the RTP packets, it can be identified whether the RTP packets will be correct due to other bit loss. decoding.

In order to describe the technical details of the present invention in further detail, a description of the process of ERRTP encapsulation and decapsulation is given below. After performing the above processing, multiple H.264 NALU types in the same ERRTP packet are identical, that is, their header information bytes are the same, then when they are divided, encoded, and encapsulated into the ERRTP packet, The original header bytes can be stripped off, so that if there are N NALUs, N bytes can be reduced. When decapsulating, the NALU is extracted and decoded from the ERRTP packet and re-partitioned into the original form, that is, the N NALUs are extracted and decoded from the ERRTP packet in which they are located. Then, the 7 bits of the PT in the ERRTP header information are copied to the lowest 7 bits of a byte H (8 bits), and the highest bit of H is set to 0 as the F bit. The generated H bytes are then appended to the top of each extracted NALU, thus restoring each NALU. Of course, if the F field in the ERRTP header is 1, it indicates that the NALU in the ERRTP packet is in error, so it can be directly discarded, and the processing time saved.

The second solution is given below, which is the same as the first one, that is, the NRI and Type fields in the NALU header are also filled into the 7 bits of the PT field of the ERRTP header. There are two points in different places: 标识Use the M field to identify ERRTP. One problem that comes with this is that the F field has no place to fill. In this embodiment, the two types of NALUs that set F are treated separately, and the error for F is set. NALU still uses the original RTP transmission, and for normal, it uses ERRTP to transmit, but ignores the F bit. The specific details are as follows.

The M field is set to 1 to identify the ERRTP packet, which is located in the first 1 byte of the 2nd byte of the ERRTP header information. For F bits, it is specified in the H.264 protocol: 1 if there is a syntax conflict or an error. When the network recognizes a bit error in this unit, it can be set to 1 so that the receiver drops the unit. It is mainly used to adapt to different kinds of network environments, such as wired and wireless combined environments. The specific usage principle is: Generally, when the transmitting end and the receiving end of the communication perform H.264 encoding and decoding on the video, the bit is not "written, operated, and the decoding end performs a "read" operation on the bit. If F=l is found, the receiving end will discard the NALU during the decoding process. According to the current general application of the industry, the "write" operation for the F bit is mainly a gateway between two different networks. On the above, such as the case of encoding conversion (MPEG-4 to H.264, H.263 to H.264, etc.).

Therefore, the present invention ignores the F bits and does not have to be defined with the original H.264. Thus, the M field originally used to fill the F bits can be reserved, and the future extension carries more information, which is used to identify the ERRTP packet. The advantage of this is that you do not need to modify the version information V = 2, and ERRTP still uses the original version V to take the value 2. This also saves the only RTP version information resources currently available.

However, in practical applications, there may be a small probability that F bits need to be used, such as When the NALU syntax is wrong, the present invention performs the following processing for this case: In the ERETP encapsulation format, the F field in the NALU header information is ignored; but on the transmitting end, the error NALU valid for the F field still uses the RTP packet. Encapsulation, only the normal NALU is encapsulated in ERRTP; at the receiving end, it is judged whether the packet is ERRTP or RTP packet and the packet is processed according to the corresponding encapsulation format. That is, when the F bit is used in some special cases, it is used for the purpose of the original H.264 definition, that is, to indicate the possible H.264 NALU syntax error, if an intermediate device such as a gateway is in the When the video is video-encoded according to the H.264 protocol, it is found that a certain NALU has a syntax error, and then the NALU is separately encapsulated.

The method of summarizing the above ERRTP and RTP alternate processing is as follows:

The sender first determines whether the F field in the header information of at least one NALU is valid, and accordingly divides it into a normal NALU and an error NALU;

Then, according to the ERRTP encapsulation format, the normal NALU is encapsulated into an ERRTP packet, and the ERRTP identifier is set; the error NALU is encapsulated into an RTP packet according to the RTP encapsulation format;

The receiving end first determines whether the header information of the received packet is set to the ERRTP identifier, and divides it into an ERRTP packet and an RTP packet;

The ERRTP packet is then processed according to the ERRTP encapsulation format, and the RTP packet is processed according to the RTP packet encapsulation format.

It can be seen that the gateway for the normal NALU, according to the method described above, for the same type of H.264 NALU according to certain rules (determined by the specific application, mainly stipulates how many similar NALUs are encapsulated in each ERRTP packet) for ERRTP encapsulation Once a syntax error is found in a NALU, a regular RTP encapsulation is required for the NALU. At this time, the regular RTP packet may contain only one H.264 NALU.

Finally, it should be noted that, taking note of the type of NALU given in Table 1 mentioned above and the value of its corresponding Type field, it can be found that there are less than 16 types of existing types, that is, 5 bits of Type. It can be reduced to 4, which does not affect the existing H.264 transmission. Therefore, in the ERRTP encapsulation format, when all types of NALU are less than 16 types, only the lower 4 bits of the Type field are used, and the Type is the highest. The bit is an extended reserved bit, called the C field. Leave the C bit for later use and continue with the function expansion. Bit C After the reservation, the NALU types given in Table 1 should be modified accordingly: A total of 16 values, the values 0-12 are the same as Table 1, and the values 13-15 are reserved.

Of course, although there are only 13 NALU types of H.264, H.264 will be developed later, and more NALU types may be generated. If the number of NALUs is increased to more than 16 in the future, it is still necessary to use PT 7 bits. The lowest 4 bits plus C bits are used as type indications.

It should be mentioned that the biggest advantage of integrating the NALU header information into the ERRTP header information is that the multimedia transmission device can directly learn the relevant information of the NALU carried by the multimedia transmission device according to the ERRTP header information, and implement H.264 multimedia data accordingly. Real-time delivery of QoS policies. This is not possible in the existing RTP, because for the RTP layer, the NALU layer information is not concerned, and the head information of each NALU in the payload cannot be known, so that the QoS policy cannot be implemented.

On the basis of ERRTP, in order to achieve feedback from the receiving end, the SEI carries the enhanced technology of QoS reporting. As can be seen from the foregoing description, RTCP assumes the QoS reporting mechanism, but it is actually a general reporting method that can be used for reporting. QoS can also be used to report other information. For specific video communication applications, reporting with RTCP is not necessarily the most appropriate. At some point, if both the sender and the receiver of the QoS information can communicate using a higher layer protocol such as H.264, then H.264 can be considered to carry the reported content. Based on this starting point, the present invention directly uses H.264 to carry QoS report information, which avoids the use of additional channels and implements an "in-band" reporting mechanism.

Another basis for transmitting QoS reports by H.264 higher layer protocols is that in current video communication applications, the adaptation measures for network transmission are mainly based on terminals, rather than network intermediate devices such as routers, switches or gateways. . Therefore, the encapsulation of the QoS report does not depend on the underlying protocol. The terminal can understand the QoS report information carried in the H.264 to implement QoS monitoring, so it can be independent of the underlying RTCP and other protocols. Of course, by adopting the "in-band" reporting mechanism of H.264, it does not mean to exclude the application of the RTCP reporting mechanism. The two mechanisms can be used or coexisted, and the use of H.264 can reduce the reporting traffic of RTCP. In addition, if H.264 "in-band, reporting mode is adopted, H.264 packets can take multiple protection measures and bear the QoS. The reported H.264 packet, which can be considered as important data, can be protected against high-intensity according to the principle of Unequal Protection (UEP). Thereby, the correct arrival of the report data can be ensured, and the reliability of the QoS monitoring is improved.

[Fourth Embodiment of the Invention]

Based on the third embodiment, the H.264-based extended message mechanism to carry QoS reports is roughly divided into the following three basic steps.

First, each multimedia communication terminal statistically generates a QoS report of H.264 multimedia communication. The content of these reports may be the same as the SR and RR report contents of the RTCP, and may of course be different, but the described quality of service related to H.264 media communication. And information such as network status is consistent;

Then, the terminal carries the QoS report by using the H.264 extended message and sends it to other communication terminals. The H.264 extended message mechanism has been mentioned above. Typically, there is an SEI, etc., and the SEI message is basically used by the present invention. Later extensions of H.264 can also use other extended message payloads;

The terminal also receives the QoS report sent by other terminals while sending the QoS advertisement. In fact, each terminal will execute the QoS policy according to these QoS reports.

The present invention uses the SEI message to carry the QoS report. Taking the existing RTCP QoS report as an example, the main content of the SR and RR reports of the RTCP can be directly used as the payload of the H.264 SEI message, and thus carried by the extended SEI message. these messages.

Based on this idea, in the fourth embodiment of the present invention, a specific SEI extended message is defined specifically for carrying QoS reports. H.264 specifies that SEI information is stored in a class of NALUs, as described above, with Type = 6. The invention stores the SR and RR report messages similar to RTCP in the SEI domain, which not only ensures the transmission efficiency, but also effectively feeds back the channel state and the decoded information, and facilitates the interactive anti-data packet loss between the encoding end and the decoding end. The specific structure is shown in Figure 8, except that the header information is arranged according to the SEI message structure, and other QoS report contents are drawn from the format of the SR and RR reports of RTCP.

The header information of the SEI message used to carry the QoS report contains the following fields:

The first byte (byte 0) is a payload type field (SEI Type), which is used to indicate that the payload is a corresponding QoS report. In this embodiment, SEI Type = 200 indicates that the SEI type is stored in the SEI domain. Is similar to the transmission report (SR) in RTCP, and SEI Type=201 indicates that it is the reception report (RR);

The second and third bytes (bytes 1, 2) are the payload length field (SEI Packet-Length), which is used to indicate the corresponding QoS report length, which is the same as the length field in the RTCP QoS report;

The 4th byte and later are the payload of the SEI message, that is, used to fill the corresponding QoS report.

The QoS report is also divided into the sender report and the receiver report. The load type field indicates the difference, that is, the SEI Type value is different. The specific content of the QoS report can be the same as the RTCP SR and RR reports, as shown in Figure 2:

The version information field (V), which is 2 bits, is in the form of a binary 11 or V = 3, indicating the difference from the previous version;

The padding field (P), which is 1 bit, is used to indicate whether there is padding content, the same as RTCP;

Receive report number field (RC), which is 5 bits, used to indicate the number of received report blocks reported in the QoS report;

The sender SSRC field, which is 32 bits, is used to identify the sender of the quality of service report;

For the sender report, there is also a sender information block for describing the information about the sender of the report;

Then, a plurality of receiving report blocks are included for describing multimedia statistical information from different sources, each block containing the identifier of the source and related statistical indicators of the multimedia stream, and the meanings of various indicators have been described in the previous RTCP;

Finally, a specific level of extension is used to extend the reserved functionality at a particular level.

It can be seen that the content of the QoS report given in Figure 8 is basically the same as that of RTCP. After the basic content of the RTCP, the RR and the SR are written into the SEI domain, the RTCP information can be transmitted without a dedicated logical channel, which saves part of the bandwidth overhead. In fact, the essence of the present invention lies in the in-band bearer with the SEI message. As for the statistical generation of the QoS report, as long as the invention of the QoS monitoring can be achieved, the essence and scope of the present invention are not affected. After implementing the QoS report, various QoS policies can be performed on this basis, for example, using the accumulated packet loss field of RTCP, which can be used for feedback decoding information in two-way video communication (the terminal has both an encoder and a decoder). For easy interactive anti-data packet loss.

In addition, in the QoS 4 report, there are fields such as arrival delay jitter and sender byte count, which can be used to sense the network status. The rate control algorithm can further ensure that the encoding end rate is nearly constant according to the information in the arrival delay jitter field; the sender byte count field can estimate the average rate of the payload, so that the sending end can reset the encoder parameters according to the network state. This includes adjusting the target frame rate, restoring the image quality, and the resolution of the original image.

In order to improve the reliability of RTCP transmission, H.264 data packets can adopt various protection measures after adopting H.264 "in-band" reporting mode, and can be considered as H.264 data packets carrying QoS reports. Important data, according to the principle of unequal protection, can be applied with high-intensity protection measures. This ensures the correct arrival of the report data. For example, the SEI for carrying the QoS report should be further carried by the NALU, and as described above, the NALU has a header information to set the importance of the content, so the communication terminal can set the NALU according to the reliability requirement of the QoS report transmission. The nal-ref_idc field can be set to 1, 2, 3, etc. In the fault-tolerant elastic coding, different strength protection measures are taken according to the level of this field.

The communication terminal can also dynamically adjust the transmission period of the QoS advertisement based on the SEI message according to the current network state and the high-level application requirement. By default, the interval for writing RTCP information to the SEI domain (that is, the reporting period) is the same as the recommended RTCP transmission interval in RFC3550. Of course, depending on the needs of the particular application (specific protection methods, etc.), the possible reporting period may not be exactly the same as that specified in RFC 3550, but may be adjusted. The reporting period is determined by the needs of the specific application. For example, an important use of reporting data is to dynamically estimate network performance: packet loss rate, latency, jitter, and more. If these data need to be detected frequently, the reporting period should be short, otherwise the reporting period can be long. When the network is in good condition, you can stop reporting. In addition, the SEI message can not only transmit the QoS report of the H.264 video, but also mix the QoS reports carrying the multiple media streams, and only need to add the corresponding receiving report blocks of the various media streams after the QoS report. For example, audio stream, etc., as long as the source of the SSRC specific report block content is added to the SR report. As mentioned earlier, in addition to the SEI for in-band monitoring, communication The terminal may also select an existing RTCP transmission, or may simultaneously transmit one or both of the H.264 extended message and the RTCP.

After the SEI is implemented to feedback the QoS report related to the network status from the receiving end, it is easy to implement adaptive protection policy adjustment, including multi-level protection and unequal protection.

[Fifth Embodiment of the Invention]

According to the prior art, for the problem that the network communication condition cannot be adaptively adjusted, the present invention provides a video transmission method for estimating the current communication status and adaptively adjusting the adaptive protection of the protection policy. Firstly, according to the performance impact of the protection method, different parameter configurations are given, and a multi-level protection strategy with different protection capabilities is set, which is selected for efficient and reliable protection under different communication conditions. Secondly, according to the communication statistics at the receiving end The network status and communication quality are sent back to the sender; finally, the sender adjusts according to the returned communication quality statistics to select the most appropriate protection policy level.

The key to the program is also the method of statistical communication quality and the channel for sending back statistical information. The information of the packet loss rate and its location can be counted by using the sequence number loss of the H.264 NALU, and the extended SEI message structure of the payload part of the NALU is defined to carry the statistical information, and the statistical data is transmitted from the receiving end to the transmitting end. . Although the feedback mechanism is different from the SR/RR format of the QoS report, those skilled in the art can understand that the fundamental principles of the two methods are the same, but the content carried by the SEI is different, so the following description does not. then the QoS reporting scheme proposed specifically SEI bearer network packet loss ratio of the area of the other embodiment ¹ J.

Taking the Tornado erasure code as an example, the video stream data is protected according to the encoding and decoding method of the aforementioned Tornado erasure code. Tornado erasure codes need to set parameters such as '. number of data nodes, number of check nodes, scaling ratio, number of check node layers, and two levels of graphs used to calculate check nodes. In the process of video stream communication, the transmitting end divides the video stream data into data nodes, and then generates a check node according to the Tornado encoding method, and sends it to the receiving end together; the receiving end performs error correction according to the Tornado decoding method to obtain video stream data.

Since the actual IP network bandwidth and other factors are constantly changing and unstable, the fixed protection policy will bring problems such as low efficiency or high bit error rate. Therefore, this embodiment pre-sets a protection strategy series with different levels of protection strength. Used separately for different communication quality levels Protect video stream data. It can be seen that different levels of protection policies can adapt to changes in network communication quality, not only can meet the protection requirements of channel degradation, but also can appropriately reduce the protection strength in the case of signal improvement, thereby reducing system overhead and saving processing and bandwidth resources. .

In order to give different levels of protection strategy, it is necessary to set Tornado erasure codes with different parameters. According to the foregoing parameters affecting the protection performance of Tornado erasure code, there are mainly the number of data nodes, the number of check nodes and the random distribution of the node degrees on both sides of the bipartite graph. For the sake of the single, the Tornado codes of different abilities are generally not unified. In the bipartite graph, the Tornado erasure code protection strategy with different protection strengths is given by using different number of data nodes and number of check nodes. According to the Tornado erasure code principle, the number of different data nodes and the number of check nodes can determine the Tornado erasure codes of different code rates or redundancy rates, thus giving different protection strengths and system overhead.

The receiving end receives the data and performs Tornado erasure code decoding to obtain the video stream data, and performs statistics according to the data loss situation, and obtains statistical information to represent the communication quality.

The sender needs to adjust the protection policy according to the communication quality. Therefore, the transmission needs to be counted. The receiver collects the transmission according to the sequence number of the NALU of the H.264 video process data. In H.264-based two-way video communication, each terminal of the communication system has both an encoder and a decoder. The NALU is sequence numbered, that is, the NALU sent by all the senders has a uniform sequence number. Therefore, the receiver can determine whether there is a NALU loss according to the sequence number of the received NALU. If the NALU sequence number is discontinuous, it indicates that there is a NALU loss. The interrupted NALU sequence number is the sequence number of the lost NALU, and the number is the number of lost NALUs. After a period of time accumulation, the total number of lost NALUs in the period can be calculated, and then the number of all NALUs in the time period is normalized to obtain the cumulative loss rate (ALSR). . Of course, the receiving end can also send the packet loss information directly to the sending end, and the sending end performs statistics. Using the NALU sequence number for statistics not only ensures that the statistics are accurate, but also directly uses the existing data information without additional bearer overhead.

The receiving end sends the statistical information and other data loss information back to the sending end through the extended SEI message. After collecting statistics on the transmission status at the receiving end, it needs to be sent back to the sending. In this embodiment, the extended SEI message structure is specifically configured to carry the transmission status statistics sent back from the receiving end. After completing the statistics, the receiving end writes the information into the specifically defined extended SEI message body, and then writes it into the SEI field of the encoded code stream sent back by the terminal, and sends it back to the transmitting end. After receiving the SEI message, the sender can directly learn the statistics or obtain the ALSR, so as to establish a true perception mechanism of the packet loss rate of the network.

As mentioned above, the SEI message is also carried by the basic unit NALU of the H.264 code stream. Each SEI field contains one or more SEI messages, and the SEI message is composed of SEI header information and SEI payload. The SEI header information includes two codewords: payload type and payload size. The length of the payload type is not necessarily the same. For example, when the type is between 0 and 255, it is represented by one byte. When the type is between 256 and 511, it is represented by two bytes OxFFOO to OxFFFE, and so on. Define any number of load types. In the existing H.264 standard, the type 0 to type 18 standards have been defined as specific information such as buffer period, image timing, and the like. It can be seen that the SEI domain defined in H.264 can store enough user-defined information according to requirements. In a first embodiment of the invention, an extended SEI message for carrying statistical information is defined in the reserved SEI payload type.

Finally, the sender adjusts the Tornado erasure code according to the statistics sent back, and uses a protection strategy that is more suitable for the current transmission situation. Finally, the sender will adjust the protection policy according to the statistical information, that is, select the appropriate level of protection strategy. Here, the transmitting end also presets a judgment threshold series corresponding to different protection levels, sets a threshold for entering each level, and then selects its corresponding level according to the threshold at which the ALSR falls. The statistics, feedback, and adjustment mechanisms of the transmission conditions thus established can accurately and timely adapt to network transmission requirements and improve protection capabilities.

Different protection strategy series are used for data of different importance. Considering the different protection requirements for critical and non-critical data, in order to further improve the fitness, two different protection strategy series were set up to protect critical and non-critical data. In this way, the data of two different communication requirements can be processed independently, and the protection strategy is selected according to the protection strength suitable for each requirement, thereby improving system efficiency.

For example, different levels of Tornado code are used as the protection scheme series, and the protection capability level is characterized by parameters n, 1, where n represents the number of data nodes and 1 represents the number of check nodes. The Tornado code protection scheme determined by the parameters n, 1 is represented by TN(n+l,n). So corresponding to the key The data protection scheme series is: TN _K (n. +l _Q , n.), ,

TN _K (n _LI +l _L-l5 n _L-1 ); The same series of protection schemes for non-critical data are:

ΤΝ _ΝΚ (η ₀ +1 ₀ , η ₀ ) ₅ ΤΝΝ κίη, + Ι, η , , TN _NK (n _L-1 + l _L-1 , n _L-1 ). Set the threshold series

0<G _1; G ₂ , . . . , G _L-1 <1 , which is used to judge the selection of the protection level. When the sender adjusts the protection policy, according to the relationship between the ALSR and the thresholds GhG^..., G _L-1 , the following operations are performed:

If 0<A1SR< G1, TN _K (n _Q +lo, n _Q ) is used to protect the key data, and TN _NK (n _Q +l., n _Q ) is used to protect the non-critical data;

If Gi<AlSR< G _i+1 , i-1, 2, ....., L-2, the key data is protected, and 非 Η^) is used to protect non-critical data;

If G _L-1 <A1SR<1 , TN^nw+lw, !^) is used to protect key data, and TN I is used to protect non-critical data.

In addition, the sender resends the information according to the lost data information sent back by the receiver. When the receiving end counts the lost NALU information, it obtains the positioning information of the image frame corresponding to the lost NALU, and the information includes the sequence number of the frame and the position in the frame. The receiving end sends the positioning information back to the sending end, and the sending end can locate the corresponding video stream data and resend it. In real-time video communication, video stream data with too long delay has lost value, but in some business situations or under certain mechanisms, data with a certain delay still has value, such as a large buffer range. In video communication, as long as the delayed video stream data still falls in the buffer, the data can be used to avoid interruption of the video stream playback. It can be seen that the retransmission mechanism has important value for improving the reliability and quality of service of video stream communication.

[Sixth embodiment of the present invention]

In addition to adopting the fault-tolerant elastic protection strategy, based on the fifth embodiment, from the two aspects of error concealment and error-distribution elimination, combined with the error concealment strategy at the receiving end and the error diffusion elimination strategy at the transmitting end, Reducing the video quality loss caused by the error can prevent the bit error from spreading. For the error concealment, the effect of compensating the error loss with the lowest complexity can be achieved by using the single-sheet alternative; for the error diffusion elimination, the error information feedback mechanism is established through the existing channel of H.264, and the feedback is implemented according to the feedback. Intra-frame coding to achieve diffusion cancellation without adding additional network load, ensuring video bitstream error The robustness also avoids bit error spread caused by false masking.

The basic idea of the scheme is to find the missing data information, such as the location of the slice, by using the statistics of the NALU serial number at the receiving end. On the one hand, an efficient algorithm is used to simply replace the lost data to cover the error loss, and on the other hand, it will be wrong. The code information is fed back to the sender. by

The extended SEI message of H.264 establishes a bit error information feedback channel from the receiving end to the transmitting end. After the sender knows the error information, it immediately adopts the strategy of intra-frame coding successively, and segments the error slice to prevent the error from spreading.

In the H.264 video communication process, the transmitting end encodes the video stream data to be encoded, obtains a video stream, and then encapsulates the NALU and transmits the packet to the receiving end through the packet message. The receiving end receives the message and decodes it. At this time, the receiving end needs to determine whether the video stream data is lost, so as to perform subsequent error elimination operations. The error elimination process is roughly divided into three major steps: masking, feedback, and diffusion elimination.

First, the receiving end judges whether data is lost according to the NALU sequence interruption condition, and counts the information of the lost data, that is, the error information. As mentioned earlier, NALU is the basic unit of H.264 video stream data transfer, and each NALU has a unique serial number. Therefore, the receiving end knows which NALUs are lost according to whether the NALU sequence number is interrupted. It is thus possible to implement an error concealment strategy for lost data. The NALU serial number is used for statistics, which not only ensures the accuracy of the statistical information, but also directly uses the existing data information, and does not require additional bearer overhead.

First, the receiving end learns the sequence number by identifying the received NALU header information, and the discontinuous detection error occurs by the sequence number. The previous NALU knows the video data that the missing NALU should carry, and locates the data loss caused by the error code. For example, if the previous NALU of the lost NALU bears the first slice of the Nth frame, the position of the slice carried by the lost NALU may be inferred in the order of transmission, which should be the latter slice of the current frame.

Then, the receiving end needs to re-synchronize the video information. Because the H.264 video code stream is continuously transmitted, the receiving end and the data stream need to be synchronized, and then can be correctly received. Once the data stream is interrupted, the receiving end needs to re-synchronize. The resynchronization of the decoder is accomplished by finding the next NALU header information after the interruption. This process, the receiver also needs After that, the receiving end needs to perform error concealment, and the lost NALU is discarded. Therefore, the entire slice carried by the NALU is lost. The error concealment strategy is to replace the lost data with data adjacent to the time domain or the spatial domain. For example, the slice recovery image data corresponding to the position of the previous frame of the frame in which the data is lost is masked.

After receiving the error information, the receiving end feeds it back to the transmitting end. The feedback error information needs a feedback channel. In order to reduce the network burden and simplify the implementation mechanism, the first embodiment of the present invention uses an existing H.264 communication mechanism to define an extended SEI message for carrying the error information to establish feedback. So that the sender combines the error information to prevent the error from spreading. In fact, combined with the error information feedback mechanism and the error diffusion elimination strategy at the transmitting end, the error spread caused by the error concealment strategy implemented by the previous receiving end can be avoided.

In the foregoing embodiment, the extended SEI message of the H.264 is used to provide an information feedback mechanism from the receiving end to the transmitting end, so that the sending end can know which NALUs are lost in time, so that effective error spreading can be eliminated in time. Prevent future error spread due to these lost data.

The advantage of establishing an information feedback mechanism within the H.264 system is to save network bandwidth overhead, save system processing resources, and not affect interoperability. Here's how to define an extended SEI message. As mentioned above, the SEI message is also carried by the basic unit NALU of the H.264 code stream. Each SEI field contains one or more SEI messages, and the SEI message is composed of the SEI header information and the SEI payload. The SEI header information includes two codewords: payload type and payload size. The length of the payload type is not necessarily the same. For example, the type is represented by one byte between 0 and 255. When the type is between 256 and 511, it is represented by two bytes OxFFOO to OxFFFE, and so on, so that the user can customize Any of a variety of load types. In the existing H.264 standard, the type 0 to type 18 standards have been defined as specific information such as buffer period, image timing, and the like. It can be seen that the SEI domain defined in H.264 can store enough user-defined information according to requirements.

Then, the transmitting end starts to perform error diffusion elimination according to the error information of the feedback. The error diffusion elimination method of joint error information is better than the existing error-free diffusion elimination without feedback. Using error information, such as the location of the lost slice, the sender can purposely take precautions against the lost slice, such as avoiding losing the slice in later encoding. For the reference frame, this can minimize the dependence of the receiver on the slice when decoding.

Since the H.264 encoding is based on Slice, that is, the data of the same slice of the preceding and succeeding frames is associated with the reference, and the same slice data of the subsequent frame is encoded by the slice prediction of the previous frame, the error diffusion is also limited to the same slice. internal. In the second embodiment of the present invention, a strategy of intra-frame coding is performed in stages, that is, after the error is transmitted, the slice region of the subsequent frame is segmented into new slices, for example, P macroblocks are divided. A new slice is then intra-coded to eliminate the reference or dependency of the slice on the previously lost slice. In order to ensure the transmission quality, the H.264 video real-time transmission system uses a data rate control scheme to limit the fluctuation of each frame of data, so that the amount of data per frame is equalized, and the stability of video transmission is improved. Therefore, the amount of data that is intra-coded once in each frame, that is, the number of macroblocks, cannot be too much, otherwise it will exceed the H.264 data rate control range.

Figure 9 shows the principle of error spread elimination for segmented successive intra coding. When the receiving end fails to recover the packet loss error, the error information is detected and fed back to the transmitting end, that is, the frame where the slice of the lost data and the intra-frame positioning information are sent back to the transmitting end through the extended SEI message. The sender extracts the missing slice location information from the SEI message. For example, each frame in FIG. 9 is divided into three slices, namely, Slice#0, Slice#1, Slice#2, and the slice #1 of the nth frame is in the transmission. Lost, then segmented successive intraframe coding is required.

First, in the nth frame, the encoding end divides P macroblocks into a new Slice#3 from the starting position in the macroblock scanning order, and the remaining macroblocks are still Slice#l, and there are four Slice, where the new Slice#3 is intra-coded.

Next, in the n + 1 frame, Slice #3, which is divided into new components in the previous step, is intra-coded and then transmitted as Slice #3, and the other slices are still encoded as usual.

After that, it is necessary to determine whether there are still macroblocks remaining in Slice#1. If there is still no segmentation, return to the first step to continue to slice the remaining macroblocks of Slice#1 into a new frame in the next frame, and perform intraframe coding. Send until all macro blocks have been processed.

The number of macroblocks P divided each time should satisfy the following conditions, as large as possible, to avoid the number of divisions, reduce the processing delay, and shorten the range of influence, but it is necessary to satisfy the aforementioned H.264 data rate control range. The number of macroblocks divided each time can be different, but the number of macroblocks divided last time will cause all macroblocks in the lost slice to be processed. For example, one frame of video stream data is composed of 240 macroblocks, and each 80 macroblocks are initially divided into one slice, that is, 1 - 80 macroblocks are Slice # 0, 81 - 160 macroblocks are Slice # 1 , 161 - 240 The macro block is Slice # 2. According to the data rate calculation, the appropriate segmentation value P is determined to be 12 macroblock segments. Then, after SHce # l is lost in the nth frame, 80 macroblocks of Slice #1 should be segmentally successively intra-coded. First, the first 12 macroblocks in the n+1th frame are intra-coded to form Slice #3. Thus, in the 11+2 frame, Slice #3 can use conventional predictive coding, and the next 12 macroblocks are intra-coded to form Slice #4, and the last remaining until the n+7th frame is 8 The macroblock is intra-coded to form Slice #9, and the error spreading method flow of the segment-by-frame intra-frame coding is completed.

According to the experimental results, it was found that the video image obtained by the method of the error concealment and error diffusion elimination of the present invention is very effective.

[Seventh Embodiment of the Invention]

Finally, an improved Tornado coding scheme is proposed, in which this Tornado coding scheme is used as a 'fault-tolerant elastic protection strategy. The following is a brief indication of the main differences between this Tornado encoding scheme and the traditional encoding scheme.

In the process of data transmission protection using Tornado code, setting the multi-layer Tornado code check node layer will enhance the data transmission protection capability to a certain extent, however, setting the multi-layer Tornado code check node layer will also make the Tornado code. The amount of computation is large, so that the data is paid a long time delay in the transmission protection process. If the number of layers of the check node layer can be reduced without ensuring a significant drop in the data transmission protection capability, the amount of operations of the Tornado code can be effectively reduced, and the time delay in the data transfer process can be greatly reduced, thereby seeking higher Data transfer protection performance - cost ratio. Therefore, the seventh embodiment of the present invention is: setting an erasure code having only one layer of the check node, and performing data transfer protection based on the erasure code.

The Tornado erasure code scheme has only one check node layer, and the intermediate check node layer of the Tornado code is removed. Similarly, the inherent requirement of the last layer check node generated by the Reed-Solomon code in the Tornado code is removed. Thus, the erasure code of the present invention has only one layer of data node layer and one layer of check node as shown in FIG. 10. It can be said that the erasure code of the present invention is a structured tubular Tornado code, which is a An improved Tornado code. The data node size L1 of the improved Tornado code of the present invention, the number n of data nodes in the data node layer, and the number L of check nodes in the check node layer can be determined according to actual needs. The data node size L1 in the data node layer and the number of data nodes included in the data node layer are determined according to factors such as data transmission rate, data type such as audio data/video data, data protection capability requirements, maximum network delay that can be received, and the like. Check the number of check nodes L included in the node layer.

If the prior art Tornado code has a m-layer intermediate check node layer, and from the data node layer to the mth intermediate layer, the proportional scaling factor of the number of nodes between adjacent two layers is ^ the last layer and The equal scaling factor of the number of nodes between the mth layers is the total number of nodes Total _N of the Tornado code in the prior art. _De is:

Thanks to Total _N. _De =n+L, . Therefore, the setting of L is limited, L=[ /?/(l -^)]n, L cannot be arbitrarily set. Since it is necessary to ensure that the number of nodes of each node in the Tornado code is an integer, η, ηβ ² , ηβ' and η[^ ⁺¹ /(1 )] are required to be integers. This condition is called the condition of the number of implicit integer nodes. . According to this condition, if m is summed in the Tornado code, the condition that n needs to be satisfied can be calculated. For example, when m=3, β = 1/2, n=16k can be calculated, where k is an arbitrary natural number. It can be seen that the minimum value that n can obtain is 16 , and the code rate 1' of the Tornado code in the prior art is: r=n/(n+L)=l-)5= 1/2, and the redundancy rate l - r is: 1 - r^=l/2 ₀

The improved Tornado code of the present invention has the condition that the number of hidden integer nodes is no longer required for the improved Tornado code because there is no intermediate check node layer. The number of check nodes of the check node layer of the improved Tornado code of the present invention L For: L=^n, the equal-ratio scaling factor of the number of nodes of the data node layer and the check node layer of the improved Tornado code of the present invention can be arbitrarily set, given the number n of data nodes, L Can be flexibly set.

The code rate r of the improved Tornado code of the present invention is: r = l / (l + ^); The redundancy rate of the improved Tornado code of the present invention 1 - r is: 1 - τ = β / ( + β).

The improved Tornado code of the present invention can be expressed as TN (n+L, n), such as TN (30, 20), indicating that the number of data nodes in the data node layer is n=20, and the number of check nodes in the check node layer is L=10. . At this time, the improved Tornado code of the present invention has ^ L/n = 10/20 = 1/2 , and the code rate r=2/3=66.7% ₀

In summary, the present invention integrates the above six enhancement technologies, modularizes the entire H.264/ERRTP transmission architecture, and combines them on a protocol stack, not only achieving their respective advantages, but also mutual Enhanced to reflect better reliability and quality of service.

It will be understood by those skilled in the art that the description of the above embodiments, the specific implementation details and the parameter selections, etc., may be determined according to the specific application without affecting the essence and scope of the invention.

Although the present invention has been illustrated and described with reference to the preferred embodiments of the present invention, range.

Claims

Rights request

A multimedia communication method, comprising:

2. The multimedia communication method according to claim 1, further comprising:

The receiving end collects the communication quality, generates a service quality report, and sends it back to the sending end;

The sending end adjusts the fault tolerant elastic protection policy according to the quality of service report.

The multimedia communication method according to claim 2, further comprising:

The multimedia communication method according to claim 1, wherein the real-time transmission protocol header information carries coding-related information, and the receiving end recovers or partially recovers the multimedia data according to the coding-related information.

The multimedia communication method according to claim 1, wherein the encoded multimedia data is classified into two types: a data node and a check node.

The multimedia communication method according to claim 4, wherein the transmitting end selects a forward error correction coding mode according to a current network transmission status or/and a quality of service level of the multimedia data to be transmitted,

The quality of service level of the multimedia data to be transmitted depends on the relative importance of different data.

The multimedia communication method according to claim 6, wherein the real-time transport protocol header information includes: a real-time transport protocol identification field for indicating to distinguish from a real-time transport protocol; a forward error correction coding type field for indicating a forward error correction code type to be used; a forward error correction coding subtype field for indicating The related parameter setting of the forward error correction coding mode;

a packet length field, configured to indicate a length of a node obtained after performing forward error correction coding on the multimedia data;

The number of packets field is used to indicate the number of said nodes carried by the real-time transport protocol packet.

8. The multimedia communication method according to claim </ RTI>, wherein the transmitting end divides the H.264 network abstraction layer unit into at least one data node of equal length, and then performs forward error correction coding to obtain At least one check node;

Transmitting, by the sending end, the data node and the check node packet in at least one fault tolerant elastic real-time transport protocol packet for sending;

After receiving the real-time transport protocol packet, the receiving end decapsulates the data node and the check node;

If the data node is lost during the transmission, the receiving end performs recovery or partial recovery based on the forward error correction decoding on the lost data node according to the check node, and divides the H.264 network abstraction. Layer unit.

The multimedia communication method according to claim 8, wherein the transmitting end encapsulates the data node and the check node packet in at least one of the fault tolerant elastic real-time transport protocol packets for transmission. Also includes steps:

The transmitting end and the receiving end negotiate to determine, for each of the forward error correcting code types, the value of the forward error correcting code subtype field and the related parameter of the forward error correcting code indicated by the sending end The corresponding relationship set.

The multimedia communication method according to claim 9, wherein the transmitting end and the receiving end both establish a correspondence relationship table according to the correspondence relationship indicated by the forward error correction coding subtype field, for The forward error correction coding type field and the forward error correction coding subtype field query corresponding forward error correction coding or forward error correction decoding processing module; The transmitting end invokes a corresponding forward error correction coding processing module to perform forward error correction coding; the receiving end invokes a corresponding forward error correction decoding processing module to perform forward error correction decoding.

The multimedia communication method according to claim 10, wherein the transmitting end determines the service according to the relative importance of the network abstraction layer reference target in the header information of the H.264 network abstraction layer unit. The quality level, and then the forward error correction coding mode, is selected to determine the forward error correction coding type field and the forward error correction coding subtype field.

The multimedia communication method according to claim 10, wherein the transmitting end evaluates the network transmission status according to the transmission report fed back by the receiving end, and further selects the forward error correction coding mode to determine The forward error correction coding type field and the forward error correction coding subtype field are described.

13. The multimedia communication method according to claim 8, wherein the transmitting end removes, compresses, and encapsulates the at least one network abstraction layer unit with the same header information, and then divides, codes, and encapsulates the same. The fault-tolerant elastic real-time transport protocol packet, and synthesizing the same header information of the network abstraction layer unit in the header information of the fault-tolerant elastic real-time transport protocol packet;

Receiving, by the receiving end, the carried header information from the received header information of the fault-tolerant elastic real-time transport protocol packet, and adding to the network abstraction layer stripped from the header information extracted from the fault-tolerant elastic real-time transport protocol packet The head of the unit obtains a complete network abstraction layer unit; if there is a transmission error, the forward error correction decoding is performed according to the preset strategy to recover or partially restore the data node, and then the network abstraction layer unit is extracted therefrom.

The multimedia communication method according to claim 13, wherein in the fault tolerant elastic real-time transmission protocol header information, a network abstraction layer reference identifier field and a type field in the network abstraction layer unit header information are filled in The fault-tolerant elastic real-time transport protocol header information is stored in a payload type field.

The multimedia communication method according to claim 14, wherein the fault tolerant elastic real time transmission protocol identification field is a version information field of the fault tolerant elastic real time transmission protocol header information.

16. The multimedia communication method according to claim 15, wherein In the fault tolerant elastic real-time transport protocol encapsulation format, the forbidden bit field in the network abstraction layer unit header information is filled in a tag field of the fault tolerant elastic real-time transport protocol header information;

The receiving end determines, according to the tag field of the fault-tolerant elastic real-time transport protocol packet, whether the network abstraction layer unit it carries is in error.

The multimedia communication method according to claim 14, wherein the fault-tolerant elastic real-time transmission protocol identifier is a value of a tag field of the fault-tolerant elastic real-time transmission protocol header information, and the tag field is located in the fault-tolerant elastic real-time. In the transfer protocol header information.

The multimedia communication method according to claim 17, wherein the transmitting end first determines whether the forbidden bit field in the header information of at least one of the network abstraction layer units is valid, and then divides the virtualized network into a normal network. Abstract layer unit and error network abstraction layer unit;

Then, the normal network abstraction layer unit is encapsulated into the fault-tolerant elastic real-time transport protocol packet according to the fault-tolerant elastic real-time transport protocol encapsulation format, and the fault-tolerant elastic real-time transport protocol identifier is set;

Encapsulating the error network abstraction layer unit into the real-time transport protocol packet according to the real-time transport protocol encapsulation format;

The receiving end first determines whether the header information of the received packet is set to the fault-tolerant elastic real-time transport protocol identifier, and divides the fault-tolerant elastic real-time transport protocol packet into the real-time transport protocol packet;

The fault tolerant elastic real-time transport protocol packet is then processed according to the fault tolerant elastic real-time transport protocol encapsulation format, and the real-time transport protocol packet is processed according to the real-time transport protocol packet encapsulation format.

The multimedia communication method according to claim 8, wherein the receiving end statistically generates the service quality 4 report;

The receiving end carries the quality of service report by using an H.264 extended message, and sends the quality of service report to the sending end.

20. The multimedia communication method according to claim 19, wherein: The H.264 extension message is supplemental to the enhanced information;

The supplemental enhancement information includes:

a payload type field for indicating that the payload is a corresponding quality of service report;

a payload length field, used to indicate a corresponding quality of service report length;

Load, used to populate the corresponding quality of service report.

The multimedia communication method according to claim 20, wherein the quality of service report is divided into a sender report and a receiver report, and is indicated by the payload type field;

When the quality of service report is filled in the payload of the supplemental enhancement information, the payload of the supplemental enhancement information includes a version information field, a padding field, a received report number field, and a sender synchronization source identifier field;

When the quality of service report is a sender report, the sender information block is further included to describe related information of the sender of the service quality report;

Include at least one piece of the received ^= acknowledgment block for describing multimedia statistics from different sources;

Contains specific level extensions for retention extensions at specific levels.

22. The multimedia communication method according to claim 20, wherein the supplementary enhancement information for carrying the quality of service report is further carried by an abstract network layer unit;

The communication terminal sets a network abstraction layer reference identifier of the abstraction network layer unit according to the reliability requirement of the quality of service report transmission.

The multimedia communication method according to claim 20, wherein the communication terminal dynamically adjusts a period of statistical generation and transmission of the quality of service report according to a current network state and a high-level application requirement.

The multimedia communication method according to claim 19, wherein the receiving end counts the number of the lost network abstraction layer units according to the received network abstraction layer unit serial number of the video stream data, and generates a Said service quality report, sent back to the sender;

The sending end calculates the tiredness according to the missing network abstract layer unit sequence number The packet loss rate is measured, and the fault-tolerant elastic protection strategy is adjusted accordingly.

The multimedia communication method according to claim 24, wherein the receiving end analyzes and calculates a network condition parameter according to the received quality of service report; the parameter includes end-to-end instantaneous bandwidth, delay, and jitter.

The multimedia communication method according to claim 25, wherein the transmitting end sets different levels of fault-tolerant elastic protection policy series, and selects the corresponding fault-tolerant elastic protection policy according to the service quality report.

The multimedia communication method according to claim 26, wherein the receiving end obtains the positioning information of the lost video stream data according to the received network abstract layer unit sequence number of the video stream data, and Send it back to the sender;

And sending, by the sending end, the lost video stream data to the receiving end according to the positioning information of the lost video stream data.

The multimedia communication method according to claim 8, wherein the transmitting end obtains the positioning information of the lost strip according to the transmission error information, and performs segmental successive intra coding on the lost strip. To implement the error diffusion elimination strategy.

29. The multimedia communication method according to claim 28, wherein the segment-by-sequence intra-frame coding comprises the following steps:

Splitting a set of consecutive macroblocks from the lost strip to form a new strip, and the remaining macroblocks still belong to the missing stripe;

The new strip is intra-coded and transmitted at the next frame, after which the new strip is conventionally encoded.

30. The multimedia communication method according to claim 8, wherein the receiving end detects a transmission error and statistically transmits the error information;

The receiving end performs video information resynchronization after a transmission error occurs;

The receiving end implements the error concealment policy according to the transmission error information.

The multimedia communication method according to claim 30, wherein the receiving end detects and statistically transmits the error information according to the discontinuity of the network abstraction layer unit number.

The multimedia communication method according to claim 31, wherein: The receiving end obtains the positioning information of the lost strip according to the interruption condition of the sequence number of the network abstraction layer unit, where the positioning information includes a frame number where the lost strip is located and a position of the lost strip at the frame.

33. The multimedia communication method according to claim 35, wherein the error concealment policy comprises the step of: the receiving end replacing the loss with a corresponding strip of a previous frame of the frame in which the lost strip is located Bands.

The multimedia communication method according to any one of claims 8 to 33, wherein the fault tolerant elastic coding scheme includes an improved "Tornado" erasure code; the improved "Tornado" erasure code A set of said data nodes generates only one layer of said check nodes.

35. A multimedia communication terminal having a basic function module for implementing multimedia communication, comprising a codec module for implementing a multimedia codec function, wherein the method further comprises the following modules:

The multimedia communication terminal according to claim 35, further comprising the following modules:

a forward error correction module, configured to implement at least one forward error correction protection method, to maintain related parameters of the forward error correction protection method, wherein the protection method and the policy negotiation module control the forward error correction module to The unequal protection and adaptive hierarchical protection functions are implemented, and the fault-tolerant elastic implementation transmission control protocol module implements fault-tolerant elastic protection and error correction functions by calling the forward error correction module.

37. The multimedia communication terminal according to claim 36, further comprising Includes:

Error masking module for implementing error concealment function;

The multimedia communication terminal according to claim 37, further comprising:

The multimedia communication terminal according to claim 37, wherein the transmission layer is based on the fault tolerant elastic real-time transmission protocol/real-time transmission control protocol for implementing a multimedia transmission function supporting error resilience;

The application protocol layer includes a protection mechanism and a policy negotiation sublayer for implementing hierarchical protection and unequal protection functions;

The H.264 video coding layer includes a supplementary enhanced message extension reporting layer for implementing a reporting function based on supplementary enhanced message extension;

The H.264 network abstraction layer includes a forward error correction coding layer for implementing forward error correction coding.

The multimedia communication terminal according to any one of claims 35 to 39, wherein the basic function module for implementing multimedia communication comprises one of the following or any combination thereof:

The main control module is used to control the entire terminal;

a user interface module, configured to be responsible for user input and output interaction and display of information; a network communication module for communicating with the network to provide a lower layer transmission channel; an input and output and an underlying driver module for driving the hardware device; Service module, used to implement high-level business;

a communication process control module for controlling a communication process;

Application protocol module, used to implement application protocol functions; Implementing a transport control protocol trousers for implementing a transport control protocol function; a network abstraction layer module for implementing a network abstraction layer function;

Audio codec module for audio codec function.